More on detecting anonymization of documents in Janeway
Last weekend I wrote about how we were detecting anonymization of metadata in Janeway.
This week, this process has been improved further. We now have the ability to run documents through pandoc in order to detect whether there are specific bits of text inside the file itself!
So, for instance, we look specifically for certain terms: the authors’ names and institutions; the words “previous” (“my previous work”); words pertaining to “funding”. This allows us to set a flag on each of the documents.
We can then allow the editor to go in and have a look to see what’s causing the flags.
Tada! The danger here, of course, is that we might give a false sense of security. This is good for flagging cases where something looks dodgy. But just because it has said it is clean does not 100% guarantee that it is.
Finally, the last step that we need to take is to allow editors to specify which words they want to search for, so that we can abstract this out across languages.
— Martin Paul Eve