More on detecting anonymization of documents in Janeway

Janeway Dev Team
2 min readJan 30, 2021

--

Last weekend I wrote about how we were detecting anonymization of metadata in Janeway.

This week, this process has been improved further. We now have the ability to run documents through pandoc in order to detect whether there are specific bits of text inside the file itself!

So, for instance, we look specifically for certain terms: the authors’ names and institutions; the words “previous” (“my previous work”); words pertaining to “funding”. This allows us to set a flag on each of the documents.

The Janeway file inspector showing document status
Showing the document anonymity flags in Janeway.

We can then allow the editor to go in and have a look to see what’s causing the flags.

The Janeway document inspector showing a non-anonymized document
The document inspector shows us what terms came up that shouldn’t be in the document.

Tada! The danger here, of course, is that we might give a false sense of security. This is good for flagging cases where something looks dodgy. But just because it has said it is clean does not 100% guarantee that it is.

Finally, the last step that we need to take is to allow editors to specify which words they want to search for, so that we can abstract this out across languages.

— Martin Paul Eve

--

--

Janeway Dev Team

We are Andy, Mauro, Joe and Martin and we develop the Janeway publishing platform.