Martin Paul Eve bio photo

Martin Paul Eve

Professor of Literature, Technology and Publishing at Birkbeck, University of London

Email Books Twitter Github Stackoverflow MLA CORE Institutional Repo ORCID ID  ORCID iD Wikipedia Pictures for Re-Use

New provisions in UK copyright law look promising for text and data mining. Last year, the government signed into effect an exemption to copyright for the purposes of non-commercial research. This states that:

If a researcher has the right to read a copyright document under the terms of the licensing agreement with the content provider, they must be permitted to copy the work for the purpose of non-commercial text and data mining.

Wonderful! So all those novels that are in copyright can actually be data-mined if we can get a digital copy. Except, as I discovered in a conversation with one of my Ph.D. students today, that is quite a large caveat and it turns out to be not quite so straightforward. If we have a digital copy we can text mine it. However, if there are DRM (Digital Rights Management) restrictions on the text, we cannot remove those protections, even for the purpose of non-commercial research. This would violate the Digital Millennium Copyright Act in the USA and/or Article 6 of the European Copyright Directive, which comes with severe penalties. On the other hand, if we saw the spines off the books and run them through a scanner and OCR process, that’s fine for personal research.

There is an exemption, apparently, for “Literary works distributed in e-book format when all existing e-book editions of the work (including digital text editions made available by authorized entities) contain access controls that prevent the enabling either of the book’s read-aloud function or of screen readers that render the text into a specialized format. (A renewed exemption from 2006, based on a similar exemption approved in 2003.)” But that’s no good here.

This is patently ridiculous and it should be an exemption to the DMCA in the USA and the EUCD.