New provisions in UK copyright law look promising for text and data mining. Last year, the government signed into effect an exemption to copyright for the purposes of non-commercial research. This states that:
If a researcher has the right to read a copyright document under the terms of the licensing agreement with the content provider, they must be permitted to copy the work for the purpose of non-commercial text and data mining.
Wonderful! So all those novels that are in copyright can actually be data-mined if we can get a digital copy. Except, as I discovered in a conversation with one of my Ph.D. students today, that is quite a large caveat and it turns out to be not quite so straightforward. If we have a digital copy we can text mine it. However, if there are DRM (Digital Rights Management) restrictions on the text, we cannot remove those protections, even for the purpose of non-commercial research. This would violate the Digital Millennium Copyright Act in the USA and/or Article 6 of the European Copyright Directive, which comes with severe penalties. On the other hand, if we saw the spines off the books and run them through a scanner and OCR process, that’s fine for personal research.
There is an exemption, apparently, for “Literary works distributed in e-book format when all existing e-book editions of the work (including digital text editions made available by authorized entities) contain access controls that prevent the enabling either of the book’s read-aloud function or of screen readers that render the text into a specialized format. (A renewed exemption from 2006, based on a similar exemption approved in 2003.)” But that’s no good here.
This is patently ridiculous and it should be an exemption to the DMCA in the USA and the EUCD.