Martin Paul Eve bio photo

Martin Paul Eve

Professor of Literature, Technology and Publishing at Birkbeck, University of London

Email Books Twitter Github Stackoverflow MLA CORE Institutional Repo Hypothes.is ORCID ID  ORCID iD Wikipedia Pictures for Re-Use

One of the major challenges that we face in the Jisc Open Monographs Metrics Experiment is in aligning the linguistic expression of a citation with its underlying canonical citation object. That is, “M. Foucault, Discipline and Punish” refers, despite the different linguistic expression, to the same object as “Michel Foucault, Discipline and Punish: The Birth of the Prison (London: Penguin, 1992)”. This problem is greatly eased when the artefact in question has a DOI. However, the fact that the current Document Object Identifier (DOI) system is a supplier-side, push mechanism means that it will never be possible or likely for all cited objects to have a DOI. Consumers of citation data from DOI registries, therefore, are at the mercy of content creators to register their metadata. In addition, this comes with preservation and access requirements – the PILA agreement in the case of Crossref – that are not necessarily suitable or realistic for all types of content, given that scholarly work can cite arbitrary grey literature.

One solution that could be envisaged is a crowdsourced independent metadata repository that can be linked to DOIs but that provides canonical metadata resolution functionality for cited artefacts. Good examples of such services, in another domain, are the MetaBrainz services, such as MusicBrainz. These provide “consumer”-controlled – crowdsourced – metadata records for music releases. These are queryable via an API and addressable via a canonical URL. One could imagine the same for cited objects, although an infinitely extensible open metadata schema for any kind of object (remember, data=stuff) is ambitious (impossible), to say the least. This could be linked to and defer to a DOI if one were subsequently issued.

The challenge, though, is that this mistakes the DOI architecture for a technical lookup service. The DOI architecture provides this functionality, for sure, on a producer-side push basis. But what Crossref and other registries do with DOIs are actually more social than technical. DOIs are a contract between the content producer and the registry to keep the work available and preserved and to keep the DOI resolving, even in the event of organisational failure. In the imagined environment above, there is no compact to hold the metadata to standard, no body who would have any responsibility to maintain a set of canonical identifiers that could permanently find their way into the scholarly record. In other words, it reinvents the function of DOIs as a technical, rather than social, matter, and neglects the compact.

There is also the issue of sustainability. Crossref and other DOI registries have to maintain a vast computational infrastructure that processes enormous quantities of data, on both a deposit and query basis. Breaking the system, even for a few minutes, is out of the question and can have dire consequences. This requires an onboard staff of highly competent technicians and astute critical thinkers, often converging in the same person (see: Geoff Bilder). To sustain an organisation like this requires a business model. Crossref’s business model is membership based; an annual tiered membership fee based on size of organisational turnover from publishing and a (very small) fee per DOI deposit.

In the case of the envisaged repository mentioned above, who would the members be? Certainly one could envisage a coalition of libraries supporting such infrastructural provision through initiatives such as SCOSS (the global Sustainability Coalition for Open Science Services). But, there is consensus in the DOI model among publisher members that DOIs are useful and that the system should receive ongoing infrastructural support. The same cannot be said for a new consumer-controlled central metadata service.

All in all, then, the recommendation on this front from our experiment is not that such a consumer-controlled metadata service should be created – it is likely impossible to get this to work robustly in a useful way that has consensus – but to echo other recommendations from elsewhere: assign DOIs to granular scholarly objects (books, book chapters etc.) and ensure that whenever an artefact is cited, the DOI is included. This means that publisher processes for finding and inserting DOIs need to be robust; authors are incredibly unlikely to change their practices to include DOIs across the board. Given that typesetting (PDF, XML etc.) is often outsourced, this can be a tricky quality-assurance issue. Further to this, other identifier systems could be integrated into citations. ORCID IDs, for instance, could be included alongside author names to allow for robust lookup. (If it is undesirable to have such identifiers in the version for human display, then at least some form of structured representation that includes this would be helpful.) Other databases, such as GRID, could be used to indicate institutions where these exist (University Presses, for example). These all come with additional cost overheads.