The blog series ‘From the Engine Room’ is dedicated to the technical aspects and challenges of the ‘Edition der fränkischen Herrschererlasse. Unlike our previous scientific posts on findings and editorial insights, here we focus on the infrastructural, methodological and technological dimensions of a long-term digital project. Since 2014, we have been working on a new edition of the capitularies. Designed as a hybrid edition, the project poses particular challenges: How can we ensure the long-term availability of our research data? How can we retroactively integrate measures for implementing the FAIR principles into a work plan that was designed before they were established? How can we network effectively with other projects and infrastructures? Or what role can AI play in the project? In shorter articles, we examine these questions from different perspectives: from research data management and networking strategies to technical infrastructure and the use of new technologies. In doing so, we share not only possible solutions, but also open questions and desiderata.
Introduction
This article examines the challenges of implementing FAIR principles retrospectively in long-term projects that began before 2016. When the ‘Edition der fränkischen Herrschererlasse’ project was started in 2014, the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) had not yet been formulated. It was not until 2016 that the FORCE11 community published its groundbreaking guidelines for handling research data (Wilkinson, M. D. et al. 2016), which have since been also adopted in funding programmes. Today, almost a decade later, we are faced with the challenge of retroactively adapting our project to standards that did not yet exist in this form when it was initially conceptualised.
The Dilemma of Anteriority
While newly proposed projects must submit a data management plan (DMP) and thus clarify the storage of their research data from the outset, Capitularia has not yet done so. The original work plan did not allocate resources for FAIR research data management (RDM) – simply because these requirements were not yet standard at the time of application. However, the subsequent integration of FDM measures is not only a conceptual issue, but above all a question of resources: Who will do the work? Where will the funds (and time) come from? And how can these tasks be prioritised over the core objectives of the project – the editorial work itself?
In this context, finding a suitable repository for the long-term provision of our research data is particularly challenging. This decision is not trivial. Only a few (certified) repositories are suitable for complex, TEI-XML-encoded edition data and are relevant or responsible for this (or even accessible). Inclusion in such a repository involves considerable effort. Not only must the data be prepared in accordance with the specifications of the repository in question, but there is also the issue of the long-term costs of data storage and permanent provision. Who bears these costs after the end of the project? What costs are to be estimated here? Are these foreseeable? And actually, we are not only concerned with the provision of the data layer itself, but ideally this should be maintained in conjunction with a presentation layer, since without its context the data is less useful or even incomprehensible to the general public.
For the latter problem, no satisfactory answers have yet been found – beyond statification and the reduction of functionalities – but for the other levels (e.g. bit layer and data layer), at least technical solutions exist. Established repositories such as Heidelberg Open Research Data (heiData) for Heidelberg Digital Editions (heiEDITIONS) and the Göttingen TextGrid Repository (TextGridRep) contain edition data and offer various advantages and disadvantages compared to discipline-specific solutions. RADAR4Memory is a new repository for the humanities working in the field of history, operated by FIZ Karlsruhe as a partner institution of the NFDI consortium NFDI4Memory.
Where available and appropriate, repositories provided by your own institution may also be suitable. The University of Cologne has such a repository in the form of the Data Centre for the Humanities (DCH), which also works closely with the CCeH as our technical partner and acts as the data centre for the Digital Humanities Coordination Office of the North Rhine-Westphalian Academy of Sciences, Humanities and the Arts. However, its focus is more on AV data, so it is unlikely that anyone would search for ‘our’ research data here. Furthermore, the DCH did not yet exist in its current form and with its broad portfolio of services when the Capitularies Project began, so no direct collaboration could be planned in advance. If a similar project were to be launched today, there would be no question of involving the DCH and other service providers from the outset, seeking advice and clarifying a prospective data depositing in advance.
Zenodo as a pragmatic interim solution
In order to defuse the situation and postpone a final decision on a repository, we have set up the Zenodo community ‘Capitularia’. This open ‘ repository’ is generally accessible and is also used and recommended by the DCH as an external service. The (additional) storage of data sets and publications on Zenodo has emerged as a best practice in the field of digital humanities and beyond. Here, we store transcription files (in addition to the download option on our own website), presentations and scientific blog posts accompanying the project, which are thus assigned DOIs and become citable. This solution offers several advantages, even if the fundamental question of long-term data storage remains unresolved:
- Manageable effort and thus easy integration into existing workflows
- Independence from other services or individuals
- Possibility of versioning
- Free, sustainable storage by CERN
- Automatic DOI assignment for citability
- Visibility through linking to OpenAIRE (Open Access Infrastructure for Research in Europe)
Capitularia in Context
Through the CCeH as a technical partner and currently also through a person employed in both structures, Capitularia is closely linked to the NFDI consortium Text+, which focuses on the field of text and language-based data and, in the data domain of editions, is explicitly dedicated to the issues and challenges already mentioned here. Text+ offers a comprehensive range of consulting services, including on the topic of research data management and standards, develops guidelines on best practices in these areas, and organises workshops and training courses. The Text+ Registry also enables structured traceability of the project and networking with other projects at the meta data level. Repository providers also participate in Text+, so that the contact persons are at least known. Several academies are likewise involved, giving rise to the hope that this connection will enable the aforementioned problems to be tackled jointly with other academy projects, thereby creating synergies.
Initial Findings and Open Questions
The experiments and experiences to date indicate that retroactive or ongoing FAIRification is possible to a limited extent, but also requires pragmatic solutions. Incremental or partial improvements seem more realistic to implement than a complete ad hoc conversion to a supposedly perfect FDM according to all the rules of the art, which would probably be impossible to maintain in passing and could fail due to a lack of resources. Documentation and transparency of decisions are essential here.
Formulating the difficulties we face and our position, for example in this article or in discussions at conferences or with other projects, seems helpful to us in reflecting on our thoughts and approaches and defining individual, feasible work packages. In general, however, questions remain unanswered: if certain aspects or parameters could not be considered and thus factored into the project planning from the outset, the structural question arises as to how such (long-term) projects can be supported in order to integrate and implement measures retrospectively and meet current standards. What sustainable financing models are available for data preservation and data provision after the end of a project, and what options should be available (for legacy projects)? The discussion about FAIR research data management in digital editions is still in its early stages – with pieces like this one, we hope to contribute a small part to the debate on this complex topic.
Daniela Schulz
References and Links:
- HeiData: https://heidata.uni-heidelberg.de/dataverse/root
- HeiEDITIONS: https://www.ub.uni-heidelberg.de/publikationsdienste/digitale_editionen.html
- HeiEDITIONS Documentation: https://heieditions.github.io/guidelines/toc.html
- Sandra König et al. (2024): FAIRes FDM für digitale Editionen: Konzept für einen Workshop im World Café-Format. Zenodo. https://doi.org/10.5281/zenodo.11618480
- Karoline Lemke et al.: Empfehlung zur Erstellung, Bearbeitung und Publikation FAIRer Forschungsdaten in der Datendomäne Editionen. https://textplus.pages.gwdg.de/textplus-editions/guidelines_sde/
- RADAR4Memory: https://radar.products.fiz-karlsruhe.de/de/radarabout/radar4memory
- Melanie Seltmann / Sandra König (2024): Text+ @ FORGE – FAIRes FDM für digitale Editionen. In: Text+ Blog. https://doi.org/10.58079/vfb4
- TextGrid Repository: https://textgridrep.org/
- Text+: Research Data Management. https://text-plus.org/themen-dokumentation/forschungsdatenmanagement/
- Wilkinson, M. D. et al. (2016): The FAIR Guiding Principles for scientific data management and stewardship. In: Scientific Data 3, 160018. https://doi.org/10.1038/sdata.2016.18
- Zenodo Community Capitularia: https://zenodo.org/communities/capitularia
How to cite
