{"id":32449,"date":"2026-06-26T10:43:23","date_gmt":"2026-06-26T08:43:23","guid":{"rendered":"https:\/\/capitularia.uni-koeln.de\/?p=32449"},"modified":"2026-07-01T10:32:48","modified_gmt":"2026-07-01T08:32:48","slug":"aus-dem-maschinenraum-capitularia-und-ki","status":"publish","type":"post","link":"https:\/\/capitularia.uni-koeln.de\/en\/blog\/aus-dem-maschinenraum-capitularia-und-ki\/","title":{"rendered":"From the Engine Room #3: Capitularia and AI. A recent assessment of possibilities and limitations"},"content":{"rendered":"<p><\/p>\n<p style=\"padding: 10px; background-color: #d1d1d1; font-size: small;\">The blog series \u2018From the Engine Room\u2019 is dedicated to the technical aspects and challenges of the \u2018Edition der fr\u00e4nkischen Herrschererlasse. Unlike our previous scientific posts on findings and editorial insights, here we focus on the infrastructural, methodological and technological dimensions of a long-term digital project. Since 2014, we have been working on a new edition of the capitularies. Designed as a hybrid edition, the project poses particular challenges: How can we ensure the long-term availability of our research data? How can we retroactively integrate measures for implementing the FAIR principles into a work plan that was designed before they were established? How can we network effectively with other projects and infrastructures? Or what role can AI play in the project? In shorter articles, we examine these questions from different perspectives: from research data management and networking strategies to technical infrastructure and the use of new technologies. In doing so, we share not only possible solutions, but also open questions and desiderata.<\/p>\n<h5>Introduction<\/h5>\n<p>The rapid development of AI technologies over the past two years and their increasing penetration at many levels raises questions: What potential do these technologies offer for edition projects? Where are their limits? And how does Capitularia position itself in this field of tension? The following post outlines the tool stack currently used in the project and explains why no AI tools or methods are currently planned in the workflow \u2014 but also where we see opportunities.<sup><a id=\"fnref-1\" href=\"#fn-1\">1<\/a><\/sup><\/p>\n<h5>Capitularia Tool Stack and Workflow<\/h5>\n<p>In the capitularies project, we currently rely deliberately on manual transcription and post-collations in the <a href=\"https:\/\/www.oxygenxml.com\/\">oXygen XML Editor<\/a>, rule-based collation (based on <a href=\"https:\/\/collatex.net\/\">CollateX<\/a>), and traditional editorial standards. The reasons lie partly in the &#8220;pre-dating&#8221; of the project, whose conception and application goes back to a period when the tools available today \u2014 some of which have since become well-established \u2014 did not yet exist or were still in their infancy. On the other hand, and these are arguably the weightier considerations, the reasons are rooted in the materials being edited. The transmission situation of the capitularies explains why tools deployed profitably in other projects reach their limits here:<\/p>\n<ul>\n<li>The capitulary texts are usually comparatively short, distributed across collections with heterogeneous content. They were often written by several hands in close proximity.<\/li>\n<li>Although nearly 400 manuscripts constitute a substantial body of transmission carriers, the capitularies are preserved with very unequal breadth and occasionally in fragmentary form.<\/li>\n<li>Capitularies are normative texts containing regulations and provisions, not narrative texts with numerous references to persons or places.<\/li>\n<li>As part of the project, existing digitised material will be used where available, for cost reasons. However, this sometimes includes older black-and-white scans rather than high-resolution images.<\/li>\n<\/ul>\n<p><em>Handwritten Text Recognition<\/em> (HTR) with tools such as <a href=\"https:\/\/readcoop.eu\/transkribus\/\" target=\"_blank\" rel=\"noopener\">Transkribus<\/a>, however, works best with large quantities of consistent training data and longer, coherent texts by a small number of scribes. Although most capitularies are written in Carolingian minuscule \u2014 a script for which a Transkribus model exists \u2014 the transmission also includes decidedly earlier as well as considerably later textual witnesses. Overall, it can be assumed that applying existing models to the short, diversely transmitted texts would yield comparatively high error rates. Post-processing would therefore likely take longer than transcription itself. It should also be borne in mind that high-resolution digitizations are not available for all manuscripts, and digitization quality equally determines the proportion of correctly recognized letters and abbreviations. For these reasons, the current workflow \u2014 providing for manual transcription followed by two complete post-collations (six-eyes principle) by trained staff \u2014 appears to be without viable alternative. The philological precision and quality thus achieved could not be delivered by HTR without substantial checking and revision.<\/p>\n<div id=\"attachment_32804\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-32804\" class=\"wp-image-32804 size-large\" src=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd-1024x216.png\" alt=\"\" width=\"1024\" height=\"216\" srcset=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd-1024x216.png 1024w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd-300x63.png 300w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd-768x162.png 768w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd-1536x323.png 1536w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/WorkflowDigEd.png 1672w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-32804\" class=\"wp-caption-text\">Workflow for the Digital Edition (own illustration; created with the help of ChatGPT)<\/p><\/div>\n<p>A further method now quite established in editorial contexts is <em>Named Entity Recognition<\/em> (NER), as it facilitates index creation and consequently the linking of authority data. As noted, capitularies \u2014 with the exception of the relevant ruler(s) \u2014 rarely name specific persons or localities. Compounding this are the difficulties associated with historical place names and their geographic identification. Which person, place, or region is intended can only be determined through deep historical expertise. The identification and disambiguation of <em>named entities<\/em> therefore remains an intellectual, editorial activity that cannot be automated in our context.<\/p>\n<p>Even for applications promising added value or time savings, their adoption warrants critical scrutiny. After twelve years we have proven and established workflows. Retrospective integration of new tools would entail considerable effort \u2014 with rather uncertain benefits. There are also broader scholarly and ecological concerns: AI tools do not guarantee transparency or reproducibility, and it frequently remains unclear how a given output came about. The underlying data of commercial models is moreover potentially questionable regarding provenance and quality. Relying on such tools creates dependencies that are naturally undesirable from a project perspective. Universities often officially prohibit the use of commercial models and providers outright. The open alternatives that must consequently be considered are comparatively limited and lag behind the major players in output quality. Access to the requisite computing capacity \u2014 for instance for training proprietary models \u2014 is also frequently restricted.<\/p>\n<h5>Potential Usage Scenarios for Applications or Methods<\/h5>\n<p>The following section explores where we see opportunities in employing AI. It is important to note that, on principle, only applications operating according to the <em>human-in-the-loop<\/em> principle are considered \u2014 ones in which people remain embedded in the work process, verify outputs for errors, and retain decision-making authority. Applications can be assigned to the following levels:<\/p>\n<ul>\n<li>Use in the editorial workflow (tools)<\/li>\n<li>Use in interaction with users (chatbot)<\/li>\n<li>Provision of AI-ready data<\/li>\n<\/ul>\n<h6>CTE2TEI<\/h6>\n<p>The basis for creating the critical edition in the <a href=\"https:\/\/cte.oeaw.ac.at\/\" target=\"_blank\" rel=\"noopener\">Classical Text Editor (CTE)<\/a> is the manuscript files encoded in TEI\/XML. The <a href=\"https:\/\/capitularia.uni-koeln.de\/blog\/kollationstool\/\" target=\"_blank\" rel=\"noopener\">collation tool<\/a> based on <em>CollateX<\/em> supports text constitution through the dynamic display of deviations and variants between manuscripts. <em>CollateX<\/em> operates in a rule-based manner. The editors of the individual texts produce the critical text, variant apparatus, factual annotations, and translation in the CTE, thereby preparing as thoroughly as possible the printed edition appearing in the MGH&#8217;s <em>leges<\/em> series.<\/p>\n<div id=\"attachment_32810\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-32810\" class=\"wp-image-32810 size-large\" src=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf-1024x137.png\" alt=\"\" width=\"1024\" height=\"137\" srcset=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf-1024x137.png 1024w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf-300x40.png 300w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf-768x103.png 768w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf-1536x206.png 1536w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/PrintEdWf.png 1557w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-32810\" class=\"wp-caption-text\">Workflow Printedition (eigene Darstellung; erstellt mit Hilfe von ChatGPT)<\/p><\/div>\n<p>After an embargo period, the MGH processes the printed edition for use within their online platform <a href=\"https:\/\/www.dmgh.de\/\" target=\"_blank\" rel=\"noopener\">dmgh.de<\/a>. It would naturally be desirable for the critical text and translation to also be available on the Capitularia website and accessible via search \u2014 this would link the digital and printed editions far more closely. In principle the CTE offers export formats; however, these have proven occasionally lossy, necessitating laborious post-checking. Working directly with the CTE file itself would eliminate the risk of loss during transformation to export formats. A subsequent approach would be AI-assisted re-encoding, for example via a transformer model, which would considerably accelerate the digital publication of the critical text without compromising editorial standards. Since the files follow a uniform template, recognizing apparatus entries and transferring them into the corresponding TEI structure (<code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">&lt;app&gt;<\/code>, <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">&lt;lem&gt;<\/code>, <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">&lt;rdg&gt;<\/code>) should be straightforward. The same applies to emendations. This is not &#8220;creative&#8221; work, but rather straightforward re-encoding following clear rules, so editorial standards would not be endangered. Initial experiments proceeding from the image of the printed edition are quite promising.<\/p>\n<p>A potential workflow could be structured as follows:<\/p>\n<p style=\"text-align: center;\">CTE export (critical text)<br \/>\n\u2193<br \/>\nSpecialized transformer model (trained on 20\u201330 already encoded examples)<br \/>\n\u2193<br \/>\nTEI-XML (automatically re-encoded)<br \/>\n\u2193<br \/>\nManual review by editors<br \/>\n\u2193<br \/>\nDigital publication + web edition<\/p>\n<h6 style=\"text-align: left;\">Further Scenarios<\/h6>\n<p class=\"font-claude-response-body break-words whitespace-normal\" data-sourcepos=\"128:1-128:52;8371-8422\">Beyond re-encoding, several other domains merit consideration:<\/p>\n<ul>\n<li class=\"font-claude-response-body break-words whitespace-normal\" data-sourcepos=\"130:1-130:221;8424-8644\">At the level of user interaction, a RAG-supported chatbot could address queries across the corpus such as &#8220;Which capitularies regulate monastic property?&#8221; \u2014 serving as an intelligent search interface complementing the existing full-text search across already edited texts. Considerations regarding the implementation of such a chatbot exist, for example, in the kindred project &#8220;<a href=\"https:\/\/burchards-dekret-digital.de\/index.html\" target=\"_blank\" rel=\"noopener\">Burchard&#8217;s Decree Digital<\/a>&#8220;.<a id=\"fnref-2\" href=\"#fn-2\">2<\/a><\/li>\n<li class=\"font-claude-response-body break-words whitespace-normal\" data-sourcepos=\"130:1-130:221;8424-8644\">Language models could help present collation results or variant histories in a more accessible manner, and could also render collections or clusters of capitularies within transmission carriers more visible. Exploratory visualizations might assist editors in testing hypotheses about the distribution and dependencies among textual witnesses; for general users, such visualizations could serve as a vehicle for clearer communication.<\/li>\n<li data-sourcepos=\"130:1-130:221;8424-8644\">From the &#8220;provider&#8221; perspective, transcription and edition data could be prepared in an AI-ready format, making them available for diverse applications and research questions (for instance in the context of Natural Language Processing, NLP). They could thereby also become part of the knowledge graphs currently taking shape. Here, however, the question of responsibility arises: we already make our data available for reuse under an appropriate licence, e.g. via <a href=\"https:\/\/zenodo.org\/communities\/capitularia\/records?q=&amp;f=resource_type%3Adataset&amp;l=list&amp;p=1&amp;s=10&amp;sort=newest\" target=\"_blank\" rel=\"noopener\">Zenodo<\/a>. Is it also our task to prepare data for downstream use (i.e. AI-ready), or does that responsibility rest with those wishing to use it? What would such preparation concretely entail? It must be noted that no corresponding work packages are budgeted for us and accordingly no resources are allocated. On the other hand, it is also the aspiration of an academy project to potentially assume a pioneering role. Preparation would probably also be conceivable in the broader context of academy projects, distributing the burden rather than placing it on any single project and enabling synergies.<\/li>\n<\/ul>\n<div id=\"attachment_32809\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-32809\" class=\"wp-image-32809 size-large\" src=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced-1024x137.png\" alt=\"\" width=\"1024\" height=\"137\" srcset=\"https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced-1024x137.png 1024w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced-300x40.png 300w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced-768x103.png 768w, https:\/\/capitularia.uni-koeln.de\/wp-content\/uploads\/2026\/06\/KIenhanced.png 1492w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-32809\" class=\"wp-caption-text\">Potential AI modules for expanding the workflow (own illustration; created with the help of ChatGPT)<\/p><\/div>\n<h5 data-sourcepos=\"134:1-134:212;8854-9065\">Summary<\/h5>\n<p>We currently forgo AI applications not on principle, but for pragmatic reasons: the manuscript tradition, our source material, and the existing workflows make common tools such as Transkribus uneconomical for us. At the same time, we recognize genuine opportunities. Above all, the re-encoding of critical texts by means of AI models appears as a promising scenario; extended search capabilities, visualizations, and AI-ready data preparation are further areas of interest.<\/p>\n<p class=\"font-claude-response-body break-words whitespace-normal\" data-sourcepos=\"156:1-156:278;10314-10591\">It does not seem productive to choose between &#8220;yes to AI&#8221; or &#8220;no to AI,&#8221; but rather to assess anew for each task whether there is a genuine benefit and whether deployment can be justified scientifically, resource-wise, and ecologically. Where AI is employed, this happens selectively and transparently. The design of infographics and visualizations for presentations has, for instance, become considerably more straightforward through the new possibilities.<\/p>\n<hr \/>\n<p><a id=\"fn-1\"><\/a>[1] This contribution is fundamentally based on a talk delivered in December 2025 at the workshop &#8220;Digital Editions of Contemporary History between AI and Linked Open Data&#8221; (Berlin, 04\/05.12.2025) of the Commission for the History of Parliamentarism and Political Parties (KGParl). A PDF version of the presentation is available on <a href=\"https:\/\/doi.org\/10.5281\/zenodo.17895907\" target=\"_blank\" rel=\"noopener\">Zenodo<\/a>.<a href=\"#fnref-1\">\u2191<\/a><\/p>\n<p class=\"font-claude-response-body break-words whitespace-normal\" data-sourcepos=\"134:1-134:212;8854-9065\"><a id=\"fn-2\"><\/a>[2] Cf. Daniela Schulz (8 May 2025). Edit recommends #3: Burchard&#8217;s Decree Digital (Resource Roundup Special). <em>Text+ Blog<\/em>. Retrieved 25 June 2026 from https:\/\/doi.org\/10.58079\/13w6w<a href=\"#fnref-2\">\u2191<\/a><\/p>\n<h5>Further reading on the topic:<\/h5>\n<div id=\"caption\">\n<div id=\"info2\">\n<ul>\n<li>Gerrit Br\u00fcning: Digitale Editionen von Goethes Werken seit 1998. Bilanz und Perspektiven in Zeiten Generativer KI. In: Daniela Schulz \/ Marcus Baumgarten \/ Torsten Scha\u00dfan (Hg.): Digitales Edieren gestern, heute und morgen (= Zeitschrift f\u00fcr digitale Geisteswissenschaften \/ Sonderb\u00e4nde, 7). Wolfenb\u00fcttel 2025. 30.12.2025. HTML \/ XML \/ PDF. DOI: <a href=\"https:\/\/doi.org\/10.17175\/sb007_002\">10.17175\/sb007_002<\/a><\/li>\n<li>Christopher Pollin\u00a0\/ Franz Fischer\u00a0\/ Patrick Sahle\u00a0\/ Martina Scholger\u00a0\/ Georg Vogeler: When it was 2024\u00a0\u2013 Generative AI in the Field of Digital Scholarly Editions. In: Zeitschrift f\u00fcr digitale Geisteswissenschaften 10 (2025). 10.07.2025. HTML\u00a0\/ XML\u00a0\/ PDF. DOI:\u00a0<a href=\"https:\/\/doi.org\/10.17175\/2025_008\">10.17175\/2025_008<\/a><\/li>\n<li>Michael Schonhardt: Do One Thing and Do It Well. Vier Prinzipien einer digitalen Editionspraxis im Spannungsfeld zwischen fachlichen Standards und Deep Learning. In: Daniela Schulz\u00a0\/ Marcus Baumgarten\u00a0\/ Torsten Scha\u00dfan (Hg.): Digitales Edieren gestern, heute und morgen (=\u00a0Zeitschrift f\u00fcr digitale Geisteswissenschaften \/ Sonderb\u00e4nde, 7). Wolfenb\u00fcttel 2025\u20132026. 05.03.2026. HTML\u00a0\/ XML\u00a0\/ PDF. DOI:\u00a0<a href=\"https:\/\/doi.org\/10.17175\/sb007_005\">10.17175\/sb007_005<\/a><\/li>\n<li>Daniela Schulz (2025): Potenziale und Herausforderungen des (digitalen) Edierens. Die \u201eEdition der fr\u00e4nkischen Herrschererlasse&#8221;. In: DigiTRiP. <a class=\"underline underline underline-offset-2 decoration-1 decoration-current\/40 hover:decoration-current focus:decoration-current\" href=\"https:\/\/doi.org\/10.58079\/13r7m\">https:\/\/doi.org\/10.58079\/13r7m<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p><\/p>\n       <div class=\"cite_as\">\n         <h5>How to cite<\/h5>\n         <div>\n           <span class=\"author\">Daniela Schulz<\/author>,\n           <span class=\"title\">From the Engine Room #3: Capitularia and AI. A recent assessment of possibilities and limitations<\/title>,\n           in: Capitularia. Edition of the Frankish Capitularies, ed. by\n           Karl Ubl and collaborators, Cologne 2014 ff.\n           \n           URL: https:\/\/capitularia.uni-koeln.de\/en\/blog\/aus-dem-maschinenraum-capitularia-und-ki\/ (accessed on 07\/21\/2026)\n         <\/div>\n       <\/div>","protected":false},"excerpt":{"rendered":"<p>The blog series \u2018From the Engine Room\u2019 is dedicated to the technical aspects and challenges of the \u2018Edition der fr\u00e4nkischen Herrschererlasse. Unlike our previous scientific posts on findings and editorial insights, here we focus on the infrastructural, methodological and technological dimensions of a long-term digital project. Since 2014, we have been working on a new [&hellip;]<\/p>\n","protected":false},"author":44,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[186],"tags":[],"class_list":["post-32449","post","type-post","status-publish","format-standard","hentry","category-from-the-engine-room"],"_links":{"self":[{"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/posts\/32449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/users\/44"}],"replies":[{"embeddable":true,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/comments?post=32449"}],"version-history":[{"count":12,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/posts\/32449\/revisions"}],"predecessor-version":[{"id":32811,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/posts\/32449\/revisions\/32811"}],"wp:attachment":[{"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/media?parent=32449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/categories?post=32449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/capitularia.uni-koeln.de\/en\/wp-json\/wp\/v2\/tags?post=32449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}