GT Set for Dresdner Hofdiarium 1665 (#168)

buccacoronatus · alix-tz · web-flow · commit c8c858218728 · 2025-01-27T12:13:18.000-05:00
* Create ground-truth-set-for-handwritten-text-recognition-htr-ocr-dresdner-hofdiarium-1665-mscrdresdk80-17th-century-kurrent-manuscript.yml

Hi! 

just wanted to contribute! Hope everything is setup correct. I just discovered your github repository so maybe I forgot something. 

Feel free to reach out! 

Stefan

* updated licence field

---------

Co-authored-by: Alix Chagué &lt;33317799+alix-tz@users.noreply.github.com&gt;
diff --git a/catalog/ground-truth-set-for-handwritten-text-recognition-htr-ocr-dresdner-hofdiarium-1665-mscrdresdk80-17th-century-kurrent-manuscript/ground-truth-set-for-handwritten-text-recognition-htr-ocr-dresdner-hofdiarium-1665-mscrdresdk80-17th-century-kurrent-manuscript.yml b/catalog/ground-truth-set-for-handwritten-text-recognition-htr-ocr-dresdner-hofdiarium-1665-mscrdresdk80-17th-century-kurrent-manuscript/ground-truth-set-for-handwritten-text-recognition-htr-ocr-dresdner-hofdiarium-1665-mscrdresdk80-17th-century-kurrent-manuscript.yml
@@ -0,0 +1,80 @@
+schema: https://htr-united.github.io/schema/2023-06-27/schema.json
+title: >-
+  Ground Truth Set for Handwritten Text Recognition (HTR/OCR): Dresdner
+  Hofdiarium 1665 (Mscr.Dresd.K.80) - 17th century Kurrent manuscript
+url: https://doi.org/10.5281/zenodo.14356190
+authors:
+  - name: Stefan
+    surname: Beckert
+    orcid: 0009-0005-2394-0075
+    roles:
+      - transcriber
+      - aligner
+      - project-manager
+      - quality-control
+institutions: []
+description: >-
+  This dataset contains ten pages of Ground Truth from the Dresden Court Diaries
+  of elector Johann Georg II. as Page XML, Alto XML and jpg. 
+language:
+  - deu
+production-software: eScriptorium + Kraken
+automatically-aligned: false
+script:
+  - iso: Latn
+    qualify: Kurrent
+script-type: only-manuscript
+time:
+  notBefore: '1665'
+  notAfter: '1665'
+hands:
+  count: '1'
+  precision: exact
+license:
+  name: CC-BY-NC-SA 4.0
+  url: https://creativecommons.org/licenses/by/4.0/
+format: Alto-XML
+sources:
+  - reference: >-
+      Beckert, S. (2024). Ground Truth Set for Handwritten Text Recognition
+      (HTR/OCR): Dresdner Hofdiarium 1665 (Mscr.Dresd.K.80) - 17th century
+      Kurrent manuscript [Data set]. Zenodo.
+      https://doi.org/10.5281/zenodo.14356190
+    link: ''
+volume:
+  - metric: pages
+    count: 10
+transcription-guidelines: >-
+  Transcription guidelines are oriented on the DTABF-M schema
+  (https://www.deutschestextarchiv.de/doku/basisformat/manuskript.html), but
+  have been adapted as follows:
+
+
+  - I and J majuscules are not distinguished
+
+  - u and v are reproduced true to the original (e.g. vnd)
+
+  - Long-s (ſ) and round-s (s) are distinguished
+
+  - sz ligature is rendered as ß in Kurrent scripts and as sz (e.g. "Libusza")
+  in Antiqua scripts
+
+  - ij ligature is rendered as y
+
+  - other ligatures, if they occur at all, are dissolved
+
+  - r graphemes are rendered as r in their modern day form
+
+  - an m with a nasal stroke was rendered as a simple m
+
+  - Where possible, abbreviation signs (Abbrechungszeichen) for the contemporary
+  identification of abbreviations have been included as single letters and not
+  marked separately. The subsequent punctuation mark (“.” or “:”) for further
+  identification of the abbreviation has also been included (cf. also Capelli,
+  1928, Lexicon abbreviaturarum I, p.X) 
+
+  - Diacritics in u were not marked
+
+  - In the case of uncertain capitalization, an approximation is sought via the
+  letter size
+