Hoshi: A Japanese morphological adorner for TEI XML

Jerry Bonnell, Mitsunori Ogihara

Research output: Contribution to journalArticlepeer-review

Abstract

Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when <choice> appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.

Original languageEnglish (US)
Pages (from-to)32-42
Number of pages11
JournalDigital Scholarship in the Humanities
Volume36
Issue number1
DOIs
StatePublished - Apr 1 2021
Externally publishedYes

ASJC Scopus subject areas

  • Information Systems
  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Hoshi: A Japanese morphological adorner for TEI XML'. Together they form a unique fingerprint.

Cite this