TY - JOUR
T1 - Hoshi
T2 - A Japanese morphological adorner for TEI XML
AU - Bonnell, Jerry
AU - Ogihara, Mitsunori
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of EADH. All rights reserved. For permissions, please email: journals.permissions@oup.com.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.
AB - Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.
UR - http://www.scopus.com/inward/record.url?scp=85126726810&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126726810&partnerID=8YFLogxK
U2 - 10.1093/llc/fqaa003
DO - 10.1093/llc/fqaa003
M3 - Article
AN - SCOPUS:85126726810
VL - 36
SP - 32
EP - 42
JO - Digital Scholarship in the Humanities
JF - Digital Scholarship in the Humanities
SN - 2055-7671
IS - 1
ER -