OU Portal
Log In
Welcome
Applicants
Z6_60GI02O0O8IDC0QEJUJ26TJDI4
>
Publ3 search
Error:
Javascript is disabled in this browser. This page requires Javascript. Modify your browser's settings to allow Javascript to execute. See your browser's documentation for specific instructions.
{}
Close
Publikační činnost
Probíhá načítání, čekejte prosím...
publicationId :
tempRecordId :
actionDispatchIndex :
navigationBranch :
pageMode :
tabSelected :
isRivValid :
Record type:
stať ve sborníku (D)
Home Department:
Katedra českého jazyka (25300)
Title:
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Citace
Batsuren, K., Bella, G., Arora, A., Martinovic, V., Gorman, K., Žabokrtský, Z., Ganbold, A., Dohnalová, Š., Ševčíková, M., Pelegrinová, K., Giunchiglia, F., Cotterell, R. a Vylomova, E. The SIGMORPHON 2022 Shared Task on Morpheme Segmentation.
In:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 2022-07-14 Seattle, Washington.
Seattle, Washington: Association for Computational Linguistics, 2022. s. 103-116. ISBN 978-1-955917-82-7.
Subtitle
Publication year:
2022
Obor:
Number of pages:
14
Page from:
103
Page to:
116
Form of publication:
Elektronická verze
ISBN code:
978-1-955917-82-7
ISSN code:
Proceedings title:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Proceedings:
Mezinárodní
Publisher name:
Association for Computational Linguistics
Place of publishing:
Seattle, Washington
Country of Publication:
Sborník vydaný v zahraničí
Název konference:
Místo konání konference:
Seattle, Washington
Datum zahájení konference:
Typ akce podle státní
příslušnosti účastníků:
Celosvětová akce
WoS code:
EID:
2-s2.0-85139172965
Key words in English:
morpheme segmentation, tokenization, Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian
Annotation in original language:
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
Annotation in english language:
The SIGMORPHON 2022 shared task on mor- pheme segmentation challenged systems to de- compose a word into a sequence of morphemes and covered most types of morphology: com- pounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system sub- missions from 7 teams and the best system av- eraged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmenta- tion, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 sys- tem submissions from 3 teams, and the best sys- tems outperformed all three state-of-the-art sub- word tokenization methods (BPE, ULM, Mor- fessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evalua- tion script, and all gold standard datasets.
References
Reference
R01:
RIV/61988987:17250/22:A2302FXH
Complementary Content
Deferred Modules
${title}
${badge}
${loading}
Deferred Modules