OU Portal
Log In
Welcome
Applicants
Z6_60GI02O0O8IDC0QEJUJ26TJDI4
Error:
Javascript is disabled in this browser. This page requires Javascript. Modify your browser's settings to allow Javascript to execute. See your browser's documentation for specific instructions.
{}
Zavřít
Publikační činnost
Probíhá načítání, čekejte prosím...
publicationId :
tempRecordId :
actionDispatchIndex :
navigationBranch :
pageMode :
tabSelected :
isRivValid :
Typ záznamu:
stať ve sborníku (D)
Domácí pracoviště:
Katedra českého jazyka (25300)
Název:
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Citace
Batsuren, K., Bella, G., Arora, A., Martinovic, V., Gorman, K., Žabokrtský, Z., Ganbold, A., Dohnalová, Š., Ševčíková, M., Pelegrinová, K., Giunchiglia, F., Cotterell, R. a Vylomova, E. The SIGMORPHON 2022 Shared Task on Morpheme Segmentation.
In:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 2022-07-14 Seattle, Washington.
Seattle, Washington: Association for Computational Linguistics, 2022. s. 103-116. ISBN 978-1-955917-82-7.
Podnázev
Rok vydání:
2022
Obor:
Počet stran:
14
Strana od:
103
Strana do:
116
Forma vydání:
Elektronická verze
Kód ISBN:
978-1-955917-82-7
Kód ISSN:
Název sborníku:
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Sborník:
Mezinárodní
Název nakladatele:
Association for Computational Linguistics
Místo vydání:
Seattle, Washington
Stát vydání:
Sborník vydaný v zahraničí
Název konference:
Místo konání konference:
Seattle, Washington
Datum zahájení konference:
Typ akce podle státní
příslušnosti účastníků akce:
Celosvětová akce
Kód UT WoS:
EID:
2-s2.0-85139172965
Klíčová slova anglicky:
morpheme segmentation, tokenization, Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian
Popis v původním jazyce:
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
Popis v anglickém jazyce:
The SIGMORPHON 2022 shared task on mor- pheme segmentation challenged systems to de- compose a word into a sequence of morphemes and covered most types of morphology: com- pounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system sub- missions from 7 teams and the best system av- eraged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmenta- tion, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 sys- tem submissions from 3 teams, and the best sys- tems outperformed all three state-of-the-art sub- word tokenization methods (BPE, ULM, Mor- fessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evalua- tion script, and all gold standard datasets.
Seznam ohlasů
Ohlas
R01:
RIV/61988987:17250/22:A2302FXH
Complementary Content
Deferred Modules
${title}
${badge}
${loading}
Deferred Modules