OU Portal
Log In
Welcome
Applicants
Z6_60GI02O0O8IDC0QEJUJ26TJDI4
Error:
Javascript is disabled in this browser. This page requires Javascript. Modify your browser's settings to allow Javascript to execute. See your browser's documentation for specific instructions.
{}
Close
Publikační činnost
Probíhá načítání, čekejte prosím...
publicationId :
tempRecordId :
actionDispatchIndex :
navigationBranch :
pageMode :
tabSelected :
isRivValid :
Record type:
stať ve sborníku (D)
Home Department:
Ústav pro výzkum a aplikace fuzzy modelování (94410)
Title:
Analysis of the Semantic Vector Space Induced by a Neural Language Model and a Corpus
Citace
Chen, X., Hůla, J. a Dvořák, A. Analysis of the Semantic Vector Space Induced by a Neural Language Model and a Corpus.
In:
22nd Conference Information Technologies - Applications and Theory (ITAT 2022): ITAT 2022. Information Technologies - Applications and Theory 2022 2022-09-23 Zuberec.
Aachen: CEUR-WS, 2022. s. 103-110. ISSN 1613-0073.
Subtitle
Publication year:
2022
Obor:
Informatika
Number of pages:
8
Page from:
103
Page to:
110
Form of publication:
Elektronická verze
ISBN code:
neuvedeno
ISSN code:
1613-0073
Proceedings title:
ITAT 2022. Information Technologies - Applications and Theory 2022
Proceedings:
Mezinárodní
Publisher name:
CEUR-WS
Place of publishing:
Aachen
Country of Publication:
Sborník vydaný v zahraničí
Název konference:
22nd Conference Information Technologies - Applications and Theory (ITAT 2022)
Místo konání konference:
Zuberec
Datum zahájení konference:
Typ akce podle státní
příslušnosti účastníků:
Celosvětová akce
WoS code:
EID:
2-s2.0-85139854412
Key words in English:
Contextual embeddings; BERT; polysemy, clustering
Annotation in original language:
Although contextual word representations produced by transformer-based language models (e.g., BERT) have proven to be very successful in different kinds of NLP tasks, there is still little knowledge about how these contextual embeddings are connected to word meanings or semantic features. In this article, we provide a quantitative analysis of the semantic vector space induced by the XLM-RoBERTa model and the Wikicorpus. We study the geometric properties of vector embeddings of selected words. We use HDBSCAN clustering algorithm and propose a score called Cluster Dispersion Score which reflects how disperse is the collection of clusters. Our analysis shows that the number of meanings of a word is not directly correlated with the dispersion of embeddings of this word in the semantic vector space induced by the language model and a corpus. Some observations about the division of clusters of embeddings for several selected words are provided.
Annotation in english language:
References
Reference
R01:
RIV/61988987:17610/22:A2302FNM
Complementary Content
Deferred Modules
${title}
${badge}
${loading}
Deferred Modules