Popis v původním jazyce: |
This study examines the syntactic complexity of texts spanning multiple genres by the renowned Czech author Karel Čapek. Čapek (1890–1938) was a Czech writer, playwright, and journalist, best known for his science fiction works. The analyzed corpus includes over 700 texts covering diverse genres (novels, short stories, newspaper articles, travel books, poems, scientific studies, personal correspondence, and children’s literature) offering a unique opportunity to analyze genre variation within the works of a single author, thus eliminating authorship-related bias common in corpus-based genre studies.
The study focuses on the syntactic aspects of texts, which have traditionally been less common in stylometric analysis compared to lexical features. This disparity is largely due to the historical scarcity of syntactically annotated corpora (treebanks), which has limited researchers' ability to explore sentence structures within texts. However, the accuracy of syntactic annotation has significantly improved in recent years, thanks to advancements in natural language processing (NLP) and the development of robust tools and models. This study applies the Surface Syntactic Universal Dependencies (SUD) framework (Gerdes et al., 2018) to the collected texts and analyzes the syntactic complexity indices derived from the resulting treebanks.
Syntactic complexity in this study is measured using multiple indicators to capture both linear and hierarchical dimensions of sentence structure. These include average sentence length (measured in words and clauses), average clause length (in words), Mean Dependency Distance (MDD), and Mean Hierarchical Distance (MHD). Average sentence length and clause length reflect the level of syntactic elaboration, with longer sentences and clauses often reflecting greater complexity. MDD measures the linear distance between syntactically connected words, capturing how dispersed or tightly connected words are within a sentence, with larger MDD indicating greater complexity (cf. Liu 2008). In contrast, MHD assesses the hierarchical depth of a sentence by calculating the mean vertical distance between nodes in the dependency tree, with higher values indicating deeper layers of subordination and greater syntactic embedding (cf. Liu 2008). By combining these metrics, the study offers a comprehensive evaluation of syntactic complexity, accounting for both surface-level word relationships and the underlying hierarchical structures, enabling a nuanced analysis of genre variation within Čapek's writing.
The results reveal significant variation in syntactic complexity across genres in Karel Čapek’s texts. Newspaper articles and travel books exhibit the highest complexity, characterized by longer sentences, multiple clauses, and greater hierarchical depth (MHD). Scientific texts display moderately long sentences with fewer clauses but higher hierarchical depth, reflecting their formal nature. Surprisingly, children’s literature demonstrates notable syntactic richness, with sentence lengths and dependency distances (MDD) higher than expected, suggesting a more intricate style than typically seen in this genre. In contrast, poetry and novels feature simpler syntactic structures with shorter sentences and flatter hierarchies. Short stories and correspondence occupy a middle ground, balancing syntactic complexity with readability. These findings highlight Čapek’s nuanced adaptation of syntactic strategies to align with the stylistic and communicative demands of each genre.
|