| Annotation in original language: |
This contribution maps the current state of Czech anti-establishment discourse from the perspective of quantitative linguistics. The methodology of quantitative linguistics provides a high degree of intersubjectivity and replicability, which makes it particularly suitable for the study of a socially sensitive phenomenon such as disinformation. At present, our research focuses primarily on the detection of morphological, syntactic, and lexical features of anti-establishment media texts; in doing so, it draws on similar studies published abroad, but adapts them to the specific features of the Czech language. The results from all three domains consistently indicate clear and systematic linguistic divergences between mainstream and anti-establishment journalism. Our analyses are based upon the corpus used for the Verifee project, which focuses on determining credibility of news using AI tools; this dataset comprises 10,116 texts, divided into five categories (credible, partially credible, manipulative, misleading, and unclassifiable news). Moreover, in some of the studies, we make use of the ONLINE corpora, including the mainstream and antisystemic media and being part of the Czech National Corpus databases.
All the analyses single out antisystemic/manipulative news as linguistically specific, mostly exhibiting a high degree of complexity, which entails a lot of cognitive load inflicted upon the language user’s processing skills. More particularly, anti-establishment texts contain fewer nouns, numerals, and prepositions, while pronouns, conjunctions, and punctuation are overrepresented; from the perspective of the grammatical case, they abound in datives and vocatives. Such tendencies point to higher emotionality, rhetorical framing, and lower factuality. Closed-class categories also exhibit marked patterns, e.g., archaic pronouns, rhetorical particles, or stylistically heterogeneous prepositions. The lexical analysis makes the picture more fine-grained, discovering a high lexical sophistication of the manipulative news, which is, however, complemented with repetitive collocation patterns.
From the syntactic viewpoint, manipulative texts are the most complex on four out of the five studied syntactic metrics (sentence length, clause density, mean dependency distance, mean hierarchical distance). They achieve this by stacking short clauses into long sentences, thereby increasing cognitive load and enhancing persuasive potential. Credible news, by contrast, displays longer but more coherent clauses, reflecting editorial norms. These results are thus in line with what has been discovered in morphological and lexical investigations.
Finally, onomastic analysis of proper names of the Eastern European states in the mainstream and antisystemic discourses shows systematic shifts in case distributions. In anti-establishment discourse, Ukraine is more often in the nominative and Russia in the accusative or dative, while locative forms are suppressed. These grammatical choices encode divergent geopolitical narratives: mainstream news frames Ukraine as the victim and Russia as the aggressor, while anti-establishment outlets reverse these roles.
Taken together, our results provide robust empirical evidence that Czech anti-establishment discourse differs from mainstream journalism across morphological, lexical, syntactic, and onomastic dimensions. Regarding missing information, the very features absent in credible news – rhetorical overloading, excessive clause stacking, or re-framing of proper names – are those that carry manipulative discourse, and vice versa. Missing is thus a strategy: it defines the contrasting identities of credible and manipulative texts. Finally, the applied methodology provides a comprehensive toolkit that can be transferred to other languages; the results therefore not only shed light on Czech disinformation, but also contribute to the broader cross-linguistic and cross-cultural study of the phenomenon.
|