Authors:
(1) Xiaohan Ding, Department of Computer Science, Virginia Tech, (e-mail: xiaohan@vt.edu);
(2) Mike Horning, Department of Communication, Virginia Tech, (e-mail: mhorning@vt.edu);
(3) Eugenia H. Rho, Department of Computer Science, Virginia Tech, (e-mail: eugenia@vt.edu ).
Table of Links
Study 1: Evolution of Semantic Polarity in Broadcast Media Language (2010-2020)
Study 2: Words that Characterize Semantic Polarity between Fox News & CNN in 2020
Discussion and Ethics Statement
Related Work
Temporal Dynamics in Linguistic Polarization
Scholars have measured linguistic polarization through stance (Dash et al. 2022), toxicity (Sap et al. 2019), sentiment detection (Yang et al. 2017), topic modeling (Ebeling et al. 2022), and lexicon-dictionaries (Polignano et al. 2022). Such methods typically capture an aggregated snapshot of polarization across large textual corpora, providing a static quantification of media bias based on preset criteria. As a result, such approaches seldom capture temporal fluctuations in semantic polarity. Only a few research to date quantify linguistic polarization over time. For example, researchers have measured partisan trends in congressional speech by defining polarization as the expected posterior probability that a neutral observer would infer a speaker’s political party from a randomly selected utterance (Gentzkow, Shapiro, and Taddy 2019) across various issues. Demszky et al. use this Bayesian approach to capture how social media discussions on mass shootings become increasingly divisive (2020).
Such methods, however, capture linguistic polarization as a general metric across nearly all aggregated words within a textual corpus. Given the powerful salience of certain topical keywords that are inseparable from American politics, unlike prior work, we focus on how specific keywords polarize over time. We depart from previous approaches by devising a framework that captures semantic polarity as a function of how two different entities semantically diverge across time in their contextual use of identical words that are central to the American public discourse. Specifically, we measure linguistic polarization by capturing the diachronic shifts in how two news stations use the same topical words on their news programs over an 11-year period.
Further, in effort to understand the linguistic drivers behind how and why semantic polarization evolves, we aim to identify the source of diachronic shifts based on how two different entities use identical, politically contentious keywords. Few research has endeavored to detect causal factors underlying how the meaning of words changes over time, which remains an open challenge in NLP scholarship (Kutuzov et al. 2018). Hamilton et al. show how word meanings change between consecutive decades due to cultural shifts (e.g., change in the meaning of “cell” driven by technological advancement: prison cell vs. cell phone) (2016). Yet such research typically makes a priori assumptions that the language between two textual corpora of comparison are significantly related without statistically demonstrating how so. On the other hand, research that do provide evidence of statistical relations between two separate textual data (Dutta, Ma, and Choudhury 2018), generally do not explain what words drive the direction of semantic influence from one corpus to another. We overcome these limitations: first, we statistically validate the existence of a significant semantic relationship between two textual corpora, TV news language and social media discourse, through Grangercausality (Study 3). Then, we strategically separate our data by lag-times associated with significant Granger-causal relations and apply a deep learning interpretation technique to demonstrate which contextual words from TV news language influences Twitter discussions around specific topical keywords (and vice-versa) over time.
Who Influences Who? Setting a Media Agenda
On any given day, newsrooms select news on a wide range of topics that cover a variety of issues. According to Agenda Setting Theory, this selection process leads audiences to see certain issues as more significant than others (McCombs 1997). Agenda Setting was particularly useful in decades past when media was much more consolidated. In the U.S. for example, three major networks (NBC, ABC, and CBS) provided the bulk of broadcast news content, and so their news agendas had broad influence on what audiences saw as important. However, today’s media landscape is much more fragmented. People receive their news from any number of sources including television, newspapers, radio, online websites, and social media. In such an environment, Agenda Setting has certain limitations as the theory assumes that audiences are relatively passive and simply receive information from the media. This was true when news flowed one-way from the media to the mass public.
By contrast nowadays, the two-way form of communication afforded by the internet and social media makes it possible for the public to both influence media agendas and be influenced by them (Papacharissi 2009; Barnard 2018). As a result, new scholarship has argued through Intermedia Agenda Setting Theory that media agendas do not come entirely from within their own organizations, but from two other sources: other media and the mass audience. In the latter case, for example, audiences themselves can influence media agendas through online affordances (e.g., retweeting, commenting, sharing), to raise certain issues to prominence online (Rogstad 2016). In this work, we explore these new emerging dynamics with an interest in not only understanding how news agendas are constructed, but how the possible adoption of different agendas might influence agenda dynamics as media interact with the public online.
This paper is available on arxiv under CC 4.0 license.