Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 22 May 2024

Uncovering the essence of diverse media biases from the semantic embedding space

  • Hong Huang 1 , 2 , 3 , 4 , 5 ,
  • Hua Zhu 1 , 2 , 3 , 4 , 5 ,
  • Wenshi Liu 4 , 5 ,
  • Hua Gao 5 ,
  • Hai Jin 1 , 2 , 3 , 4 , 5 &
  • Bang Liu 6  

Humanities and Social Sciences Communications volume  11 , Article number:  656 ( 2024 ) Cite this article

1280 Accesses

Metrics details

  • Cultural and media studies

Media bias widely exists in the articles published by news media, influencing their readers’ perceptions, and bringing prejudice or injustice to society. However, current analysis methods usually rely on human efforts or only focus on a specific type of bias, which cannot capture the varying magnitudes, connections, and dynamics of multiple biases, thus remaining insufficient to provide a deep insight into media bias. Inspired by the Cognitive Miser and Semantic Differential theories in psychology, and leveraging embedding techniques in the field of natural language processing, this study proposes a general media bias analysis framework that can uncover biased information in the semantic embedding space on a large scale and objectively quantify it on diverse topics. More than 8 million event records and 1.2 million news articles are collected to conduct this study. The findings indicate that media bias is highly regional and sensitive to popular events at the time, such as the Russia-Ukraine conflict. Furthermore, the results reveal some notable phenomena of media bias among multiple U.S. news outlets. While they exhibit diverse biases on different topics, some stereotypes are common, such as gender bias. This framework will be instrumental in helping people have a clearer insight into media bias and then fight against it to create a more fair and objective news environment.

Similar content being viewed by others

research paper on media bias

What is newsworthy about Covid-19? A corpus linguistic analysis of news values in reports by China Daily and The New York Times

research paper on media bias

Negativity drives online news consumption

research paper on media bias

Media bias through collocations: a corpus-based study of Egyptian and Ethiopian news coverage of the Grand Ethiopian Renaissance Dam

Introduction.

In the era of information explosion, news media play a crucial role in delivering information to people and shaping their minds. Unfortunately, media bias, also called slanted news coverage, can heavily influence readers’ perceptions of news and result in a skewing of public opinion (Gentzkow et al. 2015 ; Puglisi and Snyder Jr, 2015b ; Sunstein, 2002 ). This influence can potentially lead to severe societal problems. For example, a report from FAIR has shown that Verizon management is more than twice as vocal as worker representatives in news reports about the Verizon workers’ strike in 2016 Footnote 1 , putting workers at a disadvantage in the news and contradicting the principles of fair and objective journalism. Unfortunately, this is just the tip of the media bias iceberg.

Media bias can be defined as the bias of journalists and news producers within the mass media in selecting and covering numerous events and stories (Gentzkow et al. 2015 ). This bias can manifest in various forms, such as event selection, tone, framing, and word choice (Hamborg et al. 2019 ; Puglisi and Snyder Jr, 2015b ). Given the vast number of events happening in the world at any given moment, even the most powerful media must be selective in what they choose to report instead of covering all available facts in detail (Downs, 1957 ). This selectivity can result in the perception of bias in the news coverage, whether intentional or unintentional. Academics in journalism studies attempt to explain the news selection process by developing taxonomies of news values (Galtung and Ruge, 1965 ; Harcup and O’neill, 2001 , 2017 ), which refer to certain criteria and principles that news editors and journalists consider when selecting, editing, and reporting the news. These values help determine which stories should be considered news and the significance of these stories in news reporting. However, different news organizations and journalists may emphasize different news values based on their specific objectives and audience. Consequently, a media outlet may be very keen on reporting events about specific topics while turning a blind eye to others. For example, news coverage often ignores women-related events and issues with the implicit assumption that they are less critical than men-related contents (Haraldsson and Wängnerud, 2019 ; Lühiste and Banducci, 2016 ; Ross and Carter, 2011 ). Once events are selected, the media must consider how to organize and write their news articles. At that time, the choice of tone, framing, and word is highly subjective and can introduce bias. Specifically, the words used by the authors to refer to different entities may not be neutral but instead imply various associations and value judgments (Puglisi and Snyder Jr, 2015b ). As shown in Fig. 1 , the same topic can be expressed in entirely different ways, depending on a media outlet’s standpoint Footnote 2 . For example, certain “right-wing” media outlets tend to support legal abortion, while some “left-wing” ones oppose it.

figure 1

The blue and red fonts represent the views of some “left-wing” and “right-wing” media outlets, respectively.

In fact, media bias is influenced by many factors: explicit factors such as geographic location, media position, editorial guideline, topic setting, and so on; obscure factors such as political ideology (Groseclose and Milyo, 2005 ; MacGregor, 1997 ; Merloe, 2015 ), business reason (Groseclose and Milyo, 2005 ; Paul and Elder, 2004 ), and personal career (Baron, 2006 ), etc. Besides, some studies also summarize these factors related to bias as supply-side and demand-side ones (Gentzkow et al. 2015 ; Puglisi and Snyder Jr, 2015b ). The influence of these complex factors makes the emergence of media bias inevitable. However, media bias may hinder readers from forming objective judgments about the real world, lead to skewed public opinion, and even exacerbate social prejudices and unfairness. For example, the New York Times supports Iranian women’s saying no to hijabs in defense of women’s rights Footnote 3 while criticizing the Chinese government’s initiative to encourage Uyghur women to remove hijabs and veils Footnote 4 . Besides, the influence of news coverage on voter behavior is a subject of ongoing debate. While some studies indicate that slanted news coverage can influence voters and election outcomes (Bovet and Makse, 2019 ; DellaVigna and Kaplan, 2008 ; Grossmann and Hopkins, 2016 ), others suggest that this influence is limited in certain circumstances (Stroud, 2010 ). Fortunately, research on media bias has drawn attention from multiple disciplines.

In social science, the study of media bias has a long tradition dating back to the 1950s (White, 1950 ). So far, most of the analyses in social science have been qualitative, aiming to analyze media opinions expressed in the editorial section (e.g., endorsements (Ansolabehere et al. 2006 ), editorials (Ho et al. 2008 ), ballot propositions (Puglisi and Snyder Jr, 2015a )) or find out biased instances in news articles by human annotations (Niven, 2002 ; Papacharissi and de Fatima Oliveira, 2008 ; Vaismoradi et al. 2013 ). Some researchers also conduct quantitative analysis, which primarily involves counting the frequency of specific keywords or articles related to certain issues (D’Alessio and Allen, 2000 ; Harwood and Garry, 2003 ; Larcinese et al. 2011 ). In particular, there are some attempts to estimate media bias using automatic tools (Groseclose and Milyo, 2005 ), and they commonly rely on text similarity and sentiment computation (Gentzkow and Shapiro, 2010 ; Gentzkow et al. 2006 ; Lott Jr and Hassett, 2014 ). In summary, social science research on media bias has yielded extensive and effective methodologies. These methodologies interpret media bias from diverse perspectives, marking significant progress in the realm of media studies. However, these methods usually rely on manual annotation and analysis of the texts, which requires significant manual effort and expertise (Park et al. 2009 ), thus might be inefficient and subjective. For example, in a quantitative analysis, researchers might devise a codebook with detailed definitions and rules for annotating texts, and then ask coders to read and annotate the corresponding texts (Hamborg et al. 2019 ). Developing a codebook demands substantial expertise. Moreover, the standardization process for text annotation is subjective, as different coders may interpret the same text differently, thus leading to varied annotations.

In computer science, research on social media is extensive (Lazaridou et al. 2020 ; Liu et al. 2021b ; Tahmasbi et al. 2021 ), but few methods are specifically designed to study media bias (Hamborg et al. 2019 ). Some techniques that specialize in the study of media bias focus exclusively on one type of bias (Huang et al. 2021 ; Liu et al. 2021b ; Zhang et al. 2017 ), thus not general enough. In natural language processing (NLP), research on the bias of pre-trained models or language models has attracted much attention (Qiang et al. 2023 ), aiming to identify and reduce the potential impact of bias in pre-trained models on downstream tasks (Huang et al. 2020 ; Liu et al. 2021a ; Wang et al. 2020 ). In particular, some studies on pre-trained word embedding models show that they have captured rich human knowledge and biases (Caliskan et al. 2017 ; Grand et al. 2022 ; Zeng et al. 2023 ). However, such works mainly focus on pre-trained models rather than media bias directly, which limits their applicability to media bias analysis.

A major challenge in studying media bias is that the evaluation of media bias is highly subjective because individuals have varying evaluation criteria for bias. Take political bias as an example, a story that one person views as neutral may appear to be left-leaning or right-leaning by someone else. To address this challenge, we develop an objective and comprehensive media bias analysis framework. We study media bias from two distinct but highly relevant perspectives: the macro level and the micro level. From the macro perspective, we focus on the event selection bias of each media, i.e., the types of events each media tends to report on. From the micro perspective, we focus on the bias introduced by media in the choice of words and sentence construction when composing news articles about the selected events.

In news articles, media outlets convey their attitudes towards a subject through the contexts surrounding it. However, the language used by the media to describe and refer to entities may not be purely neutral descriptors but rather imply various associations and value judgments. According to the cognitive miser theory in psychology, the human mind is considered a cognitive miser who tends to think and solve problems in simpler and less effortful ways to avoid cognitive effort (Fiske and Taylor, 1991 ; Stanovich, 2009 ). Therefore, faced with endless news information, ordinary readers will tend to summarize and remember the news content simply, i.e., labeling the things involved in news reports. Frequent association of certain words with a particular entity or subject in news reports can influence a media outlet’s loyal readers to adopt these words as labels for the corresponding item in their cognition due to the cognitive miser effect. Unfortunately, such a cognitive approach is inadequate and susceptible to various biases. For instance, if a media outlet predominantly focuses on male scientists while neglecting their female counterparts, some naive readers may perceive scientists to be mostly male, leading to a recognition bias in their perception of the scientist and even forming stereotypes unconsciously over time. According to the “distributional hypothesis” in modern linguistics (Firth, 1957 ; Harris, 1954 ; Sahlgren, 2008 ), a word’s meaning is characterized by the words occurring in the same context as it. Here, we simplify the complex associations between different words (or entities/subjects) and their respective context words into co-occurrence relationships. An effective technique to capture word semantics based on co-occurrence information is neural network-based word embedding models (Kenton and Toutanova, 2019 ; Le and Mikolov, 2014 ; Mikolov et al. 2013 ).

Word embedding models represent each word in the vocabulary as a vector (i.e., word embedding) within the word embedding space. In this space, words that frequently co-occur in similar contexts are positioned close to each other. For instance, if a media outlet predominantly features male scientists, the word “scientist” and related male-centric terms, such as “man” and “he” will frequently co-occur. Consequently, these words will cluster near the word “scientist” in the embedding space, while female-related words occupy more distant positions. This enables us to evaluate the media outlet’s gender bias concerning the term “scientist” by comparing the embedding distances between “scientist” and words associated with both males and females. This approach aligns closely with the Semantic Differential theory in psychology (Osgood et al. 1957 ), which gauges an individual’s attitudes toward various concepts, objects, and events using bipolar scales constructed from adjectives with opposing semantics. In this study, to identify media bias from news articles, we first define two sets of words with opposite semantics for each topic to develop media bias evaluation scales. Then, we quantify media bias on each topic by calculating the embedding distance difference between a target word (e.g., scientist) and these two sets of words (e.g., female-related words and male-related words) in the word embedding space.

Compared with the bias in news articles, event selection bias is more obscure, as only events of interest to the media are reported in the final articles, while events deliberately ignored by the media remain invisible to the public. Similar to the co-occurrence relationship between words mentioned earlier, two media outlets that frequently select and report on the same events should exhibit similar biases in event selection, as two words that occur frequently in the same contexts have similar semantics. Therefore, we refer to Latent Semantic Analysis (LSA (Deerwester et al. 1990 )) and generate vector representation (i.e., media embedding) for each media via truncated singular value decomposition (Truncated SVD (Halko et al. 2011 )). Essentially, a media embedding encodes the distribution of the events that a media outlet tends to report on. Therefore, in the media embedding space, media outlets that often select and report on the same events will be close to each other due to similar distributions of the selected events. If a media outlet shows significant differences in such a distribution compared to other media outlets, we can conclude that it is biased in event selection. Inspired by this, we conduct clustering on the media embeddings to study how different media outlets differ in the distribution of selected events, i.e., the so-called event selection bias.

These two methodologies, designed for micro-level and macro-level analysis, share a fundamental similarity: both leverage data-driven embedding models to represent each word or media outlet as a distinctive vector within the embedding space and conduct further analysis based on these vectors. Therefore, in this study, we integrate both methodologies into a unified framework for media bias analysis. We aim to uncover media bias on a large scale and quantify it objectively on diverse topics. Our experiment results show that: (1) Different media outlets have different preferences for various news events, and those from the same country or organization tend to share more similar tastes. Besides, the occurrence of international hot events will lead to the convergence of different media outlets’ event selection. (2) Despite differences in media bias, some stereotypes, such as gender bias, are common among various media outlets. These findings align well with our empirical understanding, thus validating the effectiveness of our proposed framework.

Data and methods

The first dataset is the GDELT Mention Table, a product of the Google Jigsaw-backed GDELT project Footnote 5 . This project aims to monitor news reports from all over the world, including print, broadcast, and online sources, in over 100 languages. Each time an event is mentioned in a news report, a new row is added to the Mention Table (See Supplementary Information Tab. S1 for details). Given that different media outlets may report on the same event at varying times, the same event can appear in multiple rows of the table. While the fields GlobalEventID and EventTimeDate are globally unique attributes for each event, MentionSourceName and MentionTimeDate may differ. Based on the GlobalEventID and MentionSourceName fields in the Mention Table, we can count the number of times each media outlet has reported on each event, ultimately constructing a “media-event” matrix. In this matrix, the element at ( i ,  j ) denotes the number of times that media outlet j has reported on the event i in past reports.

As a global event database, GDELT collects a vast amount of global events and topics, encompassing news coverage worldwide. However, despite its widespread usage in many studies, there are still some noteworthy issues. Here, we highlight some of the issues to remind readers to use it more cautiously. Above all, while GDELT provides a vast amount of data from various sources, it cannot capture every event accurately. It relies on automated data collection methods, and this could result in certain events being missed. Furthermore, its algorithms for event extraction and categorization cannot always perfectly capture the nuanced context and meaning of each event, which might lead to potential misinterpretations.

The second dataset is built on MediaCloud Footnote 6 , an open-source platform for research on media ecosystems. MediaCloud’s API enables the querying of news article URLs for a given media outlet, which can then be retrieved using a web crawler. In this study, we have collected more than 1.2 million news articles from 12 mainstream US media outlets in 2016-2021 via MediaCloud’s API (See Supplementary Information Tab. S2 for details).

Media bias estimation by media embedding

Latent Semantic Analysis (LSA (Deerwester et al. 1990 )) is a well-established technique for uncovering the topic-based semantic relationships between text documents and words. By performing truncated singular value decomposition (Truncated SVD (Halko et al. 2011 )) on a “document-word” matrix, LSA can effectively capture the topics discussed in a corpus of text documents. This is accomplished by representing documents and words as vectors in a high-dimensional embedding space, where the similarity between vectors reflects the similarity of the topics they represent. In this study, we apply this idea to media bias analysis by likening media and events to documents and words, respectively. By constructing a “media-event” matrix and performing Truncated SVD, we can uncover the underlying topics driving the media coverage of specific events. Our hypothesis posits that media outlets mentioning certain events more frequently are more likely to exhibit a biased focus on the topics related to those events. Therefore, media outlets sharing similar topic tastes during event selection will be close to each other in the embedding space, which provides a good opportunity to shed light on the media’s selection bias.

The generation procedures for media embeddings are shown in Supplementary Information Fig. S1 . First, a “media-event” matrix denoted as A m × n is constructed based on the GDELT Mention Table, where m and n represent the total number of media outlets and events, respectively. Each entry A i , j represents the number of times that media i has reported on event j . Subsequently, Truncated SVD is performed on the matrix A m × n , which results in three matrices: U m × k , Σ k × k and \({V}_{n\times k}^{T}\) . The product of Σ k × k and \({V}_{n\times k}^{T}\) is represented by E k × n . Each column of E k × n corresponds to a k -dimensional vector representation for a specific media outlet, i.e., a media embedding. Specifically, the decomposition of matrix A m × n can be formulated as follows:

Equation( 1 ) defines the complete singular value decomposition of A m × n . Both \({U}_{m\times m}^{0}\) and \({({V}_{n\times n}^{0})}^{T}\) are orthogonal matrices. \({{{\Sigma }}}_{m\times n}^{0}\) is a m  ×  n diagonal matrix whose diagonal elements are non-negative singular values of the matrix A m × n in descending order. Equation( 2 ) defines the truncated singular value decomposition (i.e., Truncated SVD) of A m × n . Based on the result of complete singular value decomposition, the part corresponding to the largest k singular values is equivalent to the result of Truncated SVD. Specifically, U m × k comprises the first k columns of the matrix \({U}_{m\times m}^{0}\) , while \({V}_{n\times k}^{T}\) comprises the first k rows of the matrix \({({V}_{n\times n}^{0})}^{T}\) . Additionally, the diagonal matrix Σ k × k is composed of the first k diagonal elements of \({{{\Sigma }}}_{m\times n}^{0}\) , representing the largest k singular values of A m × n . In particular, the media embedding model is defined as the product of the matrices Σ k × k and \({V}_{n\times k}^{T}\) , which has n k -dimensional media embeddings as follows:

To measure the similarity between two media embedding sets, we refer to Word Mover Distance (WMD (Kusner et al. 2015 )). WMD is designed to measure the dissimilarity between two text documents based on word embedding. Here, we subtract the optimal value of the original WMD objective function from 1 to convert the dissimilarity value into a normalized similarity score that ranges from 0 to 1. Specifically, the similarity between two media embedding sets is formulated as follows:

Let n denote the total number of media outlets, and s be an n -dimensional vector corresponding to the first media embedding set. For each i , the weight of media i in the embedding set is given by \({s}_{i}=\frac{1}{\sum_{k = 1}^{n}{t}_{i}}\) , where t i  = 1 if media i is in the embedding set, and t i  = 0 otherwise. Similarly, \({s}^{{\prime} }\) is another n -dimensional vector corresponding to the second media embedding set. The distance between media i and j is calculated using c ( i ,  j ) =  ∥ e i  −  e j ∥ 2 , where e i and e j are the embedding representations of media i and j , respectively. The flow matrix T   ∈   R n × n is used to determine how much media i in s travels to media j in \({s}^{{\prime} }\) . Specifically, T i , j  ≥ 0 denotes the amount of flow from media i to media j .

Media bias estimation by word embedding

Word embedding models (Kenton and Toutanova, 2019 ; Le and Mikolov, 2014 ; Mikolov et al. 2013 ) are widely used in text-related tasks due to their ability to capture rich semantics of natural language. In this study, we regard media bias in news articles as a special type of semantic and capture it using Word2Vec (Le and Mikolov, 2014 ; Mikolov et al. 2013 ).

Supplementary Information Fig. S2 presents the process of building media corpora and training word embedding models to capture media bias. First, we reorganize the corpus for each media outlet by up-sampling to ensure that each media corpus contains the same number of news articles. The advantage of up-sampling is that it makes full use of the existing media corpus data, as opposed to discarding part of the data like down-sampling does. Second, we superimpose all 12 media corpora to construct a large base corpus and pre-train a Word2Vec model denoted as W b a s e based on it. Third, we fine-tune the same pre-trained model W b a s e using the specific corpus of each media outlet separately and get 12 fine-tuned models denoted as \({W}^{{m}_{i}}\) ( i  = 1, 2, . . . 12).

In particular, the main objective of reorganizing the original corpora is to ensure that each corpus equivalently contributes to the pre-training process, in case a large corpus from certain media dominates the pre-trained model. As shown in Supplementary Information Tab. S2 , the largest corpus in 2016-2021 is from USA Today, which contains 295,518 news articles. Therefore, we can reorganize the other 11 media corpora by up-sampling to ensure that each of the 12 corpora has 295,518 articles. For example, as for NPR’s initial corpus, which has 14,654 news articles, we first repeatedly superimpose 295, 518//14, 654 = 20 times to get 293,080 articles and then randomly sample 295, 518%14, 654 = 2, 438 from the initial 14,654 articles as a supplement. Finally, we can get a reorganized NPR corpus with 295,518 articles.

Semantic Differential is a psychological technique proposed by (Osgood et al. 1957 ) to measure people’s psychological attitudes toward a given conceptual object. In the Semantic Differential theory, a given object’s semantic attributes can be evaluated in multiple dimensions. Each dimension consists of two poles corresponding to a pair of adjectives with opposite semantics (i.e., antonym pairs). The position interval between the poles of each dimension is divided into seven equally-sized parts. Then, given the object, respondents are asked to choose one of the seven parts in each dimension. The closer the position is to a pole, the closer the respondent believes the object is semantically related to the corresponding adjective. Supplementary Information Fig. S3 provides an example of Semantic Differential.

Constructing evaluation dimensions using antonym pairs in Semantic Differential is a reliable idea that aligns with how people generally evaluate things. For example, when imagining the gender-related characteristics of an occupation (e.g., nurse), individuals usually weigh between “man” and “woman”, both of which are antonyms regarding gender. Likewise, when it comes to giving an impression of the income level of the Asian race, people tend to weigh between “rich” (high income) and “poor” (low income), which are antonyms related to income. Based on such consistency, we can naturally apply Semantic Differential to measure a media outlet’s attitudes towards different entities and concepts, i.e., media bias.

Specifically, given a media m , a topic T (e.g., gender) and two semantically opposite topic word sets \(P={\{{p}_{i}\}}_{i = 1}^{{K}_{1}}\) and \(\neg P={\{\neg {p}_{i}\}}_{i = 1}^{{K}_{2}}\) about topic T , media m ’s bias towards the target x can be defined as:

Here, K 1 and K 2 denote the number of words in topic word sets P and ¬  P , respectively. W m represents the word embedding model obtained by fine-tuning W b a s e using the specific corpus of media m . \(\overrightarrow{{W}_{x}^{m}}\) is the embedding representation of the word x in W m . S i m is a similarity function used to measure the similarity between two vectors (i.e., word embeddings). In practice, we employ the cosine similarity function, which is commonly used in the field of natural language processing. In particular, equation( 5 ) calculates the difference of average similarities between the target word x and two semantically opposite topic word sets, namely P and ¬  P . Similar to the antonym pairs in Semantic Differential, such two topic word sets are used to construct the evaluation scale of media bias. In practice, to ensure the stability of the results, we have repeated this experiment five times, each time with a different random seed for up-sampling. Therefore, the final results shown in Fig. 4 are the average bias values for each topic.

The idea of recovering media bias by embedding methods

We first analyzed media bias from the aspect of event selection to study which topics a media outlet tends to focus on or ignore. Based on the GDELT database, we constructed a large “media-event" matrix that records the times each media outlet mentioned each event in news reports from February to April 2022. To extract media bias information, we referred to the idea of Latent Semantic Analysis (Deerwester et al. 1990 ) and performed Truncated SVD (Halko et al. 2011 ) on this matrix to generate vector representation (i.e., media embedding) for each media outlet (See Methods for details). Specifically, outlets with similar event selection bias (i.e., outlets that often report on events of similar topics) will have similar media embeddings. Such a bias encoded in the vector representation of each outlet is exactly the first type of media bias we aim to study.

Then, we analyzed media bias in news articles to investigate the value judgments and attitudes conveyed by media through their news articles. We collected more than 1.2 million news articles from 12 mainstream US news outlets, spanning from January 2016 to December 2021, via MediaCloud’s API. To identify media bias from each outlet’s corpus, we performed three sequential steps: (1) Pre-train a Word2Vec word embedding model based on all outlets’ corpora. (2) Fine-tune the pre-trained model by using the specific corpus of each outlet separately and obtain 12 fine-tuned models corresponding to the 12 outlets. (3) Quantify each outlet’s bias based on the corresponding fine-tuned model, combined with the idea of Semantic Differential, i.e., measuring the embedding similarities between the target words and two sets of topic words with opposite semantics (See Methods for details). An example of using Semantic Differential (Osgood et al. 1957 ) to quantify media bias is shown in Supplementary Information Fig. S4 .

Media show significant clustering due to their regions and organizations

In this experiment, we aimed to capture and analyze the event selection bias of different media outlets based on the proposed media embedding methodology. To achieve a comprehensive analysis, we selected 247 media outlets from 8 countries ( Supplementary Information Tab. S6) , including the United States, the United Kingdom, Canada, Australia, Ireland, and New Zealand-six English-speaking nations with India and China, two populous countries. For each country, we chose media outlets that were the most active during February-April 2022, with media activity measured by the quantity of news reports. We then generated embedding representations for each media outlet via Truncated SVD and performed K-means clustering (Lloyd, 1982 ; MacQueen, 1967 ) on the obtained media embedding representations (with K  = 10) for further analysis. Details of the experiment are presented in the first section of the supplementary Information. Figure 2 visualizes the clustering results.

figure 2

There are 247 media outlets from 8 countries: Canada (CA), Ireland (IE), United Kingdom (UK), China (CN), United States (US), India (IN), Australia (AU), and New Zealand (NZ). Each circle in the visualization represents a media outlet, with its color indicating the cluster it belongs to, and its diameter proportional to the number of events reported by the outlet between February and April 2022. The text in each circle represents the name or abbreviation of a media outlet (See Supplementary Information Tab. S6 for details). The results indicate that media outlets from the same country tend to be grouped together in clusters. Moreover, the majority of media outlets in the Fox series form a distinct cluster, indicating a high degree of similarity in their event selection bias.

First, we find that media outlets from different countries tend to form distinct clusters, signifying the regional nature of media bias. Specifically, we can interpret Fig. 2 from two different perspectives, and both come to this conclusion. On the one hand, most media outlets from the same country tend to appear in a limited number of clusters, which suggests that they share similar event selection bias. On the other hand, as we can see, media outlets in the same cluster mostly come from the same country, indicating that media exhibiting similar event selection bias tends to be from the same country. In our view, differences in geographical location lead to diverse initial event information accessibility for media outlets from different regions, thus shaping the content they choose to report.

Besides, we observe an intriguing pattern where the Associated Press (AP) and Reuters, despite their geographical separation, share similar event selection biases as they are clustered together. This abnormal phenomenon could be attributed to their status as international media outlets, which enables them to cover various global events, thus leading to extensive overlapping news coverage. In addition, 16 out of the 21 Fox series media outlets form a distinct cluster on their own, suggesting that a media outlet’s bias is strongly associated with the organization it belongs to. After all, media outlets within the same organization often tend to prioritize or overlook specific events due to shared positions, interests, and other influencing factors.

International hot topics drive media bias to converge

Previous results have revealed a significant correlation between media bias and the location of a media outlet. Therefore, we conducted an experiment to further investigate the event selection biases of media outlets from 25 different countries. To achieve this, we gathered GDELT data spanning from February to April 2022 and created three “media-event” matrices on a monthly basis. We then subjected each month’s “media-event” matrix to the same processing steps: (1) generating an embedding representation for each media outlet through matrix decomposition, (2) obtaining the embedding representation of each media outlet that belongs to each country to construct a media embedding set, and (3) calculating the similarity between every two countries (i.e., each two media embedding sets) using Word Mover Distance (WMD (Kusner et al. 2015 )) as the similarity metric (See Methods for details). Figure 3 presents the changes in event selection bias similarity amongst media outlets from different countries between February and April 2022.

figure 3

The horizontal axis in this figure represents the time axis, measured in months. Meanwhile, the vertical axis indicates the event selection similarity between Ukrainian media and media from other countries. Each circle represents a country, with the font inside it representing the corresponding country’s abbreviation (see details in Supplementary Information Tab. S3) . The size of a circle corresponds to the average event selection similarity between the media of a specific country and the media of all other countries. The color of the circle corresponds to the vertical axis scale. The blue dotted line’s ordinate represents the median similarity to Ukrainian media.

We find that the similarities between Ukraine and other countries peaked significantly in March 2022. This result aligns with the timeline of the Russia-Ukraine conflict: the conflict broke out around February 22, attracting media attention worldwide. In March, the conflict escalated, and the regional situation became increasingly tense, leading to even more media coverage worldwide. By April, the prolonged conflict had made the international media accustomed to it, resulting in a decline in media interest. Furthermore, we observed that the event selection biases of media outlets in both EG (Egypt) and CN (China) differed significantly from those of other countries. Given that both countries are not predominantly English-speaking, their English-language media outlets may have specific objectives such as promoting their national image and culture, which could influence and constrain the topics that a media outlet tends to cover.

Additionally, we observe that in March 2022, the country with the highest similarity to Ukraine was Russia, and in April, it was Poland. This change can be attributed to the evolving regional situation. In March, when the conflict broke out, media reports primarily focused on the warring parties, namely Russia and Ukraine. As the war continued, the impact of the war on Ukraine gradually became the focus of media coverage. For instance, the war led to the migration of a large number of Ukrainian citizens to nearby countries, among which Poland received the most citizens of Ukraine at that time.

Media shows diverse biases on different topics

In this experiment, we took 12 mainstream US news outlets as examples and conducted a quantitative media bias analysis on three typical topics (Fan and Gardent, 2022 ; Puglisi and Snyder Jr, 2015b ; Sun and Peng, 2021 ): Gender bias (about occupation); Political bias (about the American state); Income bias (about race & ethnicity). The topic words for each topic are listed in Supplementary Information Tab. S4 . These topic words are sourced from related literature (Caliskan et al. 2017 ), and search engines, along with the authors’ intuitive assessments.

Gender bias in terms of Occupation

In news coverage, media outlets may intentionally or unintentionally associate an occupation with a particular gender (e.g., stereotypes like police-man, nurse-woman). Such gender bias can subtly affect people’s attitudes towards different occupations and even impact employment fairness. To analyze gender biases in news coverage towards 8 common occupations (note that more occupations can be studied using the same methodology), we examined 12 mainstream US media outlets. As shown in Fig. 4 a, all these outlets tend to associate “teacher” and “nurse” with women. In contrast, when reporting on “police,” “driver,” “lawyer,” and “scientist,” most outlets show bias towards men. As for “director” and “photographer,” only slightly more than half of the outlets show bias towards men. Supplementary Information Tab. S5 shows the proportion of women in the eight occupations in America according to U.S. Bureau of Labor Statistics Footnote 7 . Women’s proportions in “teacher” and “nurse” dominate, while men’s in “police,” “driver,” and “lawyer” are significantly higher. Besides, among “directors,” “scientists,” and “photographers,” the proportions of women and men are about the same. Comparing the experiment results with USCB’s statistics, we find that these media outlets’ gender bias towards an occupation is highly consistent with the actual women (or men) ratio in the occupation. Such a phenomenon highlights the potential for media outlets to perpetuate and reinforce existing gender bias in society, emphasizing the need for increased awareness and attention to media bias. Note that we reorganized the corpus of each media outlet by up-sampling during the data preprocessing process, which introduced some randomness to the experiment results (See Methods for details). Therefore, we set five different random seeds for up-sampling and repeated the experiment mentioned above five times. A two-tailed t-test on the difference between the results shown in Fig. 4 a and the results of current repeated experiments showed no significant difference ( Supplementary Information Fig. S6) .

figure 4

Each column corresponds to a media outlet, and each row corresponds to a target word which usually means an entity or concept in the news text. The color bar on the right describes the value range of the bias value, with each interval of the bias value corresponding to a different color. As the bias value changes from negative to positive, the corresponding color changes from purple to yellow. Because the range of bias values differs across each topic, the color bar of different topics can also vary. The color of each heatmap square corresponds to an interval in the color bar. Specifically, the square located in row i and column j represents the bias of media j when reporting on target i. a Gender bias about eight common occupations. b Income bias about four races or ethnicities. c Political bias about the top-10 “red state” (Wyoming, West Virginia, North Dakota, Oklahoma, Idaho, Arkansas, Kentucky, South Dakota, Alabama, Texas) and the top-10 “blue state” (Hawaii, Vermont, California, Maryland, Massachusetts, New York, Rhode Island, Washington, Connecticut, Illinois) according to the CPVI ranking (Ardehaly and Culotta, 2017 ). Limited by the page layout, only the top-8 results are shown here. Please refer to Supplementary Information Fig. S5 for the complete results.

Income bias in terms of Race and Ethnicity

Media coverage often discusses the income of different groups of people, including many races and ethnicities. Here, we aim to investigate whether the media outlets are biased in their income coverage, such as associating a specific race or ethnicity with rich or poor. To this end, we selected four US racial and ethnic groups as research subjects: Asian, African, Hispanic, and Latino. In line with previous studies (Grieco and Cassidy, 2015 ; Nerenz et al. 2009 ; Perez and Hirschman, 2009 ), we considered Asian and African as racial categories and Hispanic and Latino as ethnic categories. Referring to the income statistics from USCB Footnote 8 , we do not strictly distinguish these concepts and compare them together. As shown in Fig. 4 b, for the majority of media outlets, Asian is most frequently associated with the rich, with ESPN being the only exception. This anomalous finding may be attributed to ESPN’s position as a sports media, with a primary emphasis on sports that are particularly popular with Hispanic, Latino, and African-American audiences, such as soccer, basketball, and golf. Additionally, there is a significant disparity in the media’s coverage of income bias toward Africans, Hispanics, and Latinos. Specifically, the biases towards Hispanic and Latino populations are generally comparable, with both groups being portrayed as richer than African Americans in most media coverage. Referring to the aforementioned income statistics of the U.S. population, the income rankings of different races and ethnicities have remained stable from 1950 to 2020: Asians have consistently had the highest income, followed by Hispanics with the second-highest income, and African Americans with the lowest income (the income of Black Americans is used as an approximation for African Americans). It is worth noting that USCB considers Hispanic and Latino to be the same ethnicity, although there are some controversies surrounding this practice (Mora, 2014 ; Rodriguez, 2000 ). However, these controversies are not the concern of this work, so we use Hispanic income as an approximation of Latino income following USCB. Comparing our experiment results with USCB’s income statistics, we find that the media outlets’ income bias towards different races and ethnicities is roughly consistent with their actual income levels. A two-tailed t-test on the difference between the results shown in Fig. 4 b and the results of repeated experiments showed no significant difference ( Supplementary Information Fig. S7) .

Political bias in terms of Region

Numerous studies have shown that media outlets tend to publish politically biased news articles that support the political parties they favor while criticizing those they oppose (Lazaridou et al. 2020 ; Puglisi, 2011 ). For example, a report from the Red State described liberals as regressive leftists with mental health issues. Conversely, a story from Right Wing News reported that Obama’s administration was terrible (Lazaridou et al. 2020 ). Such political inclinations will hinder readers’ objective judgment of political events and affect their attitudes toward different political parties. Therefore, we analyzed the political biases of 12 mainstream US media outlets when talking about different US states, aiming to increase public awareness of such biases in news coverage. As shown in Fig. 4 c, in the reports of these media outlets, most red states lean Republican, while most blue states lean Democrat. In particular, some blue states also show a leaning toward Republicans, such as Hawaii and Maryland. Such an abnormal phenomenon can be attributed to the source of the corpus data used in this study. The corpus data, which was used to train word embedding models, spans from January 2016 to December 2021. During this period, the Republican Party was in power, with Trump serving as president from January 2017 to January 2021. Thus, the majority of the data was collected during the Republican administration. We suggest that Trump’s presidency resulted in increased media coverage of the Republican Party, thus causing some blue states to be associated more frequently with Republicans in news reports. A two-tailed t-test on the difference between the results shown in Fig. 4 c and the results of repeated experiments showed no significant difference ( Supplementary Information Fig. S8 and Fig. S9) .

Media logic and news evaluation are two important concepts in social science. The former refers to the rules, conventions, and strategies that the media follow in the production, dissemination, and reception of information, reflecting the media’s organizational structure, commercial interests, and socio-cultural background (Altheide, 2015 ). The latter refers to the systematic analysis of the quality, effectiveness, and impact of news reports, involving multiple criteria and dimensions such as truthfulness, accuracy, fairness, balance, objectivity, diversity, etc. When studying media bias issues, media logic provides a framework for understanding the rules and patterns of media operations, while news evaluation helps identify and analyze potential biases in media reports. For example, to study media’s political bias, (D’heer, 2018 ; Esser and Strömbäck, 2014 ) compare the frameworks, languages, and perspectives used by traditional news media and social media in reporting political elections, so as to understand the impact of these differences on voters’ attitudes and behaviors. However, in spite of the progress, these methods often rely on manual observation and interpretation, thus inefficient and susceptible to human bias and errors.

In this work, we propose an automated media bias analysis framework that enables us to uncover media bias on a large scale. To carry out this study, we amassed an extensive dataset, comprising over 8 million event records and 1.2 million news articles from a diverse range of media outlets (see details of the data collection process in Methods). Our research delves into media bias from two distinct yet highly pertinent perspectives. From the macro perspective, we aim to uncover the event selection bias of each media outlet, i.e., which types of events a media outlet tends to report on. From the micro perspective, our goal is to quantify the bias of each media outlet in wording and sentence construction when composing news articles about the selected events. The experimental results align well with our existing knowledge and relevant statistical data, indicating the effectiveness of embedding methods in capturing the characteristics of media bias. The methodology we employed is unified and intuitive and follows a basic idea. First, we train embedding models using real-world data to capture and encode media bias. At this step, based on the characteristics of different types of media bias, we choose appropriate embedding methods to model them respectively (Deerwester et al. 1990 ; Le and Mikolov, 2014 ; Mikolov et al. 2013 ). Then, we utilize various methods, including cluster analysis (Lloyd, 1982 ; MacQueen, 1967 ), similarity calculation (Kusner et al. 2015 ), and semantic differential (Osgood et al. 1957 ), to extract media bias information from the obtained embedding models.

To capture the event selection biases of different media outlets, we employ Truncated SVD (Halko et al. 2011 ) on the “media-event” matrix to generate media embeddings. Truncated SVD is a widely used technique in NLP. In particular, LSA (Deerwester et al. 1990 ) applies Truncated SVD to the “document-word” matrix to capture the underlying topic-based semantic relationships between text documents and words. LSA assumes that a document tends to use relevant words when it talks about a particular topic and obtains the vector representation for each document in a latent topic space, where documents talking about similar topics are located near each other. By analogizing media outlets and events with documents and words, we can naturally apply Truncate SVD to explore media bias in the event selection process. Specifically, we assume that there are underlying topics when considering a media outlet’s event selection bias. If a media focuses on a topic, it will tend to report events related to that topic and otherwise ignore them. Therefore, media outlets sharing similar event selection biases (i.e., tend to report events about similar topics) will be close to each other in the latent topic space, which provides a good opportunity for us to study media bias (See Methods and Results for details).

When describing something, relevant contexts must be considered. For instance, positive and negative impressions are conveyed through the use of context words such as “diligent” and “lazy”, respectively. Similarly, a media outlet’s attitude towards something is reflected in the news context in which it is presented. Here, we study the association between each target and its news contexts based on the co-occurrence relationship between words. Our underlying assumption is that frequently co-occurring words are strongly associated, which aligns with the idea of word embedding models (Kenton and Toutanova, 2019 ; Le and Mikolov, 2014 ; Mikolov et al. 2013 ), where the embeddings of frequently co-occurring words are relatively similar. For example, suppose that in the corpus of media M, the word “scientist” often co-occurs with female-related words (e.g., “woman” and “she”, etc.) but rarely with those male-related words. Then, the semantic similarities of “scientist” with female-related words should be much higher than those of male-related words in the word embedding model. Therefore, we can conclude that media M’s reports on scientists are biased towards women.

According to the theory of Semantic Differential (Osgood et al. 1957 ), the difference in semantic similarities between “scientist” and female-related words versus male-related words can serve as an estimation of media M’s gender bias. Since we have kept all settings (e.g., corpus size, starting point for model fine-tuning, etc.) the same when training word embedding models for different media outlets, the estimated bias values can be interpreted as absolute ones within the same reference system. In other words, the estimated bias values for different media outlets are directly comparable in this study, with a value of 0 denoting unbiased and a value closer to 1 or -1 indicating a more pronounced bias.

We notice that there has been literature investigating the choice of events/topics and words/frames to measure media bias, such as partisan and ideological biases (Gentzkow et al. 2015 ; Puglisi and Snyder Jr, 2015b ). However, our approach not only considers bias related to the selective reporting of events (using event embedding) but also studies biased wording in news texts (using word embedding). While the former focuses on the macro level, the latter examines the micro level. These two perspectives are distinct yet highly relevant, but previous studies often only consider one of them. For the choice of events/topics, our approach allows us to explore how they change over time. For example, we can analyze the time-changing similarities between media outlets from different countries, as shown in Fig. 3 . For the choice of words/frames, prior work has either analyzed specific biases based on the frequency of particular words (Gentzkow and Shapiro, 2010 ; Gentzkow et al. 2006 ), which fails to capture deeper semantics in media language or analyzed specific biases by merely aggregating the analysis results for every single article in the corpus (e.g., calculating the sentiment (Gentzkow et al. 2006 ; Lott Jr and Hassett, 2014 ; Soroka, 2012 ) of each article or its similarity with certain authorship (Gentzkow and Shapiro, 2010 ; Groseclose and Milyo, 2005 ), then summing them up as the final bias value), without considering the relationships between different articles, thus lacking a holistic nature. In contrast, our method, based on word embeddings (Le and Mikolov, 2014 ; Mikolov et al. 2013 ), directly models the semantic associations between all words and entities in the corpus with a neural network, offering advantages in capturing both semantic meaning and holistic nature. Specially, we not only utilize word embedding techniques but also integrate them with appropriate psychological/sociological theories, such as the Semantic Differential theory and the Cognitive Miser theory. These theories endow our approach with better interpretability. In addition, the method we propose is a generalizable framework for studying media bias using embedding techniques. While this study has focused on validating its effectiveness with specific types of media bias, it can actually be applied to a broader range of media bias research. We will expand the application of this framework in future work.

As mentioned above, our proposed framework examines media bias from two distinct but highly relevant perspectives. Here, taking the significant Russia-Ukraine conflict event as an example, we will demonstrate how these two perspectives contribute to providing researchers and the public with a more comprehensive and objective assessment of media bias. For instance, we can gather relevant news articles and event reporting records about the ongoing Russia-Ukraine conflict from various media outlets worldwide and generate media and word embedding models. Then, according to the embedding similarities of different media outlets, we can judge which types of events each media outlet tends to report and select some media that tend to report on different events. By synthesizing the news reports of the selected media, we can gain a more comprehensive understanding of the conflict instead of being limited to the information selectively provided by a few media. Besides, based on the word embedding model and the bias estimation method based on Semantic Differential, we can objectively judge each media’s attitude towards Russia and Ukraine (e.g., whether a media tends to use positive or negative words to describe either party). Once a news outlet is detected as apparently biased, we should read its articles more carefully to avoid being misled.

In the end, despite the advantages of our framework, there are still some shortcomings that need improvement. First, while the media embeddings generated based on matrix decomposition have successfully captured media bias in the event selection process, interpreting these continuous numerical vectors directly can be challenging. We hope that future work will enable the media embedding to directly explain what a topic exactly means and which topics a media outlet is most interested in, thus helping us understand media bias better. Second, since there is no absolute, independent ground truth on which events have occurred and should have been covered, the aforementioned media selection bias, strictly speaking, should be understood as relative topic coverage, which is a narrower notion. Third, for topics involving more complex semantic relationships, estimating media bias using scales based on antonym pairs and the Semantic Differential theory may not be feasible, which needs further investigation in the future.

Data availability

The data that support the findings of this study are available at https://github.com/CGCL-codes/media-bias .

Code availability

The code that supports the findings of this study is also available at https://github.com/CGCL-codes/media-bias .

https://fair.org/home/when-both-sides-are-covered-in-verizon-strike-bosses-side-is-heard-more/ .

These views were extracted from reports by some mainstream US media outlets in 2022 when the Democratic Party (left-wing) was in power.

https://www.nytimes.com/2022/09/26/world/middleeast/women-iran-protests-hijab.html .

https://www.nytimes.com/2014/08/08/world/asia/uighurs-veils-a-protest-against-chinas-curbs.html .

https://www.gdeltproject.org/ .

https://mediacloud.org/ .

https://www.bls.gov/cps/cpsaat11.htm .

https://www.census.gov/content/dam/Census/library/publications/2021/demo/p60-273.pdf .

Altheide, DL (2015) Media logic. The international encyclopedia of political communication, pages 1–6

Ansolabehere S, Lessem R, Snyder Jr JM (2006) The orientation of newspaper endorsements in us elections, 1940–2002. Quarterly Journal of political science 1(4):393

Article   Google Scholar  

Ardehaly, EM, Culotta, A (2017) Mining the demographics of political sentiment from twitter using learning from label proportions. In 2017 IEEE international conference on data mining (ICDM), pages 733–738. IEEE

Baron DP (2006) Persistent media bias. Journal of Public Economics 90(1-2):1–36

Article   ADS   Google Scholar  

Bovet A, Makse HA (2019) Influence of fake news in twitter during the 2016 us presidential election. Nature communications 10(1):1–14

Caliskan A, Bryson JJ, Narayanan A (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356(6334):183–186

Article   ADS   CAS   PubMed   Google Scholar  

D’Alessio D, Allen M (2000) Media bias in presidential elections: A meta-analysis. Journal of communication 50(4):133–156

Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. Journal of the American society for information science 41(6):391–407

DellaVigna S, Kaplan E (2008) The political impact of media bias. Information and Public Choice, page 79

Downs A (1957) An economic theory of political action in a democracy. Journal of political economy 65(2):135–150

D’heer E (2018) Media logic revisited. the concept of social media logic as alternative framework to study politicians’ usage of social media during election times. Media logic (s) revisited: Modelling the interplay between media institutions, media technology and societal change, pages 173–194

Esser F, Strömbäck J (2014) Mediatization of politics: Understanding the transformation of Western democracies. Springer

Fan A, Gardent, C (2022) Generating biographies on Wikipedia: The impact of gender bias on the retrieval-based generation of women biographies. In Proceedings of the Conference of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)

Firth, JR (1957) A synopsis of linguistic theory, 1930–1955. Studies in linguistic analysis

Fiske ST, Taylor SE (1991) Social cognition. Mcgraw-Hill Book Company

Galtung J, Ruge MariHolmboe (1965) The structure of foreign news: The presentation of the congo, cuba and cyprus crises in four norwegian newspapers. Journal of peace research 2(1):64–90

Gentzkow M, Shapiro JM (2010) What drives media slant? evidence from us daily newspapers. Econometrica 78(1):35–71

Article   MathSciNet   Google Scholar  

Gentzkow M, Glaeser EL, Goldin C (2006) The rise of the fourth estate. how newspapers became informative and why it mattered. In Corruption and reform: Lessons from America’s economic history, pages 187–230. University of Chicago Press

Gentzkow M, Shapiro JM, Stone DF (2015) Media bias in the marketplace: Theory. In Handbook of Media Economics, volume 1, pages 623–645. Elsevier

Grand G, Blank IdanAsher, Pereira F, Fedorenko E (2022) Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nature Human Behaviour 6(7):975–987

Article   PubMed   PubMed Central   Google Scholar  

Grieco EM, Cassidy RC (2015) Overview of race and hispanic origin: Census 2000 brief. In ’Mixed Race’Studies, pages 225–243. Routledge

Groseclose T, Milyo J (2005) A measure of media bias. The Quarterly Journal of Economics 120(4):1191–1237

Grossmann, Matt and Hopkins, David A (2016) Asymmetric politics: Ideological Republicans and group interest Democrats . Oxford University Press

Halko N, Martinsson Per-Gunnar, Tropp JA (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53(2):217–288

Hamborg F, Donnay K, Gipp B (2019) Automated identification of media bias in news articles: an interdisciplinary literature review. International Journal on Digital Libraries 20(4):391–415

Haraldsson A, Wängnerud L (2019) The effect of media sexism on women’s political ambition: evidence from a worldwide study. Feminist media studies 19(4):525–541

Harcup T, O’neill D (2001) What is news? galtung and ruge revisited. Journalism studies 2(2):261–280

Harcup T, O’neill D (2017) What is news? news values revisited (again). Journalism studies 18(12):1470–1488

Harris ZS (1954) Distributional structure. Word 10(2-3):146–162

Harwood TG, Garry T (2003) An overview of content analysis. The marketing review 3(4):479–498

Ho DE, Quinn KM et al. (2008) Measuring explicit political positions of media. Quarterly Journal of Political Science 3(4):353–377

Huang H, Chen Z, Shi X, Wang C, He Z, Jin H, Zhang M, Li Z (2021) China in the eyes of news media: a case study under covid-19 epidemic. Frontiers of Information Technology & Electronic Engineering 22(11):1443–1457

Huang P-S, Zhang H, Jiang R, Stanforth R, Welbl J, Rae J, Maini V, Yogatama D, Kohli P (2020) Reducing sentiment bias in language models via counterfactual evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 65–83

Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, (2019)

Kusner M, Sun Y, Kolkin N, Weinberger K. From word embeddings to document distances. In International conference on machine learning, pages 957–966. PMLR, (2015)

Larcinese V, Puglisi R, Snyder Jr JM (2011) Partisan bias in economic news: Evidence on the agenda-setting behavior of us newspapers. Journal of public Economics 95(9–10):1178–1189

Lazaridou K, Löser A, Mestre M, Naumann F (2020) Discovering biased news articles leveraging multiple human annotations. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1268–1277

Le, Q, Mikolov, T (2014) Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR

Liu, R, Jia, C, Wei, J, Xu, G, Wang, L, Vosoughi, S (2021) Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14857–14866

Liu R, Wang L, Jia, C, Vosoughi, S (2021) Political depolarization of news articles using attribute-aware word embeddings. In Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM 2021)

Lloyd S (1982) Least squares quantization in pcm. IEEE transactions on information theory 28(2):129–137

Lott Jr JR, Hassett KA (2014) Is newspaper coverage of economic events politically biased? Public Choice 160(1–2):65–108

Lühiste M, Banducci S (2016) Invisible women? comparing candidates’ news coverage in Europe. Politics & Gender 12(2):223–253

MacGregor, B (1997) Live, direct and biased?: Making television news in the satellite age

MacQueen, J (1967) Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, pages 281–297

Merloe P (2015) Authoritarianism goes global: Election monitoring vs. disinformation. Journal of Democracy 26(3):79–93

Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In International Conference on Learning Representations

Mora, GC (2014) Making Hispanics: How activists, bureaucrats, and media constructed a new American. University of Chicago Press

Nerenz DR, McFadden B, Ulmer C et al. (2009) Race, ethnicity, and language data: standardization for health care quality improvement

Niven, David (2002). Tilt?: The search for media bias. Greenwood Publishing Group

Osgood, Charles Egerton, Suci, George J and Tannenbaum, Percy H (1957) The measurement of meaning. Number 47. University of Illinois Press

Papacharissi Z, de Fatima Oliveira M (2008) News frames terrorism: A comparative analysis of frames employed in terrorism coverage in US and UK newspapers. The international journal of press/politics 13(1):52–74

Park S, Kang S, Chung, S, Song, J (2009) Newscube: delivering multiple aspects of news to mitigate media bias. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 443–452

Paul R, Elder L (2004) The thinkers guide for conscientious citizens on how to detect media bias & propaganda in national and world news: Based on critical thinking concepts & tools

Perez AnthonyDaniel, Hirschman C (2009) The changing racial and ethnic composition of the US population: Emerging American identities. Population and development review 35(1):1–51

Puglisi, R (2011) Being the New York times: the political behaviour of a newspaper. The BE journal of economic analysis & policy 11(1)

Puglisi R, Snyder Jr JM (2015a) The balanced US press. Journal of the European Economic Association 13(2):240–264

Puglisi, Riccardo and Snyder Jr, James M (2015b) Empirical studies of media bias. In Handbook of media economics, volume 1, pages 647–667. Elsevier

Qiang J, Zhang F, Li Y, Yuan Y, Zhu Y, Wu X (2023) Unsupervised statistical text simplification using pre-trained language modeling for initialization. Frontiers of Computer Science 17(1):171303

Rodriguez, CE (2000) Changing race: Latinos, the census, and the history of ethnicity in the United States, volume 41. NYU Press

Ross K, Carter C (2011) Women and news: A long and winding road. Media, Culture & Society 33(8):1148–1165

Sahlgren M (2008) The distributional hypothesis. Italian Journal of Disability Studies 20:33–53

Google Scholar  

Soroka SN (2012) The gatekeeping function: distributions of information in media and the real world. The Journal of Politics 74(2):514–528

Stanovich KE (2009) What intelligence tests miss: The psychology of rational thought. Yale University Press

Stroud NatalieJomini (2010) Polarization and partisan selective exposure. Journal of Communication 60(3):556–576

Sun J, Peng N (2021) Men are elected, women are married: Events gender bias on wikipedia. In Proceedings of the Conference of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)

Sunstein C (2002) The law of group polarization. Journal of Political Philosophy 10:175–195

Tahmasbi F, Schild L, Ling C, Blackburn J, Stringhini G, Zhang Y, Zannettou S (2021) “go eat a bat, chang!”: On the emergence of sinophobic behavior on web communities in the face of covid-19. In Proceedings of the Web Conference, pages 1122–1133

Vaismoradi M, Turunen H, Bondas T (2013) Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. Nursing & health sciences 15(3):398–405

Wang T, Lin XV, Rajani NF, McCann B, Ordonez V, Xiong, C (2020). Double-hard debias: Tailoring word embeddings for gender bias mitigation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5443–5453

White DavidManning (1950) The “gate keeper”: a case study in the selection of news. Journalism Quarterly 27(4):383–390

Zeng Y, Li Z, Chen Z, Ma H (2023) Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network. Frontiers of Computer Science 17(6):176340

Zhang Y, Wang H, Yin G, Wang T, Yu Y (2017) Social media in github: the role of@-mention in assisting software development. Science China Information Sciences 60(3):1–18

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (No. 62127808).

Author information

Authors and affiliations.

National Engineering Research Center for Big Data Technology and System, Wuhan, China

Hong Huang, Hua Zhu & Hai Jin

Services Computing Technology and System Lab, Wuhan, China

Cluster and Grid Computing Lab, Wuhan, China

School of Computer Science and Technology, Wuhan, China

Hong Huang, Hua Zhu, Wenshi Liu & Hai Jin

Huazhong University of Science and Technology, Wuhan, China

Hong Huang, Hua Zhu, Wenshi Liu, Hua Gao & Hai Jin

DIRO, Université de Montréal & Mila & Canada CIFAR AI Chair, Montreal, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

HH: conceptualization, writing-review & editing, supervision; HZ: software, writing-original draft, data curation; WSL: software; HG and HJ: resources; BL: methodology, writing-review & editing, supervision.

Corresponding author

Correspondence to Hong Huang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval is not required as the study does not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Huang, H., Zhu, H., Liu, W. et al. Uncovering the essence of diverse media biases from the semantic embedding space. Humanit Soc Sci Commun 11 , 656 (2024). https://doi.org/10.1057/s41599-024-03143-w

Download citation

Received : 26 February 2023

Accepted : 07 May 2024

Published : 22 May 2024

DOI : https://doi.org/10.1057/s41599-024-03143-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research paper on media bias

MediaBiasGroup_Logo_verlauf_neu_2_bold

Resources & Publications

research paper on media bias

All members of the network can share their recent work on media bias here.

Most recent models are published on Huggingface

[Benchmark, GitHub] MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection

[Dataset, GitHub] BABE – Bias Annotations By Experts

[Scale/Questionnaire to measure bias perception] Do You Think It’s Biased? How To Ask For The Perception Of Media Bias (A set of tested questions to assess media bias perception to be used in any bias-related research)

[Dataset, Zenodo] MBIC -A Media Bias Annotation Dataset Including Annotator Characteristics

Publications

Hinterreiter, Smi; Wessel, Martin; Schliski, Fabian; Echizen, Isao; Latoschik, Marc Erich; Spinde, Timo

NewsUnfold: Creating a News-Reading Application That Indicates Linguistic Media Bias and Collects Feedback Proceedings Article Forthcoming

In: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM'25), AAAI, Copenhagen, Denmark, Forthcoming , (Conditionally accepted for publication) .

Abstract | Links | BibTeX | Tags: crowdsourcing , HITL , linguistic bias , media bias , news bias

  • https://media-bias-research.org/wp-content/uploads/2024/07/Preprint_ICWSM_25_New[...]

Hinterreiter, Smi; Spinde, Timo; Oberdörfer, Sebastian; Echizen, Isao; Latoschik, Marc Erich

News Ninja: Gamified Annotation of Linguistic Bias in Online News Journal Article Forthcoming

In: Proc. ACM Hum.-Comput. Interact., vol. 8, no. CHI PLAY, Forthcoming , (Publisher: Association for Computing Machinery. Conditionally accepted for publication) .

Abstract | Links | BibTeX | Tags: crowdsourcing , Game With A Purpose , linguistic bias , media bias , news bias

  • https://media-bias-research.org/wp-content/uploads/2024/07/Preprint_News_Ninja.p[...]
  • doi:10.1145/3677092

Wessel, Martin; Horych, Tomas

Beyond the Surface: Spurious Cues in Automatic Media Bias Detection Proceedings Article

In: Bharathi B Bharathi Raja Chakravarthi, Paul Buitelaar (Ed.): Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pp. 21–30, Association for Computational Linguistics, 2024 .

Abstract | Links | BibTeX | Tags:

  • https://aclanthology.org/2024.ltedi-1.3

Horych, Tomas; Wessel, Martin; Wahle, Jan Philip; Ruas, Terry; Wassmuth, Jerome; Greiner-Petter, Andre; Aizawa, Akiko; Gipp, Bela; Spinde, Timo

MAGPIE: Multi-Task Analysis of Media-Bias Generalization with Pre-Trained Identification of Expressions Proceedings Article

In: "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation", 2024 .

Abstract | Links | BibTeX | Tags: dataset , multi-task learning , Transfer learning

  • https://aclanthology.org/2024.lrec-main.952

Wessel, Martin; Horych, Tomas; Ruas, Terry; Aizawa, Akiko; Gipp, Bela; Spinde, Timo

Introducing MBIB - the first Media Bias Identification Benchmark Task and Dataset Collection Proceedings Article

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), ACM, New York, NY, USA, 2023 , ISBN: 978-1-4503-9408-6/23/07 .

  • https://media-bias-research.org/wp-content/uploads/2023/04/Wessel2023Preprint.pdf
  • doi:https://doi.org/10.1145/3539618.3591882

Spinde, Timo; Richter, Elisabeth; Wessel, Martin; Kulshrestha, Juhi; Donnay, Karsten

What do Twitter comments tell about news article bias? Assessing the impact of news article bias on its perception on Twitter Journal Article

In: Online Social Networks and Media, vol. 37-38, pp. 100264, 2023 , ISSN: 2468-6964 .

Abstract | Links | BibTeX | Tags: Hate speech detection , media bias , Sentiment analysis , Transfer learning

  • https://www.sciencedirect.com/science/article/pii/S246869642300023X
  • doi:https://doi.org/10.1016/j.osnem.2023.100264

Spinde, Timo; Hinterreiter, Smi; Haak, Fabian; Ruas, Terry; Giese, Helge; Meuschke, Norman; Gipp, Bela

The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias Journal Article

In: arXiv preprint, 2023 .

Links | BibTeX | Tags:

  • https://media-bias-research.org/wp-content/uploads/2023/12/spinde2023.pdf

Krieger, David; Spinde, Timo; Ruas, Terry; Kulshrestha, Juhi; Gipp, Bela

A Domain-adaptive Pre-training Approach for Language Bias Detection in News Proceedings Article

In: 2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Cologne, Germany, 2022 .

  • https://media-bias-research.org/wp-content/uploads/2022/06/Krieger2022_mbg.pdf
  • doi:10.1145/3529372.3530932

Zhukova, Anastasia; Hamborg, Felix; Gipp, Bela

Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes Proceedings Article

In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4884–4893, European Language Resources Association, Marseille, France, 2022 .

  • https://aclanthology.org/2022.lrec-1.522

Spinde, Timo; Krieger, Jan-David; Ruas, Terry; Mitrović, Jelena; Götz-Hahn, Franz; Aizawa, Akiko; Gipp, Bela

Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles Proceedings Article

In: Proceedings of the iConference 2022, Virtual event, 2022 .

  • https://media-bias-research.org/wp-content/uploads/2022/03/Spinde2022a_mbg.pdf
  • doi:https://doi.org/10.1007/978-3-030-96957-8_20

Spinde, Timo; Jeggle, Christin; Haupt, Magdalena; Gaissmaier, Wolfgang; Giese, Helge

How do we raise media bias awareness effectively? Effects of visualizations to communicate bias Journal Article

In: PLOS ONE, vol. 17, no. 4, pp. 1-14, 2022 .

  • https://doi.org/10.1371/journal.pone.0266204
  • doi:10.1371/journal.pone.0266204

Haak, Fabian; Schaer, Philipp

Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation Proceedings Article

In: WebSci '22: 14th ACM Web Science Conference 2022, ACM, 2022 .

BibTeX | Tags: bias esupol haak myown schaer

Zhukova, Anastasia; Hamborg, Felix; Donnay, Karsten; Gipp, Bela

XCoref: Cross-Document Coreference Resolution in the Wild Proceedings Article

In: Information for a Better World: Shaping the Global Future: 17th International Conference, IConference 2022, Virtual Event, February 28 – March 4, 2022, Proceedings, Part I, pp. 272–291, Springer-Verlag, Berlin, Heidelberg, 2022 , ISBN: 978-3-030-96956-1 .

Abstract | Links | BibTeX | Tags: Cross-document coreference resolution , media bias , news analysis

  • https://doi.org/10.1007/978-3-030-96957-8_25
  • doi:10.1007/978-3-030-96957-8_25

Spinde, Timo; Plank, Manuel; Krieger, Jan-David; Ruas, Terry; Gipp, Bela; Aizawa, Akiko

Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts Proceedings Article

In: Findings of the Association for Computational Linguistics: EMNLP 2021, Dominican Republic, 2021 .

  • https://media-bias-research.org/wp-content/uploads/2022/01/Neural_Media_Bias_Dete[...]
  • doi:10.18653/v1/2021.findings-emnlp.101

Hinterreiter, Smi

A Gamified Approach To Automatically Detect Biased Wording And Train Critical Reading Proceedings Article

In: 2021 IEEE International Conference on Data Mining Workshops (ICDMW), 2021 .

  • https://media-bias-research.org/wp-content/uploads/2021/10/hinterreiter2021a.pdf
  • doi:10.1109/ICDMW53433.2021.00141

Spinde, Timo

An Interdisciplinary Approach for the Automated Detection and Visualization of Media Bias in News Articles Proceedings Article

Links | BibTeX | Tags: media bias , news analysis , slanted coverage , text retrieval

  • https://media-bias-research.org/wp-content/uploads/2021/09/Spinde2021g.pdf
  • doi:10.1109/ICDMW53433.2021.00144

Spinde, Timo; Sinha, Kanishka; Meuschke, Norman; Gipp, Bela

TASSY - A Text Annotation Survey System Proceedings Article

In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021 .

  • https://media-bias-research.org/wp-content/uploads/2022/01/Spinde2021c.pdf
  • doi:10.1109/JCDL52503.2021.00052

Spinde, Timo; Kreuter, Christina; Gaissmaier, Wolfgang; Hamborg, Felix; Gipp, Bela; Giese, Helge

Do You Think It’s Biased? How To Ask For The Perception Of Media Bias Proceedings Article

  • https://media-bias-research.org/wp-content/uploads/2022/01/Spinde2021e.pdf
  • doi:10.1109/JCDL52503.2021.00018

Spinde, Timo; Krieger, David; Plank, Manu; Gipp, Bela

Towards A Reliable Ground-Truth For Biased Language Detection Proceedings Article

In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Virtual Event, 2021 .

  • https://media-bias-research.org/wp-content/uploads/2022/01/Spinde2021d.pdf
  • doi:10.1109/JCDL52503.2021.00053

Haak, Fabian; Engelmann, Björn

IRCologne at GermEval 2021: Toxicity Classification Proceedings Article

In: Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, pp. 47–53, Association for Computational Linguistics, Duesseldorf, Germany, 2021 .

Abstract | Links | BibTeX | Tags: 2021 bias classification data engelmann haak nlp programming snorkel toxic

  • https://aclanthology.org/2021.germeval-1.7

Hamborg, F.; Heinser, K.; Zhukova, A.; Donnay, K.; Gipp, B.

Newsalyze: Effective Communication of Person-Targeting Biases in News Articles Proceedings Article

In: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 130-139, IEEE Computer Society, Los Alamitos, CA, USA, 2021 .

Abstract | Links | BibTeX | Tags: visualization;costs;atmospheric measurements;voting;natural languages;manuals;particle measurements

  • https://doi.ieeecomputersociety.org/10.1109/JCDL52503.2021.00025
  • doi:10.1109/JCDL52503.2021.00025

Cabot, Pere-Lluís Huguet; Abadi, David; Fischer, Agneta; Shutova, Ekaterina

Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions Proceedings Article

In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1921–1945, Association for Computational Linguistics, Online, 2021 .

  • https://aclanthology.org/2021.eacl-main.165
  • doi:10.18653/v1/2021.eacl-main.165

Spinde, Timo; Rudnitckaia, Lada; Hamborg, Felix; Bela,; Gipp,

Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings Proceedings Article

In: Proceedings of the iConference 2021, Beijing, China (Virtual Event), 2021 .

  • https://media-bias-research.org/wp-content/uploads/2021/01/Spinde2021.pdf
  • doi:10.1007/978-3-030-71305-8_17

Spinde, Timo; Rudnitckaia, Lada; Kanishka, Sinha; Hamborg, Felix; Bela,; Gipp,; Donnay, Karsten

MBIC – A Media Bias Annotation Dataset Including Annotator Characteristics Proceedings Article

  • https://media-bias-research.org/wp-content/uploads/2021/01/Spinde2021a.pdf
  • doi:10.6084/m9.figshare.17192924

Spinde, Timo; Rudnitckaia, Lada; Mitrović, Jelena; Hamborg, Felix; Granitzer, Michael; Gipp, Bela; Donnay, Karsten

Automated identification of bias inducing words in news articles using linguistic and context-oriented features Journal Article

In: Information Processing & Management, vol. 58, no. 3, pp. 102505, 2021 , ISSN: 0306-4573 .

Abstract | Links | BibTeX | Tags: bias data set , context analysis , feature engineering , media bias , news analysis , text analysis

  • https://www.sciencedirect.com/science/article/pii/S0306457321000157/pdfft?md5=64[...]
  • doi:https://doi.org/10.1016/j.ipm.2021.102505

Ehrhardt, Jonas; Spinde, Timo; Vardasbi, Ali; Hamborg, Felix

Omission of Information: Identifying Political Slant via an Analysis of Co-occurring Entities Book Section

In: Information between Data and Knowledge, vol. 74, pp. 80–93, Werner Hülsbusch, Glückstadt, 2021 , (Session 2: Information Behavior and Information Literacy 2) .

Abstract | Links | BibTeX | Tags: media bias; bias by omission; news articles; co-occurrences

  • https://epub.uni-regensburg.de/44939/

Garz, Marcel; Martin, Gregory J.

Media Influence on Vote Choices: Unemployment News and Incumbents' Electoral Prospects Journal Article

In: American Journal of Political Science, vol. 65, no. 2, pp. 278-293, 2021 .

  • https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12539
  • doi:https://doi.org/10.1111/ajps.12539

Babaei, Mahmoudreza; Kulshrestha, Juhi; Chakraborty, Abhijnan; Redmiles, Elissa M.; Cha, Meeyoung; Gummadi, Krishna P.

Analyzing Biases in Perception of Truth in News Stories and Their Implications for Fact Checking Journal Article

In: IEEE Transactions on Computational Social Systems, 2021 .

  • doi:10.1109/TCSS.2021.3096038

Perception-Aware Bias Detection for Query Suggestions Proceedings Article

In: Boratto, Ludovico; Faralli, Stefano; Marras, Mirko; Stilo, Giovanni (Ed.): Advances in Bias and Fairness in Information Retrieval - Second International Workshop on Algorithmic Bias in Search and Recommendation, BIAS 2021, Lucca, Italy, April 1, 2021, Proceedings, Springer Nature, Switzerland, 2021 , ISBN: 978-3-030-78817-9 .

Links | BibTeX | Tags: 2021 bias esupol haak myown schaer

  • doi:10.1007/978-3-030-78818-6_12

Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons Proceedings Article

In: Diversity, Divergence, Dialogue: 16th International Conference, IConference 2021, Beijing, China, March 17–31, 2021, Proceedings, Part I, pp. 514–526, Springer-Verlag, Beijing, China, 2021 , ISBN: 978-3-030-71291-4 .

Abstract | Links | BibTeX | Tags: Coreference resolution , media bias , news analysis

  • https://doi.org/10.1007/978-3-030-71292-1_40
  • doi:10.1007/978-3-030-71292-1_40

Cabot, Pere-Lluís Huguet; Dankers, Verna; Abadi, David; Fischer, Agneta; Shutova, Ekaterina

The Pragmatics behind Politics: Modelling Metaphor, Framing and Emotion in Political Discourse Proceedings Article

In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4479–4488, Association for Computational Linguistics, Online, 2020 .

  • https://aclanthology.org/2020.findings-emnlp.402
  • doi:10.18653/v1/2020.findings-emnlp.402

Spinde, Timo; Hamborg, Felix; Gipp, Bela

Media Bias in German News Articles : A Combined Approach Proceedings Article

In: Proceedings of the 8th International Workshop on News Recommendation and Analytics ( INRA 2020), Virtual event, 2020 .

  • https://media-bias-research.org/wp-content/uploads/2021/01/Media-Bias-in-German-N[...]
  • doi:10.1007/978-3-030-65965-3_41

Ganguly, Soumen; Kulshrestha, Juhi; An, Jisun; Kwak, Haewoon

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets Proceedings Article

In: pp. 939-943, 2020 .

  • https://ojs.aaai.org/index.php/ICWSM/article/view/7362

An Integrated Approach to Detect Media Bias in German News Articles Proceedings Article

In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 505–506, Association for Computing Machinery, Virtual Event, China, 2020 , ISBN: 9781450375856 .

Abstract | Links | BibTeX | Tags: content analysis , frame analysis , media bias , news bias , news slant

  • https://doi.org/10.1145/3383583.3398585
  • doi:10.1145/3383583.3398585

Spinde, Timo; Hamborg, Felix; Donnay, Karsten; Becerra, Angelica; Gipp, Bela

Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias Proceedings Article

In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 389–392, Association for Computing Machinery, Virtual Event, China, 2020 , ISBN: 9781450375856 .

Abstract | Links | BibTeX | Tags: bias visualization , news bias , news slant , perception of news

  • https://doi.org/10.1145/3383583.3398619
  • doi:10.1145/3383583.3398619

Garz, Marcel; Sörensen, Jil; Stone, Daniel F.

Partisan selective engagement: Evidence from Facebook Journal Article

In: Journal of Economic Behavior & Organization, vol. 177, pp. 91-108, 2020 , ISSN: 0167-2681 .

Abstract | Links | BibTeX | Tags: Filter bubble , media bias , Polarization , Political immunity , Social media

  • https://www.sciencedirect.com/science/article/pii/S0167268120302079
  • doi:https://doi.org/10.1016/j.jebo.2020.06.016

Garz, Marcel; Sood, Gaurav; Stone, Daniel F.; Wallace, Justin

The supply of media slant across outlets and demand for slant within outlets: Evidence from US presidential campaign news Journal Article

In: European Journal of Political Economy, vol. 63, pp. 101877, 2020 , ISSN: 0176-2680 .

Abstract | Links | BibTeX | Tags: Horse race news , media bias , Media slant , Motivated beliefs , Polarization , Selective exposure

  • https://www.sciencedirect.com/science/article/pii/S0176268020300252
  • doi:https://doi.org/10.1016/j.ejpoleco.2020.101877

Hamborg, Felix; Zhukova, Anastasia; Gipp, Bela

Automated Identification of Media Bias by Word Choice and Labeling in News Articles Proceedings Article

In: Proceedings of the 18th Joint Conference on Digital Libraries, pp. 196–205, IEEE Press, Champaign, Illinois, 2020 , ISBN: 9781728115474 .

Abstract | Links | BibTeX | Tags: automated content analysis , automated frame analysis , CAQDAS , CAS , emotions , entity perception , news bias , news slant , NLP

  • https://doi.org/10.1109/JCDL.2019.00036
  • doi:10.1109/JCDL.2019.00036

Bonart, Malte; Samokhina, Anastasiia; Heisenberg, Gernot; Schaer, Philipp

An investigation of biases in web search engine query suggestions Journal Article

In: Online Information Review, vol. 44, no. 2, pp. 365-381, 2019 , ISSN: 1468-4527 .

Abstract | Links | BibTeX | Tags: bias bonart esupol myown schaer

  • https://www.emerald.com/insight/content/doi/10.1108/OIR-11-2018-0341/full/html
  • doi:10.1108/OIR-11-2018-0341

Kulshrestha, Juhi; Eslami, Motahhare; Messias, Johnnatan; Zafar, Muhammad Bilal; Ghosh, Saptarshi; Gummadi, Krishna P.; Karahalios, Karrie

Search bias quantification : investigating political bias in social media and web search Journal Article

In: Information Retrieval Journal, vol. 22, no. 1-2, pp. 188–227, 2019 , ISSN: 1386-4564 .

  • doi:10.1007/s10791-018-9341-2

Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling Proceedings Article

In: Taylor, Natalie Greene; Christian-Lamb, Caitlin; Martin, Michelle H.; Nardi, Bonnie (Ed.): Information in Contemporary Society, pp. 179–187, Springer International Publishing, Cham, 2019 , ISBN: 978-3-030-15742-5 .

  • https://media-bias-research.org/wp-content/uploads/2023/05/hamborg2019.pdf

Babaei, Mahmoudreza; Kulshrestha, Juhi; Chakraborty, Abhijnan; Benevenuto, Fabrício; Gummadi, Krishna P.; Weller, Adrian

Purple Feed: Identifying High Consensus News Posts on Social Media Proceedings Article

In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 10–16, Association for Computing Machinery, New Orleans, LA, USA, 2018 , ISBN: 9781450360128 .

Abstract | Links | BibTeX | Tags: audience leaning based features , consensus , news consumption in social media , Polarization , purple feed

  • https://doi.org/10.1145/3278721.3278761
  • doi:10.1145/3278721.3278761

Ribeiro, Filipe N.; Henrique, Lucas; Benevenuto, Fabricio; Chakraborty, Abhijnan; Kulshrestha, Juhi; Babaei, Mahmoudreza; Gummadi, Krishna P.

Media Bias Monitor : Quantifying Biases of Social Media News Outlets at Large-Scale Proceedings Article

In: Twelfth International AAAI Conference on Web and Social Media, pp. 290–299, AAAI Press, Palo Alto, California, 2018 , ISBN: 978-1-57735-798-8 .

  • https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17878

Bonart, Malte; Schaer, Philipp

Intertemporal Connections Between Query Suggestions and Search Engine Results for Politics Related Queries Proceedings Article

In: EuroCSS 2018 Dataset Challenge, Cologne, 2018 .

Links | BibTeX | Tags: bias bonart esupol myown schaer

  • https://arxiv.org/abs/1812.08585

Garz, Marcel

Good news and bad news: evidence of media bias in unemployment reports Journal Article

In: Public Choice, vol. 161, no. 3/4, pp. 499–515, 2014 , ISSN: 00485829, 15737101 .

  • http://www.jstor.org/stable/24507505

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

On the nature of real and perceived bias in the mainstream media

Erick elejalde.

1 Department of Computer Science, Faculty of Engineering, Universidad de Concepción, Concepción, Chile

2 Institute of Data Science, Faculty of Engineering, Universidad del Desarrollo, Santiago, Chile

3 Telefónica R&D, Santiago, Chile

Eelco Herder

4 Institute for Computing and Information Sciences, Radboud Universiteit, Nijmegen, Netherlands

Associated Data

All tweet IDs are made available at https://github.com/eelejalde/PolQuiz , together with the information collected for the news outlets. Using the tweets IDs researchers can download the full content of the tweets from the Twitter API.

News consumers expect news outlets to be objective and balanced in their reports of events and opinions. However, there is a growing body of evidence of bias in the media caused by underlying political and socio-economic viewpoints. Previous studies have tried to classify the partiality of the media, but there is little work on quantifying it, and less still on the nature of this partiality. The vast amount of content published in social media enables us to quantify the inclination of the press to pre-defined sides of the socio-political spectrum. To describe such tendencies, we use tweets to automatically compute a news outlet’s political and socio-economic orientation. Results show that the media have a measurable bias, and illustrate this by showing the favoritism of Chilean media for the ruling political parties in the country. This favoritism becomes clearer as we empirically observe a shift in the position of the mass media when there is a change in government. Even though relative differences in bias between news outlets can be observed, public awareness of the bias of the media landscape as a whole appears to be limited by the political space defined by the news that we receive as a population. We found that the nature of the bias is reflected in the vocabulary used and the entities mentioned by different news outlets. A survey conducted among news consumers confirms that media bias has an impact on the coverage of controversial topics and that this is perceivable by the general audience. Having a more accurate method to measure and characterize media bias will help readers position outlets in the socio-economic landscape, even when a (sometimes opposite) self-declared position is stated. This will empower readers to better reflect on the content provided by their news outlets of choice.

Introduction

The media have a strong influence on how people perceive the world that surrounds them. More and more power has been ascribed to the modern press since its inception, even calling it the “Fourth Estate” [ 1 ] emphasizing its independence and its ability to provide strict limits to what governments may or may not do. There are well known examples of the press even toppling governments: the Washington Post in the Watergate scandal is perhaps the most resounding example.

However, as the media grows in power, the political and economic interests of news outlets and the ones who control it have grown as well, which has its impact on the news that the population of a territory gets served. Among others Herman and Chomsky [ 2 ] argue that political and doctrinal interests have penetrated the press at different stages of the news generation process, deliberately or accidental—for example through homophily effects. In certain cases the resulting bias is explicitly stated, in other cases—like FOX News—the bias is known but not explicitly communicated. People usually have some intuition of media bias. For average readers, though, it is very difficult and time-consuming to be aware or even find the bias of all media outlets, let alone quantify these biases and give them a total order in terms of the magnitude of the leaning.

Bias in the media is a global phenomenon, not exclusive to one kind of economy or particular political system. As such, there is now a quickly growing body of empirical evidence on its existence [ 3 – 5 ]. In previous work [ 6 ], we showed several types of bias in media coverage of ongoing news stories on crises in the world. What has not been studied as deeply, however, at least not quantitatively, is how outlets could be positioned in a socio-economic space. Knowing the nature of media bias will help individuals and organizations take actions that counteract bias. If, for example, a newspaper claims to be objective, but is in fact “right-wing, conservative” (as is the case with El Mercurio in Chile [ 7 ]), people should be able to recognize this and take this bias into account when reading its content. The case of El Mercurio is quite clear, and being a very old, traditional newspaper, the bias is actually known and arguably accepted. It is important to emphasize here that “bias” is not categorical, but comes embedded in a geopolitical news context determined by other outlets in the region [ 8 ]. In other, bolder words, some bias is inherent to all media, but how biased they are, depends, to an extent, upon a comparison to other media.

In this work, we automatically identify the (largely implicit) socio-economic “relative bias” of news outlets in the context of Chilean media. The value of our methodology and study here is to position those media outlets that do not state their socio-economic bias, or are not even aware of their bias. Socio-economic studies at this scale may help uncover patterns of editorial policies that show a systematic bias that favors governments’ propaganda or private economic interests over social welfare. Operationalizing bias is a difficult task. It relies not only on linguistic information, but also on the actual geo-socio-economic, and even historical, context of the newspaper. We propose to automatically categorize news outlets by analyzing what they “think” about certain relevant, controversial topics using their tweet content and then map these worldviews onto a well-known political quiz: “The World’s Smallest Political Quiz” (henceforth PolQuiz ) [ 9 ].

The PolQuiz has ten question, and it was originally intended for an American audience. Although we believe this does not imply a loss of generality wrt Latin American culture, at least in the topics chosen. It does, obviously, impact the polarity of attitudes towards those topics, but that is what we explore in these pages. It was designed by the Libertarian Advocates for Self Government [ 10 ], created by Marshall Fritz in 1985. The quiz is based on the one proposed by David Nolan Chart in 1971 [ 11 ], which in turn can be traced back to a 2D chart proposed in 1968 [ 12 ], representing variations in political and socio-economic orientation.

In short, we use what the media say on Twitter to position them in a Cartesian plane that tells us more about their orientation based on Fritz’ quiz. In turn, the PolQuiz results motivate a deeper investigation into the nature of the found bias, which we study through the vocabulary used and entities covered by the news outlets. Finally, we conducted a survey that confirm that media bias has a noticeable impact on how news related to controversial topics are presented.

Related work

There are several works related to the topic of media bias [ 5 , 13 – 16 ]. Some works do not try to identify bias directly, but instead try to identify and track events in order to present different points of view of the same affair to the readers in order to counteract these possible bias [ 13 ]. These are complemented by works like J. An et.al . [ 17 ], which create a so-called landscape of newspapers based on the similarity of their communities. They measure the exposure of Twitter users to politically diverse news. Other authors assume a certain leaning by contacts association [ 18 ]. In [ 4 ] the authors go deeper and try to identify different kinds of bias, what they term gatekeeping, coverage and statement bias, according to the stage at which the news acquire the alleged bias.

Most outlets identify themselves as unbiased free press, which makes the discussion on the direction and degree of media bias very controversial. To be fair, it is true that “bias” in journalism may arise naturally out of the interaction of reporters, rather than a prior , but this discussion is left for another paper. Media bias is usually found in the editorial policies that ultimately decide which stories are worth publishing and which amount and angle of coverage they get [ 4 , 13 , 19 ].

This bias reflects the political and socio-economic views of the institution, rather than the point of view of a particular reporter. For example, the authors in [ 2 ] use a few recent events to point out how the press applies the word “genocide” to cases of victimization in non-allied states, but almost never to similar or worse cases committed by the home state or allied regimes. In the latter case, they could use terms such as “repression of insurgency”.

In [ 20 ], the authors defined a model to predict political preference among Twitter users. Through this model they calculate, for each user, a ranking of the likelihood that they prefer a political party over another. This model is based on the usage of weighted words . The words and their weights are extracted from tweets of candidates of certain political parties. Using these weights, in combination with Twitter specific features (retweets, following, etc.) the authors train classifiers that achieve a performance similar to that of human annotators. Similarly, in [ 14 ], the authors estimate the bias in newspapers according to how similar the language is to that used by congressmen for which a right/left stand is known. One interesting result is that bias in the news is found to be correlated to political inclinations of readers, showing a tendency in these news outlets to maximize profit by “catering” to a certain audience.

The topology of the social network on its own has also been shown to give enough information to create classifiers concerning a user’s preference, even when the choices are very similar [ 21 ](e.g. Pepsi vs. Coca Cola, Hertz vs. Avis or McDonalds vs. BurgerKing). Although we carefully select the dataset to use in our experiments to achieve extensible results [ 22 ], we notice here that in our dataset, news outlets (which may be considered the participants of our studies), regularly talk about these controversial topics, and thus, it is possible to use traditional methods to find a political stand.

Combining topological characteristics of the social networks with language features has also been tested [ 18 ], showing that users tend to interact more frequently with like-minded people. This phenomenon is known as homophily . As we mentioned before, our dataset is derived from a special type of users (news outlets Twitter accounts), and this method may not apply directly.

As an alternative approach, in [ 15 ] the authors propose a semi-supervised classifier for detecting political preference. They design a propagation method that, starting with a few labeled items and users, creates a graph representing the connections between users and items or even users with other users. Based on the same phenomena of homophily, they assume that users interacting with the same item, or with each other, most likely have the same political leaning. This way, they can propagate the labels from tagged users and tagged items to the rest of the graph. They report that the system achieves over 95% precision in cross-validation. In [ 16 , 23 ], the authors also follow a propagation strategy to compute the political preference of Twitter users, but using Congress members as the initially tagged users.

In [ 24 ], the authors describe a framework to discover and track controversial topics that involve opposing views. They first use tags that represent each side (e.g. “#prolife”—“#prochoice”) as seeds to find an expanded set of labels to represent each side. This may also help in cases where labels may change over time as the result of new arguments for either side. With these sets of labels they identify strong partisans (anchors) that have a clear lean to one side. Having these anchors and a graph representing relationships between users (based on similarity of tweet content or based on re-tweets), they propagate the classification through the graph inferring the opinion bias of “regular” users.

Yet another approach to quantifying political leaning is presented in [ 3 ]. They based their analysis on the number of tweets and re-tweets generated about different political events associated with some predefined topics. The authors developed a model that takes into account both the sentiment analysis of the tweets and the number of time they are re-tweeted to calculate the political leaning score of each outlet.

In [ 8 ], the authors propose an unsupervised model based on how news outlets quoted president Barack Obama’s speeches. The findings suggest that quotation patterns do reveal some underlying structure in the media, and that these may be evidence of bias. They found that one of the identified dimensions roughly aligns with the traditional left(liberal)-right(conservative) political classification and the other with a mainstream/independent one. This is a strong finding. Still, we believe this is to be somewhat expected, given the selected corpus; namely, presidential speeches in the strongly bipartisan system that dominates U.S. politics. Although this model helps classify and quantify bias in the media, it does not explain the causes and nature of this bias.

In this paper, we present a new methodology that quantifies the political leaning of news outlets based on the automation of a well known political quiz. The prediction of the answers for each question for each outlet is generated based on the polarity of their tweets on subjects related to the issues addressed in the quiz. The automation of a quiz has been used before to automatically classify mood [ 25 ] but, as far as we know, this is the first attempt to quantify media bias using this approach. We focus on Chile as a case study because most of the literature report only on Twitter content from English-speaking countries, which may bias the knowledge we posses in general about these issues.

Methodology

In this section, we describe our dataset, followed by an overview of the PolQuiz and an explanation on how we applied this quiz to our data. In Section Rank difference, we introduce the Rank Difference method for investigating the nature of bias. We conclude with an overview of the survey that we carried out to measure perceived bias.

Every news outlet has some presence on the web, which opens the possibility for the automatic collection of the news stream they produce. Twitter, an online social network that enables users to send and read short messages called “tweets”, is a prime example of a web platform that allows this, and Chile ranks among the top-10 countries regarding the average number of Twitter users per 1000 individuals [ 26 ]. Twitter offers an open API to automatically access the flow of tweets and query the system for user profiles, followers and tweeting history. This makes it possible to explore the behavior and interactions of personal and institutional accounts, developing and testing social theories at a scale never seen before. This is the closest thing we have to a record of the every-day life of over 300 million people (Twitter reported 328 million monthly active users in the first quarter of 2017 [ 27 ]). We treat every tweet as an independent document from which we can extract a statement. We assume that these reflect the ideology of the news outlet as an entity. As many others, we use Twitter as our source documents to study news [ 3 , 28 , 29 ]. Twitter and other social media have become hubs of news for an increasing number of users [ 30 ]. A tweet from a media outlet is a man-made summary of the news, usually in the form of a headline . It conveys the main idea, and hence arguably the main editorial point of view. Headlines of online news articles have shown to be slightly more reliable than full text for adequately providing a high-level overview of the news events [ 31 – 33 ]. These summaries are expected to be representative of the newspaper’s bias [ 34 , 35 ], but with the advantage that bias is easier to detect than in a full articles (shorter, to the point), given the heterogeneity of sites (not all provide RSSs and we would have to rely on scraping and more sophisticated natural language processing tools, to detect topics, sponsored news, etc.). Tweets also contain features/annotations (e.g. hashtags (#) and mentions (@)) that help to give semantic to the text. Twitter “texts” are concise, the interface much richer than scraping (we, for example, don’t “miss” tweets, and get much more metadata from Twitter (timestamps, number of retweets, likes, etc.) which must be found and “parsed” in free-text articles, the editorializing is arguably stronger, NLP techniques need not be as sophisticated and simple n-grams will do. Because of this, tweets, in all their simplicity, seem to be not only enough, but a better fit, for our profiling purposes.

To create our database of outlets, we used different sources, with Poderopedia’s “influence” database [ 36 ] as our baseline, manually adding other news outlets in Chile. Our database contains 399 active accounts. An account is considered active if it tweets at least once a month. The data set contains 1,916,709 tweets, spanning a period of 8 months—from October 6, 2015 to June 4, 2016. The accounts vary dramatically in tweet publication behavior, with some having published more than a hundred thousand tweets to others with less than a hundred in this timeframe. Out of the 399 active accounts, only 269 outlets published at least one document about the topics of interest.

The PolQuiz has ten questions, divided into two groups: economic and personal issues, of five questions each. The answers to the questions may be Agree , Maybe (or Don’t Know) or Disagree .

Personal issue questions:

  • 1 Government should not censor speech, press, media or Internet.
  • 2 Military service should be voluntary. There should be no draft.
  • 3 There should be no laws regarding sex between consenting adults.
  • 4 Repeal laws prohibiting adult possession and use of drugs.
  • 5 There should be no National ID card.

Economic issue questions:

  • 6 End corporate welfare. No government handouts to business.
  • 7 End government barriers to international free trade.
  • 8 Let people control their own retirement: privatize Social Security.
  • 9 Replace government welfare with private charity.
  • 10 Cut taxes and government spending by 50% or more.

Based on the answers to these questions, the quiz-taker is classified into one of the five categories: left-liberal, libertarian, centrist, right-conservative, or statist. Left-liberalism is a political ideology that supports governments that take care of the welfare of vulnerable people and keeps a centralized economy, but at the same time, allows a great deal of liberties in personal matters. Libertarians seek freedom in both economic and personal issues, minimizing the role of the state in all matters. An extreme position in this direction would be anarchism. On the other side, statists —or supporters of a big government—want the state to regulate both personal and economic issues. Examples of this position would be totalitarian regimes, such as Kim Jong-Un in North Korea. Right-wing conservatives are more reluctant to accept changes in personal issues and want official standards on these matters (i.e. morality and family traditions), but demand economic freedom and a free market. Finally, centrists accept or even support a balance between the government reach and personal/economic freedom. They favor selective government interventions to current problems while avoiding drastic measures that may shift society to either side of the spectrum.

For each Agree answer, we increase the score of the quiz-taker in the corresponding dimension by 20 points. If the answer is Maybe (or don’t know) , we only add 10 points. Finally, if the answer is Disagree , no points are added. This way, if the quiz taker agrees with all the issues in one dimension, it will be in one end of that axis. In the other extreme of the axis, we will have a quiz-taker who disagrees with all issues in that dimension. In our study, we assume that news outlets are (or strive to be) unbiased, so in an ideal world, most of their comments should have no polarity toward any side of the issue and, as such, they should score as a Maybe . Another expected behavior would be that news outlets report on both sides of the issue to cover different points of view. Both approaches would result in the news outlet being in the center of the graph.

There is a long tradition of surveys to profile individuals and position them on a socio-economic landscape (see, for example, [ 37 ]). The instrument we use here, “The World’s Smallest Political Quiz” (PolQuiz), follows this tradition. The theoretical foundations of the PolQuiz can be traced back to the works of David Nolan, Maurice Bryson and other political scientists in the late 60s and early 70s [ 11 , 12 ]. Although there are other more “complete” quizzes online (see, for example, The Political Compass [ 38 ]), an advantage of the PolQuiz is that it is “open source”, in the sense that the scoring system is known, unlike for example the Political Compass one mentioned above. It is also short (only five questions per dimension), very popular (the Advocates for Self-Government (founded by the creator of the quiz: Marshall Fritz) claims that the quiz has been taken online more than 23 million times [ 39 ], and it has been used, evaluated and cited scientifically [ 40 , 41 ].

Operationalizing the Quiz

We filtered the collected tweets to get only those with information regarding the issues referred to in the PolQuiz . For this, we created a seed query for each question, containing a set of preselected keywords (see Table 1 ).

QuestionKeywords
q1(censura—libertad) & (prensa—discurso—expresion)
q2(servicio—reclutamiento—entrenamiento—reserva) & (militar—ejercito—armada)
q3(ley—legal—legislacion—regulacion—penalizacion) & (sexual—prostitucion—sexo—sodomia—gay) & ¬(infantil—menor—niño—acoso—abuso—agresion)
q4(ley—legal—legislacion—regulacion—penalizacion) & (droga—marihuana—cannabis—psicotropico—cocaina)
q5inmigracion—inmigrante—refugiado—xenofobia
q6(subsidio—bienestar—ayuda) & (corporativa—empresa)
q7(trato—tratado—convenio—negociacion—relacion) & (comercial—economica) & (internacional—bilateral—gobierno—libre—liberal—barrera—proteccion—bloque)
q8(“seguridad social”—afp—pension—jubilado—prevision) & (privada—gobierno—estatal)
q9(“beneficio sociale”—bono—“ayuda sociale”—“programa social”) & (gobierno)
q10(reducion—recorte—aumento—incremento) & (impuesto—gasto) & (gobierno—gubernamental)

Our actual queries are designed so they can also find variations of the keywords (such as variations in gender and number). AFP stands for Administradoras de Fondos de Pensiones (Chilean pension system)

With the subset of documents returned by the seed queries, we then analyzed the hash-tags to find an expanded set of labels that may represent related aspects of the same issues [ 24 ]. We removed hash-tags that contain the name of a news outlet, as it is common practice in newspapers accounts to use hash-tags to refer to themselves or the original source of the news (regardless of the subject). We also remove hash-tags with names of politicians: even when these politicians could potentially provide some relevant documents, they also introduce a lot of noise, mostly due to the salience of politicians who appear regularly in the news for a wide variety of issues not necessarily related to the query in question. The new labels are added as keywords to the original query. Our enriched queries give us the final set of tweets used to evaluate any possible bias of each news outlet, see Table 2 .

QstweetsTraining set (TS)% Agr (TS)% Mb (TS)% Dis (TS)% Not rel (TS)Prc (±2 * )
q137417948.616.717.816.70.76 (± 0.14)
q219413218.129.520.431.80.87 (± 0.11)
q31447857.605.124.312.80.83 (± 0.17)
q459720361.008.314.216.20.80 (± 0.10)
q574621935.112.715.936.00.73 (± 0.16)
q663611726.425.623.923.90.53 (± 0.20)
q7116223829.824.739.905.40.76 (± 0.09)
q825111721.309.441.827.30.76 (± 0.13)
q929816705.913.169.411.30.87 (± 0.09)
q10857346616.713.366.003.80.71 (± 0.06)

The last column indicates the average precision obtained by the model in cross-validation (See Section Operationalizing the Quiz)

Having the set of tweets for each question, we classified their polarity with respect to the corresponding question . For example, for question 7 ( q7 ), a tweet classified as Agree is “TPP abrira puertas a más de 1.600 productos chilenos no incluidos en acuerdos vigentes.” (tr. TPP will open doors for more than 1.600 Chilean products not included in existing agreements) . For that same question, the following tweet disagrees with it: “El TPP: un misil contra la soberanía” (tr. TPP: a missile against our sovereignty) . In other words, we classify the polarity of the tweet with respect to the corresponding issue. As the number of tweets is too large to label manually, we created and trained a supervised model for each question. This approach also allows us to scale in the presence of an even larger number of resulting documents.

To create a representative sample for the training set, we randomly select, where possible, two tweets from each question from each news outlet. We took care to not include duplicate tweets (tweets with the exact same text) published by the same outlet. The training set consisted of 1916 documents (an average of about 190 documents per question). We manually classified this training set in four groups: Agree , Maybe , Disagree and Out of topic (Not relevant) . The distribution of each training set is shown in Table 2 .

For the automatic classification task, we used a “Randomized Trees” model [ 42 ] (Implemented in the python library scikit-learn in the module sklearn.tree.ExtraTreeClassifier ). Decision trees are less susceptible to overfitting, considering that we have relatively small training sets. Given that the classes in our training set are not evenly populated, we decided to evaluate the model using a 10 iterations stratified shuffle-split cross validation. Each fold leaves out 20% for validation. The other 80% is selected while preserving the percentage of samples for each class. The accuracy values for each model is presented in Table 2 .

After the classification stage, we scored each news outlet on each question. We removed those documents classified as off-topic (Not relevant) . We scored the remaining documents’ polarity according to the PolQuiz scoring system and we found the average for each question/news-outlet pair. For simplicity, in the question/news-outlet pairs for which we have no associated documents, we assume a Maybe (or don’t know) answer. This assumption is the least disruptive towards the default supposition of an unbiased media.

In order to find out how sensitive the observed bias is to noise, we repeated the scoring steps 20 times. Each time we leave out 5% of the tweets, selected at random while maintaining the original distribution of documents per question. Each time we measure the average score of the news outlets for which we were able to answer at least one question in the corresponding dimension. We did not go over 5%, because the smallest news outlets already have a small set of documents: removing too many entries would have resulted in the elimination of an entire outlet, affecting the results.

Finally, we tested how the entire system adapts to the local environment. For a proof of concept, we introduced the subject of abortion in the personal dimension. This topic appears among the personal issues in other political quizzes (e.g., The Political Compass [ 38 ]) In addition, abortion was a very relevant and controversial topic in the Chilean media during this period because of a new bill presented by the president and approved by the Chamber of Deputies to legalize the abortion on three grounds: pregnancy resulting from rape, lethal fetal infeasibility or danger to the life of the pregnant women.

We formulated this new question as follows:

  • 0. All women should be free to choose whether she wants to terminate her pregnancy or not.

Notice that the question is formulated in the same “direction” as the rest of the questions. This is, agreeing with the statement will be an indicator of a more liberal tendency by the quiz taker.

We apply the same methodology described before to the original PolQuiz . We named this question q0 and the query we applied (before injecting the hash-tags) is shown in Table 3 . The enriched query returned 4891 documents from our corpus. We selected two random documents for each news outlets to create a training set containing 409 tweets. We had an average precision of 0.70 (±0.08) in the 10 iterations stratified shuffle-split cross validation.

QuestionKeywords
q0(ley—legal—legislacion—regulacion—penalizacion—despenalizacion) & (aborto—interrupcion—embarazo)

Our actual queries are designed so they can also find variation of the keywords (such as variations in gender and number)

Rank difference

Using the PolQuiz , we aim to show empirically that the news media in Chile have some socio-economic leaning. This means that news outlets tend to have a stand in at least some of the controversial topics that dominate the political landscape of the country. However, we are also interested in the nature of the bias regarding such controversial topics.

To do this, we use the rank difference method proposed in [ 43 ]. Rank difference is used to identify terms that characterize a specific domain. For example, the word court will be probably identified as a term if we are analyzing a corpus of legal documents. The method creates a ranking of words based on their frequency in a domain and a generic corpus. By comparing their relative position in both corpora, the algorithm identifies words that are significantly more used in a given domain. These unusual word frequencies are used as an indication of the importance of these words in the given domain. The formula for calculating rank difference is shown in Eq 1 ,

where r D ( w ) and r B ( w ) are the ranks of word w in the domain and background corpus respectively. Rank normalization is done against the summation of all word rankings in the corresponding vocabulary ( V D and V B ).

To investigate to what extent the bias—as measured with the PolQuiz and investigated using the rank difference method—is perceived by the general audience, we conducted an online survey. We chose abortion as the topic of this survey, as this is (as explained in Section Operationalizing the Quiz) a current and controversial item in Chile that has received an important amount of coverage in the local media. This means that most people in Chile are aware of the discussion and probably have their own opinions. We also restricted our survey to the subset of news outlets who had relevant tweets for at least four questions per dimension (see Section Results ) since these are the ones that we were able to position in the chart with the highest confidence.

We calculated the bi-grams’ rank difference (see Section Rank difference) for each news outlet. We decided to present bi-grams to users in the survey instead of words, because bi-grams offer more context, so it was easier for people to assess the connotation of a word or set of words within the selected topic. We also decided to use bi-grams over named entities because people not always recognize all the names involved in the discussion, although they do have an intuition in the discourse and the arguments used on both sides.

For each survey we presented a randomly selected and anonymised list (each list represents a news outlet) with the top-20 ranked bi-grams in one column and the bottom-20 bi-grams in another column. The top-20 list was presented as the words used with a relatively high frequency by one outlet. The bottom-20 list was presented as words the outlet tried to avoid or used with a relatively low frequency. The user had to answer if, based on these lists, he or she considered the outlet to be “in favor” or “against” abortion. The user could also respond with an “I can’t tell” option. A user could answer the survey more than once, but the random selection was always made from the remaining lists.

We scored the “perceived bias” for each news outlet based on the answers we received in the survey. For each outlet, we calculated the percentage of users that answered “in favor” and subtracted the percentage of “against” answers. These percentage include the users that answered “I can’t tell”. So, we consider outlets with a negative score to have a conservative “perceived bias”. Equivalently, outlets with a positive score are considered as liberals in our “perceived bias”. It is worth noticing that an unbiased news outlet should be expected to score close to zero (because it should have mixed signals and, either a proportional number of user labeled in each direction or most users were unable to classify it).

In this section, we first show how the PolQuiz helps to measure the bias in the media. We investigated the benefit of contextualizing the quiz by transforming the political space and by including new questions that fit the current political landscape in Chile. We verified that our results are stable to small changes in the dataset and we explore the nature of the bias showed by the media by using the rank difference method. We show the differences in the type of coverage between news outlets of various leanings when we deep in the analysis of one particular topic. We also investigate, using a survey, to what extent this bias is perceived by the general audience. Finally, we show empirical evidence of the existence of media capture.

To summarize the results, we show that the media landscape in Chile in terms of absolute positions is highly in line with the political orientation of the government. Within this landscape, relative differences in biases can be observed, which are in line with public perception—as captured by tendencies reported in Wikipedia. Further, we show that the nature of the bias can be explained and shown by the entities and sentiment related to the news outlet. Finally, we discuss how the media landscape, in terms of absolute positions, shifted along with a shift of the government.

Measuring bias using the PolQuiz

Our list of news media covers outlets of very different sizes (as measured by number of followers of that outlets’s Twitter account). This difference in size is also reflected in the number of documents related to the issues in the PolQuiz that we are able to retrieve for each news entity. We found that, in general terms, the larger the outlet (measured in number of followers), the more they talk about (at least) the socioeconomic and political issues related to the PolQuiz . Likewise, mid-size outlets are well represented and show an active participation in the selected topics. In this work we are interested in the behaviour and bias displayed by each different outlet in the media landscape, regardless of the size. This notwithstanding, we wanted to make sure we were covering the entire spectrum of the Chilean media. Our methodology is able to find the outlet’s position even with just a few tweets. The evident extension of a more general analysis that takes into account the weight that each news outlet contributes to the global media bias is left for future work.

Absolute bias: The media landscape

To understand the results, a few preliminaries about the Chilean context is in order. First, the current president (as of June 2017), Michelle Bachelet, is affiliated with the socialist party. In general, the ruling coalition is “Nueva Mayoría”, which mainly consists of center-left to left-wing parties. Second, one of the strongest component in this coalition is the Christian Democratic Party. Christian democracy is still a center-left political ideology, but probably the most conservative within the government, especially in personal issues. With this in mind, Fig 1 shows the absolute positions of 269 news outlets that published at least one tweet related to the issues of at least one of the questions of the PolQuiz . Black dots identify those outlets for which there are answers for at least four questions on each dimension. Those which do not fulfill this criterion, but for which there is still some information, are identified as gray dots. Notice that for outlets with no information on a given question we assumed, conservatively, that they were “unbiased”, in the sense that they did not explicitly pronounce their stance one way or the other. We can see that outlets tend to tweet more content pertaining to the economic axis than to the personal axis. This may suggest that communicating economic issues is more important to the news system in terms of reaching or influencing their audience. A drastic shift in economic issues may invoke fear of losing your job or livelihood. Meanwhile, personal issues like freedom of speech are more ideological but of less immediate effect. The figure also shows that there is a clear preference in the Chilean media for the left-liberal end of the spectrum. This is explained, at least in part, by the political context of Chile during the observed period, discussed above. So, in this case, the observed leaning of the media has a similar tendency to the political alignment of the ruling coalition. This result also lends some evidence to the Propaganda Model [ 2 ]. This model proposes the existence of implicit filters that allow the political and economic elites to mold the content of the media to benefit their private interests. We explore more in how a change in government may influence the media position in Section The influence of government orientation on the media landscape . Notice that this tendency also coincides with the “liberal media” label that is frequently used, implying a popularly perceived Left-liberal leaning in the majority of the outlets [ 44 ].

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g001.jpg

The chart shows the position of the outlets for which we were able to answer at least one question. Dark dots represent 26 news outlets who had relevant tweets for at least four questions per dimension (the 26ers ). The solid red dot shows the average position of the 26ers . 1. adnradiochile, 2. biobio, 3. cooperativa, 4. latercera, 5. mercuriovalpo, 6. publimetrochile, 7. emol, 8. soyarauco, 9 soyconcepcion, 10. soycoronel, 11. soyquillota, 12. soysanantonio, 13. soytalcahuano, 14 soytome, 15. dfinanciero, 16. el_ciudadano, 17. elmostrador, 18. tele13_radio, 19. el_dinamo, 20. nacioncl, 21. pinguinodiario, 22. soychillan, 23. soycopiapo, 24. soyvaldiviacl, 25. soyvalparaiso, 26. t13.

Of the 269 news outlets, our method yielded 26 that answered at least four questions on each dimension (we will call this subset of news outlets the 26ers ). This represents 10% of our database and 13% of those that regularly report on economics and politics. The 26ers account for the 45% of the tweets relevant to the subjects in the PolQuiz . In Fig 1 we explicitly labeled some of the most prominent ones to help understand the general landscape.

Measuring perceived bias

We wanted to investigate if the outlets’ bias obtained using the PolQuiz corresponded with the popular perception of their political orientation. We annotated the 26ers political alignment using information extracted from Wikipedia, the official web site of the news outlet or the political alignment known for their owners. Since Wikipedia pages are crowdsourced content, we consider the political alignment extracted from there as either self-declared or a popular perception. Remember that the Christian Democratic Party is part of the center-left coalition that was ruling in Chile during the observed period, and was generally in favor of the social changes promoted by the government. We used the label “Christian democracy” to group the outlets associated to this party, and assigned them a left-leaning position in our analysis. This classification is taken as ground-truth to evaluate our model.

There are three of the 26ers for which we could not find reliable information on their political leaning. Two of them (tele13_radio and t13) are owned by Chile’s Grupo Luksic (one of the richest families in Chile and Latin America). The third one (pinguinodiario) is a mid-size daily local newspaper headquartered in Punta Arenas, in the south of Chile. In the 26ers there are also two outlets that belong to international groups originating outside of Chile: publimetrochile and adnradiochile. The first one is owned by Metro International, a Swedish global media company based in Luxembourg that publishes the Metro newspapers in many big cities around the world. The other one is controlled by a subsidiary of the Spanish group PRISA. Although these international companies may also have economic interest in the country that motivates a political leaning, it is probably less influential in theirs outlets editorial policy. We did not make a bias inference either for these cases. Nevertheless, very much as for any other outlets outside the 26ers , some valuable information can be learn from their automatic classification.

In the rightmost of the group, we have mercuriovalpo (Tags in Fig 1 are the corresponding Twitter accounts (e.g., https://twitter.com/mercuriovalpo )) that represents El Mercurio de Valparaíso , one of the oldest newspapers in Chile currently in circulation. This newspaper is part of a big conglomerate ( El Mercurio S.A.P ) that owns more than 20 news papers and several radio stations, among other broadcast media (such as magazines, TV cable, etc.). The regional newspaper Soy Coronel ( soycoronel ), on the bottom, is also part of this group. In fact, 11 regional newspapers of El Mercurio S.A.P are within these 26 and are all clustered bottom-right. As we mentioned earlier, the El Mercurio ’s newspapers are popularly perceived as right-wing conservative.

La Tercera ( latercera ), is owned by Copesa S.A ., which is El Mercurio ’s closest competitor. These two companies have a so-called news media duopoly. La Tercera , also in the lower-right, but closer to the center of the group, is thought to be moderate-conservative [ 45 ]. El Mostrador ( elmostrador ) is an on-line newspaper with a perceived orientation to progressivism [ 46 ].

Finally, we want to mention La Nación ( nacioncl ) since it is a newspaper that currently only publishes its online edition and is partially controlled by the government. This newspaper appears in the top region of the personal dimension. Compared to the other 25 news outlets, this one appears as the most progressive on personal issues. This score is due to a series of populist reforms promoted by the government during the observed period (i.e. therapeutic marihuana legalization, decriminalization of abortion, anti-xenophobic campaigns, promote voluntary enlistment of women to the military service, etc.).

The perceived bias assigned to the rest of the 26ers is shown in Table 4 . Notice that many outlets in that list are perceived as Right-wing, conservative . There is also a group labeled as Libertarian . None of those outlets’ PolQuiz automatic classification correspond with their popular recognized leaning. Our hypothesis is that this mismatch is produced by a lack of contextualization: if we look at the range of scores obtained by the automatic method, we notice that they are confined to a fraction of the entire space. In order to investigate our proposition, we normalized the original scores by making the range of observed values our entire universe. We discuss these results in the next section.

IdNameOwnerPolitical alignment
1adnradiochileGrupo PrisaInternational
2biobioBío-Bío ComunicacionesIndependent
3cooperativaCo. Chilena de ComunicacionesChristian democracy
4laterceraCopesaClassical liberalism
5mercuriovalpoEl MercurioRight-wing, conservative
6publimetrochileGrupo metroInternational
7emolEl MercurioRight-wing, conservative
8soyaraucoEl MercurioRight-wing, conservative
9soyconcepcionEl MercurioRight-wing, conservative
10soycoronelEl MercurioRight-wing, conservative
11soyquillotaEl MercurioRight-wing, conservative
12soysanantonioEl MercurioRight-wing, conservative
13soytalcahuanoEl MercurioRight-wing, conservative
14soytomeEl MercurioRight-wing, conservative
15dfinancieroGrupo ClaroRight-wing, conservative
16el_ciudadanoRed de medios de los pueblosLibertarian
17elmostradorLa PlazaLibertarian
18tele13_radioGrupo Luksic & PUC
19el_dinamoEdiciones Giro PaisChristian democracy
20nacionclEstado de ChileLeft, Liberal
21pinguinodiarioPatagónica Publicaciones
22soychillanEl MercurioRight-wing, conservative
23soycopiapoEl MercurioRight-wing, conservative
24soyvaldiviaclEl MercurioRight-wing, conservative
25soyvalparaisoEl MercurioRight-wing, conservative
26t13Grupo Luksic & PUC

The list is sorted by the perceived bias. Outlets with an unclear Political Alignment (shadowed rows in the table) were left out of the analysis.

Relative positioning

We applied normalization to contextualize the political leaning of the outlets to the reality of Chile. We normalized the scores on each axis in the range [0, 100]. Now our entire positioning universe is determined by the scope defined by the Chilean media. Fig 2 shows the relative position of the 26ers as black dots. Now each outlet is positioned relative to the others . These new position are much more inline with how people think of the leaning of these outlets.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g002.jpg

The dots represent the relative positions obtained by normalizing the PolQuiz scores. The blue dot shows the average for relative positions. Diamonds represent the score on each dimension as the average over 20 repetitions, leaving out each time a random 5% of the documents. The gray circles around diamonds are the 95% confidence interval.

This result shows that the “bias” should not be a categorical measure. Media bias comes embedded in a geopolitical news context determined by other outlets in the region. In other words, some bias is inherent to the media, but how biased they are (and on what direction they lean) will depend upon a comparison to other media in the same context. But, more importantly, our own perception of the bias seem to be adjusted and limited by the political space defined by the news that we receive as a population . This phenomenon has important socio-politic implications (such as the possibility of artificial displacement of the center for political purposes), but we leave to social scientist further considerations on these matters.

We noticed that even with the normalized scores, the Chilean media is not balanced. For our statistical analysis we treated each axis independently, so we could work with values in only one dimension. We conducted a one-sample Student t-test (the QQplot and the histogram suggested normality was a reasonable assumption) for each dimension (economic and personal) to test if the mean score was significantly different from 50 (the assumed unbiased score). We used, for each dimension, the scores of those news outlets for which we were able to answer at least one question on that dimension. For the economic dimension, there is a significant bias, t (254) = −10.93, p < .001, with a leaning to the left-wing ( M = 40.28, SD = 14.21). In the personal issues the bias is lower, but still is statistically significant, t (190) = −2.10, p < .05, with a leaning to the conservative side ( M = 47.42, SD = 16.98).

Once again, we think the slight left-wing bias in the economic issues might be explained, at least in part, by the political context of Chile during the observed period (see Section The influence of government orientation on the media landscape ). On the personal issues dimension, we can also see some bias, although less prominent, tending to the conservative end of the spectrum. Other possible factors that might contribute to the observed tendency are the unavoidable bias introduced by the quiz itself and the presented methodology. The PolQuiz has been criticized as being biased by using leading questions, favoring libertarian results and imposing the libertarian definition of freedom [ 47 ]. For our part, we tried to minimize the bias in the methodology by making a conscious collective selection of the initial keywords for each query, but there is always room for interpretation of the questions. Despite the alleged bias, we have shown that the quiz can differentiate outlets with opposite points of view in both dimensions and also that the automatic classification is in accordance with the widespread perception of the tendency displayed by the main outlets. This means that either the bias introduced by the appliances in the methodology is not significant or it is representative of the predisposition showed by the population that we are considering in our study.

Stability of the results

In order to find out the stability of the observed bias with respect to changes in the obtained evidence (i.e. the collected tweets), we repeated the scoring steps 20 times. Each time we leave out 5% of the tweets selected at random, while maintaining the original distribution of documents per question. Each time, we measure the average score of the news outlets for which we were able to answer at least one question in the corresponding dimension. In the economic issues, we could observe a consistent bias to the left ( M = 40.45, 95% CI [36.91, 43.99]). On the other hand, the personal dimension, although it is also leaning to one side, is much closer to the center of the spectrum ( M = 46.89, 95% CI [43.99, 49.79]). Fig 2 shows a similar analysis, but at an individual level in the 26ers . The mean for each individual outlet (diamonds in Fig 2 ) stays close to its original position, and each newspaper can be located in a relatively small neighborhood with high confidence, meaning that there are no drastic changes compared to the previous classification.

The relatively low impact of leaving out data in the positioning process indicates that the results are not very sensitive to change and not influenced by only a small number of tweets.

Contextualizing the PolQuiz

We noticed that some of our queries, particularly in the personal issues dimension, returned only a small number of documents (e.i q2 and q3 ). This is because of lack of interest or too few relevant events related to the corresponding topics during the observed period. We think that a way to counteract this environmental/circumstantial effect is to substitute the respective questions or to increase the number of questions. As a proof of concept, we repeated our analysis using q0 as a replacement for question q3 (related to laws concerning sex between consenting adult, see Section PolQuiz). We replaced q3 , because it was the one with the lowest number of retrieved documents. This substitution increased the number of news outlets with at least one answer. There is now a stronger statistical effect for the personal issues dimension, t (239) = 3.54, p < .001. Interestingly, this dimension now leans to the more liberal end of the spectrum ( M = 53.57, SD = 15.63) (see Fig 3 ).

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g003.jpg

These are the scores of news outlets for which we were able to answer at least one question in the corresponding dimension.

In Fig 4 we plot the scores of the 26ers in the original quiz (dots) and the adapted quiz (diamonds). Note that the difference in scores between the quiz with q3 and the quiz with q0 is considerably larger (with a negative difference) for outlets in the right/conservative quadrant. This is expected and validates the model.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g004.jpg

Dots represent the scores with q3. Diamonds represent the scores with q0.

The influence of government orientation on the media landscape

As we mentioned earlier, we think that the overall behavior of the media, both in terms of their original/absolute position and the relative balance, is determined, at least in part, by the current government political alignment. This assumption is suggested by the absolute positions as shown in Fig 1 and supported by some models concerned with the political-economy of the mass media. The Propaganda Model [ 2 ] describes the political elite, and the government in particular, as a very influential actor. According to the model, either by millionaire advertising contracts, controlling the sources or generating flack against opposed views, the government always tries to control the discourse. The Media Capture model [ 48 ] also presents the government as an important factor on the news selection and publishing process. For an in-depth survey on the political economy of the mass media see also [ 49 ].

In this section we investigate if our PolQuiz methodology is able to give evidence on the influence of the political ruling class over the mass media behavior. For our analysis we apply the political quiz to the same set of news outlets, but using a different time frame. We collected the tweets in the analogous period of the previous administration. Since the previous government (lead by Sebastian Piñera) had a different political alignment (right-wing conservative), we should be able to see a shifting in the position of the outlets in between the two governments.

Using the Advanced Search from Twitter we collected 832,223 tweets, from 283 news outlets, published in the period from Oct 1st, 2010 to May 31st, 2011. This data set contains only tweet published by these outlets (no retweets are included). After applying the queries, our final dataset contains 16,176 tweets related to the issues addressed in the PolQuiz .

In Fig 5 we can see the absolute position of the 186 outlets for which we were able to answer at least one question of the PolQuiz using the tweets published during the previous government. In comparison with Fig 1 it is easy to notice that the entire context of the media as a whole is more to the center of the chart, or seem from the point of view of the current state of the media, it is more to the right and more conservative. This behavior might be due to the main topics being discussed (e.g., tax reforms vs. free high education), but ultimately this indicates that in a way or another the government is playing its role and it is dominating the discourse that prevails in the media.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g005.jpg

The chart shows the position of the outlets for which we were able to answer at least one question. Dark dots represent the 26ers subset. The solid red dot shows the average position for the 26ers .

We also compared the relative positions of the 26ers in both governments (there are two of them that were not yet created in the first time slot). The Kendall’s Tau-b correlation for the economic dimension between the two periods is τ b (23) = −0.3461 ( z = −2.37, p = .0178). This shows that there is association between the two time periods (i.e. we can reject the null hypothesis of independence), but there are quite a few inversions in the relative order. This is because the newspapers owned by El Mercurio stayed more or less in the same position while most of the others move from being right to El Mercurio’s to being left of their position (notice in Fig 5 that mercuriovalpo is at the left of the group). We mentioned before that El Mercurio S.A.P is the biggest media group in Chile, but now we can say that it is also the most stable in their editorial policy behaviour. This makes intuitively sense: their consolidated control of the market gives them more independence, and it makes them less susceptible to the government influence.

To study the individual behavior of the outlets, we also calculated the relative position of the outlets in one period with respect to the other period. For this we normalized the scores using the results from both periods combined. This allows us to see, in the overall context, the outlets transition over politically opposite regimes. Fig 6 shows how many points each of the 26ers move within this context in the economic issues dimension. Notice that there is a tendency towards the left with the arrival of the left-wing government. Only some of the outlets owned by El Mercurio S.A.P. stayed in an approximately similar position or shifted to the right. El Mercurio de Valparaiso shows again to be one of the most important representatives for the right. Interestingly, the biggest movements to the left (in the same direction of the new government), come from outlets from which we were not able to find a clear popular perception or declare political leaning. Maybe is this flip-flopping what makes it so difficult for the public.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g006.jpg

The chart shows the relative shift in points from the conservative to the liberal government. A shift > 0 (red bars) means more to the right. A shift < 0 (blue bars) means more to the left left.

Fig 7 represents a similar individual analysis for the personal issues dimension. As in the results presented earlier, this dimension offers less information. The direction of the movements is more divided and seems to emphasize the known position of the outlets. Of particular interest is the outlet controlled by the government ( nacioncl ). This newspaper’s editorial policy seems to move (in both dimensions) to accommodate the current government. This behavior makes it a biased source of information, but a good point of reference to validate our model. Another point to notice is that, once again, the outlets without a clear perceived bias, consistently show the most significant shift in favor of the ruling side.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g007.jpg

The chart shows the relative shift in points from the conservative to the liberal government. A shift > 0 (yellow bars) means more liberal. A shift < 0 (green bars) means more conservative.

This empirical analysis of the behavior of the media over two politically different regimes show that, directly or indirectly, the government does successfully interfere in the news process. The case of Chile makes a good example because being only four years apart makes it harder to attribute these noticeable differences to other factors (such as a very different staff, ownership or editorial policy).

Investigating the nature of bias using rank difference

The PolQuiz showed the existence of bias in Chilean media. In this section, we investigate the nature of this bias in terms of vocabulary used and entities mentioned in the different newspapers’ tweets (see Section Rank difference). We focused on the 26ers and the topic of abortion. We selected the topic of abortion, as it is one of the most polarizing issues in our dataset. Nevertheless, this is used only as an illustration: to fully understand the nature of the bias and the media landscape, the decision makers or interested parties should conduct a similar analysis on each of the questions.

Topic bias based on named entities

We used the Stanford’s NE recognizer system [ 50 ] to extract the entities mentioned in the tweets related to the abortion issue. We compare the extracted entities against a list of politicians, public personalities and activist groups. For the list of politicians and their position in the abortion issue, we use the vote sessions in the house of representatives [ 51 ] and in the senate [ 52 ]. We manually labeled another 53 personalities and groups according to comments and events reported in the local news. The complete list L E has 199 labeled entities. We labeled with −1 the politicians who voted against the abortion bill, and the public figures that were openly against the issue. Equivalently, we use + 1 for politicians and personalities in favor of the subject. We assign a 0 to the entities not included in our list. We will refer to these labels as the leaning of the entities (e.g. leaning ( entity )).

After applying the rank difference method to the NE mention counts, we calculated a score for each outlet in function of the τ ( entity ) and the leaning of entity in the issue (for every entity mentioned more than once in the news). This final score of the outlet o i is found using the Eq 2 .

A low value in this score indicates that this outlet tends to mention with relatively high frequency entities with a conservative leaning and/or it tends to ignore those with a more liberal view.

As expected, outlets tagged as independent, libertarian and classical-liberal have higher scores (top 10 in the 26ers ). Interestingly, within the top 10 we also find the outlets tagged as International , publimetrochile and adnradiochile , which means that they behave similarly to liberal outlets under the left-liberal government in office in 2016. According to our scores, all these top-scored outlets have comparably more mentions of entities with a liberal leaning than the rest of the outlets. To our surprise, the lower values (bottom 5 in the 26ers ) are occupied by the outlets linked to parties in the ruling coalition (Christian democracy and Left-Liberal( nacioncl )). Apparently these outlets focus their tweets in negative reports of the opposition. For example, when we look at the rank-difference results for nacioncl , within the top-20 entities, only two refer to entities with a liberal leaning (‘President Michelle Bachelet’ and ‘Government’). To investigate more on this, we run a sentiment analysis on the most used bigrams. The results are presented in the next section.

Topic bias based on bi-grams

We again apply the rank difference method, this time using the bi-gram counts in the tweets relevant to the subject of abortion. Following the same strategy as before, we calculated a score for each outlet in function of the τ ( bigram ) and the sentiment calculated for bigram (for every bigram mentioned more than once in the news). For determining the sentiment of words and bi-grams we use the Spanish lexicon from [ 53 ]. This lexicon consists of a set of norms for valence and arousal for an extensive set of Spanish words. We found this to be one of the largest dictionaries in this language, and it includes items from a variety of frequencies, semantic categories, and parts of speech, including conjugated verbs. We weighted each word with its mean valence (we assigned the neutral value 5 for words not present in the dictionary). The weight of the bi-grams is the average of the weight of their composing words. To calculate τ ( bigram ) we use a formula equivalent to that shown in Eq 2 . Accordingly, we give a similar interpretation to these scores. That is, a high value indicates that this outlet tends to convey mostly positive sentiments with the bi-grams used with relatively high frequency and/or avoid using negative sentiments when referring to the issue of abortion. For example, elmostrador , with the highest score, has as a frequently use bi-gram “proyecto aprobado” (tr. “project approved”—referring to the bill). This bi-gram is classified as positive by the sentiment analyzer, so it will add to the score. On the other hand, this same outlet has “injusticia gobierno” (tr. “government injustice”) as a totally ignored bi-gram. Since the bi-gram is assigned a negative sentiment and the rank-difference is also negative, the bi-gram will also add to the score of the outlet, pushing it to the liberal side. Following the same reasoning, an outlet with a very low score can be understood as an outlet that uses predominantly negative words with relatively high frequency.

When we analyze the scores of the 26ers , we notice that nacioncl (controlled by the government) has the lowest score. This, together with the previous NE analysis, confirms the theory that this outlet focuses in tweeting negative reports of the opposition, at least for the abortion issue. Most of the others outlets show the expected behavior, with conservative in the lower half of the ranking (i.e. lower scores) and liberals in the higher positions.

The question that follows is if the bias that we are seeing with the PolQuiz and describing with the Rank Difference is perceived in the same way through the popular wisdom. We help answer this question in the next section.

Survey results

For the survey described in Section Survey, we collected 372 answers from 54 unique Chilean users on how they perceive the bias on the topic of abortion in the different Chilean newspapers. Since this was an open and anonymous online survey, we do not have any demographic data on the users, but the IP addresses indicate we have a good representation of different regions of the country. We received between 11 and 19 answers for each of the 26ers (M: 14.31, SD: 2.07). We carried out 10 Fleiss’ kappa measurements; each time we selected 10 ratings at random per outlet (subject). This shows a fair agreement in the answers (M: 0.2253, SD: 0.0167). In Table 5 we show the 26ers and their corresponding “Perceived bias” (see Section Survey). The political alignment information shown in the table is again our ground-truth.

IdNameOwnerPolitical alignmentPerceived biasPersonal issues
21pinguinodiarioPatagónica Publicaciones-66.6739.18
24soyvaldiviaclEl MercurioRight, conservative-66.67-50.49
22soychillanEl MercurioRight, conservative-57.14-50.55
25soyvalparaisoEl MercurioRight, conservative-43.75-51.81
8soyaraucoEl MercurioRight, conservative-42.86-51.27
12soysanantonioEl MercurioRight, conservative-30.77-92.98
13soytalcahuanoEl MercurioRight, conservative-30.77-92.98
18tele13_radioGrupo Luksic & PUC-28.5752.42
9soyconcepcionEl MercurioRight, conservative-25.00-94.09
14soytomeEl MercurioRight, conservative-25.00-92.92
7emolEl MercurioRight, conservative-25.00-0.59
10soycoronelEl MercurioRight, conservative-23.53-100
11soyquillotaEl MercurioRight, conservative-18.18-92.92
15dfinancieroGrupo ClaroRight, conservative0.0042.57
5mercuriovalpoEl MercurioRight, conservative21.43-51.81
2biobioBío-Bío ComunicacionesIndependent23.536.91
6publimetrochileGrupo metroInternational25.0047.50
17elmostradorLa PlazaLibertarian26.3270.95
19el_dinamoEdiciones Giro PaisChristian democracy29.41-30.79
4laterceraCopesaClassical liberalism33.33-3.38
1adnradiochileGrupo PrisaInternational37.5052.98
16el_ciudadanoRed de medios de los pueblosLibertarian37.5044.54
23soycopiapoEl MercurioRight, conservative38.46-36.21
3cooperativaCo. Chilena de ComunicacionesChristian democracy57.1446.04
26t13Grupo Luksic & PUC57.1448.45
20nacionclEstado de ChileLeft, Liberal63.64100

Results show that there is a perceivable difference in the language used by the outlets in both sides of the spectrum. Note that, based on the rank difference of bi-grams, the users were able to collectively classify the outlets with over 90% precision (We are not taking into account those for which we could not find a political alignment or those that belong to international groups). Our positioning of these outlets in the adapted PolQuiz has also a good agreement with the direction of the Perceived bias (80%).

To evaluate the relative positions of the outlets in our PolQuiz , we calculated the number of inversions with respect to the ranking of the outlets in the perceived bias. The Kendall’s Tau-b coefficient between the two rankings is τ b (21) = 0.4203 ( z = 2.66, p < .01). Even though the popular perception resulting from the survey can not be seen as ground-truth for the relative positioning of the outlets, it is important to notice that our results show a good correlation with the intuition of the public. As a future work, we aim to add some other content features (e.g., leaning of the named entities) to the polarity classification of the tweets as these may help to refine the relative positioning found by our model.

To summarize, we have shown that reported political alignment is highly correlated with the PolQuiz results as well as with the bias, as perceived by the general audience. This implies that existing bias has a noticeable influence on how controversial issues such as abortion are reported in the media.

Conclusions

In this paper, we presented an automatic approach for estimating the political bias of news outlets in Chile, exploiting the well-known and widely used “The World’s Smallest Political Quiz”. We empirically confirmed the estimation results and showed that they are stable with respect to evolving data. We have demonstrated the benefits of adapting questions to the local context. Furthermore, we showed that our model is able to discover the relative political context that regulates the perceived bias of the media. Building upon the PolQuiz results, we investigated the nature of this political bias and found this to exist in the chosen vocabulary and the entities covered by the newspaper. We also conducted a survey, of which the results confirm that political bias in newspapers has an impact on how controversial topics are covered and that the general audience does notice this bias. Our methodology does not make too many assumptions about the underlying system. The way it is designed could be applied to any Western culture. Our system can deal with any number of outlets, can compare relative quantitative positions, can show empirical evidence of consistent bias, and can partially explain the source of these tendencies.

Finally, our methodology contributes as empirical evidence of the media capture in modern “democratic” societies. Since most people expect the media to report a fair an unbiased account of the events, we think the outlets behavior should be analyzed and take into consideration in, for example, recommending systems for a real diversity of information.

In summary, the results indicate that the political orientation of the media in Chile is in line with and follows the political orientation of the government. Even though relative differences in bias or orientation between individual news outlets can be observed, public awareness of the bias of the media landscape as a whole appears to be limited: our own perception of the bias seem to be adjusted and limited by the political space defined by the news that we receive, which on its turn is largely defined by governmental politics.

We believe it is important to be aware of shifts, alignments and discrepancies in bias and political orientation within the government, the population and the media, as misconceptions regarding real or perceived bias may have unexpected or negative effects.

As a future work we are interested to see what is the most accurate way to score the missing answers. Since “coverage” is a form of bias [ 4 ], perhaps the outlet is not being neutral by not mentioning a specific subject. Even when the decision of which stories/events are newsworthy is subjective and depends on the editorial strategy [ 6 ], there are some events that are very relevant in the national context and are covered for the majority of the media. So, a complete silence of a news outlet on such an event may be interpreted as something other than neutrality.

For example, question q7 is about international free trade. Taking the number of tweets and re-tweet as an indicator of important events [ 3 ], we can see in Fig 8 that this topic has had at least one major event during this period. This event was the ascription of Chile to the Trans-Pacific Partnership (TPP) signed by the country on Feb 3th, 2016. Despite the magnitude of the event, only 135 out of 198 newspapers with a section on politics mentioned it. A plausible cause is that the other news outlets decided not to report about this event, in other words ‘bias by omission’.

An external file that holds a picture, illustration, etc.
Object name is pone.0193765.g008.jpg

We show that a careful selection or update of the questions may lead to a significant improvement in the results. If we have an inside understanding of the socio-economic environment from where the news are being collected, then we could replace the questions to capture more relevant topics. In this sense, we could benefit from advances in systems that focus on identifying controversial topics in social media [ 54 ]. On the other hand, if we do not have any intuition on the news collected, then we can accumulate the new questions so we can widen the spectrum of topics and have a better chance of capturing relevant events/discussions with our queries.

In the future, it would be interesting to compare the results of this paper to a similar analysis conducted over full-text articles published by the same news outlets. As discussed, this will require more sophisticated NLP tools and more human supervision, but it could shed some light on the similarities and differences between traditional media and social media.

For individuals as well as for society as a whole it is important to recognize and understand media bias that are shaped through underlying general political or socio-economic orientations. As we have shown in this paper, these general tendencies have a clear and noticeable effect on the way concrete topics are covered and commented upon, and therefore should be investigated and published.

Acknowledgments

This work was supported in part by a doctoral scholarship from Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) to the first author (No. 63130228). We would like to thank the L3S Research Center in Hannover, Germany, for hosting the first author between Sept. 2016 and Feb. 2017, and particularly Asmelash Teka Hadgu for invaluable help with his Twitter Advanced Search mining tool. The authors acknowledge financial support from Movistar—Telefónica Chile, the Chilean government initiative CORFO 13CEE2-21592 (2013-21592-1-INNOVA_ PRODUCCION2013-21592-1).

Funding Statement

EE was supported by a doctoral scholarship of Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) No. 63130228. LF received financial support from Movistar - Telefónica Chile and the Chilean government initiative CORFO 13CEE2-21592 (2013-21592-1-INNOVA PRODUCCION2013-21592-1). The specific roles of these authors are articulated in the ‘author contributions’ section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

  • University of Wisconsin–Madison
  • University of Wisconsin-Madison
  • Research Guides
  • College Undergraduate Research Group
  • Mass Media: An Undergraduate Research Guide

Mass Media: An Undergraduate Research Guide : Media Bias

  • Writing, Citing, & Research Help
  • Advertising
  • Copyright/Intellectual Property
  • Social Media
  • Women in Advertising
  • Newspaper Source Plus Newspaper Source Plus includes 1,520 full-text newspapers, providing more than 28 million full-text articles.
  • Newspaper Research Guide This guide describes sources for current and historical newspapers available in print, electronically, and on microfilm through the UW-Madison Libraries. These sources are categorized by pages: Current, Historical, Local/Madison, Wisconsin, US, Alternative/Ethnic, and International.

Organizations

  • Center for Media and Democracy's PR Watch Madison, WI-based nonprofit organization that focuses on "investigating and exposing the undue influence of corporations and front groups on public policy, including PR campaigns, lobbying, and electioneering"
  • CAMERA The Committee for Accuracy in Middle East Reporting in America describes itself as "a media-monitoring, research and membership organization devoted to promoting accurate and balanced coverage of Israel and the Middle East"
  • Fairness & Accuracy in Reporting (FAIR) "FAIR, the national media watch group, has been offering well-documented criticism of media bias and censorship since 1986"
  • Media Research Center Conservative watch group with a "commitment to neutralizing left-wing bias in the news media and popular culture"

About Media Bias

This guide focuses on bias in mass media coverage of news and current events. It includes concerns of sensationalism, allegations of media bias, and criticism of media's increasingly profit-motivated ethics. It also includes examples of various types of sources coming from particular partisan viewpoints.

Try searching these terms using the resources linked on this page: media bias, sensational* AND (news or media), bias AND media coverage, (liberal or conservative) AND bias, [insert topic] AND media bias, media manipulation, misrepresent* AND media

Overview Resources - Background Information

  • International Encyclopedia of Media Studies This encyclopedia covers the broad field of "media studies” which includes encompassing print journalism, radio, film, TV, photography, computing, mobile phones, and digital media.
  • Opposing Viewpoints Resource Center (OVRC) provides viewpoint articles, topic overviews, statistics, primary documents, links to websites, and full-text magazine and newspaper articles related to controversial social issues.
  • FactCheck.org A nonpartisan, nonprofit "consumer advocate" for voters that aims to reduce the level of deception and confusion in U.S. politics by monitoring the factual accuracy of what is said by major U.S. political players in the form of TV ads, debates, speeches, interviews and news releases.

research paper on media bias

Articles - Scholarly and Popular

  • Academic Search Includes scholarly and popular articles on many topics.
  • Communication & Mass Media Complete Includes articles on communication and media topics.
  • ProQuest One Business (formerly ABI Inform) covers a wide range of business topics including accounting, finance, management, marketing and real estate.
  • Project Muse Disciplines covered include art, anthropology, literature, film, theatre, history, ethnic and cultural studies, music, philosophy, religion, psychology, sociology and women's studies.
  • JSTOR: The Scolarly Journal Archive full-text journal database which provides access to articles on many different topics.

Statistics and Data

  • Data Citation Index The Data Citation Index provides a single point of access to quality research data from repositories across disciplines and around the world. Through linked content and summary information, this data is displayed within the broader context of the scholarly research, enabling users to gain perspective that is lost when data sets or repositories are viewed in isolation.
  • << Previous: Copyright/Intellectual Property
  • Next: Social Media >>
  • Last Updated: Jun 7, 2024 5:02 PM
  • URL: https://researchguides.library.wisc.edu/massmediaURG

Media Bias 101: What Journalists Really Think -- and What the Public Thinks About Them

Media Bias 101 summarizes decades of survey research showing how journalists vote, what journalists think, what the public thinks about the media, and what journalists say about media bias. The following links take you to dozens of different surveys, with key findings and illustrative charts. (Most recent update: May 2014)

A printer-friendly, fully-formatted 48-page version of the report (updated January 2014) is available in PDF format here ( 1.8 MB ).  

Part One: What Journalists Think

Surveys over the past 30 years have consistently found that journalists — especially those at the highest ranks of their profession — are much more liberal than rest of America. They are more likely to vote liberal, more likely to describe themselves as liberal, and more likely to agree with the liberal position on policy matters than members of the general public.

Early Polls of Journalists, 1962-1985 Added January 2014 Exhibit 1-1: The Media Elite Exhibit 1-2: Major Newspaper Reporters Updated January 2014 Exhibit 1-3: The American Journalist Exhibit 1-4: U.S. Newspaper Journalists Exhibit 1-5: Survey of Business Reporters Exhibit 1-6: Journalists - Who Are They, Really? Exhibit 1-7: White House Reporters Exhibit 1-8: The Media Elite Revisited Updated January 2014 Exhibit 1-9: Washington Bureau Chiefs and Correspondents Exhibit 1-10: Newspaper Journalists of the 1990s Exhibit 1-11: Newspaper Editors Exhibit 1-12: The People and the Press: Whose Views Shape the News? Exhibit 1-13: How Journalists See Journalists in 2004 Exhibit 1-14: Campaign Journalists (2004) Exhibit 1-15: TV and Newspaper Journalists Exhibit 1-16: Journalists' Ethics and Attitudes, 2005 Exhibit 1-17: The News Media and the War, 2005 Exhibit 1-18: Slate Magazine Pre-Election Staff Survey Updated January 2014 Exhibit 1-19: Indiana University Polls of Journalists Added May 2014

Part Two: How the Public Views the Media

A wide variety of public opinion polls have documented the fact that most Americans now see the media as politically biased, inaccurate, intrusive, and a tool of powerful interests. By a nearly three-to-one margin, those who see political bias believe the media bend their stories to favor liberals.

Exhibit 2-1: The People and The Press, 1997 Exhibit 2-2: What the People Want from the Press Exhibit 2-3: ASNE Journalism Credibility Project, 1998 Exhibit 2-4: The People and The Press, 2000 Exhibit 2-5: Gallup Polls on Media Bias Updated January 2014 Exhibit 2-6: The People and The Press, 2003 Exhibit 2-7: Bias in the 2004 Presidential Campaign Exhibit 2-8: Missouri School of Journalism 2004 Exhibit 2-9: American Journalism Review, 2005 Exhibit 2-10: CBS's "State of the Media," 2006
Exhibit 2-11: Institute for Politics, Democracy and the Internet/Zogby Survey Exhibit 2-12: Coverage of the War in Iraq, 2007 Exhibit 2-13: Rasmussen Reports on Media Bias, 2007 Exhibit 2-14: Harvard's "National Leadership Index" Survey (2007) Exhibit 2-15: Sacred Heart University Polling Institute (2007) Exhibit 2-16: Public Reaction to Media Coverage of the 2008 Primaries Exhibit 2-17: Rasmussen Reports on Campaign 2008 Bias Exhibit 2-18: Public Overwhelmingly Saw Favoritism For Obama Exhibit 2-19: Pew Study Finds Media Credibility Plummets Exhibit 2-20: Confidence In Media Hits New Low
Exhibit 2-21: Trust and Satisfaction with the National Media (2009) Exhibit 2-22: News Media Both Too Liberal and Too Powerful (2009) Exhibit 2-23: 2010 Surveys Find Two-Thirds of Public Is “Angry” at the Media Exhibit 2-24: Gallup Finds Media Distrusted, Public’s Confidence Low (2011) Exhibit 2-25: Pew Finds Record Low Respect for News Media (2011) Exhibit 2-26: Record High 67% See Political Bias in News Media Exhibit 2-27: In Campaign 2012, Voters Saw Media Favoring Obama Added January 2014 Exhibit 2-28: Seeing Liberal Bias in the News (2013) Added January 2014

Part Three: What Journalists Say about Media Bias

Over the years, the Media Research Center has catalogued the views of journalists on the subject of bias. In spite of overwhelming evidence to the contrary, many journalists still refuse to acknowledge that most of the establishment media tilts to the left. Even so, a number of journalists have admitted that the majority of their brethren approach the news from a liberal angle.

Journalists Denying Liberal Bias Updated May 2014 More Journalists Denying Liberal Bias Still More Journalists Denying Liberal Bias Journalists Admitting Liberal Bias Updated May 2014 More Journalists Admitting Liberal Bias
  • Utility Menu

University Logo

Matthew A. Baum

Marvin kalb professor of global communications.

Matthew A. Baum

Issue Bias: How Issue Coverage and Media Bias Affect Voter Perceptions of Elections

View by publication type.

  • Reports (4)
  • Op-Eds (31)
  • Dissertation (1)
  • Journal Articles (50)
  • Working Papers (10)
  • Book Chapters (11)

View By Publication Topic

  • Democracy & Governance (29)
  • International Relations & Security (26)
  • Politics (50)
  • Public Leadership & Management (8)

Media Bias/Fact Check

  • August 17, 2024 | MBFC’s Weekly Media Literacy Quiz Covering the Week of Aug 11th – Aug 17th
  • August 17, 2024 | MBFC’s Daily Vetted Fact Checks for 08/17/2024 (Weekend Edition)
  • August 16, 2024 | (Media News) Harris Google Ads Spark Controversy Over Perceived News Endorsements
  • August 16, 2024 | MBFC’s Daily Vetted Fact Checks for 08/16/2024
  • August 15, 2024 | MBFC’s Daily Vetted Fact Checks for 08/15/2024

We are the most comprehensive media bias resource on the internet. There are currently 8300+ media sources, journalists, and politicians listed in our database and growing every day. Don’t be fooled by Questionable sources. Use the search feature above (Header) to check the bias of any source. Use name or URL.

MBFC Media and Fact Check News

Literacy Quiz

Least Biased , Original

MBFC’s Weekly Media Literacy Quiz Covering the Week of Aug 11th – Aug 17th

Welcome to our weekly media literacy quiz. This quiz will test your knowledge of the past week’s events with a focus on facts, misinformation, bias,…

Daily Curated Fact Checks by MBFC

Fact Check , Facts Matter , Least Biased , Original

MBFC’s Daily Vetted Fact Checks for 08/17/2024 (Weekend Edition)

Media Bias Fact Check selects and publishes fact checks from around the world. We only utilize fact-checkers that are either a signatory of the International…

Media News

Least Biased , Media News , Original

(Media News) Harris Google Ads Spark Controversy Over Perceived News Endorsements

Some Kamala Harris ads have sparked concerns by appearing to mislead viewers into thinking major news outlets are endorsing her presidential campaign. The ads, which…

MBFC’s Daily Vetted Fact Checks for 08/16/2024

Media Bias Fact Check selects and publishes fact checks from around the world. We only utilize fact-checkers who are either a signatory of the International…

MBFC’s Daily Vetted Fact Checks for 08/15/2024

Mbfc’s daily vetted fact checks for 08/14/2024, (media news) major news outlets decline to publish leaked trump campaign documents.

At least three prominent news organizations—Politico, The New York Times, and The Washington Post—have received confidential material from inside Donald Trump’s campaign, including a vetting…

MBFC’s Daily Vetted Fact Checks for 08/13/2024

(media news) the hidden power of repetition: how climate misinformation gains ground.

Even staunch supporters of climate science may be more susceptible to misinformation than they realize, according to new research. The Conversation reported on a new…

MBFC’s Daily Vetted Fact Checks for 08/12/2024

Mbfc’s daily vetted fact checks for 08/11/2024 (weekend edition), court sides with meta, dismisses rfk jr.’s children’s health defense lawsuit over vaccine post suppression.

A federal appellate panel upheld the dismissal of a lawsuit by Robert F. Kennedy Jr.’s Children’s Health Defense against Meta Platforms, which claimed the company…

MBFC’s Weekly Media Literacy Quiz Covering the Week of Aug 4th – Aug 10th

Mbfc’s daily vetted fact checks for 08/10/2024 (weekend edition), (media news) trump and harris confirm debate on abc for september 10.

Vice President Kamala Harris and former President Donald Trump have both agreed to a presidential debate on September 10, hosted by ABC. This announcement comes…

Verified Factual News from NFN

Disgraced former New York Republican congressman George Santos is expected to plead guilty on Monday... The post George Santos Expected to Plead […]

Vice President Kamala Harris’s campaign announced plans to spend at least $370 million on digital... The post Harris Campaign to Spend $370 Million […]

Sen. Bernie Sanders (I-Vt.) applauded Vice President Kamala Harris’s newly announced economic plan, calling it... The post Bernie Sanders Praises […]

The Washington Post Editorial Board criticized Vice President Kamala Harris’s newly-announced economic agenda, calling it... The post Washington […]

The Supreme Court on Friday blocked the Biden administration from enforcing parts of a significant... The post Supreme Court Blocks Biden […]

In North Carolina, Vice President Kamala Harris unveiled her economic plan, aiming to win over... The post VP Harris reveals reducing cost of living […]

Americans are seeing more ads supporting former President Trump than those for presumptive Democratic nominee... The post Study: Pro-Trump Ads […]

We are used by:

research paper on media bias

Check out a list of Educational Institutions and Media Outlets that use Media Bias Fact Check as a resource.

Subscribe by Email

Enter your email address to subscribe to MBFC and receive notifications of new posts by email. For Ad-Free Subscriptions go here: https://mediabiasfactcheck.com/membership-account/membership-levels/

Email Address

Support our mission - ad-free browsing & exclusive content. If you value our work, consider becoming a member.

New membership plans available.

Every contribution counts

Never see this message again

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computation and Language

Title: a study on bias detection and classification in natural language processing.

Abstract: Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available datasets and determine how to better combine them to effectively train models in the task of hate speech detection and classification; 2) analyse the main issues with these datasets, such as scarcity, skewed resources, and reliance on non-persistent data. We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance.
Comments: 31 pages, 15 Tables, 4 Figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
classes: 68T50
 classes: I.2.7
Cite as: [cs.CL]
  (or [cs.CL] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

80 Media Bias Essay Topic Ideas & Examples

🏆 best media bias topic ideas & essay examples, ⭐ interesting topics to write about media bias, ✅ simple & easy media bias essay titles, ❓ questions about media bias.

  • The Role of Bias in Media Sources The main focus of this medium is to provide information about the protests related to the Black Lives Matter movement that took place in the country.
  • Media Bias Monitor: Quantifying Biases of Social Media On the other hand, the media uses selective exposure and airing of stories about leaders, leading to more bias in their stories.
  • Media Bias Fact Check: Website Analysis For instance, Fact Check relies on the evidence provided by the person or organization making a claim to substantiate the accuracy of the source.
  • Bias of the Lebanese Media Therefore, the main aim of the paper is to identify the elements of bias in the media coverage through an analysis of the media coverage of Al Manar and Future TV in 2008.
  • Media Bias in the Middle East Crisis in America A good example of this in the United States Media coverage of the Middle East crisis comes in terms of criminalizing the Israeli forces.
  • Media Bias in America and the Middle East Of course, Benjamin Franklin neglected to mention that the printing company he owned was in the running to get the job of printing the money if the plan was approved.
  • Why Study the Media, Bias, Limitations, Issues of Media The media have recently have taken an identity almost undistinguishable from entertainment or pop culture and marketing where news serve as “spices” that add up flavor to the whole serving, such as the Guardian Unlimited […]
  • Media Bias: The Organization of a Newsroom The media is, however, desperate for attention, and it’s not political ideology that dictates what we are offered in the guise of news on any particular day, but what will sell advertising.
  • Mass Media Bias Definition The mass media is the principal source of political information that has an impact on the citizens. The concept of media bias refers to the disagreement about its impact on the citizens and objectivity of […]
  • Modern Biased Media: Transparency, Independence, and Objectivity Lack The mass media is considered to be the Fourth Estate by the majority of people. The main goal of this paper is to prove that the modern media is biased because it lacks transparency, independence, […]
  • How Is the Media Biased and in What Direction? The bias in this article is aimed at discrediting mainstream media’s coverage of Clinton’s campaign while praising the conservative actions of the GOP presidential candidate.
  • Al Jazeera TV: A Propaganda Platform Al Jazeera is the largest media outlet in the Middle East reporting events mostly to the Arab world. The media outlet has equated revolutions in Egypt and Libya with the ejection of totalitarianism in the […]
  • Media Bias in the U.S. Politics The main reason for the censure of this information by the media is because it had a connection with the working masses, and Unionists. In this case, the perceived media bias comes from the state […]
  • The Impact of Media Bias Media bias is a contravention of professional standards by members of the fourth estate presenting in the form of favoritism of one section of society when it comes to the selection and reporting of events […]
  • Media Bias: Media Research Center Versus Fairness and Accuracy in Reporting
  • Advertising Spending and Media Bias: Evidence From News Coverage of Car Safety Recalls
  • Towards a More Direct Measure of Political Media Bias
  • Media Bias Towards Science
  • French Media Bias and the Vote on the European Constitution
  • Political Accountability, Electoral Control, and Media Bias
  • Media Mergers and Media Bias With Rational Consumers
  • Same-Sex Marriage and Media Bias
  • Media Bias and Stereotypes: A Long Way of Justify the Truth
  • Political Polarization and the Electoral Effects of Media Bias
  • Media Bias and Its Influence on Public Opinion on Current Events
  • The Arguments Surrounding Media Bias
  • Political Science: Media Bias and Presidential Candidates
  • Competition and Commercial Media Bias
  • Media Bias and Its Influence on News: Reporting the News Article Analysis
  • Power of Media Framing – Framing Impact on Media Bias
  • Media Bias and Conflicting Ideas
  • Detecting Media Bias and Propaganda
  • Media Bias and the Effect of Patriotism on Baseball Viewership
  • Good News and Bad News: Evidence of Media Bias in Unemployment Reports
  • Media Industries and Media Bias: How They Work Together
  • More Ads, More Revs: A Note on Media Bias in Review Likelihood
  • News Consumption and Media Bias
  • Media Bias and the Persistence of the Expectation Gap: An Analysis of Press Articles on Corporate Fraud
  • Public Opinion, Polling, Media Bias, and the Electoral College
  • Media Bias and Electoral Competition
  • Information Gatekeeping, Indirect Lobbying, and Media Bias
  • Conservative and Liberal Media Bias
  • Media Bias: Politics, Reputation, and Public Influence
  • Law and Legal Definition of Media Bias
  • Primetime Spin: Media Bias and Belief Confirming Information
  • Media Bias and the Current Situation of Reporting News and Facts in America
  • Framing the Right Suspects: Measuring Media Bias
  • Media Bias and Its Economic Impact
  • When Advertisers Have Bargaining Power – Media Bias
  • Media Bias and the Lack of Reporting on Minority Missing Persons
  • Critical Thinking vs. Media Bias
  • Social Connectivity, Media Bias, and Correlation Neglect
  • The Difference Between Media Bias and Media Corruption
  • Media Bias and How It Affects Society
  • Does Foreign Media Entry Discipline or Provoke Local Media Bias?
  • What Are the Main Issues of Media Bias?
  • How Does Media Bias Affect Campaigns?
  • Does Foreign Media Entry Tempers Government Media Bias?
  • What Is Media Bias in News Reporting?
  • How Does Media Bias Affect the World?
  • What Is the Difference Between Media Bias and Media Propaganda?
  • Is Media Bias Bad for Democracy?
  • How Do Issue Coverage and Media Bias Affect Voter Perceptions of Elections?
  • What Are Some of the Most Prominent Examples of Media Bias in Politics?
  • Does Media Bias Affect Public Opinion?
  • What Are the Reasons for Which Bias in Media Is Necessary?
  • Is There a Difference Between Media Bias and Fake News?
  • What Are the Different Types of Media Bias?
  • How Does Media Bias Affect Our Society?
  • Why Is Media Bias Unavoidable in Modern Society?
  • How Does Liberal Media Bias Distort the American Mind?
  • What Is the Effect of the Economic Development and Market Competition on Media Bias in China?
  • Is There a Relationship Between Media Bias and Reporting Inaccuracies?
  • What Are the Effects of Media Bias?
  • Are There Any Benefits of Media Bias?
  • What Is the Best Way to Deal With Media Bias?
  • How to Detect Media Bias and Propaganda?
  • Does Media Bias Matter in Elections?
  • How Do Media Trust and Media Bias Perception Influence Public Evaluation of the COVID-19 Pandemic in International Metropolises?
  • Phobia Titles
  • Social Norms Essay Ideas
  • Racial Profiling Essay Topics
  • Accountability Titles
  • Terrorism Questions
  • Broadcasting Paper Topics
  • Corruption Ideas
  • Media Violence Titles
  • Chicago (A-D)
  • Chicago (N-B)

IvyPanda. (2024, March 2). 80 Media Bias Essay Topic Ideas & Examples. https://ivypanda.com/essays/topic/media-bias-essay-topics/

"80 Media Bias Essay Topic Ideas & Examples." IvyPanda , 2 Mar. 2024, ivypanda.com/essays/topic/media-bias-essay-topics/.

IvyPanda . (2024) '80 Media Bias Essay Topic Ideas & Examples'. 2 March.

IvyPanda . 2024. "80 Media Bias Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/media-bias-essay-topics/.

1. IvyPanda . "80 Media Bias Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/media-bias-essay-topics/.

Bibliography

IvyPanda . "80 Media Bias Essay Topic Ideas & Examples." March 2, 2024. https://ivypanda.com/essays/topic/media-bias-essay-topics/.

IMAGES

  1. ≫ Media Bias Research Paper Free Essay Sample on Samploon.com

    research paper on media bias

  2. ≫ Media Bias in Society Free Essay Sample on Samploon.com

    research paper on media bias

  3. Media Bias Chart (2018).

    research paper on media bias

  4. Information Sources: Bias

    research paper on media bias

  5. The Impact of Media Bias

    research paper on media bias

  6. (PDF) Automated identification of media bias in news articles: an

    research paper on media bias

COMMENTS

  1. A systematic review on media bias detection: What is media bias, how it

    Media bias is defined by researchers as slanted news coverage or internal bias, reflected in news articles. By definition, remarkable media bias is deliberate, intentional, and has a particular purpose and tendency towards a particular perspective, ideology, or result. On the other hand, bias can also be unintentional and even unconscious. (1), (3)

  2. (PDF) Media Bias Analysis

    perception of a topic through word choice, e.g., if the author uses a word with a. positive or a ne gative connotation to refer to an entity [116], or by varying the. credibility ascribed to the ...

  3. Uncovering the essence of diverse media biases from the semantic

    Media bias widely exists in the articles published by news media, influencing their readers' perceptions, and bringing prejudice or injustice to society. However, current analysis methods ...

  4. How do we raise media bias awareness effectively? Effects of

    The term 'media bias' refers to, in part, non-neutral tonality and word choice in the news. Media Bias can consciously and unconsciously result in a narrow and one-sided point of view. How a topic or issue is covered in the news can decisively impact public debates and affect our collective decision making." Besides, an example of one-sided ...

  5. (PDF) A systematic review on media bias detection: What is media bias

    PDF | On Sep 1, 2023, Francisco-Javier Rodrigo-Ginés and others published A systematic review on media bias detection: What is media bias, how it is expressed, and how to detect it | Find, read ...

  6. How Media Exposure, Media Trust, and Media Bias Perception Influence

    Media bias perception is mainly related to the party orientation and ideological stance of the media [29,30]. Extensive research conducted on hostile media effects has revealed that partisanship influences individual perceptions of objectivity and portrayal of political and social issues in media [31,32]. This cognitive stereotyping will affect ...

  7. The Media Bias Taxonomy: A Systematic Literature Review on the Forms

    Media bias is widely recognized as having a strong impact on the public's perception of reported topics [48, 98, 145]. Media bias aggravates the problem known as filter bubbles or echo chambers [216], where readers consume only news corresponding to their beliefs, views, or personal liking [145]. The behavior likely leads to poor awareness of ...

  8. Countering Algorithmic Bias and Disinformation and Effectively

    When it comes to users, research has explored their role in algorithmic bias, using and updating traditional media theories of the 20th century, such as selective exposure theory (Knobloch-Westerwick et al., 2015), and how algorithms reinforce or counteract individual-level bias in media selection (Knobloch-Westerwick et al., 2015; Trielli ...

  9. Identification and Analysis of Media Bias in News Articles

    One of the main e ffects of media bias is the change of people's awareness and. perception of topics (Siemens, 2014), which becomes critical for public issues, such as elections (Bernhardt ...

  10. The Media Bias Taxonomy: A Systematic Literature Review on the Forms

    The way the media presents events can significantly affect public perception, which in turn can alter people's beliefs and views. Media bias describes a one-sided or polarizing perspective on a topic. This article summarizes the research on computational methods to detect media bias by systematically reviewing 3140 research papers published between 2019 and 2022. To structure our review and ...

  11. The Role of Media Use and Misinformation Perceptions in Optimistic Bias

    Introduction. In times of major societal disruptions, the news media play a central role in informing the knowledge, behavior, and attitudes of citizens (Boukes et al., Citation 2019; Van der Meer, Citation 2018).However, in such heightened crisis times characterized by high media dependency, like the outbreak of SARS-CoV-2 in early 2020, the information available to individuals to base their ...

  12. Resources & Publications

    2023. Introducing MBIB - the first Media Bias Identification Benchmark Task and Dataset Collection Proceedings Article. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23), ACM, New York, NY, USA, 2023, ISBN: 978-1-4503-9408-6/23/07.

  13. PDF Perceptions of Media Bias: viewing the news through ideological cues

    perceptions of bias in the media was found in (Niven, 2002) which suggest that the issue of media bias is more complex and rooted in interpersonal factors and beliefs. (Niven, 2002) claims that individuals' formulate opinions of media bias based on their own prejudices with little evidence as to why. After all, people do view things from ...

  14. On the nature of real and perceived bias in the mainstream media

    Knowing the nature of media bias will help individuals and organizations take actions that counteract bias. If, for example, a newspaper claims to be objective, but is in fact "right-wing, conservative" (as is the case with El Mercurio in Chile [ 7 ]), people should be able to recognize this and take this bias into account when reading its ...

  15. PDF Media Bias and Reputation

    revealing rising polarization and falling trust in the news media has prompted concerns about the market's ability to deliver credible infor-mation to the public (Kohut 2004). In this paper, we develop a new model of media bias. We start from a simple assumption: A media firm wants to build a reputation as a provider of accurate information.

  16. 29 Theories of Media Bias

    The debate over media bias has drawn on a wide range of theories and methods. The tradition of critical theory has produced a rich literature that portrays the news media as a conservative force in politics. To some degree, however, this conclusion is built into the theory itself.

  17. PDF The Political Impact of Media Bias

    Estimates of the Impact of Media Bias. Table 6.1 summarizes a small number of key studies that examine the impact of media bias on political behavior and voting. The studies are grouped into four groups by the methodologies used: surveys, laboratory experiments, fi eld experiments, and natural experiments. TABLE 6.I.

  18. PDF DISC O VER Y Media Bias: It's Real, but Surprising

    The results break new ground in the exploration of political bias in the media. "Past researchers have been able to say whether an outlet is conservative or liberal, but no one has com-pared media outlets to lawmakers," Groseclose said. "Our work gives a precise characterization of the bias and relates it to a known commodity ...

  19. Mass Media: An Undergraduate Research Guide : Media Bias

    This guide focuses on bias in mass media coverage of news and current events. It includes concerns of sensationalism, allegations of media bias, and criticism of media's increasingly profit-motivated ethics. It also includes examples of various types of sources coming from particular partisan viewpoints.

  20. Media Bias 101: What Journalists Really Think

    Media Bias 101 summarizes decades of survey research showing how journalists vote, what journalists think, what the public thinks about the media, and what journalists say about media bias. The following links take you to dozens of different surveys, with key findings and illustrative charts. (Most recent update: May 2014)

  21. Issue Bias: How Issue Coverage and Media Bias Affect Voter Perceptions

    We extend that research by investigating how issue ownership and the Hostile Media Outlet Phenomenon mediate, separately and in interaction, voter perceptions of media campaign coverage. We look at the effects of story selection on individuals' perceptions concerning which party benefits more from media issue coverage.

  22. Media Bias/Fact Check

    August 15, 2024 | MBFC's Daily Vetted Fact Checks for 08/15/2024. We are the most comprehensive media bias resource on the internet. There are currently 8300+ media sources, journalists, and politicians listed in our database and growing every day. Don't be fooled by Questionable sources. Use the search feature above (Header) to check the ...

  23. [2408.07479] A Study on Bias Detection and Classification in Natural

    Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available ...

  24. 80 Media Bias Essay Topic Ideas & Examples

    The mass media is the principal source of political information that has an impact on the citizens. The concept of media bias refers to the disagreement about its impact on the citizens and objectivity of […] Modern Biased Media: Transparency, Independence, and Objectivity Lack. The mass media is considered to be the Fourth Estate by the ...