library(tidyverse)
library(dplyr)
library(knitr)
library(ggplot2)
polarity <- read_csv("C:/Users/Jamie/Downloads/polarity_comparison_country_time.csv")
country_time_freqrank <- read.csv("C:/Users/Jamie/Downloads/country_time_freqrank.csv")
theme <- read.csv("C:/Users/Jamie/Downloads/word_themes.csv")
theme_rank <- read.csv("C:/Users/Jamie/Downloads/word_themes_rank.csv")
headlines <- read.csv("C:/Users/Jamie/Downloads/headlines.csv/headlines.csv")
headlines_reduced <- read.csv("C:/Users/Jamie/Downloads/headlines_reduced_temporal.csv")
Social norms, gender roles and acceptable societal behaviors are communicated and reinforced through various media, including television, social media, and news stories (Ward and Grower, 2020). The sentence structure, or even the word choice, used in a news headline can have an impact on societal attitudes and behavior (Shahkevich, 2019). Therefore, language can be used to underpin these cultural ideas and norms, or become a force of change. This paper explores the word choices and themes that occur when women are featured in news headlines, and the subtle messages they convey to support (or subvert) these cultural ideas.
The data set that is being used in this article was created by researcher Amber Thomas (2023) and is freely available to the public to create their own analysis. The goal of creating this data set is to examine the latent or blatant gender bias that occurs when women are in the news headlines. The data was sourced from fifty major news outlets and papers within four major countries (India, South Africa, UK and USA) over a ten year period from 2010 to 2020, and was compiled using various computer interface programs. The data includes headlines, country where it appeared, publication URL, date and time of publication, themes, polarity, and bias scores (calculated using Gender Bias Taxonomy V1.0). These were separated into several different tables that were retrieved from https://www.kaggle.com/datasets/thedevastator/women-in-headlines-bias.
Polarity is used to measure the sentiment, or emotional strength, of a text. The polarity score is usually given between -1 (negative sentiment) and 1 (positive sentiment), with 0 being neutral. Natural language processing tools are typically used to calculate these scores and include context to determine if the polarity is positive or negative.
Thomas’ data contained two data sets dealing with polarity. The data set used here excluded the name of the site from which the headlines were retrieved. News headlines tend to be written to elicit an emotional reaction (either positive or negative) in order to encourage the reader to continue reading (Nicoletti and Sarva). In this case, any strong sentiment was perceived as productive. Therefore, polarity scores here were calculated from 0 to 1, with both positive and negative sentiment reflected above 0.
The following line graph shows the mean polarity scores through time for women-centered headlines (“WH”) that appeared in each country.
pol_country <- polarity %>%
group_by(country)
ggplot(pol_country, aes(x = year, y = women_polarity_mean, color = country, shape = country)) +
geom_line() +
geom_point() +
labs(x = "Year of publication", y = "Mean Polarity Scores", title = "Mean polarity scores for headlines about women")
As is evident on the graph, the polarity scores moved gradually upward over time. India and the US had the lowest mean polarity scores compared to the other two countries, whereas the UK had the highest average. There is a precipitous spike in the mean scores for South Africa toward the end of the timeline. This may be an artifact of the small amount of data collected from that region. Since the graph is showing the yearly mean, the smaller data set can have a greater impact in these scores.
The next set of graphs includes the mean polarity scores for headlines on other topics, listed as “all headlines” in the data (“AH”). The scores from the AH and WH are compared during those same time periods. Since headlines are meant to catch the reader’s attention, the polarity scores for the AH are also expected to be above neutral. You may note, however, the graphs indicate that all the WH scores were even higher than the AH scores.
all_mean <- select(polarity, c(country, year, women_polarity_mean, all_polarity_mean))
country_mean <- pivot_longer(all_mean, c(women_polarity_mean, all_polarity_mean)) %>%
rename("population" = "name", "mean" = "value")
country_mean$population[country_mean$population == "woman_polarity_mean"] = "women headlines"
country_mean$population[country_mean$population == "all_polarity_mean"] = "all headlines"
country_mean$year = as.factor(country_mean$year)
ggplot(country_mean, aes(x = year, y = mean,
group = population, color = as.factor(population))) +
geom_line() +
geom_point() +
facet_wrap(~country) +
scale_x_discrete(breaks = function(x){x[c(TRUE, FALSE)]}) +
scale_color_discrete(name = "Headline Population", labels = c("all headlines", "women headlines")) +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
labs(x = "Year", y = "Mean Polarity of Headlines", title = "Comparison of polarity between all headlines and those about women")
The AH mean polarity scores for South Africa were missing in 2015 and 2016, which accounts for the dip on the graph for that period. On the other hand, the large spike in WH that occurred from 2019 to 2020, may be correlated to protests in the region at that time. Johannesburg experienced unrest and protests toward the end of 2019, calling for an end to the rising violence women have been facing in their country (Bauer, 2019; Chutel, 2019). Additionally, it could also be affected by the smaller amount of data collected from that region. Overall, the pattern of the polarity scores for all the countries was similar. The trend shows that the headlines about women elicited stronger sentiments.
The headlines were broken into four general themes: crime and violence (“Crime”), empowerment (“Empowerment”), female stereotypes (“Stereotypes”), and race, ethnicity and identity (“Identity”). There was an additional category for “No theme”, which was not included in this analysis. The themes were identified by country and aggregated throughout the ten year period. The bar charts below indicate the frequency of each theme by country.
india_theme <- filter(country_time_freqrank, country == "India")%>%
group_by(theme) %>%
summarize(count = sum(count)) %>%
mutate(country = "India")
s_africa_theme <- filter(country_time_freqrank, country == "South Africa")%>%
group_by(theme) %>%
summarize(count = sum(count)) %>%
mutate(country = "South Africa")
UK_theme <- filter(country_time_freqrank, country == "UK")%>%
group_by(theme) %>%
summarize(count = sum(count)) %>%
mutate(country = "UK")
USA_theme <- filter(country_time_freqrank, country == "USA")%>%
group_by(theme) %>%
summarize(count = sum(count)) %>%
mutate(country = "USA")
US_UK <- rbind(USA_theme, UK_theme)
india_s_africa <- rbind(india_theme, s_africa_theme)
all_country <- rbind(US_UK, india_s_africa) %>%
mutate(adjusted_count = count/100000)
ggplot(all_country, aes(x = theme, y = adjusted_count, fill = theme)) +
geom_col(position = "dodge") +
facet_wrap(~country) +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
labs(x = "Theme", y = "Number of headlines (in 100,000)", title = "Headline theme count within each country")
Both Empowerment and Stereotypes were the most common themes in each country, with Empowerment occurring only slightly more than Stereotypes. News headlines often reflect the societal values, norms, and gender role expectations. The logical expectation would be for Stereotypes to be the largest category. It may seem contradictory to see these two themes as the most prevalent in each country, however Research has shown that a society’s adherence to female stereotypes often creates limitations to achieving positions of power (Tassbaum and Nayak, 2021). Since the headlines are used to elicit reactions and encourage more engagement with their publications, featuring women as pioneers in their field, or in positions of power creates that engagement. The Empowerment themed headlines provide examples of the perceived gender roles being broken.
The number of headlines within the Identity theme was far below the other three groups. This theme focused on the sub-categorization of women by highlighting their intersectionality with their other marginalized identities. The data does not indicate if these themes overlap in any way and if these headlines are counted for only one theme or multiple themes. For example, NBC news published the headline this past June (not included in this data set) that read, “First Muslim woman confirmed as federal judge” (Venkatraman, 2023). This headline could encompass the Empowerment theme, the Identity theme, or both.
The theme data were separated into three different tables based on the words used in the headlines. The words were then tallied and ranked. The scatter plot below shows the five most common words found in the news headlines for each theme.
crime_violence_words <- filter(theme_rank, theme == "crime and violence") %>%
slice_max(count, n = 5)
empowerment_words <- filter(theme_rank, theme == "empowerment") %>%
slice_max(count, n = 5)
female_stereotype_words <- filter(theme_rank, theme == "female stereotypes") %>%
slice_max(count, n = 5)
identity_words <- filter(theme_rank, theme == "race, ethnicity and identity") %>%
slice_max(count, n = 5)
cv_emp <- rbind(crime_violence_words, empowerment_words)
fs_i <- rbind(female_stereotype_words, identity_words)
all_words <- rbind(cv_emp, fs_i)
factor("rank", levels = c("First", "Second", "Third", "Fourth", "Fifth"))
ggplot(all_words, aes(x = count, y = rank, color = rank)) +
geom_point(size = 1) +
geom_text(aes(label = word),
nudge_x = -1, nudge_y = -0.25,
size = 3.5, hjust = "inward", vjust = "inward") +
scale_y_continuous(breaks = c(1, 2, 3, 4, 5),
labels = c("First", "Second", "Third", "Fourth", "Fifth"),
trans = "reverse") +
scale_color_continuous(labels = c("First", "Second", "Third", "Fourth", "Fifth"),
low = "#56B1F7", high = "#132B43", trans = "reverse") +
facet_wrap(~theme)
labs(x = "Word count in headlines", y = "Rank of the top five words",
title = "Top five most common words associated with each theme")
The word “first” was used far more often than any of the other words both within its theme and among all the other themes. “First” was used approximately two times more than the second ranked word within Empowerment. The Empowerment theme tends to reflect the women who have broken stereotype and gender expectations, therefore this outlier is not surprising.
It is important to note that the words associated with the theme Crime were most often relating to violence perpetrated against women, rather than the reporting of women commiting criminal acts. Schnepf & Christmann (2023) conducted a recent study by constructing fictional stories about women being murdered by her domestic partner. The wording of the headline changed to either reflect the “domestic drama” (i.e., “family tragedy”, “crime of passion”) or the criminal aspect (i.e., “murder”, “homocide”). The headlines that framed the story as a “domestic drama” downplays the brutality against women, and were highly correlated to the readers assigning some blame to the victim. This study also found that men who exhibited “hostile sexism” had a more positive reaction to the violence against women. The readers tend to align their judgement of the situation to the framing the headline. Unfortunately, most of the headlines that report on violence against women are framed as a domestic drama. This tends to reinforce stereotypes and perceptions that women are somehow complicit in the violence perpetrated against them.
In each theme, the words were aggregated across the four countries, therefore it is difficult to see if any of the words appeared more frequently in one country compared to the others. It would be interesting to see how the top ranking words may vary by country.
The data above shows that the headlines about women tend to elicit stronger reactions than headlines regarding other subjects. The most common theme found in these headlines were tied to the perpetuation of gender role stereotypes and societal expectations. This held true for all the countries where data was collected.
Bauer, N. (2019, September 13). South Africa: Protesters demand action on Violence Against Women. Al Jazeera. https://www.aljazeera.com/news/2019/9/13/south-africa-protesters-demand-action-on-violence-against-women
Chutel, L. (2019, October 25). “women were being killed on the street”: The township struggling with domestic abuse. The Guardian.https://www.theguardian.com/cities/2019/oct/25/women-were-being-killed-on-the-street-the-township-struggling-with-gender-based-violence-diepsloot
Nicoletti, L., & Sarva, S. (n.d.). When women make headlines. The Pudding. https://pudding.cool/2022/02/women-in-headlines/
Shashkevich, A. (2019, August 27). The power of language: How words shape people, culture. Stanford News. https://news.stanford.edu/2019/08/22/the-power-of-language-how-words-shape-people-culture/
Schnepf, J., & Christmann, U. (2023). “Domestic Drama,” “Love Killing,” or “Murder”: Does the Framing of Femicides Affect Readers’ Emotional and Cognitive Responses to the Crime? Violence Against Women, 0(0). https://doi.org/10.1177/10778012231158103
Tabassum, N., & Nayak, B. S. (2021). Gender Stereotypes and Their Impact on Women’s Career Progressions from a Managerial Perspective. IIM Kozhikode Society & Management Review, 10(2), 192-208. https://doi.org/10.1177/2277975220975513
Thomas, A. (2023, January 22). Women in headlines: Bias. Kaggle. https://www.kaggle.com/datasets/thedevastator/women-in-headlines-bias
Venkatraman, S. (2023, June 15). First Muslim woman confirmed as a federal judge. NBCNews.com. https://www.nbcnews.com/news/asian-america/first-muslim-woman-confirmed-us-federal-judge-rcna89546
Ward, L. M., & Grower, P. (2020, September 15). Media and the development of gender role stereotypes. https://www.annualreviews.org/doi/10.1146/annurev-devpsych-051120-010630