Introduction of the Research

The tone and framing of the LGBTQ+ community in Turkish media were analyzed in this research. News articles, which serve as primary sources of information about current events and, therefore, most people trust, have the power to influence the conceptions, beliefs, and behaviors of individuals. Thus, this research provides insights into how the Turkish media, i.e., news articles, represent the LGBTQ+ community. It creates a foundation for researchers to analyze whether media influences Turkish people in terms of how they view LGBTQ+ individuals. LGBTQ+ individuals in Turkey experience societal and economic pressures as they are often viewed as unnecessary identities and marginalized within the community. Mainstream media contributes significantly to this perception, as it often denies them the platform to represent themselves, defend their rights, and expose them to hate speech. In response, many LGBT individuals turn to alternative media to advocate for their rights, express themselves openly, and address the challenges they face.

Methodology of the Research

The Turkish news articles dataset, which was obtained from Kaggle and included 42 thousand unlabeled news articles, was used for this research. Through AntConc, articles that contained keywords related to the LGBTQ+ community were extracted. The chosen keywords were as follows:

Then, sentiment analysis was executed using Python. As the news articles dataset did not have labels, sentiment analysis models based on BERTurk for Turkish and XLM-RoBERTa, found on Hugging Face, were utilized. The sentences were labeled and scored as negative, positive, and neutral through XLM-RoBERTa; and negative and positive through BERTurk models. The datasets were visualized using R through the ggplot2 package.

Datasets

After executing sentiment analysis using the aforementioned models, two datasets were created for the visualization step. Both datasets had ‘File Number’, ‘Text’, ‘Sentiment Label’, and ‘Sentiment Score’ columns. The first dataset (the first five rows will be displayed), whose sentiment analysis was based on the XLM-RoBERTa model, is as follows:

The second dataset (the first five rows will be displayed), whose sentiment analysis was based on the BERTurk model, is as follows:

Findings

According to the BERTurk model:

On the other hand, according to the XLM-RoBERTa model:

The following excerpt was classified as ‘negative’ by both models. BERTurk scored the sentence ‘0.868’ and XLM-RoBERTa scored it ‘0.903’.

The following excerpt was classified as ‘positive’ by both models. BERTurk scored the sentence ‘0.954’ and XLM-RoBERTa scored it ‘0.674’.

Jitter Plots

Through the use of jitter plots, the aim was to visualize the distribution of sentiment scores and sentiment labels within the datasets. Hence, the sentiment scores were represented on the x-axis, while the y-axis depicted the corresponding sentiment labels.

In the first jitter plot below, we can see that sentiment labels were colored for audience so that they could distinguish them easily. Red large transparent dots represent positive sentiment labels, grey dots represent neutral labels, and blue dots represent negative sentiment.

Thanks to the jitter plot below, it is seen that positive sentiment labels are not as many as neutral and negative labels.

Additionally, it shows that neutral sentiment labels (grey dots) outnumber negative labels (blue dots), indicating that neutral sentiments are the most prevalent, followed by negative and positive sentiments.

It is also seen that compared to positive sentiment labels, negative labels were closer to be scored 1.0, meaning that the model classified various sentences as ‘strongly negative’.

In order for the audience to understand how plotting was performed, the code chunks of the jitter plots will be shared. Through the code chunks, it can be seen that sentiment scores were plotted on the x-axis, sentiment labels were plotted on the y-axis, and they were colored through “sentiment labels”. The audience can see that “alpha” was set to “0.5”, which makes the dots transparent and if they overlap, they become more concentrated.

library(ggplot2)
ggplot(sentiment3) +
  geom_jitter(aes(x = sentiment_score,
                  y = sentiment_label,
                  color = sentiment_label),
              height = 0.3,
              alpha = 0.5,
              size = 5) +
  scale_color_manual(values = c("positive" = "#FF6961", "negative" = "#629DFF", "neutral" = "#AAB1AA")) +
  labs(title = "Jitter Plot of Sentiment Scores by Label (XLM-RoBERTa)",x = "Sentiment Score", y = "Sentiment Label") +
  theme(plot.title = element_text(size = 12),
        legend.text = element_text(size = 10),  
        legend.title = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)) +
  guides(color = guide_legend(title = "Sentiment Labels")) +
  theme_bw(10)

The second jitter plot, which was based on the BERTurk model and whose sentiment scores and labels were represented the same way as the XLM-RoBERTa model, has only “positive” and “negative” sentiment labels.

While positive sentiment labels are represented by red large transparent dots, negative sentiment labels are represented by blue ones.

It is seen that blue transparent dots are mostly accumulated around 1.0, meaning that the model was able to classify confidently that many sentences had negative connotations. The jitter plot below proves that negative sentiment labels are more prevalent in the dataset compared to positive labels.

ggplot(sent3) +
  geom_jitter(aes(x = sentiment_score,
                  y = sentiment,
                  color = sentiment),
              height = 0.3,
              alpha = 0.5,
              size = 5) + 
  scale_color_manual(values = c("positive" = "#FF6961", "negative" = "#629DFF")) +
  labs(title = "Jitter Plot of Sentiment Scores by Label (BERTurk)", x = "Sentiment Score", y = "Sentiment Label") +
  theme(plot.title = element_text(size = 12),
        legend.text = element_text(size = 10),  
        legend.title = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)) +
  guides(color = guide_legend(title = "Sentiment Labels")) +
  theme_bw(10)

Violin Plots

Violin plots, which are the combinations of box plots and kernel density plots, allow us to see how dense the distribution of sentiment scores is in this context. The overall shapes of the violin plots below represent the distributions.

According to the violin plots below, x-axis represents “sentiment label” while y-axis represents “sentiment score”.

The first visualization, which was based on XLM-RoBERTa model, it is clearly seen that positive labels, represented by the red violin, were not as common as negative or neutral labels, which were represented by the blue and gray violins respectively.

According to the blue violin , the density of the negative labeled sentences accumulated between 0.5 to 0.6. On the other hand, the density of the neutral labeled sentences accumulated between 0.4 to almost 0.8, meaning that they were more common compared to the other labels.

According to the visualization below, which was based on BERTurk model, we see only negative and positive sentiment labels, which are represented by blue and red violins respectively.

The violin plot illustrates that the density of the negative labeled sentences were more than positive ones.

The blue violin gets larger between scores 0.9 and 1.0, which shows that the model predicted a lot of sentences as ‘strongly negative’. However, the red violin plot does not get as large as the blue one, as it is not as dense between 0.9 and 1.0 as the blue violin plot.

This violin plot helps us understand the jitter plot based on the BERTurk model even better. Because it is obvious that both sentiment labels were scored mostly between 0.9 to 1.0, like how the dots were accumulated closer to 1.0. It is seen that the violins get narrower as they go down.

Bar Plots

To display the distribution of the counts, bar plots were utilized. Bar plots allow us to see and analyze how many negative, positive, or neutral classified sentences there are within the datasets.

According to the bar plots below, x-axis represents “sentiment distribution” while y-axis represents “sentiment count”.

Based on the bar plot derived from the XLM-RoBERTa model, three sentiment labels—negative, neutral, and positive—are depicted. The blue bar represents negative labels, the grey bar represents neutral labels, and the red bar represents positive labels.

The visualization distinctly indicates that the dataset contains over 150 sentences with neutral labels, more than 100 sentences with negative labels, and approximately 15 sentences with positive labels.

According to the bar plot below, which was based on BERTurk model, there are only two sentiment labels, positive and negative, represented by red and blue bars respectively.

It is obvious that the model scored approximately 150 sentences as negative, and approximately 100 sentences as negative. The bar plot clearly shows that negative labeled sentences were more common than the positive labeled ones.

The visualization helps us see that the model classified more sentences as negative again, meaning that the majority of the sentences had negative connotations.

Literature Review

Turkey supports heteronormativity while marginalizing same-sex sexualities and gender-nonconforming identities (Atalay & Doan, 2019). The influence of the AKP’s (Justice and Development Party) rhetoric has pushed the country towards conservatism, religiosity, and more societal oppression. Ozbay (2015) states that homophobia has been widespread, with specific sexualities considered ‘deviations’ and ‘illnesses’ by public authorities and military organizations, as Selma Aliye Kavaf, Turkish Minister of State responsible for Women and Family Affairs, stated in 2010 that homosexuality was a ‘biological disorder’, an ‘illness’ that should be treated (Amnesty International, 2011, p. 5). After the coup in 1980 and through the 1990s, especially trans individuals were represented within a sexist and homophobic context in the private media channels. Through broadcasting, this period witnessed the cultivation of a national “fear of the queer” ideology (Gurel, 2017). Regardless of the growing visibility of the LGBT community, the depiction was negative, characterizing its members as sinful individuals, outcasts, and even as monsters (Atalay & Doan, 2019).

Conclusion

According to the findings of this study, the tone and framing of the LGBTQ+ community in Turkish media are mainly neutral and negative. After analyzing the results of the sentiment analyses of both models, the XLM-RoBERTa model has displayed a better classification than BERTurk. Even though BERTurk classified more sentences as positive, sentiment scores indicated that the model was not confident enough, meaning that if there was a neutral classification, most of the sentences would be classified as neutral. Hence, it was concluded that the Turkish media avoids describing the LGBTQ+ community positively; instead, words that have neutral or negative connotations are preferred. As a result, it is very likely for media organizations to influence people’s opinions, beliefs, and conceptions towards the LGBTQ+ community negatively, meaning that they do not attempt to change the negativity towards the community and keep making their lives harder. Further sentiment analysis with larger datasets that contain more news articles related to the LGBTQ+ community in Turkey should be conducted. Additionally, various sentiment analysis models should be utilized and the results should be compared to reach a better result.

References