Data Sources: Collect news or articles related to immigrants from various media sources.
Extracted Text: Extract text from the news, either in the form of full articles or certain paragraphs that focus on immigration issues.
Tokenization: Breaking text into words or phrases that can be analyzed.
Normalization: Changing words to their basic form (lemmatization) and removing stop words (common words that are not important).
Data Cleaning: Removing irrelevant punctuation, numbers, and symbols.
datawords <-data.frame(
Words = c("Refugees","Discrimiation","Hates","Murder", "Steal","Kill",
"Death","Racism","Dark","Black","Bad","Slaughterer","Human",
"Right","Safety","Development", "Ecoonmic","Money","Transfer",
"Population","Demogrphy","Show","Massacre","Rapits", "Gun",
"Violance","Terror","Sexsuality","Anti","Uncertainty"),
Sentiment = c("Negative","Negative", "Negative","Negative","Negative",
"Negative","Negative","Negative","Negative", "Negative",
"Negative","Negative","Positive","Positive","Positive",
"Positive","Positive","Positive","Positive","Positive",
"Positive","Positive","Negative","Negative","Negative",
"Negative","Negative","Negative","Negative","Negative"),
Data = c(5500,4500,4300,4021,3991,3812,3711,3300,3110,3000,2911,2912,
4900,4211,3999,3555,3122,2821,2742,2641,2521,2240,2821,2712,
2681,2531,2412,2341,2222,1999))
library(tidytext)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
datawords %>%
group_by(Sentiment) %>%
top_n(30) %>%
ggplot(aes(reorder(Words, Data), Data, fill = Sentiment)) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = TRUE) +
facet_wrap(~Sentiment, scales = "free_y") +
labs(title = "Sentiment Analysis Text Framing News About Immigrants",
y = "Text Amount", x = NULL) +
coord_flip() +
theme_bw()
## Selecting by Data
Sentiment Labeling: Use Natural Language Processing (NLP) techniques to classify words or phrases into sentiment categories, such as positive, negative, or neutral.
Sentiment Dictionary: Some words or phrases found in the text will be compared with an existing sentiment dictionary (lexicon) to assess whether the word has a positive or negative connotation.
Word Frequency: Count the frequency of certain words or phrases that appear in the news, and group them based on sentiment (positive or negative).
Sentiment Separation: The results of the sentiment analysis are separated into two groups: positive and negative, as seen in the image.
Bar Chart; The results of word frequency based on sentiment are then visualized in the form of a bar graph, which shows the number of occurrences of certain words.
Sentiment Separation; The graph is separated into two parts: one for negative sentiment and one for positive sentiment, with the number of occurrences of words on the X-axis and the words on the Y-axis.
Color: Different colors are used to differentiate sentiments (e.g. red for negative, blue for positive).
Framing Analysis: By looking at the most frequently occurring words, you can see how the media frames news about immigrants, either positively or negatively.
Conclusion:Based on this analysis, conclusions can be drawn about bias or tendencies in news reporting about immigrants.
These are the general steps taken to produce a sentiment analysis and text framing as shown in the figure.