In order to answer our research question, we will focus on data collected from a list of political parties and politicians from both countries. We acquired the data from iCandid and with the permission of Prof. Leen d’Haenens. The original dataset consisted of tweets and their metadata between 2015 and 2022 and is collected based on a list of Twitter handles of politicians and parties of 4 European countries (Germany, Austria, Italy and Hungary).
We chose to focus on Germany and Austria in our research, due to 3 reasons: 1) They both have important ties and connections to Russia, yet they are also both EU members, 2) their national language is German, making certain NLP applications and analysis easier and more consistent. 3) Our team’s language capabilities and familiarity with their national contexts.
Time period: The dataset covers 1st January 2015 to 3rd April 2022. We The occupation started on 24th February. However, we wanted to monitor the emergence of the crisis and included all the tweets from 2022.
Dataest Features: * Id: Unique tweet id of the tweet Type: iCandid generated item type. All data has the type “Message” Author: The original author of the tweet * Text: The text of the tweet * Sender: The sending account of the tweet. The values in this feature correspond to our list of accounts * datePublished: Date of the tweet * Url: Url address of the tweet * Keywords: Hashtags contained in the text * Mentions: Mentions contained in the text * Country: The country of the origin of the account We focused on sender, text, date and hashtags in order to apply our computational analyses.
This research project has the goal of comparing the two different countries with certain historical, social and linguistic ties and find similarities and differences in their online political discussions of the invasion of Ukraine. In order to do so, we will first clean the data. Then we will proceed with some descriptive statistics based on party and date based visualisations and tables. Then we will move on to the linguistic analyses starting with wordclouds. Then we will move on to more advanced NLP applications, namely sentiment analysis and topic detection of the tweets. In all these methods we will pay special attention to 1) any similar trends and patterns emerging between the two countries and 2) differences in linguistic expressions and social media grammars. Moreover, we will also try to detect prominent political actors in relation to Ukraine crisis in each country.
library(knitr)
library(glue)
library(tidyverse)
library(readr)
library(dplyr)
library(RColorBrewer)
library(wordcloud)
library(wordcloud2)
library(tm)
library(SnowballC)
library(RCurl)
library(XML)
library(tidytext)
library(quanteda)
library(quanteda.textstats)
library(quanteda.textplots)
library(udpipe)
library(spacyr)
library(syuzhet)
library(lubridate)
library(ggplot2)
library(scales)
library(reshape2)
library(topicmodels)
library(cowplot)
The dataset consists of 2 .csv files, one for Germany and another for Austria.
opts_knit$set(progress=FALSE, verbose=FALSE)
twitter_Germany <- read_delim("twitterDuitsePoliticiAccount.csv",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE)
## Rows: 284198 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (8): id, type, author, text, sender, url, keywords, mentions
## date (1): datePublished
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# View(twitter_Germany)
opts_knit$set(progress=FALSE, verbose=FALSE)
twitter_Austria <- read_delim("twitterOostenrijksePoliticiAccount.csv",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE)
## Rows: 188936 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (8): id, type, author, text, sender, url, keywords, mentions
## date (1): datePublished
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# View(twitter_Austria)
A look into the dataframes: Germany
head(twitter_Germany)
Austria
head(twitter_Austria)
Select the time period of the study and the required variables.
# Select tweets through 2022-01-01 till 2022-04-03
tweets_Germany <- filter(twitter_Germany,
datePublished > "2022-01-01")
tweets_Austria <- filter(twitter_Austria,
datePublished > "2022-01-01")
# select the targeted variables
tweets_Germany <- select(tweets_Germany, author, text, sender, datePublished, keywords, mentions)
tweets_Austria <- select(tweets_Austria, author, text, sender, datePublished, keywords, mentions)
In this part, the frequency of Ukraine in both countries is calculated and visualized.
Germany
hashtags_Germany <- data.frame(str_split_fixed(tweets_Germany$keywords, ",", 10), datePublished = tweets_Germany$datePublished, sender = tweets_Germany$sender, mentions = tweets_Germany$mentions)
hashtags_Germany <- tibble(hashtags_Germany)
hashtags_Germany$X1 <- str_detect(toupper(str_squish(hashtags_Germany$X1)), "UKRAIN.*")
hashtags_Germany$X2 <- str_detect(toupper(str_squish(hashtags_Germany$X2)), "UKRAIN.*")
hashtags_Germany$X3 <- str_detect(toupper(str_squish(hashtags_Germany$X3)), "UKRAIN.*")
hashtags_Germany$X4 <- str_detect(toupper(str_squish(hashtags_Germany$X4)), "UKRAIN.*")
hashtags_Germany$X5 <- str_detect(toupper(str_squish(hashtags_Germany$X5)), "UKRAIN.*")
hashtags_Germany$X6 <- str_detect(toupper(str_squish(hashtags_Germany$X6)), "UKRAIN.*")
hashtags_Germany$X7 <- str_detect(toupper(str_squish(hashtags_Germany$X7)), "UKRAIN.*")
hashtags_Germany$X8 <- str_detect(toupper(str_squish(hashtags_Germany$X8)), "UKRAIN.*")
hashtags_Germany$X9 <- str_detect(toupper(str_squish(hashtags_Germany$X9)), "UKRAIN.*")
hashtags_Germany$X10 <- str_detect(toupper(str_squish(hashtags_Germany$X10)), "UKRAIN.*")
hashtags_Germany$Count_g <- hashtags_Germany$X1+hashtags_Germany$X2+hashtags_Germany$X3+hashtags_Germany$X4+hashtags_Germany$X5+hashtags_Germany$X6+hashtags_Germany$X7+hashtags_Germany$X8+hashtags_Germany$X9+hashtags_Germany$X10
# This all data created in new object to be used later to extract the tweets that are related to Ukriane.
Ukriane_tweets_g <- cbind(hashtags_Germany, text = tweets_Germany$text)
Austria
hashtags_Austria <- data.frame(str_split_fixed(tweets_Austria$keywords, ",", 10), datePublished = tweets_Austria$datePublished, sender = tweets_Austria$sender, mentions = tweets_Austria$mentions)
hashtags_Austria <- tibble(hashtags_Austria)
hashtags_Austria$X1 <- str_detect(toupper(str_squish(hashtags_Austria$X1)), "UKRAIN.*")
hashtags_Austria$X2 <- str_detect(toupper(str_squish(hashtags_Austria$X2)), "UKRAIN.*")
hashtags_Austria$X3 <- str_detect(toupper(str_squish(hashtags_Austria$X3)), "UKRAIN.*")
hashtags_Austria$X4 <- str_detect(toupper(str_squish(hashtags_Austria$X4)), "UKRAIN.*")
hashtags_Austria$X5 <- str_detect(toupper(str_squish(hashtags_Austria$X5)), "UKRAIN.*")
hashtags_Austria$X6 <- str_detect(toupper(str_squish(hashtags_Austria$X6)), "UKRAIN.*")
hashtags_Austria$X7 <- str_detect(toupper(str_squish(hashtags_Austria$X7)), "UKRAIN.*")
hashtags_Austria$X8 <- str_detect(toupper(str_squish(hashtags_Austria$X8)), "UKRAIN.*")
hashtags_Austria$X9 <- str_detect(toupper(str_squish(hashtags_Austria$X9)), "UKRAIN.*")
hashtags_Austria$X10 <- str_detect(toupper(str_squish(hashtags_Austria$X10)), "UKRAIN.*")
hashtags_Austria$Count_a <- hashtags_Austria$X1+hashtags_Austria$X2+hashtags_Austria$X3+hashtags_Austria$X4+hashtags_Austria$X5+hashtags_Austria$X6+hashtags_Austria$X7+hashtags_Austria$X8+hashtags_Austria$X9+hashtags_Austria$X10
# This all data created in new object to be used later to extract the tweets that are related to Ukriane.
Ukriane_tweets_a <- cbind(hashtags_Austria, text = tweets_Austria$text)
These two tables above show that parties in Germany tweet more compared to individual politicians, while politicians from Austria tweet more than the parties.
NOTE would not it be better 1) we limit these tables to top 10 (they are too long) 2) have them as a bar charts
NOTE 2 can not we also get the same bar charts/tables with Ukraine hashtags/or contains ukraine. Then we can normalize them. Then we would see (for example in percentages) which party or political leader devoted more tweets to Ukraine. That would be a great discussion
It is informative to see who tweeted the most during the relevant period. In order to see that we will construct simple a frequency list of tweets by the account who sent the tweets.
Germany
# All tweets
## Germany
Sender_freq_g <- str_squish(unlist(na.omit(toupper(str_squish(tweets_Germany$sender)))))
Senders <- data.frame(sort(table(Sender_freq_g), decreasing=TRUE))
df <- data.frame(Sender = Senders$Sender_freq_g, Freq=Senders$Freq)
f1 <- df[1:10,]
## Austria
Sender_freq_a <- str_squish(unlist(na.omit(toupper(str_squish(tweets_Austria$sender)))))
Senders <- data.frame(sort(table(Sender_freq_a), decreasing=TRUE))
df <- data.frame(Sender = Senders$Sender_freq_a, Freq=Senders$Freq)
f2 <- df[1:10,]
# Ukraine tweets
## Germany
Sender_freq_g <- Ukriane_tweets_g[Ukriane_tweets_g$Count_g > 0,]
Sender_freq_g <- str_squish(unlist(na.omit(toupper(str_squish(Sender_freq_g$sender)))))
Senders <- data.frame(sort(table(Sender_freq_g), decreasing=TRUE))
df <- data.frame(Sender = Senders$Sender_freq_g, Freq=Senders$Freq)
f3 <- df[1:10,]
## Austria
Sender_freq_a <- Ukriane_tweets_a[Ukriane_tweets_a$Count_a > 0,]
Sender_freq_a <- str_squish(unlist(na.omit(toupper(str_squish(Sender_freq_a$sender)))))
Senders <- data.frame(sort(table(Sender_freq_a), decreasing=TRUE))
df <- data.frame(Sender = Senders$Sender_freq_a, Freq=Senders$Freq)
f4 <- df[1:10, ]
f_all <- data.frame(f1,f2,f3,f4)
names(f_all) <- c("Senders_1", "Freq_1", "Sender_2", "Freq_2", "Senders_3", "Freq_3", "Senders_4", "Freq_4")
kable(f_all, col.names = c("All tweets senders (Germany)","Number of tweets","All tweets senders (Austria)","Number of tweets","Ukraine tweets senders (Germany)","Number of tweets","Ukraine tweets senders (Austria)","Number of tweets"))
| All tweets senders (Germany) | Number of tweets | All tweets senders (Austria) | Number of tweets | Ukraine tweets senders (Germany) | Number of tweets | Ukraine tweets senders (Austria) | Number of tweets |
|---|---|---|---|---|---|---|---|
| SPD PARTEIVORSTAND 🇪🇺 | 1697 | RUDI ANSCHOBER | 2277 | SPD PARTEIVORSTAND 🇪🇺 | 129 | RUDI ANSCHOBER | 98 |
| CDU/CSU | 1156 | PETER PILZ | 1353 | CDU/CSU | 105 | PETER PILZ | 17 |
| FDP | 1008 | DAS NEUE ÖSTERREICH | 560 | CEM ÖZDEMIR | 64 | DAS NEUE ÖSTERREICH | 12 |
| JOANA COTAR | 665 | BEATE MEINL-REISINGER | 522 | FDP | 59 | DIE GRÜNEN | 12 |
| CSU | 623 | SPÖ | 496 | CDU DEUTSCHLANDS | 48 | HAGEN REINHOLD, MDB | 11 |
| CEM ÖZDEMIR | 587 | FPÖ | 282 | CSU | 42 | WERNER KOGLER | 10 |
| CDU DEUTSCHLANDS | 540 | MATTHIAS STROLZ | 276 | MARKUS SÖDER | 34 | SPÖ | 9 |
| DIE LINKE | 520 | DIE GRÜNEN | 236 | DIE LINKE | 32 | MATTHIAS STROLZ | 7 |
| ALTERNATIVE FÜR 🇩🇪 DEUTSCHLAND | 449 | WERNER KOGLER | 199 | CHRISTIAN LINDNER | 29 | BEATE MEINL-REISINGER | 6 |
| MARKUS SÖDER | 407 | HAGEN REINHOLD, MDB | 143 | SAHRA WAGENKNECHT | 26 | PAMELA RENDI-WAGNER | 5 |
In this part, we will plot the sum of Ukraine related hastags in the given period. NOTE we should do this at least on the basis of weeks if not days. Then we can see better and speak about the plot more. Monthly division is too broad.
Agg_hashtags_g <- aggregate(Count_g ~ datePublished, data = hashtags_Germany, sum)
Agg_hashtags_a <- aggregate(Count_a ~ datePublished, data = hashtags_Austria, sum)
plot(Agg_hashtags_g$datePublished, Agg_hashtags_g$Count, type = "l", xlab = "Date", ylab = "Number of hashtaging UKRAINE")
lines(Agg_hashtags_a$datePublished, Agg_hashtags_a$Count, col = "red", type = "l")
legend("topleft", legend=c("Germany", "Austria"),
col=c("Black", "Red"), lty=1, cex=0.8)
There were some initial political reactions from Germany before the
invasion, while Austrian actors remained mostly silent on the issue.
Unsurprisingly the tweets dramatically escalate as the invasion began in
both countries. There are more Ukraine-hashtagged tweets from German
actors, however we follow more users from there and we have more data.
Thus this appereance can be related to the simple issue of quantity of
data. Finally, German actors continue to tweet more about Ukraine, while
the debate in Austria goes down.
There are two relevant ways of generating wordclouds in our data: 1) Based on the hashtags and 2) based on the tweet texts. We will conduct both starting with the hashtags.
For sentiment analysis of the tweet contents we employed “Syuzhet” package. It is an ” An R package for the extraction of sentiment and sentiment-based plot arcs from text.”
NOTE: I think we do a sentiment analysis here on all tweets? Why? This is random and unrelated to the research question as it stands. We can still keep this but then We should also do a sentiment analysis on tweets about Ukraine and compare how sentiment is different.
# Sentiment analysis for all tweets
# Germany
Text_all_g <- Ukriane_tweets_g$text
Text_all_g <- gsub("#\\S*", "", Text_all_g)
Text_all_g <- gsub("https\\S*", "", Text_all_g)
Text_all_g <- gsub("@\\S*", "", Text_all_g)
Text_all_g <- gsub("amp", "", Text_all_g)
Text_all_g <- gsub("[\r\n]", "", Text_all_g)
Text_all_g <- gsub("[[:punct:]]", "", Text_all_g)
Text_all_g <- gsub("\\d", "", Text_all_g)
Text_all_g <- na.omit(toupper(str_squish(Text_all_g)))
ger_all = corpus(Text_all_g) %>%
tokens(remove_punct=T) %>%
dfm() %>%
dfm_remove(stopwords("german")) %>%
dfm_remove(stopwords("english")) %>%
dfm_remove(c("dass", "menschen"))
textplot_wordcloud(ger_all, max_words=200)
words <- sort(colSums(ger_all), decreasing = T)
df <- data.frame(word = names(words), freq=words)
df <- df[df$freq > 300, ]
barplot(df$freq, names.arg = df$word, las=2, col = 2, main = "Germany")
# Sentiment Analysis
tg <- iconv(Text_all_g)
s1 <- get_nrc_sentiment(tg, language = "german")
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
## Please use `spread()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
barplot(colSums(s1),
las = 2,
col = rainbow(10),
ylab = 'Count',
main = 'Germany - Sentiment Scores Tweets')
values_g <- get_sentiment(Text_all_g, method = "syuzhet", language = "german")
simple_plot(values_g)
#Austria
Text_all_a <- Ukriane_tweets_a$text
Text_all_a <- gsub("#\\S*", "", Text_all_a)
Text_all_a <- gsub("https\\S*", "", Text_all_a)
Text_all_a <- gsub("@\\S*", "", Text_all_a)
Text_all_a <- gsub("amp", "", Text_all_a)
Text_all_a <- gsub("[\r\n]", "", Text_all_a)
Text_all_a <- gsub("[[:punct:]]", "", Text_all_a)
Text_all_a <- gsub("\\d", "", Text_all_a)
Text_all_a <- na.omit(toupper(str_squish(Text_all_a)))
aus_all = corpus(Text_all_a) %>%
tokens(remove_punct=T) %>%
dfm() %>%
dfm_remove(stopwords("german")) %>%
dfm_remove(stopwords("english")) %>%
dfm_remove(c("dass", "menschen"))
textplot_wordcloud(aus_all, max_words=200)
words <- sort(colSums(aus_all), decreasing = T)
df <- data.frame(word = names(words), freq=words)
df <- df[df$freq > 200, ]
barplot(df$freq, names.arg = df$word, las=2, col = 2, main = "Austria")
# Sentiment Analysis
ta <- iconv(Text_all_a)
s2 <- get_nrc_sentiment(ta, language = "german")
barplot(colSums(s2),
las = 2,
col = rainbow(10),
ylab = 'Count',
main = 'Austria - Sentiment Scores Tweets')
values_a <- get_sentiment(Text_all_a, method = "syuzhet", language = "german")
simple_plot(values_a)
# Sentiment analysis for the tweets that include #Ukraine
# Germany
#Create a vector containing only the text
Text_g <- Ukriane_tweets_g[Ukriane_tweets_g$Count_g > 0,] #selecting the tweets that include #Ukraine
Text_g <- Text_g$text
# clean the text
Text_g <- gsub("#\\S*", "", Text_g)
Text_g <- gsub("https\\S*", "", Text_g)
Text_g <- gsub("@\\S*", "", Text_g)
Text_g <- gsub("amp", "", Text_g)
Text_g <- gsub("[\r\n]", "", Text_g)
Text_g <- gsub("[[:punct:]]", "", Text_g)
Text_g <- gsub("\\d", "", Text_g)
Text_g <- na.omit(toupper(str_squish(Text_g)))
ger = corpus(Text_g) %>%
tokens(remove_punct=T) %>%
dfm() %>%
dfm_remove(stopwords("german")) %>%
dfm_remove(stopwords("english")) %>%
dfm_remove(c("dass", "menschen"))
textplot_wordcloud(ger, max_words=200)
words <- sort(colSums(ger), decreasing = T)
df <- data.frame(word = names(words), freq=words)
df <- df[df$freq > 30, ]
barplot(df$freq, names.arg = df$word, las=2, col = 2, main = "Germany")
tg <- iconv(Text_g)
s1 <- get_nrc_sentiment(tg, language = "german")
barplot(colSums(s1),
las = 2,
col = rainbow(10),
ylab = 'Count',
main = 'Germany - Sentiment Scores Tweets')
values_g <- get_sentiment(Text_g, method = "syuzhet", language = "german")
simple_plot(values_g)
#Austria
Text_a <- Ukriane_tweets_a[Ukriane_tweets_a$Count_a > 0,]
Text_a <- Text_a$text
# clean the text
Text_a <- gsub("#\\S*", "", Text_a)
Text_a <- gsub("https\\S*", "", Text_a)
Text_a <- gsub("@\\S*", "", Text_a)
Text_a <- gsub("amp", "", Text_a)
Text_a <- gsub("[\r\n]", "", Text_a)
Text_a <- gsub("[[:punct:]]", "", Text_a)
Text_a <- gsub("\\d", "", Text_a)
Text_a <- na.omit(toupper(str_squish(Text_a)))
aus = corpus(Text_a) %>%
tokens(remove_punct=T) %>%
dfm() %>%
dfm_remove(stopwords("german")) %>%
dfm_remove(stopwords("english")) %>%
dfm_remove(c("dass", "menschen"))
textplot_wordcloud(aus, max_words=200)
words <- sort(colSums(aus), decreasing = T)
df <- data.frame(word = names(words), freq=words)
df <- df[df$freq > 10, ]
barplot(df$freq, names.arg = df$word, las=2, col = 2, main = "Austria")
ta <- iconv(Text_a)
s2 <- get_nrc_sentiment(ta, language = "german") # the number of postive and negative terms
barplot(colSums(s2),
las = 2,
col = rainbow(10),
ylab = 'Count',
main = 'Austria - Sentiment Scores Tweets')
values_a <- get_sentiment(Text_a, method = "syuzhet", language = "german")
simple_plot(values_a)
In order to apply a topic detection to tweet contents we utilized Latent Dirichlet Allocation (LDA). The algoritm produced the following topic categories and the keywords associated with them.
NOTE Can not we get rid of stop words here as well? there is so many adverbs and meaningless stuff in the result. It is hard to interpret.
# LDA for for the all tweets.
lda_all_g = ger_all %>%
convert(to = "topicmodels") %>%
LDA(k=10,control=list(seed=123, alpha = 1/1:10))
terms(lda_all_g, 10)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "müssen" "heute" "krieg" "presseinfo" "bm"
## [2,] "deutschland" "uhr" "ukraine" "mehr" "sagt"
## [3,] "dafür" "unserer" "europa" "neue" "wäre"
## [4,] "wer" "live" "unsere" "gibt" "ende"
## [5,] "unsere" "unsere" "russland" "brauchen" "heute"
## [6,] "gut" "ab" "putin" "weniger" "u"
## [7,] "mehr" "freiheit" "putins" "ministerpräsident" "darüber"
## [8,] "verantwortung" "opfer" "seite" "statt" "fragen"
## [9,] "erste" "morgen" "stehen" "müssen" "bundestag"
## [10,] "recht" "gesellschaft" "lage" "braucht" "deutsche"
## Topic 6 Topic 7 Topic 8 Topic 9
## [1,] "interview" "gute" "beim" "mehr"
## [2,] "sei" "glückwunsch" "brauchen" "euro"
## [3,] "endlich" "herzlichen" "bayern" "brauchen"
## [4,] "bürger" "danke" "müssen" "müssen"
## [5,] "müssen" "lieber" "setzen" "mrd"
## [6,] "bundesregierung" "frankwalter" "heute" "schnell"
## [7,] "heute" "erfolg" "energien" "bürger"
## [8,] "geht" "freue" "deutschland" "bundesregierung"
## [9,] "völlig" "arbeit" "cl" "macht"
## [10,] "schritt" "dank" "zukunft" "milliarden"
## Topic 10
## [1,] "frauen"
## [2,] "mehr"
## [3,] "saarland"
## [4,] "minister"
## [5,] "kinder"
## [6,] "tage"
## [7,] "landtagswahl"
## [8,] "heute"
## [9,] "themen"
## [10,] "erfahren"
# Austria
lda_all_a = aus_all %>%
convert(to = "topicmodels") %>%
LDA(k=10,control=list(seed=123, alpha = 1/1:10))
terms(lda_all_a, 10)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "krieg" "russia" "österreich" "ukraine" "österreich"
## [2,] "ukraine" "ukraine" "geht" "russian" "russische"
## [3,] "putins" "russian" "övp" "ukrainian" "angriff"
## [4,] "heute" "kyiv" "wurde" "city" "russland"
## [5,] "kurz" "people" "wksta" "putin" "unsere"
## [6,] "europa" "new" "steht" "breaking" "ukraine"
## [7,] "müssen" "now" "nehammer" "says" "danke"
## [8,] "russischen" "us" "regierung" "said" "putin"
## [9,] "wien" "today" "macht" "minister" "solidarität"
## [10,] "russland" "ukrainian" "neutralität" "russias" "sanktionen"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "mehr" "impfpflicht" "mehr" "gute" "heute"
## [2,] "regierung" "immer" "immer" "heute" "geht"
## [3,] "seit" "mehr" "seit" "viele" "sobotka"
## [4,] "österreich" "regierung" "wer" "tag" "gemeinsam"
## [5,] "geht" "österreich" "schon" "pandemie" "gast"
## [6,] "endlich" "gibt" "ja" "bitte" "los"
## [7,] "jahren" "heute" "övp" "omikron" "gibt"
## [8,] "europa" "övp" "geht" "mehr" "övp"
## [9,] "heute" "schon" "müssen" "wurde" "zackzack"
## [10,] "wenig" "fpö" "jahren" "schon" "erfolg"
# LDA for for the tweets that include #Ukraine
# Germany
lda_g = ger %>%
convert(to = "topicmodels") %>%
LDA(k=10,control=list(seed=123, alpha = 1/1:10))
terms(lda_g, 10)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "krieg" "angriff" "unseren" "freiheit" "unsere"
## [2,] "heute" "presseinfo" "partnern" "unsere" "krieg"
## [3,] "ukraine" "krieg" "putins" "demokratie" "putins"
## [4,] "mehr" "land" "angriffskrieg" "deutschland" "heute"
## [5,] "kiew" "russland" "gemeinsam" "ukrainischen" "uhr"
## [6,] "unsere" "putin" "stehen" "frieden" "lage"
## [7,] "gast" "treffen" "seite" "präsident" "thema"
## [8,] "verurteilen" "russischen" "leid" "helfen" "gespräch"
## [9,] "schärfste" "sofort" "angriff" "russische" "danke"
## [10,] "seit" "seite" "eu" "seite" "angriffskrieg"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "ukraine" "bayern" "krieg" "ukraine" "hilfe"
## [2,] "fraktionschef" "folgen" "uhr" "krieg" "deutschland"
## [3,] "eu" "brauchen" "zeichen" "europa" "mehr"
## [4,] "gemeinsam" "krieg" "heute" "mehr" "müssen"
## [5,] "sei" "helfen" "live" "stehen" "kannst"
## [6,] "unsere" "bund" "berlin" "seite" "heute"
## [7,] "heute" "hilft" "angriff" "tag" "deutschen"
## [8,] "waffen" "müssen" "russischen" "solidarität" "unterstützung"
## [9,] "klar" "verteilung" "deutschland" "russland" "folgen"
## [10,] "krieg" "solidarität" "frieden" "gilt" "bayern"
# Austria
lda_a = aus %>%
convert(to = "topicmodels") %>%
LDA(k=10,control=list(seed=123, alpha = 1/1:10))
terms(lda_a, 10)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "people" "angriff" "angriff" "krieg" "solidarität"
## [2,] "österreich" "krieg" "ukraine" "sicherheit" "ukraine"
## [3,] "russian" "unsere" "now" "starkes" "schon"
## [4,] "united" "bevölkerung" "österreich" "frieden" "medizinische"
## [5,] "danke" "heute" "russlands" "wien" "seiten"
## [6,] "ukraine" "österreichs" "seit" "heute" "volle"
## [7,] "make" "solidarität" "russland" "putins" "toy"
## [8,] "peace" "gilt" "vergessen" "zeichen" "bridge"
## [9,] "must" "ukrainischen" "steht" "europas" "people"
## [10,] "oh" "mitgefühl" "vielen" "österreichs" "österreich"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "ukraine" "russischen" "krieg" "heute" "österreich"
## [2,] "geht" "krieg" "müssen" "putins" "europa"
## [3,] "darum" "wien" "crowd" "uhr" "unsere"
## [4,] "hospital" "helfen" "unfassbar" "solidarität" "ukraine"
## [5,] "stop" "seite" "zeiten" "heldenplatz" "russland"
## [6,] "ukrainian" "millionen" "schauen" "stop" "evacuation"
## [7,] "tag" "angriff" "us" "unsere" "angriff"
## [8,] "bereits" "verloren" "jahren" "hilfe" "hours"
## [9,] "haltung" "russland" "russland" "putin" "years"
## [10,] "gedanken" "gibt" "russische" "wiener" "härtesten"
By applying this unsupervised ML method we acquired certain divisions between tweets based on keywords. This