The dataset used in this analysis is a collection of tweets with the hashtag #twitterlehrerzimmer from 1 April 2019 - 7. October 2020. In total, in 1,5 years 207984 tweets have been collected with the TAGS tool (https://tags.hawksey.info). The archive of tweets can be browsed here https://hawksey.info/tagsexplorer/arc.html?key=1FCQcodCvu0e2pzGpjdqoM6DnwstYK-m0WxinBmiyPP4&gid=400689247 (long loading times) and the data is available in an OSF Project over here https://osf.io/c7eqz/.
The analysis consists of three parts: A.) Covering data preparation and frequency statistics, B.) Transfer of data to nodes and edges for further analysis and visualization and part C.) Calculation of some basic network statistics.
A. Descriptive statistics
1. Installation of packages
In a first step we install all required packages.
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("network")
#install.packages("igraph")
2. Import of data
We import the data. The data is a direct export from the TAGS archive as a CSV.
twlz <- read.csv("~/Publications/TwittLZ19/Data/final/twlz21.csv")
3. Cleaning of empty rows
In case there are empty rows we need to exclude those before we can continue our analysis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
twlz <- twlz %>% filter(!is.na(twlz$id_str))
4. Reduction of data to full months
Since we want to produce statistics for full months only, we reduce the matrix to the timeframe April 2019 - September 2020.
twlz <- twlz[2054:210037,]
5. We transpose the time columns into a readable date format
twlz$time <- as.Date(twlz$time, '%d/%m/%Y')
6. We produce a matrix with monthly statistics
tab <- table(cut(twlz$time, 'month'))
nrtweets <- data.frame(Date=format(as.Date(names(tab)), '%m/%Y'),
Frequency=as.vector(tab))
nrtweets$Date <- factor(nrtweets$Date, levels = nrtweets$Date)
7. Now we plot a basic barplot and lables and titles
library(ggplot2)
p<-ggplot(data=nrtweets, aes(x=Date, y=Frequency)) +
geom_bar(stat="identity", width=0.8, fill="steelblue")+
theme_minimal()
p
B. Preparation of network objects and basic SNA
After having cleaned our data in a previous step, we prepare the data now for SNA. We follow a tutorial by Jesse Sadler https://www.jessesadler.com/post/network-analysis-with-r/ As a first step, we need to produce a list of edges and nodes from the data.
1. Creation of list of nodes
library(tidyverse)
## ── Attaching packages ────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.4 ✓ purrr 0.3.3
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
sources <- twlz %>%
distinct(from_user) %>%
rename(label = from_user)
destinations <- twlz %>%
distinct(in_reply_to_screen_name) %>%
rename(label = in_reply_to_screen_name)
nodes <- full_join(sources, destinations, by = "label")
nodes <- nodes %>% rowid_to_column("id")
2. Creation of edges
per_route <- twlz %>%
group_by(from_user, in_reply_to_screen_name) %>%
summarise(weight = n()) %>%
ungroup()
## `summarise()` regrouping output by 'from_user' (override with `.groups` argument)
edges <- per_route %>%
left_join(nodes, by = c("from_user" = "label")) %>%
rename(from = id)
edges <- edges %>%
left_join(nodes, by = c("in_reply_to_screen_name" = "label")) %>%
rename(to = id)
edges <- select(edges, from, to, weight)
3. Creation of network object for network package and basic visulization
library(network)
## network: Classes for Relational Data
## Version 1.16.1 created on 2020-10-06.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## Mark S. Handcock, University of California -- Los Angeles
## David R. Hunter, Penn State University
## Martina Morris, University of Washington
## Skye Bender-deMoll, University of Washington
## For citation information, type citation("network").
## Type help("network-package") to get started.
nodes$label <- as.character(nodes$label)
routes_network <- network(edges, vertex.attr = nodes, matrix.type = "edgelist", ignore.eval = FALSE)
class(routes_network)
## [1] "network"
plot(routes_network, vertex.cex = 3)
4. Creation of network package for iGraph and basic visualization
detach(package:network)
rm(routes_network)
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
routes_igraph <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)
plot(routes_igraph, edge.arrow.size = 0.2)