This document sets out the process to identify the patients for the post-Brexit cyberhate experiment.
First step is to scrape users from the archive account. The achive account was created in order to make it easier to come up with an initial list of manually identified hateful users from Britain (patients) on Twitter. While scrolling Twitter, it is relatively easy to detect hateful tweets and their authors. However, storing usernames or user IDs of the potential patients is not very practical because of the laborous (and boring) copy-pasting process. It is also hard to keep the patient list syncronised across multiple researchers and devices. Therefore, we came up with an idea of setting up an archive account make it easier to come up with an initial patient list. The process was as follows:
Using the initial patient list as a starting point, we have identified the larger network of patient network.
First step was to import the initial list to R. It was possible by scraping the users followed by the archive account.
#loading packages
library(rtweet)
library(tidyverse)
archive_user <- 'Archive_Data_E' %>%
as_screenname()
initial_patients_list <- rtweet::get_friends(users = archive_user[1], retryonratelimit = TRUE) %>%
select(user_id) %>%
lookup_users()
Here is a quick peek at the potential patient list without exposing their identities.
initial_patients_list %>%
select(4,8,9,11,12,13,14,15) %>%
print(n=nrow(.))
## # A tibble: 36 x 8
## location followers_count friends_count statuses_count favourites_count
## <chr> <int> <int> <int> <int>
## 1 United K… 6631 6429 95129 40434
## 2 Yorkshir… 26916 133 2739 2287
## 3 Manchest… 748 595 5241 8512
## 4 "Stockpo… 102 384 2124 2414
## 5 "" 1369 2283 14103 7218
## 6 North Ea… 9730 2067 10734 18492
## 7 London, … 282 318 1559 3358
## 8 wales 186 330 8642 52
## 9 Burnie -… 96 323 2769 1649
## 10 Portsmou… 206 762 432 1157
## 11 Edinburgh 1147 805 27492 21016
## 12 "Luton E… 393844 12520 99615 58974
## 13 "Scotlan… 4027 4255 51880 77249
## 14 United K… 2498 2385 7912 5019
## 15 "" 17986 976 5031 5104
## 16 England,… 37109 37884 27798 17071
## 17 Corrupt … 6465 2041 20771 53786
## 18 ⬇️ My Y… 14412 703 3517 53351
## 19 New York… 2832 2673 23439 5294
## 20 Scotland 3308 477 22293 28359
## 21 Greenwic… 1209 1150 2685 3654
## 22 "" 617 127 7536 72070
## 23 UK 4939 4629 2061 2335
## 24 uk 527 414 13135 7278
## 25 UK 711 685 4648 728
## 26 Norway 88791 353 10096 924
## 27 France .… 7672 7288 166627 47494
## 28 In a Gul… 81 394 1578 127
## 29 Yorkshire 862 304 962 41109
## 30 Yorkshir… 2210 2728 45629 17187
## 31 Anywhere… 1935 1656 33507 54947
## 32 "" 2626 1915 41142 16281
## 33 Milton K… 17 29 327 10
## 34 "" 1294 1017 35074 19203
## 35 Milton K… 56 29 3075 42
## 36 Newport,… 140 251 2624 1889
## # ... with 3 more variables: account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>
We have 36 distinct patients with varying follower and following counts. Total number of Twitter users following the initial patitent list is 643581 and total number of Twitter users being followed by the initial patient list is 101312. Mind that these numbers do not account for duplicates so they are bound to decrease.
Lets take a look at histograms.
options(scipen=10000)
# Histogram of follower counts
p1 <- initial_patients_list %>%
ggplot( aes(followers_count))+
geom_histogram(bins = 100)+
theme_bw()+
labs(title="Histogram of Follower Counts")
# Histogram of friend counts
p2 <- initial_patients_list %>%
ggplot( aes(friends_count))+
geom_histogram(bins = 100)+
theme_bw()+
labs(title="Histogram of Friend Counts")
gridExtra::grid.arrange(p1, p2, nrow=2)
Histograms of followers and friends counts indicate there are some outliers in the initial list, such as the patient with nearly 400000 followers (you know who) and the patient following more than 35000 users.
Before moving on to scraping followers and friends of the initial list of patients, lets take a look at the follower/friend counts scatterplot.
p3 <- initial_patients_list %>%
ggplot( aes(x=followers_count, y=friends_count))+
geom_point()+
theme_bw()+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
# xlim(0,30000)+
# ylim(0,10000)+
labs(title="Outliers shown")
p4 <- initial_patients_list %>%
ggplot( aes(x=followers_count, y=friends_count))+
geom_point()+
theme_bw()+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlim(0,30000)+
ylim(0,7500)+
labs(title="Outliers hidden")
gridExtra::grid.arrange(p3, p4, nrow=1,
top= grid::textGrob("Scatterplots of Follower vs Friend counts", gp=grid::gpar(fontsize=18)),
widths= c(0.55,0.45))
## Warning: Removed 3 rows containing missing values (geom_point).
Now, let’s scrape followers and friends of our initial patient list.
initial_patients_list <- rtweet::get_friends(users = archive_user[1], retryonratelimit = TRUE) %>%
select(user_id) %>%
lookup_users()
?get_friends