Twitter Analysis

ანალიზისთვის საჭირო პაკეტები:

#პაკეტებს ყოველთვის საწყის სექციაში ვათავსებ ერთად. განმეორებითი ანალიზის დროს უფრო მარტივია ყველა პაკეტის ერთად ჩატვირთვა
library(skimr) #მონაცემების მიმოხილვისთვის
library(rtweet) #ტვიტერის API-სთან საკომუნიკაციოდ
library(readr) #მონაცემების ჩასატვირთად
library(dplyr) #ტექსტის მანიპულაციისთვის
library(tidyr)
library(tidytext) #სენტიმენტ ანალიზისთვის
library(lubridate) #დროით ცვლადთან სამუშაოდ
library(ggplot2) #ვიზუალიზაციისთვის
library(ggrepel) #ვიზუალიზაციის დამატებითი პარამეტრების გასაკონტროლებლად

ბრიტანეთის მთავრობის საიტზე მოცემულია კოვიდის გავრცელების სტატისტიკა ადმინისტრაციული ერთეულების მიხედვით. მე აღებული მაქვს ინგლისის რეგიონები - 8 რეგიონი ჯამში და ტვიტერიდან ჩამოვტვირთავ თითოეული რეგიონის მთავრი ქალაქების ტვიტებს და დავუშვებ, რომ ეს ქალაქები მოცემული რეგიონის რეპრეზენტატულია.

CovidUK <- read_csv("data/covid_cases.csv")
skim(CovidUK)

Data summary
Name	CovidUK
Number of rows	37486
Number of columns	11
_______________________
Column type frequency:
character	3
Date	1
numeric	7
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Area name	1	4	35	352
Area code	1	9	9	352
Area type	1	6	26	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
Specimen date	0	1	2020-01-30	2020-06-15	2020-04-22	127

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Daily lab-confirmed cases	4	1	16.29	110.76	0	2.0	4.0	11.0	4477.0	▇▁▁▁▁
Previously reported daily cases	37359	0	1238.53	1307.62	0	162.0	699.0	2242.0	4479.0	▇▂▂▂▁
Change in daily cases	37359	0	1.98	10.59	-5	0.0	0.0	1.0	90.0	▇▁▁▁▁
Cumulative lab-confirmed cases	0	1	916.27	6090.52	1	86.0	262.0	637.0	157545.0	▇▁▁▁▁
Previously reported cumulative cases	37359	0	75682.06	64549.41	2	1754.0	80693.0	143418.0	157293.0	▇▁▂▂▇
Change in cumulative cases	37359	0	4.13	33.59	-9	-5.0	0.0	0.0	252.0	▇▁▁▁▁
Cumulative lab-confirmed cases rate	0	1	160.30	122.09	0	51.5	148.9	242.8	844.5	▇▅▁▁▁

covidUK <- CovidUK %>% 
  select(`Area name`,`Area type`,`Specimen date`,`Cumulative lab-confirmed cases rate`) #მხოლოდ საჭირო ცვლადების არჩევა

#ვიზუალიზაცია ინგლისის რეგიონებში კოვიდის გავრცელების სანახავად
covidUK %>% 
  filter(`Area type` == "Region") %>% 
  mutate(label = if_else(`Specimen date` == max(`Specimen date`), as.character(`Area name`), NA_character_)) %>% 
  ggplot(aes(x = `Specimen date`, y = `Cumulative lab-confirmed cases rate`, colour = `Area name`)) +
  geom_line(size = 1) +
  geom_label_repel(aes(label = label), nudge_x = 2, na.rm = TRUE) +
  guides(colour = FALSE) +
  expand_limits(x = as_date("2020-06-30")) +
  theme_classic()

ტვიტერთან დასაკავშირებლად გვჭირდება API გასაები და access token.

Twitter Analysis

Irakli Kavtaradze

6/16/2020