During the COVID-19 pandemic, it became increasingly clear that Sweden has a data problem, at least coming from the health agency, Folkhälsomyndigheten.
To address one of the issues, this project is keeping a daily update of the vaccination rates in Sweden.
Data comes from Folkhälsomyndigheten.
This project uses tidyverse (for data manipulation), rvest (for webscraping) and ggrepell (for advance graphics manipulation).
library(tidyverse)
library(rvest)
library(ggrepel)
library(scales)
library(ggthemr)
library(urbanthemes)
This piece of code first of all goes to the URL of FHM and then grabs the second table on the page, which is the one that gives us the most information. Then it renames the columns from Swedish into English.
fhm <- read_html("https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/vaccination-mot-covid-19/statistik/statistik-over-registrerade-vaccinationer-covid-19/")
getTab <- fhm %>%
html_table(header = TRUE)
weeklyVacTab <- getTab[[2]]
#Tidy Names
colnames(weeklyVacTab) = c("date", "total_first", "percent_first", "total_second", "percent_second")
The data in the table is not immediately useable because it uses odd separators. Therefore, this part of the code goes through and removes erroneous spaces and other separators so that the data can be used. It also converts the table data from strings to dates and numbers.
After cleaning, I also add two new columns that calculate the number/percent of those that have received only one dose. That is because the first dose number includes thoses that have received a second dose too. It is unclear when the J&J vaccine begins usage whether FHM will include a count in both first and second dose.
After the new variable is calculated, one final clean is performed. This essentially rotates the table so that each of the doses becomes a factor and not a separate column. You can see the differences below.
weeklyVacTab$date <- as.Date(weeklyVacTab$date)
weeklyVacTab$total_first <- as.numeric(gsub(" ", "", weeklyVacTab$total_first))
weeklyVacTab$percent_first <- as.numeric(gsub(",", ".", weeklyVacTab$percent_first))
weeklyVacTab$total_second <- gsub(" ", "", weeklyVacTab$total_second)
#This was added due to poor(er) data entry on 31/03/2021 which included a new separator.
weeklyVacTab$total_second <- gsub(" ", "", weeklyVacTab$total_second)
weeklyVacTab$total_second <- as.numeric(weeklyVacTab$total_second)
weeklyVacTab$percent_second <- as.numeric((gsub(",", ".", weeklyVacTab$percent_second)))
#Add Only 1 Dose
weeklyVacTab <- weeklyVacTab %>%
mutate(total_firstOnly = total_first - total_second) %>%
mutate(percent_firstOnly = percent_first - percent_second)
vacUpdateT <- weeklyVacTab %>%
pivot_longer(cols = c("total_first", "total_second", "percent_first", "percent_second", "total_firstOnly", "percent_firstOnly"),
names_to = c("type", "dose"), names_pattern = "(.*)_(.*)", values_to = "number") %>%
pivot_wider(names_from = type, values_from = number)
Now I load in the old data, join the new data, and then save over the old data.
#Load and Rename
vacOld <- readRDS(file="vacTotal.Rda")
#Join
vacUpdateN <- bind_rows(vacOld, vacUpdateT) %>%
arrange(date)
saveRDS(vacUpdateN, file="vacTotal.Rda")
Now to graph! First of all, I create a new data frame that collects the numbers from the most recent date so they can be displayed. Then, ggplot works its magic.