COVID-19 Vaccine illustration.
Coronavirus Disease or known as COVID-19 is an infectious disease caused by a newly discovered coronavirus called SARS-CoV-2. It spreads primarily through droplets of saliva or nose discharge when an infected person coughs or sneezes. Most people infected by COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. But at the other hand, older people and those with underlying medical conditions are more likely to develop more serious complications.
COVID-19 vaccination is one of the ways to prevent people from developing the illness and its consequences. COVID-19 vaccines produce protection against the disease by developing an immune response to the virus. The first mass vaccination programme started in early December 2020. This project aims to see the progress of COVID-19 vaccination in the world up to March 30, 2021. The dataset that will be used in this project is obtained from Kaggle which provides the data of daily vaccination progress for each country.
There are several things that I want to explore through this dataset:
- The most used types of COVID-19 Vaccine in the world (to date)
- Top 10 Countries with the highest percentage of people fully vaccinated in the world
- Top 10 Countries with the highest number of daily vaccinations per million in average
- ASEAN Countries’ COVID-19 Daily Vaccination Progress
- Indonesia’s COVID-19 Vaccination Progress
Before exploring the data, load necessary libraries first.
library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(RColorBrewer) # for data visualization
library(lubridate) # for converting datetime data type
library(scales)
Read the dataset and let’s inspect the first rows.
<- read.csv("country_vaccinations.csv")
vaccine head(vaccine)
## country iso_code date total_vaccinations people_vaccinated
## 1 Afghanistan AFG 2021-02-22 0 0
## 2 Afghanistan AFG 2021-02-23 NA NA
## 3 Afghanistan AFG 2021-02-24 NA NA
## 4 Afghanistan AFG 2021-02-25 NA NA
## 5 Afghanistan AFG 2021-02-26 NA NA
## 6 Afghanistan AFG 2021-02-27 NA NA
## people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1 NA NA NA
## 2 NA NA 1367
## 3 NA NA 1367
## 4 NA NA 1367
## 5 NA NA 1367
## 6 NA NA 1367
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1 0 0
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1 NA NA
## 2 NA 35
## 3 NA 35
## 4 NA 35
## 5 NA 35
## 6 NA 35
## vaccines source_name
## 1 Oxford/AstraZeneca Government of Afghanistan
## 2 Oxford/AstraZeneca Government of Afghanistan
## 3 Oxford/AstraZeneca Government of Afghanistan
## 4 Oxford/AstraZeneca Government of Afghanistan
## 5 Oxford/AstraZeneca Government of Afghanistan
## 6 Oxford/AstraZeneca Government of Afghanistan
## source_website
## 1 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 2 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 3 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 4 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 5 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 6 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
After inspecting the data at a glance, let’s see data types and other dimensions thoroughly.
str(vaccine)
## 'data.frame': 9073 obs. of 15 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ iso_code : chr "AFG" "AFG" "AFG" "AFG" ...
## $ date : chr "2021-02-22" "2021-02-23" "2021-02-24" "2021-02-25" ...
## $ total_vaccinations : num 0 NA NA NA NA NA 8200 NA NA NA ...
## $ people_vaccinated : num 0 NA NA NA NA NA 8200 NA NA NA ...
## $ people_fully_vaccinated : num NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations_raw : num NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations : num NA 1367 1367 1367 1367 ...
## $ total_vaccinations_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ...
## $ people_vaccinated_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ...
## $ people_fully_vaccinated_per_hundred: num NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations_per_million : num NA 35 35 35 35 35 35 41 46 52 ...
## $ vaccines : chr "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" ...
## $ source_name : chr "Government of Afghanistan" "Government of Afghanistan" "Government of Afghanistan" "Government of Afghanistan" ...
## $ source_website : chr "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" ...
There are 9073 rows and 15 variables in this dataset.
Looks like there are missing values. Let’s check for the details.
colSums(is.na(vaccine))
## country iso_code
## 0 0
## date total_vaccinations
## 0 3576
## people_vaccinated people_fully_vaccinated
## 4150 5698
## daily_vaccinations_raw daily_vaccinations
## 4467 178
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## 3576 4150
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 5698 178
## vaccines source_name
## 0 0
## source_website
## 0
There are a lot of missing values. It can be explained by the absence of vaccinations that were carried out on that day. In order to be analyzed, the dataset must not contain any missing values, so convert any NA into 0.
is.na(vaccine)] <- 0 vaccine[
Let’s check if every NA has been succesfully converted.
colSums(is.na(vaccine))
## country iso_code
## 0 0
## date total_vaccinations
## 0 0
## people_vaccinated people_fully_vaccinated
## 0 0
## daily_vaccinations_raw daily_vaccinations
## 0 0
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## 0 0
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 0 0
## vaccines source_name
## 0 0
## source_website
## 0
There are several variables that are not necessary for this analysis, so it’s better to eliminate them.
c("iso_code", "daily_vaccinations_raw", "source_name", "source_website")] <- NULL vaccine[,
Several variables do not have the correct data type, so let’s convert it into the correct ones.
$country <- as.factor(vaccine$country)
vaccine$date <- ymd(vaccine$date) vaccine
To make sure the variable already has the correct data type, check for the data type one more time.
str(vaccine)
## 'data.frame': 9073 obs. of 11 variables:
## $ country : Factor w/ 161 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ date : Date, format: "2021-02-22" "2021-02-23" ...
## $ total_vaccinations : num 0 0 0 0 0 0 8200 0 0 0 ...
## $ people_vaccinated : num 0 0 0 0 0 0 8200 0 0 0 ...
## $ people_fully_vaccinated : num 0 0 0 0 0 0 0 0 0 0 ...
## $ daily_vaccinations : num 0 1367 1367 1367 1367 ...
## $ total_vaccinations_per_hundred : num 0 0 0 0 0 0 0.02 0 0 0 ...
## $ people_vaccinated_per_hundred : num 0 0 0 0 0 0 0.02 0 0 0 ...
## $ people_fully_vaccinated_per_hundred: num 0 0 0 0 0 0 0 0 0 0 ...
## $ daily_vaccinations_per_million : num 0 35 35 35 35 35 35 41 46 52 ...
## $ vaccines : chr "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" ...
summary(vaccine)
## country date total_vaccinations
## Canada : 107 Min. :2020-12-13 Min. : 0
## England : 107 1st Qu.:2021-01-31 1st Qu.: 0
## Northern Ireland: 107 Median :2021-02-23 Median : 22371
## Scotland : 107 Mean :2021-02-19 Mean : 1795212
## United Kingdom : 107 3rd Qu.:2021-03-12 3rd Qu.: 513178
## Wales : 107 Max. :2021-03-30 Max. :147602345
## (Other) :8431
## people_vaccinated people_fully_vaccinated daily_vaccinations
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 885
## Median : 6447 Median : 0 Median : 5443
## Mean : 1219135 Mean : 371630 Mean : 62906
## 3rd Qu.: 321698 3rd Qu.: 38647 3rd Qu.: 26285
## Max. :96044046 Max. :53423486 Max. :4549143
##
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 0.390 Median : 0.07
## Mean : 6.588 Mean : 4.41
## 3rd Qu.: 5.970 3rd Qu.: 3.89
## Max. :175.270 Max. :92.30
##
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## Min. : 0.000 Min. : 0
## 1st Qu.: 0.000 1st Qu.: 338
## Median : 0.000 Median : 1355
## Mean : 1.478 Mean : 2741
## 3rd Qu.: 0.760 3rd Qu.: 3368
## Max. :82.970 Max. :118759
##
## vaccines
## Length:9073
## Class :character
## Mode :character
##
##
##
##
From the summary, it can be seen that:
- The first vaccination were carried out on 13 December 2020.
- The highest total vaccinations is 147,602,345.
- The highest total of people vaccinated in the world is 96,044,046.
- The highest total of people fully vaccinated in the world is 53,423,486.
- The highest number of daily vaccinations is 4,549,143.
- The highest percentage of people vaccinated in a country is 92.30%.
- The highest percentage of people who received full dose of COVID-19 vaccine is 82.97%.
There are several types of vaccines that are used in the world.
length(unique(vaccine$vaccines))
## [1] 27
There are 27 combinations of vaccines used at the moment. Let’s see what those are:
unique(vaccine$vaccines)
## [1] "Oxford/AstraZeneca"
## [2] "Pfizer/BioNTech"
## [3] "Sputnik V"
## [4] "Oxford/AstraZeneca, Sinopharm/Beijing, Sputnik V"
## [5] "Oxford/AstraZeneca, Pfizer/BioNTech"
## [6] "Moderna, Oxford/AstraZeneca, Pfizer/BioNTech"
## [7] "Sinovac"
## [8] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sputnik V"
## [9] "Oxford/AstraZeneca, Sinovac"
## [10] "Sinopharm/Beijing"
## [11] "Pfizer/BioNTech, Sinovac"
## [12] "Sinopharm/Beijing, Sinopharm/Wuhan, Sinovac"
## [13] "Moderna, Pfizer/BioNTech"
## [14] "Moderna"
## [15] "Moderna, Oxford/AstraZeneca"
## [16] "Moderna, Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sputnik V"
## [17] "Covaxin, Oxford/AstraZeneca"
## [18] "Oxford/AstraZeneca, Sinopharm/Beijing"
## [19] "Pfizer/BioNTech, Sinopharm/Beijing"
## [20] "Sinopharm/Beijing, Sputnik V"
## [21] "Oxford/AstraZeneca, Pfizer/BioNTech, Sputnik V"
## [22] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac"
## [23] "EpiVacCorona, Sputnik V"
## [24] "Johnson&Johnson"
## [25] "Pfizer/BioNTech, Sputnik V"
## [26] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sinopharm/Wuhan, Sputnik V"
## [27] "Johnson&Johnson, Moderna, Pfizer/BioNTech"
It seems that several countries use more than 1 type of vaccine. Let’s break it down, but first we need to eliminate the duplicated data.
# Remove the duplicated data and extract only country and the vaccine types
<- vaccine %>% filter(!(duplicated(country))) %>%
vaccine_used select(country, vaccines)
# Breaking down the vaccine types and do data aggregation
<- strsplit(vaccine_used$vaccines, ", ", fixed = T)
vaccine_used <- as.data.frame(unlist(vaccine_used) %>% table())
vaccine_used names(vaccine_used) <- c("Vaccine", "Freq")
vaccine_used
## Vaccine Freq
## 1 Covaxin 1
## 2 EpiVacCorona 1
## 3 Johnson&Johnson 2
## 4 Moderna 35
## 5 Oxford/AstraZeneca 99
## 6 Pfizer/BioNTech 81
## 7 Sinopharm/Beijing 22
## 8 Sinopharm/Wuhan 2
## 9 Sinovac 14
## 10 Sputnik V 20
Let’s visualize the findings.
ggplot(vaccine_used, aes(Freq, reorder(Vaccine, Freq), fill = Vaccine)) +
geom_col() +
labs(title = "Type of COVID-19 Vaccines Used in the World",
subtitle = "Up to 30 March 2021",
caption = "Source: Kaggle.com",
y = NULL,
x = "Number of Countries") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
theme(legend.position = "null") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Interpretations: There are 10 types of COVID-19 Vaccines used in the world to date. Oxford/AstraZeneca is the most widely used type of COVID-19 Vaccine worldwide.
How about the percentage of people fully vaccinated? First, we need aggregate the data to find the percentage of fully vaccinated people for each country.
<- vaccine %>%
vaccine_country group_by(country) %>% summarise(percentage_of_people_fully_vaccinated = max(people_fully_vaccinated_per_hundred)) %>% arrange(-percentage_of_people_fully_vaccinated)
ggplot(vaccine_country[1:10, ], aes(percentage_of_people_fully_vaccinated/100, reorder(country, percentage_of_people_fully_vaccinated), fill = country)) +
geom_col() +
labs(title = "10 Countries with The Highest Percentage of People Fully Vaccinated",
subtitle = "Up to 30 March 2021",
x = NULL,
y = NULL,
caption = "Source: Kaggle.com") +
scale_fill_brewer(palette = "Set3") +
theme_minimal() +
theme(legend.position = "none") +
scale_x_continuous(label = percent) +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Interpretations: Gibraltar has the highest percentage of people fully vaccinated in the world. Fully-vaccinated refers to people who received 2 doses of COVID-19 vaccine.
%>%
vaccine group_by(country) %>%
summarise(avg_daily_per_mill = mean(daily_vaccinations_per_million)) %>%
arrange(-avg_daily_per_mill) %>% head(10) %>%
ggplot(aes(avg_daily_per_mill, reorder(country, avg_daily_per_mill))) +
geom_col(aes(fill = country), show.legend = F) +
labs(title = "10 Countries with The Highest Average of Daily Vaccinations (per million)",
subtitle = "Up to 30 March 2021",
y = NULL,
x = "Daily Vaccinations in Average (per million)",
caption = "Source: Kaggle.com") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Interpretations: Buthan has the highest average of daily COVID-19 vaccination worldwide.
##ASEAN Countries’ COVID-19 Vaccination Progress To display data from ASEAN countries, we need to subset the dataset first.
<- c("Brunei Darussalam", "Cambodia", "Indonesia", "Laos", "Malaysia", "Myanmar", "Philippines", "Singapore", "Thailand", "Vietnam")
asean <- vaccine %>% filter(country %in% asean) vaccine_asean
%>% group_by(country) %>%
vaccine_asean summarise(max = max(people_fully_vaccinated_per_hundred)) %>%
ggplot(aes(reorder(country, max), max/100)) +
geom_col(aes(fill = country)) +
labs(title = "Percentage of Fully COVID-19 Vaccinated People in ASEAN Countries",
subtitle = "Up to 30 March 2021",
x = NULL,
y = "Fully COVID-19 Vaccinated People",
caption = "Source: Kaggle.com") +
scale_y_continuous(label = percent) +
scale_fill_brewer(palette = "Set3") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5), legend.position = "null")
Interpretations: Singapore has the highest percentage of fully COVID-19 vaccinated people among ASEAN Countries. And to date, Singapore and Indonesia are the only countries in ASEAN whose residents have already started to receive full dose of COVID-19 vaccine.
ggplot(vaccine_asean, aes(date, daily_vaccinations_per_million, color = country)) +
geom_line() +
labs(title = "Daily Vaccinations per Million in ASEAN Countries",
subtitle = "Up to 30 March 2021",
x = "Date",
y = "Number of daily vaccinations (per million)",
caption = "Source: Kaggle.com") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)) +
scale_color_discrete("Country")
Interpretations: Singapore is the first ASEAN country to carry out COVID-19 vaccine and has the highest number of daily vaccinations per million in ASEAN. The graph also shows that some countries started carrying out vaccinations later than others and do not update their vaccination progress regularly.
<- vaccine[vaccine$country == "Indonesia", ]
vaccine_indo $month <- month(vaccine_indo$date, label = T, abbr = T)
vaccine_indo
ggplot(vaccine_indo, aes(date, daily_vaccinations, color = month)) +
geom_line() +
labs(title = "Daily Vaccinations in Indonesia",
subtitle = "Up to 28 March 2021",
x = "Date",
y = "Number of daily vaccinations",
col = "Month",
caption = "Source: Kaggle.com") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Interpretations: There is an upward trend of number of vaccinations daily and every month, but the trend seemed to fluctuate at the end of March.
$dayofweek <- wday(vaccine$date, label = T, abbr = T, week_start = 1)
vaccine<- vaccine %>% filter(country == "Indonesia")
vaccine_indo <- vaccine_indo %>% group_by(dayofweek) %>% summarise(mean_daily_vacc = mean(daily_vaccinations_per_million))
vaccine_indo_week_mean
ggplot(vaccine_indo_week_mean, aes(dayofweek, mean_daily_vacc, fill = dayofweek)) +
geom_col() +
labs(title = "Average Number of Daily COVID-19 Vaccinations per Million in Indonesia",
subtitle = "Up to 28 March 2021",
x = "Day of week",
y = NULL,
caption = "Source: Kaggle.com") +
scale_fill_brewer(palette = "Set3") +
theme_classic() +
theme(legend.position = "none", plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Interpretations: In average, Indonesia carries out most COVID-19 vaccinations in Sunday and least in Monday.