1 Introduction

1.1 Data Explanation

COVID-19 Vaccine illustration.

Coronavirus Disease or known as COVID-19 is an infectious disease caused by a newly discovered coronavirus called SARS-CoV-2. It spreads primarily through droplets of saliva or nose discharge when an infected person coughs or sneezes. Most people infected by COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. But at the other hand, older people and those with underlying medical conditions are more likely to develop more serious complications.

COVID-19 vaccination is one of the ways to prevent people from developing the illness and its consequences. COVID-19 vaccines produce protection against the disease by developing an immune response to the virus. The first mass vaccination programme started in early December 2020. This project aims to see the progress of COVID-19 vaccination in the world up to March 30, 2021. The dataset that will be used in this project is obtained from Kaggle which provides the data of daily vaccination progress for each country.

1.2 Questions

There are several things that I want to explore through this dataset:
- The most used types of COVID-19 Vaccine in the world (to date)
- Top 10 Countries with the highest percentage of people fully vaccinated in the world
- Top 10 Countries with the highest number of daily vaccinations per million in average
- ASEAN Countries’ COVID-19 Daily Vaccination Progress
- Indonesia’s COVID-19 Vaccination Progress


2 Preparation

Before exploring the data, load necessary libraries first.

library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(RColorBrewer) # for data visualization
library(lubridate) # for converting datetime data type
library(scales)

3 Data Wrangling

3.1 Data Importing and Explanation

Read the dataset and let’s inspect the first rows.

vaccine <- read.csv("country_vaccinations.csv")
head(vaccine)
##       country iso_code       date total_vaccinations people_vaccinated
## 1 Afghanistan      AFG 2021-02-22                  0                 0
## 2 Afghanistan      AFG 2021-02-23                 NA                NA
## 3 Afghanistan      AFG 2021-02-24                 NA                NA
## 4 Afghanistan      AFG 2021-02-25                 NA                NA
## 5 Afghanistan      AFG 2021-02-26                 NA                NA
## 6 Afghanistan      AFG 2021-02-27                 NA                NA
##   people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1                      NA                     NA                 NA
## 2                      NA                     NA               1367
## 3                      NA                     NA               1367
## 4                      NA                     NA               1367
## 5                      NA                     NA               1367
## 6                      NA                     NA               1367
##   total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1                              0                             0
## 2                             NA                            NA
## 3                             NA                            NA
## 4                             NA                            NA
## 5                             NA                            NA
## 6                             NA                            NA
##   people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1                                  NA                             NA
## 2                                  NA                             35
## 3                                  NA                             35
## 4                                  NA                             35
## 5                                  NA                             35
## 6                                  NA                             35
##             vaccines               source_name
## 1 Oxford/AstraZeneca Government of Afghanistan
## 2 Oxford/AstraZeneca Government of Afghanistan
## 3 Oxford/AstraZeneca Government of Afghanistan
## 4 Oxford/AstraZeneca Government of Afghanistan
## 5 Oxford/AstraZeneca Government of Afghanistan
## 6 Oxford/AstraZeneca Government of Afghanistan
##                                                            source_website
## 1 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 2 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 3 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 4 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 5 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm
## 6 http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm

3.2 Data Structure

After inspecting the data at a glance, let’s see data types and other dimensions thoroughly.

str(vaccine)
## 'data.frame':    9073 obs. of  15 variables:
##  $ country                            : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ iso_code                           : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ date                               : chr  "2021-02-22" "2021-02-23" "2021-02-24" "2021-02-25" ...
##  $ total_vaccinations                 : num  0 NA NA NA NA NA 8200 NA NA NA ...
##  $ people_vaccinated                  : num  0 NA NA NA NA NA 8200 NA NA NA ...
##  $ people_fully_vaccinated            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations_raw             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations                 : num  NA 1367 1367 1367 1367 ...
##  $ total_vaccinations_per_hundred     : num  0 NA NA NA NA NA 0.02 NA NA NA ...
##  $ people_vaccinated_per_hundred      : num  0 NA NA NA NA NA 0.02 NA NA NA ...
##  $ people_fully_vaccinated_per_hundred: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations_per_million     : num  NA 35 35 35 35 35 35 41 46 52 ...
##  $ vaccines                           : chr  "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" ...
##  $ source_name                        : chr  "Government of Afghanistan" "Government of Afghanistan" "Government of Afghanistan" "Government of Afghanistan" ...
##  $ source_website                     : chr  "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" "http://www.xinhuanet.com/english/asiapacific/2021-03/16/c_139814668.htm" ...

There are 9073 rows and 15 variables in this dataset.

3.3 Missing values

Looks like there are missing values. Let’s check for the details.

colSums(is.na(vaccine))
##                             country                            iso_code 
##                                   0                                   0 
##                                date                  total_vaccinations 
##                                   0                                3576 
##                   people_vaccinated             people_fully_vaccinated 
##                                4150                                5698 
##              daily_vaccinations_raw                  daily_vaccinations 
##                                4467                                 178 
##      total_vaccinations_per_hundred       people_vaccinated_per_hundred 
##                                3576                                4150 
## people_fully_vaccinated_per_hundred      daily_vaccinations_per_million 
##                                5698                                 178 
##                            vaccines                         source_name 
##                                   0                                   0 
##                      source_website 
##                                   0

There are a lot of missing values. It can be explained by the absence of vaccinations that were carried out on that day. In order to be analyzed, the dataset must not contain any missing values, so convert any NA into 0.

vaccine[is.na(vaccine)] <- 0

Let’s check if every NA has been succesfully converted.

colSums(is.na(vaccine))
##                             country                            iso_code 
##                                   0                                   0 
##                                date                  total_vaccinations 
##                                   0                                   0 
##                   people_vaccinated             people_fully_vaccinated 
##                                   0                                   0 
##              daily_vaccinations_raw                  daily_vaccinations 
##                                   0                                   0 
##      total_vaccinations_per_hundred       people_vaccinated_per_hundred 
##                                   0                                   0 
## people_fully_vaccinated_per_hundred      daily_vaccinations_per_million 
##                                   0                                   0 
##                            vaccines                         source_name 
##                                   0                                   0 
##                      source_website 
##                                   0

3.4 Removing variables

There are several variables that are not necessary for this analysis, so it’s better to eliminate them.

vaccine[, c("iso_code", "daily_vaccinations_raw", "source_name", "source_website")] <- NULL

3.5 Converting Data Type

Several variables do not have the correct data type, so let’s convert it into the correct ones.

vaccine$country <- as.factor(vaccine$country)
vaccine$date <- ymd(vaccine$date)

To make sure the variable already has the correct data type, check for the data type one more time.

str(vaccine)
## 'data.frame':    9073 obs. of  11 variables:
##  $ country                            : Factor w/ 161 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ date                               : Date, format: "2021-02-22" "2021-02-23" ...
##  $ total_vaccinations                 : num  0 0 0 0 0 0 8200 0 0 0 ...
##  $ people_vaccinated                  : num  0 0 0 0 0 0 8200 0 0 0 ...
##  $ people_fully_vaccinated            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ daily_vaccinations                 : num  0 1367 1367 1367 1367 ...
##  $ total_vaccinations_per_hundred     : num  0 0 0 0 0 0 0.02 0 0 0 ...
##  $ people_vaccinated_per_hundred      : num  0 0 0 0 0 0 0.02 0 0 0 ...
##  $ people_fully_vaccinated_per_hundred: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ daily_vaccinations_per_million     : num  0 35 35 35 35 35 35 41 46 52 ...
##  $ vaccines                           : chr  "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" "Oxford/AstraZeneca" ...

3.6 Data Summary

summary(vaccine)
##              country          date            total_vaccinations 
##  Canada          : 107   Min.   :2020-12-13   Min.   :        0  
##  England         : 107   1st Qu.:2021-01-31   1st Qu.:        0  
##  Northern Ireland: 107   Median :2021-02-23   Median :    22371  
##  Scotland        : 107   Mean   :2021-02-19   Mean   :  1795212  
##  United Kingdom  : 107   3rd Qu.:2021-03-12   3rd Qu.:   513178  
##  Wales           : 107   Max.   :2021-03-30   Max.   :147602345  
##  (Other)         :8431                                           
##  people_vaccinated  people_fully_vaccinated daily_vaccinations
##  Min.   :       0   Min.   :       0        Min.   :      0   
##  1st Qu.:       0   1st Qu.:       0        1st Qu.:    885   
##  Median :    6447   Median :       0        Median :   5443   
##  Mean   : 1219135   Mean   :  371630        Mean   :  62906   
##  3rd Qu.:  321698   3rd Qu.:   38647        3rd Qu.:  26285   
##  Max.   :96044046   Max.   :53423486        Max.   :4549143   
##                                                               
##  total_vaccinations_per_hundred people_vaccinated_per_hundred
##  Min.   :  0.000                Min.   : 0.00                
##  1st Qu.:  0.000                1st Qu.: 0.00                
##  Median :  0.390                Median : 0.07                
##  Mean   :  6.588                Mean   : 4.41                
##  3rd Qu.:  5.970                3rd Qu.: 3.89                
##  Max.   :175.270                Max.   :92.30                
##                                                              
##  people_fully_vaccinated_per_hundred daily_vaccinations_per_million
##  Min.   : 0.000                      Min.   :     0                
##  1st Qu.: 0.000                      1st Qu.:   338                
##  Median : 0.000                      Median :  1355                
##  Mean   : 1.478                      Mean   :  2741                
##  3rd Qu.: 0.760                      3rd Qu.:  3368                
##  Max.   :82.970                      Max.   :118759                
##                                                                    
##    vaccines        
##  Length:9073       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

From the summary, it can be seen that:
- The first vaccination were carried out on 13 December 2020.
- The highest total vaccinations is 147,602,345.
- The highest total of people vaccinated in the world is 96,044,046.
- The highest total of people fully vaccinated in the world is 53,423,486.
- The highest number of daily vaccinations is 4,549,143.
- The highest percentage of people vaccinated in a country is 92.30%.
- The highest percentage of people who received full dose of COVID-19 vaccine is 82.97%.

4 Data Manipulation and Exploration

4.1 Type of vaccines

There are several types of vaccines that are used in the world.

length(unique(vaccine$vaccines))
## [1] 27

There are 27 combinations of vaccines used at the moment. Let’s see what those are:

unique(vaccine$vaccines)
##  [1] "Oxford/AstraZeneca"                                                                
##  [2] "Pfizer/BioNTech"                                                                   
##  [3] "Sputnik V"                                                                         
##  [4] "Oxford/AstraZeneca, Sinopharm/Beijing, Sputnik V"                                  
##  [5] "Oxford/AstraZeneca, Pfizer/BioNTech"                                               
##  [6] "Moderna, Oxford/AstraZeneca, Pfizer/BioNTech"                                      
##  [7] "Sinovac"                                                                           
##  [8] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sputnik V"                 
##  [9] "Oxford/AstraZeneca, Sinovac"                                                       
## [10] "Sinopharm/Beijing"                                                                 
## [11] "Pfizer/BioNTech, Sinovac"                                                          
## [12] "Sinopharm/Beijing, Sinopharm/Wuhan, Sinovac"                                       
## [13] "Moderna, Pfizer/BioNTech"                                                          
## [14] "Moderna"                                                                           
## [15] "Moderna, Oxford/AstraZeneca"                                                       
## [16] "Moderna, Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sputnik V"        
## [17] "Covaxin, Oxford/AstraZeneca"                                                       
## [18] "Oxford/AstraZeneca, Sinopharm/Beijing"                                             
## [19] "Pfizer/BioNTech, Sinopharm/Beijing"                                                
## [20] "Sinopharm/Beijing, Sputnik V"                                                      
## [21] "Oxford/AstraZeneca, Pfizer/BioNTech, Sputnik V"                                    
## [22] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac"                                      
## [23] "EpiVacCorona, Sputnik V"                                                           
## [24] "Johnson&Johnson"                                                                   
## [25] "Pfizer/BioNTech, Sputnik V"                                                        
## [26] "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sinopharm/Wuhan, Sputnik V"
## [27] "Johnson&Johnson, Moderna, Pfizer/BioNTech"

It seems that several countries use more than 1 type of vaccine. Let’s break it down, but first we need to eliminate the duplicated data.

# Remove the duplicated data and extract only country and the vaccine types
vaccine_used <- vaccine %>% filter(!(duplicated(country))) %>%
  select(country, vaccines)

# Breaking down the vaccine types and do data aggregation
vaccine_used <- strsplit(vaccine_used$vaccines, ", ", fixed = T)
vaccine_used <- as.data.frame(unlist(vaccine_used) %>% table())
names(vaccine_used) <- c("Vaccine", "Freq")
vaccine_used
##               Vaccine Freq
## 1             Covaxin    1
## 2        EpiVacCorona    1
## 3     Johnson&Johnson    2
## 4             Moderna   35
## 5  Oxford/AstraZeneca   99
## 6     Pfizer/BioNTech   81
## 7   Sinopharm/Beijing   22
## 8     Sinopharm/Wuhan    2
## 9             Sinovac   14
## 10          Sputnik V   20

Let’s visualize the findings.

ggplot(vaccine_used, aes(Freq, reorder(Vaccine, Freq), fill = Vaccine)) +
  geom_col() +
  labs(title = "Type of COVID-19 Vaccines Used in the World",
       subtitle = "Up to 30 March 2021",
       caption = "Source: Kaggle.com",
       y = NULL,
       x = "Number of Countries") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") +
  theme(legend.position = "null")  +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Interpretations: There are 10 types of COVID-19 Vaccines used in the world to date. Oxford/AstraZeneca is the most widely used type of COVID-19 Vaccine worldwide.

4.2 Countries with the highest percentage of people fully vaccinated

How about the percentage of people fully vaccinated? First, we need aggregate the data to find the percentage of fully vaccinated people for each country.

vaccine_country <- vaccine %>% 
  group_by(country) %>% summarise(percentage_of_people_fully_vaccinated = max(people_fully_vaccinated_per_hundred)) %>% arrange(-percentage_of_people_fully_vaccinated)
ggplot(vaccine_country[1:10, ], aes(percentage_of_people_fully_vaccinated/100, reorder(country, percentage_of_people_fully_vaccinated), fill = country)) +
  geom_col() + 
  labs(title = "10 Countries with The Highest Percentage of People Fully Vaccinated",
       subtitle = "Up to 30 March 2021",
       x = NULL,
       y = NULL,
       caption = "Source: Kaggle.com") +
  scale_fill_brewer(palette = "Set3") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_x_continuous(label = percent)  +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Interpretations: Gibraltar has the highest percentage of people fully vaccinated in the world. Fully-vaccinated refers to people who received 2 doses of COVID-19 vaccine.

4.3 Country with the highest average of daily vaccinations (per million)

vaccine %>%
  group_by(country) %>%
  summarise(avg_daily_per_mill = mean(daily_vaccinations_per_million)) %>%
  arrange(-avg_daily_per_mill) %>% head(10) %>%
  ggplot(aes(avg_daily_per_mill, reorder(country, avg_daily_per_mill))) + 
  geom_col(aes(fill = country), show.legend = F) +
  labs(title = "10 Countries with The Highest Average of Daily Vaccinations (per million)",
       subtitle = "Up to 30 March 2021",
       y = NULL,
       x = "Daily Vaccinations in Average (per million)",
       caption = "Source: Kaggle.com") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Interpretations: Buthan has the highest average of daily COVID-19 vaccination worldwide.

##ASEAN Countries’ COVID-19 Vaccination Progress To display data from ASEAN countries, we need to subset the dataset first.

asean <- c("Brunei Darussalam", "Cambodia", "Indonesia", "Laos", "Malaysia", "Myanmar", "Philippines", "Singapore", "Thailand", "Vietnam")
vaccine_asean <- vaccine %>% filter(country %in% asean)
vaccine_asean %>% group_by(country) %>%
  summarise(max = max(people_fully_vaccinated_per_hundred)) %>%
  ggplot(aes(reorder(country, max), max/100)) +
  geom_col(aes(fill = country)) +
  labs(title = "Percentage of Fully COVID-19 Vaccinated People in ASEAN Countries",
       subtitle = "Up to 30 March 2021",
       x = NULL,
       y = "Fully COVID-19 Vaccinated People",
       caption = "Source: Kaggle.com") +
  scale_y_continuous(label = percent) +
  scale_fill_brewer(palette = "Set3") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5), legend.position = "null")

Interpretations: Singapore has the highest percentage of fully COVID-19 vaccinated people among ASEAN Countries. And to date, Singapore and Indonesia are the only countries in ASEAN whose residents have already started to receive full dose of COVID-19 vaccine.

4.4 Daily Vaccinated People in ASEAN Countries (per million)

ggplot(vaccine_asean, aes(date, daily_vaccinations_per_million, color = country)) +
  geom_line() +
  labs(title = "Daily Vaccinations per Million in ASEAN Countries",
       subtitle = "Up to 30 March 2021",
       x = "Date",
       y = "Number of daily vaccinations (per million)",
       caption = "Source: Kaggle.com") +
  theme_minimal()  +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5)) +
  scale_color_discrete("Country")

Interpretations: Singapore is the first ASEAN country to carry out COVID-19 vaccine and has the highest number of daily vaccinations per million in ASEAN. The graph also shows that some countries started carrying out vaccinations later than others and do not update their vaccination progress regularly.

4.5 Number of Daily Vaccinations in Indonesia

vaccine_indo <- vaccine[vaccine$country == "Indonesia", ]
vaccine_indo$month <- month(vaccine_indo$date, label = T, abbr = T)

ggplot(vaccine_indo, aes(date, daily_vaccinations, color = month)) +
  geom_line() +
  labs(title = "Daily Vaccinations in Indonesia",
       subtitle = "Up to 28 March 2021",
       x = "Date",
       y = "Number of daily vaccinations",
       col = "Month",
       caption = "Source: Kaggle.com") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Interpretations: There is an upward trend of number of vaccinations daily and every month, but the trend seemed to fluctuate at the end of March.

4.6 Average of Daily Vaccination carried out in Indonesia

vaccine$dayofweek <- wday(vaccine$date, label = T, abbr = T, week_start = 1)
vaccine_indo <- vaccine %>% filter(country == "Indonesia") 
vaccine_indo_week_mean <- vaccine_indo %>% group_by(dayofweek) %>% summarise(mean_daily_vacc = mean(daily_vaccinations_per_million))

ggplot(vaccine_indo_week_mean, aes(dayofweek, mean_daily_vacc, fill = dayofweek)) +
  geom_col() +
  labs(title = "Average Number of Daily COVID-19 Vaccinations per Million in Indonesia",
       subtitle = "Up to 28 March 2021",
       x = "Day of week",
       y = NULL,
       caption = "Source: Kaggle.com") +
  scale_fill_brewer(palette = "Set3") +
  theme_classic() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Interpretations: In average, Indonesia carries out most COVID-19 vaccinations in Sunday and least in Monday.

5 Conclusions

  • At the moment, there are 10 types of COVID-19 Vaccines used worldwide. Oxford/AstraZeneca, Pfizer/BioNTech, Moderna are 3 most widely used COVID-19 vaccines in the world to date.
  • Gibraltar has the highest percentage of fully-vaccinated people in the world, with majority of its residents already received full dose of COVID-19 Vaccine by the end of March 2021. Gibraltar is also the second country after Buthan with the highest average of daily vaccination worldwide.
  • Among ASEAN countries, Singapore is the first country to carry out COVID-19 vaccination. To date, Singapore has the best progress of COVID-19 vaccination progress in terms of number of daily vaccination and the highest percentage of fully-vaccinated people in South-East Asia as well.
  • Singapore and Indonesia are the only countries in South-East Asia which residents already have started receiving the second dose of COVID-19 vaccine.
  • Indonesia is on the second place in terms of number of daily vaccinations (per million) among other ASEAN Countries. There is an upward trend of number of vaccinations daily and every month, but the trend seemed to fluctuate at the end of March.
  • The highest average of daily vaccinations carried out in Indonesia is on Sunday, while the lowest is on Monday.