Introduction

For my Final Project I decided to analyze Covid-19 Vaccination topic as one of the most discussed and hot topic at the moment. I will try to respond the questions below:

In which country the vaccination program is more advanced?

Where are vaccinated more people per day? But in terms of percent from entire population ?

What vaccine or combination of vaccines are mostly used?

I also want to review Vaccination in USA accross the states and respond to the following questions:

What is the most/least vaccinated state?

What states have the highest number of people vaccinated per hundred?

Data Gathering

Datasets I want to analyze are collected daily from Our World in Data and the links to the data source are below:

https://www.kaggle.com/gpreda/covid-world-vaccination-progress?select=country_vaccinations.csv https://www.kaggle.com/paultimothymooney/usa-covid19-vaccinations/code

Exploratory Analysis

Country Vaccinations

dim(datasetx)
## [1] 17607    15
str(datasetx)
## 'data.frame':    17607 obs. of  15 variables:
##  $ country                            : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ iso_code                           : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ date                               : chr  "2021-02-22" "2021-02-23" "2021-02-24" "2021-02-25" ...
##  $ total_vaccinations                 : int  0 NA NA NA NA NA 8200 NA NA NA ...
##  $ people_vaccinated                  : int  0 NA NA NA NA NA 8200 NA NA NA ...
##  $ people_fully_vaccinated            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations_raw             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations                 : int  NA 1367 1367 1367 1367 1367 1367 1580 1794 2008 ...
##  $ total_vaccinations_per_hundred     : num  0 NA NA NA NA NA 0.02 NA NA NA ...
##  $ people_vaccinated_per_hundred      : num  0 NA NA NA NA NA 0.02 NA NA NA ...
##  $ people_fully_vaccinated_per_hundred: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ daily_vaccinations_per_million     : int  NA 35 35 35 35 35 35 41 46 52 ...
##  $ vaccines                           : chr  "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" ...
##  $ source_name                        : chr  "World Health Organization" "World Health Organization" "World Health Organization" "World Health Organization" ...
##  $ source_website                     : chr  "https://covid19.who.int/" "https://covid19.who.int/" "https://covid19.who.int/" "https://covid19.who.int/" ...
## Total number of countries present 
count(distinct(datasetx, country))
##     n
## 1 211
## [1] "2020-12-02"
## [1] "2021-05-13"

USA Vaccination

dim (usadf)
## [1] 7433   14
str(usadf)
## 'data.frame':    7433 obs. of  14 variables:
##  $ date                               : chr  "2021-01-12" "2021-01-13" "2021-01-14" "2021-01-15" ...
##  $ location                           : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ total_vaccinations                 : num  78134 84040 92300 100567 NA ...
##  $ total_distributed                  : num  377025 378975 435350 444650 NA ...
##  $ people_vaccinated                  : num  70861 74792 80480 86956 NA ...
##  $ people_fully_vaccinated_per_hundred: num  0.15 0.19 NA 0.28 NA NA NA 0.33 0.37 0.44 ...
##  $ total_vaccinations_per_hundred     : num  1.59 1.71 1.88 2.05 NA NA NA 2.67 2.84 3.38 ...
##  $ people_fully_vaccinated            : num  7270 9245 NA 13488 NA ...
##  $ people_vaccinated_per_hundred      : num  1.45 1.53 1.64 1.77 NA NA NA 2.33 2.47 2.95 ...
##  $ distributed_per_hundred            : num  7.69 7.73 8.88 9.07 NA ...
##  $ daily_vaccinations_raw             : num  NA 5906 8260 8267 7557 ...
##  $ daily_vaccinations                 : num  NA 5906 7083 7478 7498 ...
##  $ daily_vaccinations_per_million     : num  NA 1205 1445 1525 1529 ...
##  $ share_doses_used                   : num  0.207 0.222 0.212 0.226 NA NA NA 0.294 0.288 0.336 ...
## [1] "2020-12-20"
## [1] "2021-05-05"

Data Preparation and Cleaning

## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

Country Vaccination

## Dropping unnecessary columns: 'source_name', 'source_website'
datasetx <- datasetx[ -c(14,15) ]
## Rename vaccines to vaccine_name
datasetx<-datasetx %>% 
  rename(vaccine_name=vaccines)

USA Vaccination

## Total number of countries present 
count(distinct(usadf, location))
##    n
## 1 65
## Rid of some "fake" states in USA dataset
usa1<- usadf%>% filter(location != 'Federated States of Micronesia', location != 'Guam',location != 'Federated States of Micronesia', location != 'Long Term Care',location != 'Marshall Islands', location != 'Dept of Defense', location != 'Indian Health Svc', location != 'Northern Mariana Islands',location != 'Republic of Palau', location != 'United States',location != 'Veterans Health', location != 'American Samoa', location != 'Northern Mariana Islands',location != 'Republic of Palau', location != 'Bureau of Prisons', location != 'District of Columbia', location != 'Virgin Islands')
## Rename vaccines to vaccine_name
usa<-usa1 %>% 
  rename(state=location)

Total vaccinations

Total number of vaccinations - this is the absolute number of total immunizations in the country;

## Remove N/a from 'total_vaccinations'
total_vaccination <- datasetx%>% group_by(country)%>% filter(total_vaccinations != 'NA', total_vaccinations != '0')
## Add Vaccination Proportion
total_vaccinations<- total_vaccination %>%
    group_by(date) %>%
    summarise(avg_vaccines = mean(total_vaccinations), 
              min_vaccines = min(total_vaccinations),
              max_vaccines = max(total_vaccinations),
              total_vaccines= sum(total_vaccinations))%>%
         mutate(total_vaccines_prop = prop.table(total_vaccines))
kable(head(total_vaccinations))%>% 
  kable_styling("striped","hold_position", full_width = F)%>% kable_styling(bootstrap_options = "striped", font_size = 11)
date avg_vaccines min_vaccines max_vaccines total_vaccines total_vaccines_prop
2020-12-02 1 1 1 1 0
2020-12-03 1 1 1 1 0
2020-12-04 2 2 2 2 0
2020-12-05 2 2 2 2 0
2020-12-06 2 2 2 2 0
2020-12-07 4 4 4 4 0

## Remove N/a from 'daily_vaccinations_per_million'
daily_vaccinations <- datasetx%>% group_by(country)%>% filter(daily_vaccinations_per_million != 'NA', daily_vaccinations_per_million != '0')
## Add daily_vaccination Proportion
daily_vaccination<- daily_vaccinations %>%
    group_by(date) %>%
    summarise(avg_vaccines = mean(daily_vaccinations_per_million), 
              min_vaccines = min(daily_vaccinations_per_million),
              max_vaccines = max(daily_vaccinations_per_million),
              total_vaccines= sum(daily_vaccinations_per_million))%>%
         mutate(total_vaccines_prop = prop.table(total_vaccines))

kable(head(daily_vaccination,5))  %>%
 kable_styling(bootstrap_options = "striped", font_size = 11)
date avg_vaccines min_vaccines max_vaccines total_vaccines total_vaccines_prop
2020-12-15 19.00000 19 19 19 4.0e-07
2020-12-16 64.33333 23 130 193 3.7e-06
2020-12-17 72.33333 23 130 217 4.1e-06
2020-12-18 75.66667 23 130 227 4.3e-06
2020-12-19 72.00000 23 130 216 4.1e-06

Total Vaccinations by Country

Total number of vaccinations - this is the absolute number of total immunizations in the country; For the visualization below I took the latest updated information of total immunizations in the country. The first graph represents countries with the largest number of immunizations and the second graph represents countries with the smallest number.

Total Vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country;

People vaccinations

Total number of people fully vaccinated per hundred - ratio (in percent) between population fully immunized and total population up to the date in the country;

Information for China and United Arab Emirates for fully vaccinated people is missing from the data table.

people_fully_vaccinated_per_hundred <- top_countries%>% group_by(country)%>% filter(people_fully_vaccinated_per_hundred != 'NA', people_fully_vaccinated_per_hundred != '0')

Daily Vaccinations by Country

Daily vaccinations - for a certain data entry, the number of vaccination for that date/country; For further analysis I chose countries with the highest immunizations number

b1 <- datasetx %>%
  filter(country %in% c("Germany", "United Kingdom", "United States", "India", "France", "China", "England", "Brazil", "Turkey","France","Russia","Indenesia","Italy","Israel"))
b<- na.omit(b1)
kable(head(b,5)) %>%
   kable_styling(bootstrap_options = "striped",full_width = F, font_size = 11)
country iso_code date total_vaccinations people_vaccinated people_fully_vaccinated daily_vaccinations_raw daily_vaccinations total_vaccinations_per_hundred people_vaccinated_per_hundred people_fully_vaccinated_per_hundred daily_vaccinations_per_million vaccine_name day month year
22 Brazil BRA 2021-02-06 3401383 3399421 1962 326477 199739 1.60 1.60 0.00 940 Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac 6 2 2021
23 Brazil BRA 2021-02-07 3553681 3534004 19677 152298 211375 1.67 1.66 0.01 994 Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac 7 2 2021
24 Brazil BRA 2021-02-08 3605538 3579850 25688 51857 211604 1.70 1.68 0.01 996 Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac 8 2 2021
25 Brazil BRA 2021-02-09 3820207 3786591 33616 214669 218237 1.80 1.78 0.02 1027 Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac 9 2 2021
26 Brazil BRA 2021-02-10 4120332 4069677 50655 300125 228375 1.94 1.91 0.02 1074 Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac 10 2 2021

From the graphs below we can see that USA and India are leading countries based on number of total daily immunizations

## Picking joint bandwidth of 88200

Daily vaccinations per million - ratio (in ppm) between vaccination number and total population for the current date in the country; From the chart below we can tell that Israel provides the highest number of daily vaccinations per million.

## Picking joint bandwidth of 646

USA

As vaccines distributed per hundred are rising share doses of vaccines is rising as well.

um<- usa %>%filter(daily_vaccinations != 'NA', daily_vaccinations != '0') %>%
    group_by(state) %>%
    summarise(avg_vacc = mean(daily_vaccinations), 
              min_vacc = min(daily_vaccinations),
              max_vacc = max(daily_vaccinations),
              total_vacc= sum(daily_vaccinations))
um<- na.omit(um)
kable(head(um))%>%
  kable_styling("striped", full_width = F, font_size = 11)
state avg_vacc min_vacc max_vacc total_vacc
Alabama 22671.257 5906 33381 2561852
Alaska 4733.416 2645 9046 534876
Arizona 44656.354 13390 63074 5046168
Arkansas 16587.920 7990 53975 1874435
California 268927.947 75188 494575 30388858
Colorado 38511.115 12934 65974 4351756
## Add daily_vaccination Proportion
daily_vaccination_states<- daily_vaccinations_states %>%
    group_by(date) %>%
    summarise(avg_vaccines = mean(daily_vaccinations), 
              min_vaccines = min(daily_vaccinations),
              max_vaccines = max(daily_vaccinations),
              total_vaccines= sum(daily_vaccinations))%>%
         mutate(total_vaccines_prop = prop.table(total_vaccines))

kable(head(daily_vaccination_states,5))  %>%
 kable_styling(bootstrap_options = "striped", font_size = 11)
date avg_vaccines min_vaccines max_vaccines total_vaccines total_vaccines_prop
2021-01-12 NA NA NA NA NA
2021-01-13 17865.82 0 75188 911157 NA
2021-01-14 16658.35 1648 79496 849576 NA
2021-01-15 18041.10 1560 85553 920096 NA
2021-01-16 17509.67 1887 88381 892993 NA
## Daily Vaccination in USA
totalstates<-um %>%
  summarise(Ave = mean(avg_vacc),
            Min = min(min_vacc),
              Max = max(max_vacc))
kable(totalstates)%>%
  kable_styling("striped",  font_size = 11)
Ave Min Max
41007.49 1249 494575

Vaccine

covid_data_fully
## # A tibble: 150 x 3
##    country     people_fully_vaccinated people_fully_vaccinated_per_hundred
##    <chr>                         <int>                               <dbl>
##  1 Afghanistan                   55624                                0.14
##  2 Albania                      187921                                6.53
##  3 Andorra                        4702                                6.09
##  4 Angola                        40195                                0.12
##  5 Anguilla                        783                                5.22
##  6 Argentina                   1629336                                3.61
##  7 Aruba                         28509                               26.7 
##  8 Austria                     1032825                               11.5 
##  9 Azerbaijan                   729173                                7.19
## 10 Bahrain                      617139                               36.3 
## # ... with 140 more rows

Conclusion

From the analysis above we can tell that USA, China and India are the countries with the highest total immunizations. Israel is on the first place by the total vaccinations per hundred, followed by United Arab Emirates and Chile.

India has the highest daily vaccinations followed by USA. Israel has the highest daily number of immunizations per million.

Oxford/AstraZaneca are the most used vaccines. AstraZeneca, Pfizer and Moderna are most daily used vaccines. There is a highest number of total vaccinations in California (31M) and lowest in Wyoming (363K) New Hampshire is a state with the highest number of people vaccinated per hundred (61.16) and Mississippi is a state with the lowest number of people vaccinated per hundred (31.55).