Introduction
For my Final Project I decided to analyze Covid-19 Vaccination topic as one of the most discussed and hot topic at the moment. I will try to respond the questions below:
In which country the vaccination program is more advanced?
Where are vaccinated more people per day? But in terms of percent from entire population ?
What vaccine or combination of vaccines are mostly used?
I also want to review Vaccination in USA accross the states and respond to the following questions:
What is the most/least vaccinated state?
What states have the highest number of people vaccinated per hundred?
Data Gathering
Datasets I want to analyze are collected daily from Our World in Data and the links to the data source are below:
https://www.kaggle.com/gpreda/covid-world-vaccination-progress?select=country_vaccinations.csv https://www.kaggle.com/paultimothymooney/usa-covid19-vaccinations/code
Exploratory Analysis
Country Vaccinations
dim(datasetx)## [1] 17607 15
str(datasetx)## 'data.frame': 17607 obs. of 15 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ iso_code : chr "AFG" "AFG" "AFG" "AFG" ...
## $ date : chr "2021-02-22" "2021-02-23" "2021-02-24" "2021-02-25" ...
## $ total_vaccinations : int 0 NA NA NA NA NA 8200 NA NA NA ...
## $ people_vaccinated : int 0 NA NA NA NA NA 8200 NA NA NA ...
## $ people_fully_vaccinated : int NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations_raw : int NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations : int NA 1367 1367 1367 1367 1367 1367 1580 1794 2008 ...
## $ total_vaccinations_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ...
## $ people_vaccinated_per_hundred : num 0 NA NA NA NA NA 0.02 NA NA NA ...
## $ people_fully_vaccinated_per_hundred: num NA NA NA NA NA NA NA NA NA NA ...
## $ daily_vaccinations_per_million : int NA 35 35 35 35 35 35 41 46 52 ...
## $ vaccines : chr "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" "Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing" ...
## $ source_name : chr "World Health Organization" "World Health Organization" "World Health Organization" "World Health Organization" ...
## $ source_website : chr "https://covid19.who.int/" "https://covid19.who.int/" "https://covid19.who.int/" "https://covid19.who.int/" ...
## Total number of countries present
count(distinct(datasetx, country))## n
## 1 211
## [1] "2020-12-02"
## [1] "2021-05-13"
USA Vaccination
dim (usadf)## [1] 7433 14
str(usadf)## 'data.frame': 7433 obs. of 14 variables:
## $ date : chr "2021-01-12" "2021-01-13" "2021-01-14" "2021-01-15" ...
## $ location : chr "Alabama" "Alabama" "Alabama" "Alabama" ...
## $ total_vaccinations : num 78134 84040 92300 100567 NA ...
## $ total_distributed : num 377025 378975 435350 444650 NA ...
## $ people_vaccinated : num 70861 74792 80480 86956 NA ...
## $ people_fully_vaccinated_per_hundred: num 0.15 0.19 NA 0.28 NA NA NA 0.33 0.37 0.44 ...
## $ total_vaccinations_per_hundred : num 1.59 1.71 1.88 2.05 NA NA NA 2.67 2.84 3.38 ...
## $ people_fully_vaccinated : num 7270 9245 NA 13488 NA ...
## $ people_vaccinated_per_hundred : num 1.45 1.53 1.64 1.77 NA NA NA 2.33 2.47 2.95 ...
## $ distributed_per_hundred : num 7.69 7.73 8.88 9.07 NA ...
## $ daily_vaccinations_raw : num NA 5906 8260 8267 7557 ...
## $ daily_vaccinations : num NA 5906 7083 7478 7498 ...
## $ daily_vaccinations_per_million : num NA 1205 1445 1525 1529 ...
## $ share_doses_used : num 0.207 0.222 0.212 0.226 NA NA NA 0.294 0.288 0.336 ...
## [1] "2020-12-20"
## [1] "2021-05-05"
Data Preparation and Cleaning
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Country Vaccination
## Dropping unnecessary columns: 'source_name', 'source_website'
datasetx <- datasetx[ -c(14,15) ]## Rename vaccines to vaccine_name
datasetx<-datasetx %>%
rename(vaccine_name=vaccines)USA Vaccination
## Total number of countries present
count(distinct(usadf, location))## n
## 1 65
## Rid of some "fake" states in USA dataset
usa1<- usadf%>% filter(location != 'Federated States of Micronesia', location != 'Guam',location != 'Federated States of Micronesia', location != 'Long Term Care',location != 'Marshall Islands', location != 'Dept of Defense', location != 'Indian Health Svc', location != 'Northern Mariana Islands',location != 'Republic of Palau', location != 'United States',location != 'Veterans Health', location != 'American Samoa', location != 'Northern Mariana Islands',location != 'Republic of Palau', location != 'Bureau of Prisons', location != 'District of Columbia', location != 'Virgin Islands')## Rename vaccines to vaccine_name
usa<-usa1 %>%
rename(state=location)Total vaccinations
Total number of vaccinations - this is the absolute number of total immunizations in the country;
## Remove N/a from 'total_vaccinations'
total_vaccination <- datasetx%>% group_by(country)%>% filter(total_vaccinations != 'NA', total_vaccinations != '0')## Add Vaccination Proportion
total_vaccinations<- total_vaccination %>%
group_by(date) %>%
summarise(avg_vaccines = mean(total_vaccinations),
min_vaccines = min(total_vaccinations),
max_vaccines = max(total_vaccinations),
total_vaccines= sum(total_vaccinations))%>%
mutate(total_vaccines_prop = prop.table(total_vaccines))
kable(head(total_vaccinations))%>%
kable_styling("striped","hold_position", full_width = F)%>% kable_styling(bootstrap_options = "striped", font_size = 11)| date | avg_vaccines | min_vaccines | max_vaccines | total_vaccines | total_vaccines_prop |
|---|---|---|---|---|---|
| 2020-12-02 | 1 | 1 | 1 | 1 | 0 |
| 2020-12-03 | 1 | 1 | 1 | 1 | 0 |
| 2020-12-04 | 2 | 2 | 2 | 2 | 0 |
| 2020-12-05 | 2 | 2 | 2 | 2 | 0 |
| 2020-12-06 | 2 | 2 | 2 | 2 | 0 |
| 2020-12-07 | 4 | 4 | 4 | 4 | 0 |
## Remove N/a from 'daily_vaccinations_per_million'
daily_vaccinations <- datasetx%>% group_by(country)%>% filter(daily_vaccinations_per_million != 'NA', daily_vaccinations_per_million != '0')## Add daily_vaccination Proportion
daily_vaccination<- daily_vaccinations %>%
group_by(date) %>%
summarise(avg_vaccines = mean(daily_vaccinations_per_million),
min_vaccines = min(daily_vaccinations_per_million),
max_vaccines = max(daily_vaccinations_per_million),
total_vaccines= sum(daily_vaccinations_per_million))%>%
mutate(total_vaccines_prop = prop.table(total_vaccines))
kable(head(daily_vaccination,5)) %>%
kable_styling(bootstrap_options = "striped", font_size = 11)| date | avg_vaccines | min_vaccines | max_vaccines | total_vaccines | total_vaccines_prop |
|---|---|---|---|---|---|
| 2020-12-15 | 19.00000 | 19 | 19 | 19 | 4.0e-07 |
| 2020-12-16 | 64.33333 | 23 | 130 | 193 | 3.7e-06 |
| 2020-12-17 | 72.33333 | 23 | 130 | 217 | 4.1e-06 |
| 2020-12-18 | 75.66667 | 23 | 130 | 227 | 4.3e-06 |
| 2020-12-19 | 72.00000 | 23 | 130 | 216 | 4.1e-06 |
Total Vaccinations by Country
Total number of vaccinations - this is the absolute number of total immunizations in the country; For the visualization below I took the latest updated information of total immunizations in the country. The first graph represents countries with the largest number of immunizations and the second graph represents countries with the smallest number.
Total Vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country;
People vaccinations
Total number of people fully vaccinated per hundred - ratio (in percent) between population fully immunized and total population up to the date in the country;
Information for China and United Arab Emirates for fully vaccinated people is missing from the data table.
people_fully_vaccinated_per_hundred <- top_countries%>% group_by(country)%>% filter(people_fully_vaccinated_per_hundred != 'NA', people_fully_vaccinated_per_hundred != '0')Daily Vaccinations by Country
Daily vaccinations - for a certain data entry, the number of vaccination for that date/country; For further analysis I chose countries with the highest immunizations number
b1 <- datasetx %>%
filter(country %in% c("Germany", "United Kingdom", "United States", "India", "France", "China", "England", "Brazil", "Turkey","France","Russia","Indenesia","Italy","Israel"))
b<- na.omit(b1)
kable(head(b,5)) %>%
kable_styling(bootstrap_options = "striped",full_width = F, font_size = 11)| country | iso_code | date | total_vaccinations | people_vaccinated | people_fully_vaccinated | daily_vaccinations_raw | daily_vaccinations | total_vaccinations_per_hundred | people_vaccinated_per_hundred | people_fully_vaccinated_per_hundred | daily_vaccinations_per_million | vaccine_name | day | month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22 | Brazil | BRA | 2021-02-06 | 3401383 | 3399421 | 1962 | 326477 | 199739 | 1.60 | 1.60 | 0.00 | 940 | Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac | 6 | 2 | 2021 |
| 23 | Brazil | BRA | 2021-02-07 | 3553681 | 3534004 | 19677 | 152298 | 211375 | 1.67 | 1.66 | 0.01 | 994 | Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac | 7 | 2 | 2021 |
| 24 | Brazil | BRA | 2021-02-08 | 3605538 | 3579850 | 25688 | 51857 | 211604 | 1.70 | 1.68 | 0.01 | 996 | Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac | 8 | 2 | 2021 |
| 25 | Brazil | BRA | 2021-02-09 | 3820207 | 3786591 | 33616 | 214669 | 218237 | 1.80 | 1.78 | 0.02 | 1027 | Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac | 9 | 2 | 2021 |
| 26 | Brazil | BRA | 2021-02-10 | 4120332 | 4069677 | 50655 | 300125 | 228375 | 1.94 | 1.91 | 0.02 | 1074 | Oxford/AstraZeneca, Pfizer/BioNTech, Sinovac | 10 | 2 | 2021 |
From the graphs below we can see that USA and India are leading countries based on number of total daily immunizations
## Picking joint bandwidth of 88200
Daily vaccinations per million - ratio (in ppm) between vaccination number and total population for the current date in the country; From the chart below we can tell that Israel provides the highest number of daily vaccinations per million.
## Picking joint bandwidth of 646
USA
As vaccines distributed per hundred are rising share doses of vaccines is rising as well.
um<- usa %>%filter(daily_vaccinations != 'NA', daily_vaccinations != '0') %>%
group_by(state) %>%
summarise(avg_vacc = mean(daily_vaccinations),
min_vacc = min(daily_vaccinations),
max_vacc = max(daily_vaccinations),
total_vacc= sum(daily_vaccinations))
um<- na.omit(um)
kable(head(um))%>%
kable_styling("striped", full_width = F, font_size = 11)| state | avg_vacc | min_vacc | max_vacc | total_vacc |
|---|---|---|---|---|
| Alabama | 22671.257 | 5906 | 33381 | 2561852 |
| Alaska | 4733.416 | 2645 | 9046 | 534876 |
| Arizona | 44656.354 | 13390 | 63074 | 5046168 |
| Arkansas | 16587.920 | 7990 | 53975 | 1874435 |
| California | 268927.947 | 75188 | 494575 | 30388858 |
| Colorado | 38511.115 | 12934 | 65974 | 4351756 |
## Add daily_vaccination Proportion
daily_vaccination_states<- daily_vaccinations_states %>%
group_by(date) %>%
summarise(avg_vaccines = mean(daily_vaccinations),
min_vaccines = min(daily_vaccinations),
max_vaccines = max(daily_vaccinations),
total_vaccines= sum(daily_vaccinations))%>%
mutate(total_vaccines_prop = prop.table(total_vaccines))
kable(head(daily_vaccination_states,5)) %>%
kable_styling(bootstrap_options = "striped", font_size = 11)| date | avg_vaccines | min_vaccines | max_vaccines | total_vaccines | total_vaccines_prop |
|---|---|---|---|---|---|
| 2021-01-12 | NA | NA | NA | NA | NA |
| 2021-01-13 | 17865.82 | 0 | 75188 | 911157 | NA |
| 2021-01-14 | 16658.35 | 1648 | 79496 | 849576 | NA |
| 2021-01-15 | 18041.10 | 1560 | 85553 | 920096 | NA |
| 2021-01-16 | 17509.67 | 1887 | 88381 | 892993 | NA |
## Daily Vaccination in USA
totalstates<-um %>%
summarise(Ave = mean(avg_vacc),
Min = min(min_vacc),
Max = max(max_vacc))
kable(totalstates)%>%
kable_styling("striped", font_size = 11)| Ave | Min | Max |
|---|---|---|
| 41007.49 | 1249 | 494575 |
Vaccine
covid_data_fully## # A tibble: 150 x 3
## country people_fully_vaccinated people_fully_vaccinated_per_hundred
## <chr> <int> <dbl>
## 1 Afghanistan 55624 0.14
## 2 Albania 187921 6.53
## 3 Andorra 4702 6.09
## 4 Angola 40195 0.12
## 5 Anguilla 783 5.22
## 6 Argentina 1629336 3.61
## 7 Aruba 28509 26.7
## 8 Austria 1032825 11.5
## 9 Azerbaijan 729173 7.19
## 10 Bahrain 617139 36.3
## # ... with 140 more rows
Conclusion
From the analysis above we can tell that USA, China and India are the countries with the highest total immunizations. Israel is on the first place by the total vaccinations per hundred, followed by United Arab Emirates and Chile.
India has the highest daily vaccinations followed by USA. Israel has the highest daily number of immunizations per million.
Oxford/AstraZaneca are the most used vaccines. AstraZeneca, Pfizer and Moderna are most daily used vaccines. There is a highest number of total vaccinations in California (31M) and lowest in Wyoming (363K) New Hampshire is a state with the highest number of people vaccinated per hundred (61.16) and Mississippi is a state with the lowest number of people vaccinated per hundred (31.55).