Task here is to Create an example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. I picked up ‘COVID-19 World Vaccination Progress’ dataset from Kaggle
The worldwide endeavor to create a safe and effective COVID-19 vaccine is bearing fruit. A handful of vaccines now have been authorized around the globe; many more remain in development.The biggest vaccination campaign in history is underway. More than 172 million doses have been administered across 77 countries, according to data collected by Bloomberg. The latest rate was roughly 5.92 million doses a day. To bring this pandemic to an end, a large share of the world needs to be immune to the virus. in this assignment, Kaggle dataset will be used for analysis to apply Tidyverse capabilities.
The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy. Tidyverse packages are intended to make statisticians and data scientists more productive by guiding them through workflows that facilitate communication, and result in reproducible work products.
Data Wrangling and Transformation
* dplyr
* tidyr
* stringr
* forcats
Data Import and Management
* tibble
* readr
Functional Programming
* purrr
Data Visualization and Exploration
* ggplot2
More information on tidyverse can be found here
Data set from Kaggle is used for this assignment.
library(tidyverse) # Load all "tidyverse" libraries.## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# OR
# library(readr) # Read tabular data.
# library(tidyr) # Data frame tidying functions.
# library(dplyr) # General data frame manipulation.
# library(ggplot2) # Flexible plotting.
library(viridis) # Viridis color scale.## Loading required package: viridisLite
#Source: https://www.kaggle.com/gpreda/covid-world-vaccination-progress/download
url='https://raw.githubusercontent.com/rnivas2028/MSDS/Data607/Tidyverse/country_vaccinations.csv'
country_vaccinations <- read.csv(url(url))To look at the variable names and types
glimpse(country_vaccinations)## Rows: 3,081
## Columns: 15
## $ country <chr> "Albania", "Albania", "Albania"...
## $ iso_code <chr> "ALB", "ALB", "ALB", "ALB", "AL...
## $ date <chr> "2021-01-10", "2021-01-11", "20...
## $ total_vaccinations <dbl> 0, NA, 128, 188, 266, 308, 369,...
## $ people_vaccinated <dbl> 0, NA, 128, 188, 266, 308, 369,...
## $ people_fully_vaccinated <dbl> NA, NA, NA, NA, NA, NA, NA, NA,...
## $ daily_vaccinations_raw <dbl> NA, NA, NA, 60, 78, 42, 61, 36,...
## $ daily_vaccinations <dbl> NA, 64, 64, 63, 66, 62, 62, 58,...
## $ total_vaccinations_per_hundred <dbl> 0.00, NA, 0.00, 0.01, 0.01, 0.0...
## $ people_vaccinated_per_hundred <dbl> 0.00, NA, 0.00, 0.01, 0.01, 0.0...
## $ people_fully_vaccinated_per_hundred <dbl> NA, NA, NA, NA, NA, NA, NA, NA,...
## $ daily_vaccinations_per_million <dbl> NA, 22, 22, 22, 23, 22, 22, 20,...
## $ vaccines <chr> "Pfizer/BioNTech", "Pfizer/BioN...
## $ source_name <chr> "Ministry of Health", "Ministry...
## $ source_website <chr> "https://shendetesia.gov.al/vak...
To get an overview of data set
summary(country_vaccinations)## country iso_code date total_vaccinations
## Length:3081 Length:3081 Length:3081 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 25588
## Mode :character Mode :character Mode :character Median : 160378
## Mean : 1314264
## 3rd Qu.: 673445
## Max. :50641884
## NA's :1101
## people_vaccinated people_fully_vaccinated daily_vaccinations_raw
## Min. : 0 Min. : 1 Min. : 0
## 1st Qu.: 24773 1st Qu.: 6120 1st Qu.: 1901
## Median : 142831 Median : 25175 Median : 10672
## Mean : 1098716 Mean : 318684 Mean : 72091
## 3rd Qu.: 568942 3rd Qu.: 139000 3rd Qu.: 54804
## Max. :37056122 Max. :13082172 Max. :2231326
## NA's :1438 NA's :2065 NA's :1439
## daily_vaccinations total_vaccinations_per_hundred
## Min. : 1 Min. : 0.000
## 1st Qu.: 1218 1st Qu.: 0.480
## Median : 6124 Median : 1.975
## Mean : 55768 Mean : 5.231
## 3rd Qu.: 28056 3rd Qu.: 4.550
## Max. :1916190 Max. :72.580
## NA's :121 NA's :1101
## people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
## Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0.500 1st Qu.: 0.0875
## Median : 2.080 Median : 0.5200
## Mean : 4.607 Mean : 1.4042
## 3rd Qu.: 3.685 3rd Qu.: 1.1425
## Max. :46.300 Max. :28.3900
## NA's :1438 NA's :2065
## daily_vaccinations_per_million vaccines source_name
## Min. : 0.0 Length:3081 Length:3081
## 1st Qu.: 345.8 Class :character Class :character
## Median : 952.5 Mode :character Mode :character
## Mean : 2129.8
## 3rd Qu.: 1787.2
## Max. :30869.0
## NA's :121
## source_website
## Length:3081
## Class :character
## Mode :character
##
##
##
##
To get selected columns from a large data sets with many columns
head(country_vaccinations%>%select(country, total_vaccinations, people_vaccinated, people_fully_vaccinated),5)## country total_vaccinations people_vaccinated people_fully_vaccinated
## 1 Albania 0 0 NA
## 2 Albania NA NA NA
## 3 Albania 128 128 NA
## 4 Albania 188 188 NA
## 5 Albania 266 266 NA
To rename columns in a data set
head(rename(country_vaccinations, 'Total Vaccinations'=total_vaccinations),5)## country iso_code date Total Vaccinations people_vaccinated
## 1 Albania ALB 2021-01-10 0 0
## 2 Albania ALB 2021-01-11 NA NA
## 3 Albania ALB 2021-01-12 128 128
## 4 Albania ALB 2021-01-13 188 188
## 5 Albania ALB 2021-01-14 266 266
## people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1 NA NA NA
## 2 NA NA 64
## 3 NA NA 64
## 4 NA 60 63
## 5 NA 78 66
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1 0.00 0.00
## 2 NA NA
## 3 0.00 0.00
## 4 0.01 0.01
## 5 0.01 0.01
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1 NA NA
## 2 NA 22
## 3 NA 22
## 4 NA 22
## 5 NA 23
## vaccines source_name
## 1 Pfizer/BioNTech Ministry of Health
## 2 Pfizer/BioNTech Ministry of Health
## 3 Pfizer/BioNTech Ministry of Health
## 4 Pfizer/BioNTech Ministry of Health
## 5 Pfizer/BioNTech Ministry of Health
## source_website
## 1 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 2 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 3 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 4 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 5 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
To pick random sample from the data set
head(sample_n(country_vaccinations, 5))## country iso_code date total_vaccinations people_vaccinated
## 1 Argentina ARG 2021-02-02 NA NA
## 2 Latvia LVA 2021-01-06 NA 4621
## 3 Germany DEU 2021-02-06 3258881 2274441
## 4 Portugal PRT 2021-01-22 212000 NA
## 5 Scotland 2021-02-07 877513 866823
## people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1 NA NA 11475
## 2 NA NA NA
## 3 984440 101444 114759
## 4 NA NA 15143
## 5 10690 27665 41967
## total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1 NA NA
## 2 NA 0.24
## 3 3.89 2.71
## 4 2.08 NA
## 5 16.06 15.87
## people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1 NA 254
## 2 NA NA
## 3 1.17 1370
## 4 NA 1485
## 5 0.20 7682
## vaccines source_name
## 1 Sputnik V Ministry of Health
## 2 Moderna, Oxford/AstraZeneca, Pfizer/BioNTech National Health Service
## 3 Moderna, Oxford/AstraZeneca, Pfizer/BioNTech Robert Koch Institut
## 4 Moderna, Pfizer/BioNTech National Health Service
## 5 Oxford/AstraZeneca, Pfizer/BioNTech Government of the United Kingdom
## source_website
## 1 http://datos.salud.gob.ar/dataset/vacunas-contra-covid-19-dosis-aplicadas-en-la-republica-argentina
## 2 https://data.gov.lv/dati/eng/dataset/covid19-vakcinacijas
## 3 https://impfdashboard.de/
## 4 https://covid19.min-saude.pt/ponto-de-situacao-atual-em-portugal/
## 5 https://coronavirus.data.gov.uk/details/healthcare
To group data set into smaller data groups
by_country <- group_by(country_vaccinations, country)
summarise <- summarise(by_country, count = n(),
country_vaccinations_mean = mean(total_vaccinations, na.rm = TRUE))
by_country <-head(summarise %>% arrange(desc(country_vaccinations_mean)))ggplot(by_country, aes(x=country_vaccinations_mean, y=country)) + geom_point()ggplot(data=by_country, aes(x=(reorder(country, country_vaccinations_mean)), y = country_vaccinations_mean))+
geom_bar(stat="identity", fill="#FF6600")+ coord_flip()+
labs(title="Average of country vaccinations", x= "Country", y = "Country Vaccinations Mean")+
geom_text(aes(label=round(country_vaccinations_mean, digits = 2)))+
theme(plot.title=element_text(hjust=0.5))