Data 607 : Assignment 2 - Tidyverse Create

Overview

Task here is to Create an example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. I picked up ‘COVID-19 World Vaccination Progress’ dataset from Kaggle

Summary

The worldwide endeavor to create a safe and effective COVID-19 vaccine is bearing fruit. A handful of vaccines now have been authorized around the globe; many more remain in development.The biggest vaccination campaign in history is underway. More than 172 million doses have been administered across 77 countries, according to data collected by Bloomberg. The latest rate was roughly 5.92 million doses a day. To bring this pandemic to an end, a large share of the world needs to be immune to the virus. in this assignment, Kaggle dataset will be used for analysis to apply Tidyverse capabilities.

Tidyverse

The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy. Tidyverse packages are intended to make statisticians and data scientists more productive by guiding them through workflows that facilitate communication, and result in reproducible work products.

Data Wrangling and Transformation
* dplyr
* tidyr 
* stringr
* forcats
Data Import and Management
* tibble
* readr 
Functional Programming
* purrr
Data Visualization and Exploration
* ggplot2

More information on tidyverse can be found here

Data Collection

Data set from Kaggle is used for this assignment.

Load tidyverse packages

library(tidyverse) # Load all "tidyverse" libraries.

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

# OR
# library(readr)   # Read tabular data.
# library(tidyr)   # Data frame tidying functions.
# library(dplyr)   # General data frame manipulation.
# library(ggplot2) # Flexible plotting.
library(viridis)   # Viridis color scale.

## Loading required package: viridisLite

Load Vaccination Progress data

#Source: https://www.kaggle.com/gpreda/covid-world-vaccination-progress/download
url='https://raw.githubusercontent.com/rnivas2028/MSDS/Data607/Tidyverse/country_vaccinations.csv'
country_vaccinations <- read.csv(url(url))

Tidyverse capabilities

glimpse

To look at the variable names and types

glimpse(country_vaccinations)

## Rows: 3,081
## Columns: 15
## $ country                             <chr> "Albania", "Albania", "Albania"...
## $ iso_code                            <chr> "ALB", "ALB", "ALB", "ALB", "AL...
## $ date                                <chr> "2021-01-10", "2021-01-11", "20...
## $ total_vaccinations                  <dbl> 0, NA, 128, 188, 266, 308, 369,...
## $ people_vaccinated                   <dbl> 0, NA, 128, 188, 266, 308, 369,...
## $ people_fully_vaccinated             <dbl> NA, NA, NA, NA, NA, NA, NA, NA,...
## $ daily_vaccinations_raw              <dbl> NA, NA, NA, 60, 78, 42, 61, 36,...
## $ daily_vaccinations                  <dbl> NA, 64, 64, 63, 66, 62, 62, 58,...
## $ total_vaccinations_per_hundred      <dbl> 0.00, NA, 0.00, 0.01, 0.01, 0.0...
## $ people_vaccinated_per_hundred       <dbl> 0.00, NA, 0.00, 0.01, 0.01, 0.0...
## $ people_fully_vaccinated_per_hundred <dbl> NA, NA, NA, NA, NA, NA, NA, NA,...
## $ daily_vaccinations_per_million      <dbl> NA, 22, 22, 22, 23, 22, 22, 20,...
## $ vaccines                            <chr> "Pfizer/BioNTech", "Pfizer/BioN...
## $ source_name                         <chr> "Ministry of Health", "Ministry...
## $ source_website                      <chr> "https://shendetesia.gov.al/vak...

summary

To get an overview of data set

summary(country_vaccinations)

##    country            iso_code             date           total_vaccinations
##  Length:3081        Length:3081        Length:3081        Min.   :       0  
##  Class :character   Class :character   Class :character   1st Qu.:   25588  
##  Mode  :character   Mode  :character   Mode  :character   Median :  160378  
##                                                           Mean   : 1314264  
##                                                           3rd Qu.:  673445  
##                                                           Max.   :50641884  
##                                                           NA's   :1101      
##  people_vaccinated  people_fully_vaccinated daily_vaccinations_raw
##  Min.   :       0   Min.   :       1        Min.   :      0       
##  1st Qu.:   24773   1st Qu.:    6120        1st Qu.:   1901       
##  Median :  142831   Median :   25175        Median :  10672       
##  Mean   : 1098716   Mean   :  318684        Mean   :  72091       
##  3rd Qu.:  568942   3rd Qu.:  139000        3rd Qu.:  54804       
##  Max.   :37056122   Max.   :13082172        Max.   :2231326       
##  NA's   :1438       NA's   :2065            NA's   :1439          
##  daily_vaccinations total_vaccinations_per_hundred
##  Min.   :      1    Min.   : 0.000                
##  1st Qu.:   1218    1st Qu.: 0.480                
##  Median :   6124    Median : 1.975                
##  Mean   :  55768    Mean   : 5.231                
##  3rd Qu.:  28056    3rd Qu.: 4.550                
##  Max.   :1916190    Max.   :72.580                
##  NA's   :121        NA's   :1101                  
##  people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
##  Min.   : 0.000                Min.   : 0.0000                    
##  1st Qu.: 0.500                1st Qu.: 0.0875                    
##  Median : 2.080                Median : 0.5200                    
##  Mean   : 4.607                Mean   : 1.4042                    
##  3rd Qu.: 3.685                3rd Qu.: 1.1425                    
##  Max.   :46.300                Max.   :28.3900                    
##  NA's   :1438                  NA's   :2065                       
##  daily_vaccinations_per_million   vaccines         source_name       
##  Min.   :    0.0                Length:3081        Length:3081       
##  1st Qu.:  345.8                Class :character   Class :character  
##  Median :  952.5                Mode  :character   Mode  :character  
##  Mean   : 2129.8                                                     
##  3rd Qu.: 1787.2                                                     
##  Max.   :30869.0                                                     
##  NA's   :121                                                         
##  source_website    
##  Length:3081       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

select

To get selected columns from a large data sets with many columns

head(country_vaccinations%>%select(country, total_vaccinations, people_vaccinated,  people_fully_vaccinated),5)

##   country total_vaccinations people_vaccinated people_fully_vaccinated
## 1 Albania                  0                 0                      NA
## 2 Albania                 NA                NA                      NA
## 3 Albania                128               128                      NA
## 4 Albania                188               188                      NA
## 5 Albania                266               266                      NA

rename

To rename columns in a data set

head(rename(country_vaccinations, 'Total Vaccinations'=total_vaccinations),5)

##   country iso_code       date Total Vaccinations people_vaccinated
## 1 Albania      ALB 2021-01-10                  0                 0
## 2 Albania      ALB 2021-01-11                 NA                NA
## 3 Albania      ALB 2021-01-12                128               128
## 4 Albania      ALB 2021-01-13                188               188
## 5 Albania      ALB 2021-01-14                266               266
##   people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1                      NA                     NA                 NA
## 2                      NA                     NA                 64
## 3                      NA                     NA                 64
## 4                      NA                     60                 63
## 5                      NA                     78                 66
##   total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1                           0.00                          0.00
## 2                             NA                            NA
## 3                           0.00                          0.00
## 4                           0.01                          0.01
## 5                           0.01                          0.01
##   people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1                                  NA                             NA
## 2                                  NA                             22
## 3                                  NA                             22
## 4                                  NA                             22
## 5                                  NA                             23
##          vaccines        source_name
## 1 Pfizer/BioNTech Ministry of Health
## 2 Pfizer/BioNTech Ministry of Health
## 3 Pfizer/BioNTech Ministry of Health
## 4 Pfizer/BioNTech Ministry of Health
## 5 Pfizer/BioNTech Ministry of Health
##                                                                       source_website
## 1 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 2 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 3 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 4 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/
## 5 https://shendetesia.gov.al/vaksinimi-anticovid-vaksinohen-48-mjeke-dhe-infermiere/

sample_n

To pick random sample from the data set

head(sample_n(country_vaccinations, 5))

##     country iso_code       date total_vaccinations people_vaccinated
## 1 Argentina      ARG 2021-02-02                 NA                NA
## 2    Latvia      LVA 2021-01-06                 NA              4621
## 3   Germany      DEU 2021-02-06            3258881           2274441
## 4  Portugal      PRT 2021-01-22             212000                NA
## 5  Scotland          2021-02-07             877513            866823
##   people_fully_vaccinated daily_vaccinations_raw daily_vaccinations
## 1                      NA                     NA              11475
## 2                      NA                     NA                 NA
## 3                  984440                 101444             114759
## 4                      NA                     NA              15143
## 5                   10690                  27665              41967
##   total_vaccinations_per_hundred people_vaccinated_per_hundred
## 1                             NA                            NA
## 2                             NA                          0.24
## 3                           3.89                          2.71
## 4                           2.08                            NA
## 5                          16.06                         15.87
##   people_fully_vaccinated_per_hundred daily_vaccinations_per_million
## 1                                  NA                            254
## 2                                  NA                             NA
## 3                                1.17                           1370
## 4                                  NA                           1485
## 5                                0.20                           7682
##                                       vaccines                      source_name
## 1                                    Sputnik V               Ministry of Health
## 2 Moderna, Oxford/AstraZeneca, Pfizer/BioNTech          National Health Service
## 3 Moderna, Oxford/AstraZeneca, Pfizer/BioNTech             Robert Koch Institut
## 4                     Moderna, Pfizer/BioNTech          National Health Service
## 5          Oxford/AstraZeneca, Pfizer/BioNTech Government of the United Kingdom
##                                                                                        source_website
## 1 http://datos.salud.gob.ar/dataset/vacunas-contra-covid-19-dosis-aplicadas-en-la-republica-argentina
## 2                                           https://data.gov.lv/dati/eng/dataset/covid19-vakcinacijas
## 3                                                                           https://impfdashboard.de/
## 4                                   https://covid19.min-saude.pt/ponto-de-situacao-atual-em-portugal/
## 5                                                  https://coronavirus.data.gov.uk/details/healthcare

group_by

To group data set into smaller data groups

by_country <- group_by(country_vaccinations, country)
summarise <- summarise(by_country, count = n(),
country_vaccinations_mean = mean(total_vaccinations, na.rm = TRUE))
by_country <-head(summarise %>% arrange(desc(country_vaccinations_mean)))

ggplot

ggplot(by_country, aes(x=country_vaccinations_mean, y=country)) + geom_point()

ggplot(data=by_country, aes(x=(reorder(country, country_vaccinations_mean)), y = country_vaccinations_mean))+
  geom_bar(stat="identity", fill="#FF6600")+ coord_flip()+
  labs(title="Average of country vaccinations", x= "Country", y = "Country Vaccinations Mean")+
  geom_text(aes(label=round(country_vaccinations_mean, digits = 2)))+
  theme(plot.title=element_text(hjust=0.5))