Week 11 Tidyverse Assignment

Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)

Introduction

For this tidyverse assignment we were to pick a dataset from fivethirtyeight.com or Kaggle and use one of the tidyverse package to create a vignette. The Article I will be using is from Kraggle and my dataset is World happiness report.

What is the purr package?

Purrr is a popular R Programming package that provides a consistent and powerful set of tools for working with functions and vectors. It was developed by Hadley Wickham and is part of the tidyverse suite of packages. Purrr is an essential package for functional programming in R. According to purrr.tidyverse.org, purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you’ve never heard of FP before, the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the map() functions is the iteration chapter in R for data science.

Data Import

This step below I will be importing the world happiness dataset from my github account URL: (https://github.com/jnaval88/DATA607/blob/fc9b840efccb9a4f2743a21e3217acef8cb85cf1/Tidyverse_Assignment/world-happiness-report.csv.)

worldhappiness <- read.csv(file = "https://raw.githubusercontent.com/jnaval88/DATA607/main/Tidyverse_Assignment/world-happiness-report.csv")

Data filter and maping

First I will filter the data for a specific year.

worldhappiness2020 <- worldhappiness %>% 
  filter( year == '2020')

I filter the data for year 2020, which mean I will looking at information equivalent that year only.

Calculating the Average

For this step I will calculate the average life expectancy at birth for the year 2020

mean(worldhappiness2020$Healthy.life.expectancy.at.birth, na.rm = TRUE)

## [1] 67.09957

Purrr map function

Now I will be using the mapping function from the purrr package on world hapiness dataset using the year filter 2020, I will be looking at healthy life expectancy at birth.

worldhappiness2020$Healthy.life.expectancy.at.birth %>% map_dbl(mean)

##  [1] 69.30 69.20 74.20 73.60 69.70 65.30 72.40 55.10 64.20 68.40 66.80 67.20
## [13] 62.40 54.30 74.00 70.10 69.90 68.30 71.40 74.10 71.30 73.00 66.40 69.10
## [25] 62.30 66.70 69.00 59.50 72.10 74.20 64.10 72.80 58.00 72.80    NA 68.40
## [37] 73.00 60.90 66.60 61.40 72.50 73.70 74.00 50.70 75.20 67.20 65.80 61.30
## [49]    NA 64.70 59.50 67.40 68.50 72.20 67.00 68.90 66.40 62.70 68.90 66.50
## [61] 59.60 57.10 72.50 73.60 50.50 65.56 73.40 62.10 70.10 72.80 65.10 66.90
## [73] 69.00 69.50 71.70 57.30 74.20 75.00 72.80 74.70    NA 64.70 58.50 67.60
## [85] 67.50 67.60 56.50 65.20 67.50 72.70 68.10 69.20 66.90 56.30 56.80

For this step I am using the same map function and extended it to multiple columns.

worldhappiness %>% 
  select( "Healthy.life.expectancy.at.birth", "Freedom.to.make.life.choices" ) %>% 
  map(~mean(.,na.rm = TRUE))

## $Healthy.life.expectancy.at.birth
## [1] 63.35937
## 
## $Freedom.to.make.life.choices
## [1] 0.7425576

Exploring map function futher more

Below I will use the map function a bit more. I will split the original data frame by year, and run a linear model on each year. I then apply the summary function the results from each model and then again use the map function to obtain the r.squared value for each year.

worldhappiness %>%  
  split(.$year) %>% 
  map(~lm( `Healthy.life.expectancy.at.birth` ~`Log.GDP.per.capita`  , data = .) ) %>% 
  map(summary) %>% 
  map_df("r.squared") %>% 
  
  reactable()

Conclusion

From the purrr package in the tidyverse I use the map function to show how to manipulate vector.