Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points)
For this tidyverse assignment we were to pick a dataset from fivethirtyeight.com or Kaggle and use one of the tidyverse package to create a vignette. The Article I will be using is from Kraggle and my dataset is World happiness report.
What is the purr package?
Purrr is a popular R Programming package that provides a consistent and powerful set of tools for working with functions and vectors. It was developed by Hadley Wickham and is part of the tidyverse suite of packages. Purrr is an essential package for functional programming in R. According to purrr.tidyverse.org, purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you’ve never heard of FP before, the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the map() functions is the iteration chapter in R for data science.
This step below I will be importing the world happiness dataset from my github account URL: (https://github.com/jnaval88/DATA607/blob/fc9b840efccb9a4f2743a21e3217acef8cb85cf1/Tidyverse_Assignment/world-happiness-report.csv.)
worldhappiness <- read.csv(file = "https://raw.githubusercontent.com/jnaval88/DATA607/main/Tidyverse_Assignment/world-happiness-report.csv")
First I will filter the data for a specific year.
worldhappiness2020 <- worldhappiness %>%
filter( year == '2020')
I filter the data for year 2020, which mean I will looking at information equivalent that year only.
For this step I will calculate the average life expectancy at birth for the year 2020
mean(worldhappiness2020$Healthy.life.expectancy.at.birth, na.rm = TRUE)
## [1] 67.09957
Now I will be using the mapping function from the purrr package on world hapiness dataset using the year filter 2020, I will be looking at healthy life expectancy at birth.
worldhappiness2020$Healthy.life.expectancy.at.birth %>% map_dbl(mean)
## [1] 69.30 69.20 74.20 73.60 69.70 65.30 72.40 55.10 64.20 68.40 66.80 67.20
## [13] 62.40 54.30 74.00 70.10 69.90 68.30 71.40 74.10 71.30 73.00 66.40 69.10
## [25] 62.30 66.70 69.00 59.50 72.10 74.20 64.10 72.80 58.00 72.80 NA 68.40
## [37] 73.00 60.90 66.60 61.40 72.50 73.70 74.00 50.70 75.20 67.20 65.80 61.30
## [49] NA 64.70 59.50 67.40 68.50 72.20 67.00 68.90 66.40 62.70 68.90 66.50
## [61] 59.60 57.10 72.50 73.60 50.50 65.56 73.40 62.10 70.10 72.80 65.10 66.90
## [73] 69.00 69.50 71.70 57.30 74.20 75.00 72.80 74.70 NA 64.70 58.50 67.60
## [85] 67.50 67.60 56.50 65.20 67.50 72.70 68.10 69.20 66.90 56.30 56.80
For this step I am using the same map function and extended it to multiple columns.
worldhappiness %>%
select( "Healthy.life.expectancy.at.birth", "Freedom.to.make.life.choices" ) %>%
map(~mean(.,na.rm = TRUE))
## $Healthy.life.expectancy.at.birth
## [1] 63.35937
##
## $Freedom.to.make.life.choices
## [1] 0.7425576
Below I will use the map function a bit more. I will
split the original data frame by year, and run a linear model on each
year. I then apply the summary function the results from
each model and then again use the map function to obtain
the r.squared value for each year.
worldhappiness %>%
split(.$year) %>%
map(~lm( `Healthy.life.expectancy.at.birth` ~`Log.GDP.per.capita` , data = .) ) %>%
map(summary) %>%
map_df("r.squared") %>%
reactable()
From the purrr package in the tidyverse I use the map function to show how to manipulate vector.