The Ask:- from this task is to show the usage of one of more tidy verse packages using the data set from wither kaggle | 538 site

We got the the data from Kaggle World Happiness Report | Spotify Top 100 of 2018

The world happiness has 3 csv files and spotify has one csv file, we are using the list.files function of base package to get all the csv matching the pattern and load them using the map_df function of purrr , passing map_df the read_csv function of readr library to read csv files and load them into respective Data Frames.

files <- list.files(".", pattern = "[^201?]{+}.csv", full.names = TRUE)

kable(files) %>% 
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
x
./2015.csv
./2016.csv
./2017.csv
./top2018.csv
happinessDF <- map_df(files[-4],read_csv) 
spotifyDF <- map_df(files[4], read_csv)



kable(head(happinessDF)) %>% 
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Lower Confidence Interval Upper Confidence Interval Happiness.Rank Happiness.Score Whisker.high Whisker.low Economy..GDP.per.Capita. Health..Life.Expectancy. Trust..Government.Corruption. Dystopia.Residual
Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738 NA NA NA NA NA NA NA NA NA NA
Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201 NA NA NA NA NA NA NA NA NA NA
Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204 NA NA NA NA NA NA NA NA NA NA
Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531 NA NA NA NA NA NA NA NA NA NA
Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176 NA NA NA NA NA NA NA NA NA NA
Finland Western Europe 6 7.406 0.03140 1.29025 1.31826 0.88911 0.64169 0.41372 0.23351 2.61955 NA NA NA NA NA NA NA NA NA NA
kable(head(spotifyDF)) %>% 
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray")
id name artists genre danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature
6DCZcSspjsKoFjzjrWoCd God’s Plan Drake Hip-Hop/Rap 0.754 0.449 7 -9.211 1 0.1090 0.0332 8.29e-05 0.552 0.357 77.169 198973 4
3ee8Jmje8o58CHK66QrVC SAD! XXXTENTACION Hip-Hop/Rap 0.740 0.613 8 -4.880 1 0.1450 0.2580 3.72e-03 0.123 0.473 75.023 166606 4
0e7ipj03S05BNilyu5bRz rockstar (feat. 21 Savage) Post Malone Hip-Hop/Rap 0.587 0.535 5 -6.090 0 0.0898 0.1170 6.56e-05 0.131 0.140 159.847 218147 4
3swc6WTsr7rl9DqQKQA55 Psycho (feat. Ty Dolla $ign) Post Malone Hip-Hop/Rap 0.739 0.559 8 -8.011 1 0.1170 0.5800 0.00e+00 0.112 0.439 140.124 221440 4
2G7V7zsVDxg1yRsu7Ew9R In My Feelings Drake Hip-Hop/Rap 0.835 0.626 1 -5.833 1 0.1250 0.0589 6.00e-05 0.396 0.350 91.030 217925 4
7dt6x5M1jzdTEt8oCbisT Better Now Post Malone Hip-Hop/Rap 0.680 0.563 10 -5.843 1 0.0454 0.3540 0.00e+00 0.136 0.374 145.028 231267 4

Taking spotifyDF data frame, we use split function to split the data based on column value(mode),then there are three calls to purrr functions in the below code . The first map(~ lm()) call creates a list of “lm” objects; the second map(summary) call creates a list of “summary.lm” objects; the third map_dbl() creates a vector of double-precision values.

We can clearly the r-square values based for “MODE” based on danceability & Energy factor for the song.

spotifyDF %>%
  split(.$mode) %>%
  map(~ lm(danceability ~ energy, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
##          0          1 
## 0.04018521 0.09347519

Taking happinessDF data frame , and filtering out the rows based on !is.na(Region) i.e. any row which has na for column Region should be left out of the data frame, and then using the select method to subset the data frame and the using the split function of base package to split my new subset of original dataframe based on Region. Then using three calls to purrr functions in the below code , The first map(~ lm()) call creates a list of “lm” objects; the second map(summary) call creates a list of “summary.lm” objects; the third map_dbl() creates a vector of double-precision values.

We can clearly see the R2 (r-square) for various Regions based on Family and Economy(GDP per Capita).

regionRSquare <- happinessDF %>%
  filter(!is.na(Region)) %>%
  select(Region,Family,`Economy (GDP per Capita)`) %>%
  split(.$Region) %>%
  map(~ lm(Family ~ `Economy (GDP per Capita)`, data=.)) %>%
  map(summary) %>%
  map_dbl("r.squared")
  
  kable(regionRSquare) %>% 
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width   = F,position = "left",font_size = 12) %>%
  row_spec(0, background ="gray") 
x
Australia and New Zealand 0.8080644
Central and Eastern Europe 0.0532958
Eastern Asia 0.1007601
Latin America and Caribbean 0.1450172
Middle East and Northern Africa 0.3041670
North America 0.9323728
Southeastern Asia 0.2813623
Southern Asia 0.2457082
Sub-Saharan Africa 0.1071135
Western Europe 0.0007291

Using the ggplot2 library from tidyverse package to draw scatter plt for various Regions using Family vs Economy .
Then extending the same and plotting the linear model for each subgroup(i.e. Region) , setting the method=lm inside the geom_smooth function.

r ggplot(happinessDF,aes(x=Family,y=`Economy (GDP per Capita)`,col=Region))+ geom_point()

r ggplot(happinessDF,aes(x=Family,y=`Economy (GDP per Capita)`,col=Region))+ geom_point()+ geom_smooth(method="lm",se=FALSE) + geom_smooth(aes(group=1),method="lm",se=FALSE,linetype=2)