We got the the data from Kaggle World Happiness Report | Spotify Top 100 of 2018
files <- list.files(".", pattern = "[^201?]{+}.csv", full.names = TRUE)
kable(files) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
x |
---|
./2015.csv |
./2016.csv |
./2017.csv |
./top2018.csv |
happinessDF <- map_df(files[-4],read_csv)
spotifyDF <- map_df(files[4], read_csv)
kable(head(happinessDF)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
Country | Region | Happiness Rank | Happiness Score | Standard Error | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | Lower Confidence Interval | Upper Confidence Interval | Happiness.Rank | Happiness.Score | Whisker.high | Whisker.low | Economy..GDP.per.Capita. | Health..Life.Expectancy. | Trust..Government.Corruption. | Dystopia.Residual |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Switzerland | Western Europe | 1 | 7.587 | 0.03411 | 1.39651 | 1.34951 | 0.94143 | 0.66557 | 0.41978 | 0.29678 | 2.51738 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Iceland | Western Europe | 2 | 7.561 | 0.04884 | 1.30232 | 1.40223 | 0.94784 | 0.62877 | 0.14145 | 0.43630 | 2.70201 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Denmark | Western Europe | 3 | 7.527 | 0.03328 | 1.32548 | 1.36058 | 0.87464 | 0.64938 | 0.48357 | 0.34139 | 2.49204 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Norway | Western Europe | 4 | 7.522 | 0.03880 | 1.45900 | 1.33095 | 0.88521 | 0.66973 | 0.36503 | 0.34699 | 2.46531 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Canada | North America | 5 | 7.427 | 0.03553 | 1.32629 | 1.32261 | 0.90563 | 0.63297 | 0.32957 | 0.45811 | 2.45176 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Finland | Western Europe | 6 | 7.406 | 0.03140 | 1.29025 | 1.31826 | 0.88911 | 0.64169 | 0.41372 | 0.23351 | 2.61955 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
kable(head(spotifyDF)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
id | name | artists | genre | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6DCZcSspjsKoFjzjrWoCd | God’s Plan | Drake | Hip-Hop/Rap | 0.754 | 0.449 | 7 | -9.211 | 1 | 0.1090 | 0.0332 | 8.29e-05 | 0.552 | 0.357 | 77.169 | 198973 | 4 |
3ee8Jmje8o58CHK66QrVC | SAD! | XXXTENTACION | Hip-Hop/Rap | 0.740 | 0.613 | 8 | -4.880 | 1 | 0.1450 | 0.2580 | 3.72e-03 | 0.123 | 0.473 | 75.023 | 166606 | 4 |
0e7ipj03S05BNilyu5bRz | rockstar (feat. 21 Savage) | Post Malone | Hip-Hop/Rap | 0.587 | 0.535 | 5 | -6.090 | 0 | 0.0898 | 0.1170 | 6.56e-05 | 0.131 | 0.140 | 159.847 | 218147 | 4 |
3swc6WTsr7rl9DqQKQA55 | Psycho (feat. Ty Dolla $ign) | Post Malone | Hip-Hop/Rap | 0.739 | 0.559 | 8 | -8.011 | 1 | 0.1170 | 0.5800 | 0.00e+00 | 0.112 | 0.439 | 140.124 | 221440 | 4 |
2G7V7zsVDxg1yRsu7Ew9R | In My Feelings | Drake | Hip-Hop/Rap | 0.835 | 0.626 | 1 | -5.833 | 1 | 0.1250 | 0.0589 | 6.00e-05 | 0.396 | 0.350 | 91.030 | 217925 | 4 |
7dt6x5M1jzdTEt8oCbisT | Better Now | Post Malone | Hip-Hop/Rap | 0.680 | 0.563 | 10 | -5.843 | 1 | 0.0454 | 0.3540 | 0.00e+00 | 0.136 | 0.374 | 145.028 | 231267 | 4 |
Taking spotifyDF data frame, we use split function to split the data based on column value(mode),then there are three calls to purrr functions in the below code . The first map(~ lm()) call creates a list of “lm” objects; the second map(summary) call creates a list of “summary.lm” objects; the third map_dbl() creates a vector of double-precision values.
We can clearly the r-square values based for “MODE” based on danceability & Energy factor for the song.
spotifyDF %>%
split(.$mode) %>%
map(~ lm(danceability ~ energy, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
## 0 1
## 0.04018521 0.09347519
Taking happinessDF data frame , and filtering out the rows based on !is.na(Region) i.e. any row which has na for column Region should be left out of the data frame, and then using the select method to subset the data frame and the using the split function of base package to split my new subset of original dataframe based on Region. Then using three calls to purrr functions in the below code , The first map(~ lm()) call creates a list of “lm” objects; the second map(summary) call creates a list of “summary.lm” objects; the third map_dbl() creates a vector of double-precision values.
We can clearly see the R2 (r-square) for various Regions based on Family and Economy(GDP per Capita).
regionRSquare <- happinessDF %>%
filter(!is.na(Region)) %>%
select(Region,Family,`Economy (GDP per Capita)`) %>%
split(.$Region) %>%
map(~ lm(Family ~ `Economy (GDP per Capita)`, data=.)) %>%
map(summary) %>%
map_dbl("r.squared")
kable(regionRSquare) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
x | |
---|---|
Australia and New Zealand | 0.8080644 |
Central and Eastern Europe | 0.0532958 |
Eastern Asia | 0.1007601 |
Latin America and Caribbean | 0.1450172 |
Middle East and Northern Africa | 0.3041670 |
North America | 0.9323728 |
Southeastern Asia | 0.2813623 |
Southern Asia | 0.2457082 |
Sub-Saharan Africa | 0.1071135 |
Western Europe | 0.0007291 |
Using the ggplot2 library from tidyverse package to draw scatter plt for various Regions using Family vs Economy .
Then extending the same and plotting the linear model for each subgroup(i.e. Region) , setting the method=lm inside the geom_smooth function.
r ggplot(happinessDF,aes(x=Family,y=`Economy (GDP per Capita)`,col=Region))+ geom_point()
r ggplot(happinessDF,aes(x=Family,y=`Economy (GDP per Capita)`,col=Region))+ geom_point()+ geom_smooth(method="lm",se=FALSE) + geom_smooth(aes(group=1),method="lm",se=FALSE,linetype=2)