happiness<-read.csv("https://raw.githubusercontent.com/hrensimin05/Cuny_DataScience/master/2019.csv")
#there are 156 observations and 9 variables
happy<-data.frame(happiness)
head(happy)
## Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1 1 Finland 7.769 1.340 1.587
## 2 2 Denmark 7.600 1.383 1.573
## 3 3 Norway 7.554 1.488 1.582
## 4 4 Iceland 7.494 1.380 1.624
## 5 5 Netherlands 7.488 1.396 1.522
## 6 6 Switzerland 7.480 1.452 1.526
## Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1 0.986 0.596 0.153
## 2 0.996 0.592 0.252
## 3 1.028 0.603 0.271
## 4 1.026 0.591 0.354
## 5 0.999 0.557 0.322
## 6 1.052 0.572 0.263
## Perceptions.of.corruption
## 1 0.393
## 2 0.410
## 3 0.341
## 4 0.118
## 5 0.298
## 6 0.343
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. Does generosity correlate with the happiness of the country?
What are the cases, and how many are there?
Each case represents a country from around a world. There are 156 observations in the given data set from 2019.
Data is collected by the Sustainable Development Solutions Network(SDSN) in 2019 as part of the The World Happiness Report. The happiness scores and rankings use data from the Gallup World Poll.
What type of study is this (observational/experiment)?
It is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link. Data is collected by SDSN and is available online here: https://www.kaggle.com/unsdsn/world-happiness. For this project, data was dowloaded as a csv file. https://www.kaggle.com/unsdsn/world-happiness
“World Happiness Report Happiness scored according to economic production, social support, etc.” Sustainable Development Solutions Network, https://www.kaggle.com/unsdsn/world-happiness.
What is the response variable? Is it quantitative or qualitative? The response variable is generosity and is quantitative.
You should have two independent variables, one quantitative and one qualitative.
The independent variables are score and country and are quantitative and qualitative.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(happy)
## Overall.rank Country.or.region Score GDP.per.capita
## Min. : 1.00 Length:156 Min. :2.853 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.545 1st Qu.:0.6028
## Median : 78.50 Mode :character Median :5.380 Median :0.9600
## Mean : 78.50 Mean :5.407 Mean :0.9051
## 3rd Qu.:117.25 3rd Qu.:6.184 3rd Qu.:1.2325
## Max. :156.00 Max. :7.769 Max. :1.6840
## Social.support Healthy.life.expectancy Freedom.to.make.life.choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.056 1st Qu.:0.5477 1st Qu.:0.3080
## Median :1.272 Median :0.7890 Median :0.4170
## Mean :1.209 Mean :0.7252 Mean :0.3926
## 3rd Qu.:1.452 3rd Qu.:0.8818 3rd Qu.:0.5072
## Max. :1.624 Max. :1.1410 Max. :0.6310
## Generosity Perceptions.of.corruption
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1087 1st Qu.:0.0470
## Median :0.1775 Median :0.0855
## Mean :0.1848 Mean :0.1106
## 3rd Qu.:0.2482 3rd Qu.:0.1412
## Max. :0.5660 Max. :0.4530
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts --------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ggplot(happy, aes(x=Score)) + geom_histogram()+ stat_bin(bins = 25)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(happy, aes(x=Generosity)) + geom_histogram()+ stat_bin(bins = 25)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.