Project Proposal

Data Preparation

happiness<-read.csv("https://raw.githubusercontent.com/hrensimin05/Cuny_DataScience/master/2019.csv")

#there are 156 observations and 9 variables
happy<-data.frame(happiness)
head(happy)

##   Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1            1           Finland 7.769          1.340          1.587
## 2            2           Denmark 7.600          1.383          1.573
## 3            3            Norway 7.554          1.488          1.582
## 4            4           Iceland 7.494          1.380          1.624
## 5            5       Netherlands 7.488          1.396          1.522
## 6            6       Switzerland 7.480          1.452          1.526
##   Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1                   0.986                        0.596      0.153
## 2                   0.996                        0.592      0.252
## 3                   1.028                        0.603      0.271
## 4                   1.026                        0.591      0.354
## 5                   0.999                        0.557      0.322
## 6                   1.052                        0.572      0.263
##   Perceptions.of.corruption
## 1                     0.393
## 2                     0.410
## 3                     0.341
## 4                     0.118
## 5                     0.298
## 6                     0.343

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. Does generosity correlate with the happiness of the country?

Cases

What are the cases, and how many are there?

Each case represents a country from around a world. There are 156 observations in the given data set from 2019.

Data collection

Data is collected by the Sustainable Development Solutions Network(SDSN) in 2019 as part of the The World Happiness Report. The happiness scores and rankings use data from the Gallup World Poll.

Type of study

What type of study is this (observational/experiment)?

It is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link. Data is collected by SDSN and is available online here: https://www.kaggle.com/unsdsn/world-happiness. For this project, data was dowloaded as a csv file. https://www.kaggle.com/unsdsn/world-happiness

“World Happiness Report Happiness scored according to economic production, social support, etc.” Sustainable Development Solutions Network, https://www.kaggle.com/unsdsn/world-happiness.

Dependent Variable

What is the response variable? Is it quantitative or qualitative? The response variable is generosity and is quantitative.

Independent Variable

You should have two independent variables, one quantitative and one qualitative.

The independent variables are score and country and are quantitative and qualitative.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(happy)

##   Overall.rank    Country.or.region      Score       GDP.per.capita  
##  Min.   :  1.00   Length:156         Min.   :2.853   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.545   1st Qu.:0.6028  
##  Median : 78.50   Mode  :character   Median :5.380   Median :0.9600  
##  Mean   : 78.50                      Mean   :5.407   Mean   :0.9051  
##  3rd Qu.:117.25                      3rd Qu.:6.184   3rd Qu.:1.2325  
##  Max.   :156.00                      Max.   :7.769   Max.   :1.6840  
##  Social.support  Healthy.life.expectancy Freedom.to.make.life.choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.056   1st Qu.:0.5477          1st Qu.:0.3080              
##  Median :1.272   Median :0.7890          Median :0.4170              
##  Mean   :1.209   Mean   :0.7252          Mean   :0.3926              
##  3rd Qu.:1.452   3rd Qu.:0.8818          3rd Qu.:0.5072              
##  Max.   :1.624   Max.   :1.1410          Max.   :0.6310              
##    Generosity     Perceptions.of.corruption
##  Min.   :0.0000   Min.   :0.0000           
##  1st Qu.:0.1087   1st Qu.:0.0470           
##  Median :0.1775   Median :0.0855           
##  Mean   :0.1848   Mean   :0.1106           
##  3rd Qu.:0.2482   3rd Qu.:0.1412           
##  Max.   :0.5660   Max.   :0.4530

library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts --------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

ggplot(happy, aes(x=Score)) + geom_histogram()+ stat_bin(bins = 25)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(happy, aes(x=Generosity)) + geom_histogram()+ stat_bin(bins = 25)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.