DATA 607 Project 2

Background

The purpose of this assignment is to tidy and transform data. The dataset of interest describes Happiness v GDP for the United States and Finland and our task is to:

Create a .csv file in a “wide” structure.
Read this .csv and use tidyr and dplyr to tidy and transform.
Analyze Happiness and GDP between the nations.
Provide the .rmd file, rpubs link, and descriptions of steps taken.

(1) Create a .csv

The original data set was downloaded from Kaggle and is available here. Citation at bottom of page.

The Happiness Report uses data from the Gallup World Poll and is based on answers to the main life evaluation questions asked in the poll. Rankings run from 0 (worst) to 10 (best), with each individual responding based on the state of their current life v. their ideal life within the nation they call home.

A wide format .csv was created per assignment description and uploaded to github.

Our dataset was pulled from the larger Kaggle dataset and focuses exclusively on the United States (our home nation) v Finland (the happiest nation in 2019). Our analysis will compare happiness and GDP data between these nations.

With the .csv available on github, we shift to reading from this file prior to tidying and transforming the data therein.

(2) Read and tidy

We read the .csv (in its raw form) from github and store corresponding data in a variable named “data”.

#Get URL, read .csv (in raw form) from github, and put into tabular form
url <- getURL("https://raw.githubusercontent.com/Magnus-PS/CUNY-SPS-DATA-607/Project-2/happy_data.csv")
data <- read.csv(text = url)
data <- as_tibble(data)

#Show what we're working with:
data

## # A tibble: 4 x 7
##   Country  Year Happiness.Score GDP.per.Capita Social.Support Healthy.life.ex~
##   <chr>   <int>           <dbl>          <dbl>          <dbl>            <dbl>
## 1 Finland  2015            7.41           1.29           1.32            0.889
## 2 United~  2015            7.12           1.39           1.25            0.862
## 3 Finland  2019            7.77           1.34           1.59            0.986
## 4 United~  2019            6.89           1.43           1.46            0.874
## # ... with 1 more variable: Freedom <dbl>

Once we’ve read from .csv and stored our data, we shift to tidying and transforming the data therein.

To start, we tidy:

#Tidy / shape our data (using tidyr)

##Provide fitting column names for the 1st few metrics so that we can read them on the graph later.
names(data)[names(data) == "Happiness.Score"] <- "Happiness"
names(data)[names(data) == "GDP.per.Capita"] <- "Wealth"
names(data)[names(data) == "Social.Support"] <- "Fellowship"
names(data)[names(data) == "Healthy.life.expectancy"] <- "Health"

##Make observations from variables using gather()
##This is also when we convert from a 'wide' to 'long' format
data_long <- gather(data, "Metric", "Score", 3:7, factor_key=TRUE)

##Sort 'Country' column alphabetically
data_long <- data_long[order(data_long$Country),]

##Make variables from observations using spread()
data_long <- spread(data_long, "Year", "Score")
data_long

## # A tibble: 10 x 4
##    Country       Metric     `2015` `2019`
##    <chr>         <fct>       <dbl>  <dbl>
##  1 Finland       Happiness   7.41   7.77 
##  2 Finland       Wealth      1.29   1.34 
##  3 Finland       Fellowship  1.32   1.59 
##  4 Finland       Health      0.889  0.986
##  5 Finland       Freedom     0.642  0.596
##  6 United States Happiness   7.12   6.89 
##  7 United States Wealth      1.39   1.43 
##  8 United States Fellowship  1.25   1.46 
##  9 United States Health      0.862  0.874
## 10 United States Freedom     0.546  0.454

#We plot all metrics for 2015:
qplot(x=Metric, y=`2015`, data=data_long, col=Country, main="2015 Happiness v GDP Metrics", xlab="Metric", ylab="Value")

#We plot all metrics for 2019:
qplot(x=Metric, y=`2019`, data=data_long, col=Country, main="2019 Happiness v GDP Metrics", xlab="Metric", ylab="Value")

What started as a less clear 4x7 ‘wide’ table (“data”) is now a tidy 10x4 ‘long’ table (“data_long”) with plots for 2015 and 2019 showing the changes in metrics for each nation.

What we can see, when we flip between plots is that Finland edges the US in 4/5 metrics. More specifically, from 2015 to 2019, Finland put tremendous ground between itself and the US for Happiness, the US kept the lead in terms of Wealth, Freedom metrics remained close, and for Fellowship and Health Finland gained a little bit of ground on the US.

(3) Transform and analyze

It’s not terribly clear from the 2015 and 2019 graphs above what the tie is (at least for these 2 nations), between happiness and wealth. Thus, we’ll transform our dataset and make use of the following equation to better analyze our data:

(Happiness (2019) - Happiness (2015)) / (GDP (2019) - GDP (2015))

We can drop all other variables and compare / plot these values to see what affect, if any income has on happiness in 2019’s “Happiest Country in the World” (Finland) v. our home nation (the US).

#Transform the data: dplyr

##Add a column showing the change from 2015 to 2019
data_long$Chg <- data_long$`2019` - data_long$`2015`

##Extract data for Happiness and Wealth
hw <- filter(data_long, Metric == "Happiness" | Metric == "Wealth")

hw <- select(hw, Country, Metric, Chg)
hw

## # A tibble: 4 x 3
##   Country       Metric        Chg
##   <chr>         <fct>       <dbl>
## 1 Finland       Happiness  0.363 
## 2 Finland       Wealth     0.0498
## 3 United States Happiness -0.227 
## 4 United States Wealth     0.0385

#Calculate the change in Happiness / GDP for Finland and the US
fin_h_chg <- as.numeric(hw[1,3]) * 100
fin_w_chg <- as.numeric(hw[2,3]) * 100
fin_val <- round(fin_h_chg / fin_w_chg, 3)

us_h_chg <- as.numeric(hw[3,3]) * 100
us_w_chg <- as.numeric(hw[4,3]) * 100
us_val <- round(us_h_chg / us_w_chg, 3)


##Plot the change in Happiness / change in Wealth for each nation to see if we might draw any interesting insight.
barplot(c(fin_val, us_val), beside=TRUE, main = "Happiness v Wealth for the Finland and the US", xlab = "Country", ylab = "Happiness/Wealth", ylim = c(-10, 10), names.arg = c("Finland", "United States"), col=c("#6699FF", "#6699FF"), border="white")

text(0.75, 6.5, as.character(fin_val))
text(1.90, -5, as.character(us_val))

(4) Conclude

From this graphic, it’s clear that Finland had a significantly higher change in Happiness-to-Wealth ratio than the US. On top of its increase in wealth, Finland experienced a dramatic increase in happiness and it shows.

What’s interesting is the contrast between the US and Finland. Finland’s change in wealth (our divisor in the equation noted before) was more than the US’s yet their happiness was still far higher. Thus there are other factors at play and we can draw on the fact that wealth may be tied to happiness but at a certain point more wealth does not necessarily = more happiness.

A deep dive into the nature of “Happy Nations” could turnover some very interesting insights into what are the Happiest societies in the world, why they are that way, and whether their happiness could be exported elsewhere.

Data citation Sustainable Development Solutions Network. (2020). World Happiness Report [Data file]. Retrieved from https://www.kaggle.com/unsdsn/world-happiness

DATA 607 Project 2 - Dataset 1

Magnus Skonberg

2020-10-02

Background

(1) Create a .csv

(2) Read and tidy

(3) Transform and analyze

(4) Conclude