Data Analysis Final Paper

Issue Description

The project will be investigating world happiness. There is a lot of aspects that contributes to happiness of a person like income, social support, health,… The UN has been publishing World Happiness Report every year since 2012. The World Happiness Report underscores a global desire for governments to prioritize happiness and well0being when shaping their policies.

Questions

What is the happiest country in the world? Does money make people happy?

Data Source

The data sources will be provided from the 2023 UN World Happiness Report

https://happiness-report.s3.amazonaws.com/2023/DataForTable2.1WHR2023.xls

Documentation

Provide a link to the documentation for the data or the documentation itself. Is there a data dictionary? The full report can be found at: https://worldhappiness.report/ed/2023/

Description of the Data

The data includes the life ladder of countries from 2005 to 2022 – the happiness rankings are based on individuals’ own assessments of their lives. Others variables that are collected are: Log GDP per capita, Social support, Healthy Life Expectancy at Birth, Freedom to make life choices, Generosity, Positive affect and negative affect.

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.2.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.3

library(corrplot)

## Warning: package 'corrplot' was built under R version 4.2.3

## corrplot 0.92 loaded

library(readr)
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

WHR2023 <- read_csv("C:/Users/toanp/Desktop/MSCS/Classes/CSC530-DataAnalysis/Data/WHR2023.csv")

## Rows: 2199 Columns: 11

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Country_name
## dbl (10): year, Life_ladder, Log_GDP_per_capita, Social_support, Healthy_lif...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

summary(WHR2023)

##  Country_name            year       Life_ladder    Log_GDP_per_capita
##  Length:2199        Min.   :2005   Min.   :1.281   Min.   : 5.527    
##  Class :character   1st Qu.:2010   1st Qu.:4.647   1st Qu.: 8.500    
##  Mode  :character   Median :2014   Median :5.432   Median : 9.499    
##                     Mean   :2014   Mean   :5.479   Mean   : 9.390    
##                     3rd Qu.:2018   3rd Qu.:6.309   3rd Qu.:10.373    
##                     Max.   :2022   Max.   :8.019   Max.   :11.664    
##                                                    NA's   :20        
##  Social_support   Healthy_life_expectancy_at_birth Freedom_to_make_life_choices
##  Min.   :0.2280   Min.   : 6.72                    Min.   :0.2580              
##  1st Qu.:0.7470   1st Qu.:59.12                    1st Qu.:0.6562              
##  Median :0.8360   Median :65.05                    Median :0.7700              
##  Mean   :0.8107   Mean   :63.29                    Mean   :0.7479              
##  3rd Qu.:0.9050   3rd Qu.:68.50                    3rd Qu.:0.8590              
##  Max.   :0.9870   Max.   :74.47                    Max.   :0.9850              
##  NA's   :13       NA's   :54                       NA's   :33                  
##    Generosity       Perceptions_of_corruption Positive_affect  Negative_affect 
##  Min.   :-0.33800   Min.   :0.0350            Min.   :0.1790   Min.   :0.0830  
##  1st Qu.:-0.11200   1st Qu.:0.6880            1st Qu.:0.5720   1st Qu.:0.2080  
##  Median :-0.02300   Median :0.8000            Median :0.6630   Median :0.2610  
##  Mean   : 0.00009   Mean   :0.7452            Mean   :0.6521   Mean   :0.2715  
##  3rd Qu.: 0.09200   3rd Qu.:0.8690            3rd Qu.:0.7380   3rd Qu.:0.3230  
##  Max.   : 0.70300   Max.   :0.9830            Max.   :0.8840   Max.   :0.7050  
##  NA's   :73         NA's   :116               NA's   :24       NA's   :16

str(WHR2023)

## spc_tbl_ [2,199 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Country_name                    : chr [1:2199] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ year                            : num [1:2199] 2008 2009 2010 2011 2012 ...
##  $ Life_ladder                     : num [1:2199] 3.72 4.4 4.76 3.83 3.78 ...
##  $ Log_GDP_per_capita              : num [1:2199] 7.35 7.51 7.61 7.58 7.66 ...
##  $ Social_support                  : num [1:2199] 0.451 0.552 0.539 0.521 0.521 0.484 0.526 0.529 0.559 0.491 ...
##  $ Healthy_life_expectancy_at_birth: num [1:2199] 50.5 50.8 51.1 51.4 51.7 ...
##  $ Freedom_to_make_life_choices    : num [1:2199] 0.718 0.679 0.6 0.496 0.531 0.578 0.509 0.389 0.523 0.427 ...
##  $ Generosity                      : num [1:2199] 0.168 0.191 0.121 0.164 0.238 0.063 0.106 0.082 0.044 -0.119 ...
##  $ Perceptions_of_corruption       : num [1:2199] 0.882 0.85 0.707 0.731 0.776 0.823 0.871 0.881 0.793 0.954 ...
##  $ Positive_affect                 : num [1:2199] 0.414 0.481 0.517 0.48 0.614 0.547 0.492 0.491 0.501 0.435 ...
##  $ Negative_affect                 : num [1:2199] 0.258 0.237 0.275 0.267 0.268 0.273 0.375 0.339 0.348 0.371 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country_name = col_character(),
##   ..   year = col_double(),
##   ..   Life_ladder = col_double(),
##   ..   Log_GDP_per_capita = col_double(),
##   ..   Social_support = col_double(),
##   ..   Healthy_life_expectancy_at_birth = col_double(),
##   ..   Freedom_to_make_life_choices = col_double(),
##   ..   Generosity = col_double(),
##   ..   Perceptions_of_corruption = col_double(),
##   ..   Positive_affect = col_double(),
##   ..   Negative_affect = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Cleaning and Preparation

Describe the steps you took to get from your original dataset to the final dataset you used for your analysis. Include the R code in chunks.

#Remove N/A Values
WHR2023 <- na.omit(WHR2023)

Final Results

Show how you approached the questions you posed at the beginning. Describe how much you were able to accomplish. There should be both graphical and numerical results produced by R code included in chunks. Explain what you did and what it means.

HAPPIEST COUNTRY OVER YEARS

happiest_countries <- WHR2023 %>%
  group_by(year) %>%
  slice(which.max(Life_ladder))

# Print the result
happiest_countries

In recent years, according to the WHR2023, Finland is the happiest country in the world. Following by Denmark and Canada.

DOES MONEY MEANS HAPPY?

ggplot(WHR2023, aes(x = Log_GDP_per_capita, y = Life_ladder)) +
  geom_point() +
  
  # Add labels and title
  labs(x = "Log GDP per Capita",
       y = "Life Ladder",
       title = "Scatter Plot between GDP and Life Ladder") +
  
  # Customize theme if needed
  theme_minimal()

According to the graph there is a very clear correlation between happiness and money (GDP per Capita). As the GDP increases the life ladder increases. So, is it safe to say money brings happiness? Let’s investigate it further with more information.

MORE ABOUT CORRELATIONS

selected_columns <- WHR2023[, !names(WHR2023) %in% c('year', 'Country_name')]

# Calculate correlations with 'Life Ladder'
correlations_with_life_ladder <- cor(selected_columns$Life_ladder, selected_columns)

single_row <- correlations_with_life_ladder[1, ]

ordered_vars <- names(single_row)[order(-as.numeric(single_row))]

# Create a bar plot for each variable with variable names displayed
barplot(as.numeric(single_row[ordered_vars]), names.arg = ordered_vars,
        main = "Bar Plot of Variables",
        ylab = "Values",
        col = "skyblue",
        las = 2,  # Rotates x-axis labels vertically for better visibility
        cex.names = 0.7,  # Adjusts the size of variable names
        args.legend = list(title = "Variables"))  # Adds a legend

Based on the correlation graph, money (GDP per capita) has the highest correlation with life ladder following by healthy life and social support; freedom and positive affect also play a big role to contribute to happiness, surprisingly generosity doesn’t have a big correlation with life ladder. The reason GPD per capita have a big correlation with happiness may be that it also contribute to the other variables that brings happiness like healthy life, social support or freedom of choices.

Link to presentation: https://youtu.be/zs7unP-3lf8