DATA607 - Final Project - World Happiness Report

Abstract

The World Happiness Report measures global happiness through surveys collected by the Gallup World Poll. Answers are collected based on a Cantril ladder life question. Participants are asked to think of lifestyles on a ladder, with their ideal and best possible life being at a 10, or the top of the ladder, and their worst possible life at a 0, or the very bottom of the ladder. They are then asked to rate their own lives on this scale along with seven main observed factors that may contribute to their responses. These seven factors are: economic production, social support, life expectancy, freedom, perceptions of corruption, generosity, and positive and negative affect. This dataset features data collected from 2013-2016. For the purpose of this project, I decided to focus on the most recent data collected, 2018.

Happiness records are significant in recognizing the emotional and mental wellbeing of groups of people, and can, therefore, be used to make informed decisions on a governmental and organizational level to help improve quality of life for the general population. These reports can also be used to assess international progression. With this dataset, we can look at world happiness and use a variety of factors, such as social support and finances, to evaluate their potential impact on happiness. Kaggle Link for Dataset: https://www.kaggle.com/datasets/unsdsn/world-happiness/

Data Collection Method

This data was collected in the form of surveys through a poll, mkaing it an observational study.

Variables

Dependent Variable: Happiness score (Cantril Ladder Life question score)

Independent Variable: seven explanatory variables - Per Capita Income (Log GDP per capita) - Social support - Healthy Life expectancy (Healthy life expectancy at birth) - Freedom of Choice (Freedom to make life choices) - Generosity - Perceived corruption (Perceptions of corruption) - Positive and Negative affect

Research Question

What factors are significant predictors of people’s happiness?

Hypotheses

Null Hypothesis: None of the seven explanatory variables have a significant impact on people’s happiness score as measured by the Cantril ladder life question.

This null hypothesis assumes that there is no relationship or predictive power between the independent variables (the seven explanatory variables) and the dependent variable (happiness score).

Alternative Hypothesis: At least one of the seven explanatory variables has a significant impact on people’s happiness score as measured by the Cantril ladder life question.

This alternative hypothesis suggests that there is a relationship between at least one of the explanatory variables and the happiness score.

Data Exploration

library(readr)
library(ggplot2) 
url1 <- "https://raw.githubusercontent.com/renidak01/DATA607_FINALPROJECT/main/world-happiness-report.csv"
df1 <- read.csv(url1)
cat("Variables in df1:\n") #df1 variables

## Variables in df1:

print(names(df1))

##  [1] "Country.name"                     "year"                            
##  [3] "Life.Ladder"                      "Log.GDP.per.capita"              
##  [5] "Social.support"                   "Healthy.life.expectancy.at.birth"
##  [7] "Freedom.to.make.life.choices"     "Generosity"                      
##  [9] "Perceptions.of.corruption"        "Positive.affect"                 
## [11] "Negative.affect"

I began by reading the csv file for the data collected in the world happiness report. I wanted to take a look at the variable names so that I could start changing some of them, and so that I could see what variables I was working with and what needed to be kept or changed.

nrow(df1)

## [1] 1949

Here we see that there are 1949 observations in this dataset. We will need to filter out all of the years except for the year which we will be working with.

Data Wrangling/Cleaning

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

df1 <- df1 %>%
  rename(Country = Country.name,
         Year = year,
         `Healthy.Life.Expectancy` = Healthy.life.expectancy.at.birth,
         `Freedom.of.Choice` = Freedom.to.make.life.choices,
         `Perceived.Corruption` = Perceptions.of.corruption,
         'Per.Capita.Income'=Log.GDP.per.capita,
         'Social.Support'=Social.support,
         'Negative.Affect'=Negative.affect,
         'Positive.Affect'=Positive.affect)

print(names(df1))

##  [1] "Country"                 "Year"                   
##  [3] "Life.Ladder"             "Per.Capita.Income"      
##  [5] "Social.Support"          "Healthy.Life.Expectancy"
##  [7] "Freedom.of.Choice"       "Generosity"             
##  [9] "Perceived.Corruption"    "Positive.Affect"        
## [11] "Negative.Affect"

I started by renaming the variables to make them easier to work with. I did this using dplyr’s pipe operator, for the purpose of efficiency and to keep the data more concise.

Global Summary statistics from the World Happiness Report:

print(summary(df1))

##    Country               Year       Life.Ladder    Per.Capita.Income
##  Length:1949        Min.   :2005   Min.   :2.375   Min.   : 6.635   
##  Class :character   1st Qu.:2010   1st Qu.:4.640   1st Qu.: 8.464   
##  Mode  :character   Median :2013   Median :5.386   Median : 9.460   
##                     Mean   :2013   Mean   :5.467   Mean   : 9.368   
##                     3rd Qu.:2017   3rd Qu.:6.283   3rd Qu.:10.353   
##                     Max.   :2020   Max.   :8.019   Max.   :11.648   
##                                                    NA's   :36       
##  Social.Support   Healthy.Life.Expectancy Freedom.of.Choice   Generosity     
##  Min.   :0.2900   Min.   :32.30           Min.   :0.2580    Min.   :-0.3350  
##  1st Qu.:0.7498   1st Qu.:58.69           1st Qu.:0.6470    1st Qu.:-0.1130  
##  Median :0.8355   Median :65.20           Median :0.7630    Median :-0.0255  
##  Mean   :0.8126   Mean   :63.36           Mean   :0.7426    Mean   : 0.0001  
##  3rd Qu.:0.9050   3rd Qu.:68.59           3rd Qu.:0.8560    3rd Qu.: 0.0910  
##  Max.   :0.9870   Max.   :77.10           Max.   :0.9850    Max.   : 0.6980  
##  NA's   :13       NA's   :55              NA's   :32        NA's   :89       
##  Perceived.Corruption Positive.Affect  Negative.Affect 
##  Min.   :0.0350       Min.   :0.3220   Min.   :0.0830  
##  1st Qu.:0.6900       1st Qu.:0.6255   1st Qu.:0.2060  
##  Median :0.8020       Median :0.7220   Median :0.2580  
##  Mean   :0.7471       Mean   :0.7100   Mean   :0.2685  
##  3rd Qu.:0.8720       3rd Qu.:0.7990   3rd Qu.:0.3200  
##  Max.   :0.9830       Max.   :0.9440   Max.   :0.7050  
##  NA's   :110          NA's   :22       NA's   :16

Here I noticed that there is some missing data, so I decided to filter those out before analyzing the summary statistics.

Handling Missing Data

library(dplyr)
df1 <- df1 %>%
  filter(!is.na(Per.Capita.Income),
         !is.na(Social.Support),
         !is.na(Healthy.Life.Expectancy),
         !is.na(Freedom.of.Choice),
         !is.na(Generosity),
         !is.na(Perceived.Corruption),
         !is.na(Positive.Affect),
         !is.na(Negative.Affect )
         )

print(summary(df1))

##    Country               Year       Life.Ladder    Per.Capita.Income
##  Length:1708        Min.   :2005   Min.   :2.375   Min.   : 6.635   
##  Class :character   1st Qu.:2010   1st Qu.:4.595   1st Qu.: 8.394   
##  Mode  :character   Median :2013   Median :5.364   Median : 9.457   
##                     Mean   :2013   Mean   :5.447   Mean   : 9.322   
##                     3rd Qu.:2017   3rd Qu.:6.259   3rd Qu.:10.272   
##                     Max.   :2020   Max.   :7.971   Max.   :11.648   
##  Social.Support   Healthy.Life.Expectancy Freedom.of.Choice
##  Min.   :0.2900   Min.   :32.30           Min.   :0.2580   
##  1st Qu.:0.7410   1st Qu.:58.17           1st Qu.:0.6440   
##  Median :0.8350   Median :65.10           Median :0.7575   
##  Mean   :0.8103   Mean   :63.23           Mean   :0.7394   
##  3rd Qu.:0.9080   3rd Qu.:68.69           3rd Qu.:0.8520   
##  Max.   :0.9870   Max.   :77.10           Max.   :0.9850   
##    Generosity         Perceived.Corruption Positive.Affect  Negative.Affect 
##  Min.   :-0.3350000   Min.   :0.035        Min.   :0.3220   Min.   :0.0940  
##  1st Qu.:-0.1112500   1st Qu.:0.697        1st Qu.:0.6230   1st Qu.:0.2080  
##  Median :-0.0255000   Median :0.806        Median :0.7220   Median :0.2590  
##  Mean   :-0.0006376   Mean   :0.751        Mean   :0.7095   Mean   :0.2694  
##  3rd Qu.: 0.0890000   3rd Qu.:0.875        3rd Qu.:0.8013   3rd Qu.:0.3192  
##  Max.   : 0.6890000   Max.   :0.983        Max.   :0.9440   Max.   :0.7050

Summary statistics from the World Happiness Report (2018):

df_2018 <- df1 %>%
  filter(Year == 2018)

print(summary(df_2018))

##    Country               Year       Life.Ladder    Per.Capita.Income
##  Length:126         Min.   :2018   Min.   :2.694   Min.   : 6.635   
##  Class :character   1st Qu.:2018   1st Qu.:4.840   1st Qu.: 8.524   
##  Mode  :character   Median :2018   Median :5.492   Median : 9.563   
##                     Mean   :2018   Mean   :5.543   Mean   : 9.398   
##                     3rd Qu.:2018   3rd Qu.:6.247   3rd Qu.:10.361   
##                     Max.   :2018   Max.   :7.858   Max.   :11.645   
##  Social.Support   Healthy.Life.Expectancy Freedom.of.Choice   Generosity      
##  Min.   :0.4850   Min.   :48.20           Min.   :0.3740    Min.   :-0.33500  
##  1st Qu.:0.7382   1st Qu.:58.75           1st Qu.:0.7190    1st Qu.:-0.13625  
##  Median :0.8410   Median :66.25           Median :0.7955    Median :-0.04650  
##  Mean   :0.8095   Mean   :64.52           Mean   :0.7847    Mean   :-0.02354  
##  3rd Qu.:0.9097   3rd Qu.:69.53           3rd Qu.:0.8758    3rd Qu.: 0.07200  
##  Max.   :0.9660   Max.   :76.80           Max.   :0.9700    Max.   : 0.51200  
##  Perceived.Corruption Positive.Affect  Negative.Affect 
##  Min.   :0.0970       Min.   :0.4240   Min.   :0.1070  
##  1st Qu.:0.6910       1st Qu.:0.6422   1st Qu.:0.2180  
##  Median :0.7955       Median :0.7325   Median :0.2825  
##  Mean   :0.7327       Mean   :0.7120   Mean   :0.2934  
##  3rd Qu.:0.8542       3rd Qu.:0.7930   3rd Qu.:0.3585  
##  Max.   :0.9520       Max.   :0.8840   Max.   :0.5440

I have also only used the data from the year 2018, and have filtered out the rest. Here we can see some summary statistics. This is not too useful to us yet, as these are global statistics, meaning that they are not specific to a given country. This can, however, give us an idea about the emotional and psychological state of the world in general. For example, we see that the global average life expectancy is about 64.52 years old, with the minimum average life expectancy being 48.20 years old, and the maximum average life expectancy being 76.80. Because these are global statistics, it is unclear which country these extremes belong to.

World Happiness Report (2018) Dataframe:

head(df_2018[, !names(df_2018) %in% "Year"])

##       Country Life.Ladder Per.Capita.Income Social.Support
## 1 Afghanistan       2.694             7.692          0.508
## 2     Albania       5.004             9.518          0.684
## 3     Algeria       5.043             9.348          0.799
## 4   Argentina       5.793            10.032          0.900
## 5     Armenia       5.062             9.451          0.814
## 6   Australia       7.177            10.811          0.940
##   Healthy.Life.Expectancy Freedom.of.Choice Generosity Perceived.Corruption
## 1                    52.6             0.374     -0.094                0.928
## 2                    68.7             0.824      0.009                0.899
## 3                    65.9             0.583     -0.146                0.759
## 4                    68.8             0.846     -0.211                0.855
## 5                    66.9             0.808     -0.163                0.677
## 6                    73.6             0.916      0.146                0.405
##   Positive.Affect Negative.Affect
## 1           0.424           0.405
## 2           0.713           0.319
## 3           0.591           0.293
## 4           0.820           0.321
## 5           0.581           0.455
## 6           0.759           0.187

I decided to display the first few rows from the dataframe of this dataset as well. I removed the year column to avoid redundancy and save space, as we have already established that we are only working with 2018 data.

Data Visualization

Average Life Ladder Score by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Life_Ladder = mean(Life.Ladder, na.rm = TRUE)) %>%
  arrange(desc(Average_Life_Ladder))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Life_Ladder), y = Average_Life_Ladder)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Life Ladder Score") +
  ggtitle("Average Life Ladder Score by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_ll_score <- average_scores %>% arrange(Average_Life_Ladder) %>% head(1)

highest_ll_score <- average_scores %>% arrange(desc(Average_Life_Ladder)) %>% head(1)

print(lowest_ll_score)

## # A tibble: 1 × 2
##   Country     Average_Life_Ladder
##   <chr>                     <dbl>
## 1 Afghanistan                2.69

print(highest_ll_score)

## # A tibble: 1 × 2
##   Country Average_Life_Ladder
##   <chr>                 <dbl>
## 1 Finland                7.86

Average Per Capita Income by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Log.GDP.per.capita = mean(Per.Capita.Income, na.rm = TRUE)) %>%
  arrange(desc(Average_Log.GDP.per.capita))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Log.GDP.per.capita), y = Average_Log.GDP.per.capita)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Per Capita Income") +
  ggtitle("Average Per Capita Income by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_gdp <- average_scores %>% arrange(Average_Log.GDP.per.capita) %>% head(1)

highest_gdp <- average_scores %>% arrange(desc(Average_Log.GDP.per.capita)) %>% head(1)

print(lowest_gdp)

## # A tibble: 1 × 2
##   Country Average_Log.GDP.per.capita
##   <chr>                        <dbl>
## 1 Burundi                       6.64

print(highest_gdp)

## # A tibble: 1 × 2
##   Country    Average_Log.GDP.per.capita
##   <chr>                           <dbl>
## 1 Luxembourg                       11.6

Average Social Support by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Social.support = mean(Social.Support, na.rm = TRUE)) %>%
  arrange(desc(Average_Social.support))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Social.support), y = Average_Social.support)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Social Support Score") +
  ggtitle("Average Social Support Score by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_ss <- average_scores %>% arrange(Average_Social.support) %>% head(1)

highest_ss <- average_scores %>% arrange(desc(Average_Social.support)) %>% head(1)

print(lowest_ss)

## # A tibble: 1 × 2
##   Country Average_Social.support
##   <chr>                    <dbl>
## 1 Burundi                  0.485

print(highest_ss)

## # A tibble: 1 × 2
##   Country Average_Social.support
##   <chr>                    <dbl>
## 1 Norway                   0.966

Average Healthy Life Expectancy by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Healthy.Life.Expectancy = mean(Healthy.Life.Expectancy, na.rm = TRUE)) %>%
  arrange(desc(Average_Healthy.Life.Expectancy))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Healthy.Life.Expectancy), y = Average_Healthy.Life.Expectancy)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Healthy Life Expectancy") +
  ggtitle("Average Healthy Life Expectancy by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_le <- average_scores %>% arrange(Average_Healthy.Life.Expectancy) %>% head(1)

highest_le <- average_scores %>% arrange(desc(Average_Healthy.Life.Expectancy)) %>% head(1)

print(lowest_le)

## # A tibble: 1 × 2
##   Country Average_Healthy.Life.Expectancy
##   <chr>                             <dbl>
## 1 Chad                               48.2

print(highest_le)

## # A tibble: 1 × 2
##   Country   Average_Healthy.Life.Expectancy
##   <chr>                               <dbl>
## 1 Singapore                            76.8

Average Freedom of Choice Score by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Freedom.of.Choice  = mean(Freedom.of.Choice, na.rm = TRUE)) %>%
  arrange(desc(Average_Freedom.of.Choice))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Freedom.of.Choice), y = Average_Freedom.of.Choice)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Freedom of Choice Score") +
  ggtitle("Average Freedom of Choice Score by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_foc <- average_scores %>% arrange(Average_Freedom.of.Choice) %>% head(1)

highest_foc <- average_scores %>% arrange(desc(Average_Freedom.of.Choice)) %>% head(1)

print(lowest_foc)

## # A tibble: 1 × 2
##   Country     Average_Freedom.of.Choice
##   <chr>                           <dbl>
## 1 Afghanistan                     0.374

print(highest_foc)

## # A tibble: 1 × 2
##   Country    Average_Freedom.of.Choice
##   <chr>                          <dbl>
## 1 Uzbekistan                      0.97

Average Generosity by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Generosity  = mean( Generosity, na.rm = TRUE)) %>%
  arrange(desc(Average_Generosity))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Generosity), y = Average_Generosity)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Generosity") +
  ggtitle("Average Generosity by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_gen <- average_scores %>% arrange(Average_Generosity) %>% head(1)

highest_gen <- average_scores %>% arrange(desc(Average_Generosity)) %>% head(1)

print(lowest_gen)

## # A tibble: 1 × 2
##   Country Average_Generosity
##   <chr>                <dbl>
## 1 Greece              -0.335

print(highest_gen)

## # A tibble: 1 × 2
##   Country   Average_Generosity
##   <chr>                  <dbl>
## 1 Indonesia              0.512

Average Perceived Corruption by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Perceived.Corruption  = mean(Perceived.Corruption, na.rm = TRUE)) %>%
  arrange(desc(Average_Perceived.Corruption))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Perceived.Corruption), y = Average_Perceived.Corruption)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Perceived Corruption") +
  ggtitle("Average Perceived Corruption by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_percor <- average_scores %>% arrange(Average_Perceived.Corruption) %>% head(1)

highest_percor <- average_scores %>% arrange(desc(Average_Perceived.Corruption)) %>% head(1)

print(lowest_percor)

## # A tibble: 1 × 2
##   Country   Average_Perceived.Corruption
##   <chr>                            <dbl>
## 1 Singapore                        0.097

print(highest_percor)

## # A tibble: 1 × 2
##   Country  Average_Perceived.Corruption
##   <chr>                           <dbl>
## 1 Bulgaria                        0.952

Average Positive Affect by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Positive.Affect  = mean(Positive.Affect, na.rm = TRUE)) %>%
  arrange(desc(Average_Positive.Affect))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Positive.Affect), y = Average_Positive.Affect)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Positive Affect") +
  ggtitle("Average Positive Affect by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_posaff <- average_scores %>% arrange(Average_Positive.Affect) %>% head(1)

highest_posaff <- average_scores %>% arrange(desc(Average_Positive.Affect)) %>% head(1)

print(lowest_posaff)

## # A tibble: 1 × 2
##   Country     Average_Positive.Affect
##   <chr>                         <dbl>
## 1 Afghanistan                   0.424

print(highest_posaff)

## # A tibble: 1 × 2
##   Country Average_Positive.Affect
##   <chr>                     <dbl>
## 1 Panama                    0.884

Average Negative Affect by Country

average_scores <- df_2018 %>%
  group_by(Country) %>%
  summarize(Average_Negative.Affect  = mean(Negative.Affect, na.rm = TRUE)) %>%
  arrange(desc(Average_Negative.Affect))  # Arrange in descending order of scores

ggplot(average_scores, aes(x = reorder(Country, Average_Negative.Affect), y = Average_Negative.Affect)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(x = "Country Name", y = "Average Negative Affect") +
  ggtitle("Average Negative Affect by Country") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

lowest_negaff <- average_scores %>% arrange(Average_Negative.Affect) %>% head(1)

highest_negaff <- average_scores %>% arrange(desc(Average_Negative.Affect)) %>% head(1)

print(lowest_negaff)

## # A tibble: 1 × 2
##   Country   Average_Negative.Affect
##   <chr>                       <dbl>
## 1 Singapore                   0.107

print(highest_negaff)

## # A tibble: 1 × 2
##   Country Average_Negative.Affect
##   <chr>                     <dbl>
## 1 Chad                      0.544

Analysis

Then I will do some calculations to find which variable has the strongest impact on happiness. Out of curiosity, I would also like to know which one has the least

model <- lm(Life.Ladder ~ Per.Capita.Income + Social.Support + Healthy.Life.Expectancy +
               Freedom.of.Choice + Generosity + Perceived.Corruption + Positive.Affect + Negative.Affect,
            data = df_2018)

summary(model)

## 
## Call:
## lm(formula = Life.Ladder ~ Per.Capita.Income + Social.Support + 
##     Healthy.Life.Expectancy + Freedom.of.Choice + Generosity + 
##     Perceived.Corruption + Positive.Affect + Negative.Affect, 
##     data = df_2018)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8769 -0.3034  0.0334  0.2829  1.5914 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -4.10526    0.88191  -4.655 8.61e-06 ***
## Per.Capita.Income        0.39879    0.10340   3.857 0.000188 ***
## Social.Support           2.84310    0.80558   3.529 0.000597 ***
## Healthy.Life.Expectancy  0.02788    0.01441   1.934 0.055506 .  
## Freedom.of.Choice        0.67916    0.59814   1.135 0.258509    
## Generosity               0.25813    0.35168   0.734 0.464414    
## Perceived.Corruption    -0.93360    0.31117  -3.000 0.003297 ** 
## Positive.Affect          1.50665    0.60656   2.484 0.014410 *  
## Negative.Affect          3.01529    0.74832   4.029 9.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5341 on 117 degrees of freedom
## Multiple R-squared:  0.7759, Adjusted R-squared:  0.7605 
## F-statistic: 50.62 on 8 and 117 DF,  p-value: < 2.2e-16

Overall, based on the coefficients’ significance levels and the model’s R-squared value, it seems that Per.Capita.Income, Social.Support, Healthy.Life.Expectancy, Perceived.Corruption, and Positive and Negative Affect are statistically significant predictors of the Life Ladder score in this model. I was surprised to see that healthy life expectancy did not have as significant of an impact as I would have thought.Generosity and Freedom.of.Choice on the other hand, do not appear to have a statistically significant impact. It looks like economic production however, was the most significant factor contribution to overall happiness.

Because of the multiple r-squared value, 0.7605, approximately 76.05% of the variance in Life Ladder scores is explained by the independent variables.

The positive and negative affect are indicative of emotional and mental state of the general population. I would argue that those with a lower positive affect, and especially those with a higher negative affect, are countries which require improved conditions for a higher quality of life, and mental health resources. This type of information can be used for advocacy.

Limitations

It would be interesting to compare this data with that of another year. Comparing data across time and then correlating to important events in history can help to also determine what impacts happiness and how. It would also be useful to have more recent data, to keep the data relevant. Another limitation could be from the data collection method. The sample which was surveyed has to be an accurate reflection of the population, and that may or may not have been the case for every country.

Conclusion

To conclude, it looks like economic production is the most significant contributing factor to happiness. There is a reason people say that money buys happiness! This type of research, along with the analysis performed, can be used to improve quality of life all over the world. In the future, I would use this data set to compare what makes people the unhappiest, and work from there.