Data analysis of World Happiness Report

Alok Pratap Singh

Research Scholar, Department of Psychology, University of Allahabad. Publication Date: 24 June, 2021

Introduction

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness. The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.

Summary

The world happiness index surveyed by the United Nations Sustainable Development Solutions. The data contain information of 166 countries, duration of 15 years from 2005 to 2020. In this survey we have the data on different parameters i.e., Life ladder (represents the level of happiness), GDP of countries, Social support, Healthy life expectancy at birth, freedom to make life choices, and Generosity. To get the correlation between variables Pearson’s coefficient of correlation us used, also Multiple/ Stepwise regression is used for modelling and prediction.

Libraries required in the Project

library(readr) # reading dataset
library(dplyr, warn.conflicts = F) # for Data wrangling
library(ggplot2) # for data visualization
library(ggthemes) # for some nice themes and automatic colors
library(psych) # for summary, etc
library(olsrr)  # ols stepwise regression
library(ggcorrplot)# for correlation plot
library(rcompanion) # to plot histogram with density plot on real values
library(nortest)   # Statistical test for the Normality of the dataset

Loading data

data <- readr::read_csv("C:/Users/Asus/Documents/R Clints/Catherine Arceno/ProjectScience/World-happiness.csv") # loading dataset

Top 6 rows

head(data)

## # A tibble: 6 x 11
##   `Country name`  year `Life Ladder` `Log GDP per capita` `Social support`
##   <chr>          <dbl>         <dbl>                <dbl>            <dbl>
## 1 Afghanistan     2008          3.72                 7.37            0.451
## 2 Afghanistan     2009          4.40                 7.54            0.552
## 3 Afghanistan     2010          4.76                 7.65            0.539
## 4 Afghanistan     2011          3.83                 7.62            0.521
## 5 Afghanistan     2012          3.78                 7.70            0.521
## 6 Afghanistan     2013          3.57                 7.72            0.484
## # ... with 6 more variables: Healthy life expectancy at birth <dbl>,
## #   Freedom to make life choices <dbl>, Generosity <dbl>,
## #   Perceptions of corruption <dbl>, Positive affect <dbl>,
## #   Negative affect <dbl>

Name, dimension (rows and column), and type of variables

glimpse(data)

## Rows: 1,949
## Columns: 11
## $ `Country name`                     <chr> "Afghanistan", "Afghanistan", "Afgh~
## $ year                               <dbl> 2008, 2009, 2010, 2011, 2012, 2013,~
## $ `Life Ladder`                      <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, ~
## $ `Log GDP per capita`               <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, ~
## $ `Social support`                   <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, ~
## $ `Healthy life expectancy at birth` <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, ~
## $ `Freedom to make life choices`     <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, ~
## $ Generosity                         <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, ~
## $ `Perceptions of corruption`        <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, ~
## $ `Positive affect`                  <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, ~
## $ `Negative affect`                  <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, ~

Data is consist of 1949 observations/rows and 11 variables/columns.

unique(data$`Country name`) %>%
  length() # Number of countries in out dataset

## [1] 166

range(data$year) # time period of datsets

## [1] 2005 2020

we have data of 166 countries, duration between 2005 to 2020

Missing Cases in dataset

complete.cases(data) %>% table() # total number of complete cases

## .
## FALSE  TRUE 
##   241  1708

is.na(data) %>% table() # total number of missing values

## .
## FALSE  TRUE 
## 21066   373

We have total 1949 cases, in which 241 cases have missing values. Total number of missing values are 373.

How many missing values in each Variables ?

colSums(is.na(data)) %>% data.frame() # missing values (NA) in variables

##                                    .
## Country name                       0
## year                               0
## Life Ladder                        0
## Log GDP per capita                36
## Social support                    13
## Healthy life expectancy at birth  55
## Freedom to make life choices      32
## Generosity                        89
## Perceptions of corruption        110
## Positive affect                   22
## Negative affect                   16

Descriptive analysis

data %>% select(-c(`Country name`, year)) %>% 
    psych::describe(na.rm = T, type = 2, fast = F) %>% 
  select(-c(vars, n, se, trimmed, mad, min, max, range))

##                                   mean   sd median  skew kurtosis
## Life Ladder                       5.47 1.12   5.39  0.07    -0.69
## Log GDP per capita                9.37 1.15   9.46 -0.31    -0.86
## Social support                    0.81 0.12   0.84 -1.11     1.18
## Healthy life expectancy at birth 63.36 7.51  65.20 -0.74    -0.05
## Freedom to make life choices      0.74 0.14   0.76 -0.62    -0.13
## Generosity                        0.00 0.16  -0.03  0.81     0.84
## Perceptions of corruption         0.75 0.19   0.80 -1.50     1.84
## Positive affect                   0.71 0.11   0.72 -0.36    -0.58
## Negative affect                   0.27 0.09   0.26  0.74     0.87

Top 10 countries in GDP per capita (Taking the average of 15 Years starting from 2005 to 2020)

(data %>%
   select(`Country name`, `Log GDP per capita`) %>%
    na.omit() %>%  # Removing the country have NA data in any 15 year
    group_by(`Country name`) %>% 
    summarise(gdp= mean(`Log GDP per capita`)) %>% 
    arrange(-gdp) %>% .[1:10,] -> top10gdp)

## # A tibble: 10 x 2
##    `Country name`              gdp
##    <chr>                     <dbl>
##  1 Luxembourg                 11.6
##  2 Qatar                      11.5
##  3 Singapore                  11.3
##  4 Switzerland                11.1
##  5 Ireland                    11.1
##  6 United Arab Emirates       11.1
##  7 Norway                     11.0
##  8 Kuwait                     11.0
##  9 United States              11.0
## 10 Hong Kong S.A.R. of China  10.9

Bottom 10 countries in GDP per capita (Taking the average of 15 Years starting from 2005 to 2020)

(data %>% select(`Country name`, `Log GDP per capita`) %>%
    na.omit() %>%  # Removing the country have NA data in any 15 year
    group_by(`Country name`) %>% 
    summarise(gdp= mean(`Log GDP per capita`)) %>% 
    arrange(gdp) %>% .[1:10,] -> bottom10gdp)

## # A tibble: 10 x 2
##    `Country name`             gdp
##    <chr>                    <dbl>
##  1 Burundi                   6.72
##  2 Malawi                    6.88
##  3 Congo (Kinshasa)          6.88
##  4 Central African Republic  6.96
##  5 Niger                     6.99
##  6 Mozambique                7.01
##  7 Togo                      7.24
##  8 Liberia                   7.30
##  9 Sierra Leone              7.36
## 10 Madagascar                7.37

Top 10 countries in Happiness (taking mean average of 15 year)

(data %>% select(`Country name`, `Life Ladder`) %>%
    group_by(`Country name`) %>% 
    summarise(happiness= mean(`Life Ladder`)) %>% 
    arrange(-happiness) %>% .[1:10,] -> top10happiness)

## # A tibble: 10 x 2
##    `Country name` happiness
##    <chr>              <dbl>
##  1 Denmark             7.68
##  2 Finland             7.60
##  3 Switzerland         7.55
##  4 Norway              7.51
##  5 Netherlands         7.47
##  6 Iceland             7.45
##  7 Canada              7.38
##  8 Sweden              7.37
##  9 New Zealand         7.31
## 10 Australia           7.28

Bottop 10 countries in Happiness (taking mean average of 15 year)

(data %>% select(`Country name`, `Life Ladder`) %>% 
        group_by(`Country name`) %>% 
    summarise(happiness= mean(`Life Ladder`)) %>% 
    arrange(happiness) %>% .[1:10,] -> bottom10hapiness)

## # A tibble: 10 x 2
##    `Country name`           happiness
##    <chr>                        <dbl>
##  1 South Sudan                   3.40
##  2 Central African Republic      3.52
##  3 Burundi                       3.55
##  4 Togo                          3.56
##  5 Afghanistan                   3.59
##  6 Rwanda                        3.65
##  7 Tanzania                      3.70
##  8 Zimbabwe                      3.88
##  9 Yemen                         3.91
## 10 Comoros                       3.94

Comparing Ranks

Switzerland and Norway are common in the list of “top 10 countries in gdp and happiness”

inner_join(top10gdp, top10happiness, by=  "Country name")

## # A tibble: 2 x 3
##   `Country name`   gdp happiness
##   <chr>          <dbl>     <dbl>
## 1 Switzerland     11.1      7.55
## 2 Norway          11.0      7.51

Similarly on comparing the ranks of bottom 10 countries in happiness and GDP, 3 Countries remain in common list

inner_join(bottom10hapiness, bottom10gdp, by= "Country name")

## # A tibble: 3 x 3
##   `Country name`           happiness   gdp
##   <chr>                        <dbl> <dbl>
## 1 Central African Republic      3.52  6.96
## 2 Burundi                       3.55  6.72
## 3 Togo                          3.56  7.24

Data preparation for Hypothesis testing

data %>% na.omit %>%            # Removing all rows with Missing values in dataset
    group_by(`Country name`) %>% 
    summarise(happy= mean(`Life Ladder`),   # Taking mean of all variables Country wise; it is mean value of 15 years
              gdp= mean(`Log GDP per capita`),
              soci.s= mean(`Social support`),
              hlexp.b= mean(`Healthy life expectancy at birth`),
              freedom= mean(`Freedom to make life choices`),
              generosity= mean(Generosity),
              p.corruption= mean(`Perceptions of corruption`),
              p.affect= mean(`Positive affect`),
              n.affect= mean(`Negative affect`)) %>% 
    ungroup() %>%
    select(-`Country name`)-> dataset     # Dataset is ready for Regression Analysis

TOP 6 rows

dataset %>% head()

## # A tibble: 6 x 9
##   happy   gdp soci.s hlexp.b freedom generosity p.corruption p.affect n.affect
##   <dbl> <dbl>  <dbl>   <dbl>   <dbl>      <dbl>        <dbl>    <dbl>    <dbl>
## 1  3.59  7.65  0.508    52.3   0.518     0.0701        0.843    0.549    0.326
## 2  5.02  9.38  0.716    67.5   0.663    -0.0827        0.869    0.654    0.299
## 3  5.19  9.33  0.812    65.4   0.504    -0.132         0.706    0.594    0.256
## 4  4.42  8.99  0.738    53.6   0.456    -0.0883        0.867    0.614    0.351
## 5  6.31 10.0   0.904    67.9   0.768    -0.160         0.842    0.833    0.284
## 6  4.51  9.27  0.719    65.7   0.564    -0.200         0.846    0.550    0.434

Data Preperation (Checking outlierss, Normality, Removing outlierss, etc)

Abbreviated Names of Variables

Old name	New Name
Country name	Country name
year	year
Life Ladder	happy
Log GDP per Capita	gdp
Social Support	soci.s
Healthy life expectancy at birth	hlexp.b
Freedom to make life choices	freedom
Generosity	generosity
Perception of corruption	p.corruption
Positive affect	p.affact
Negative affect	n.affact

Boxplots

par(mfrow= c(3,3))
for(i in c(1:9)) {
    boxplot(dataset[,i], col= i+2, lwd= 1.3)
    title(main= names(dataset[i]))
}

perception of corruption have extreme values

Histogram with Normal Plot

par(mfrow=c(3,3))
for(i in c(1:9)) {
    rcompanion::plotNormalHistogram(dataset[,i],
                                    col=i+2,
                                    xlab= "",
                                    main= names(dataset[i]))
}

From Histograms all data seems Normality distributed, except for ‘perception of corruption, and generosity’

Data Cleaning: removing “Extreme values” form dataset

par(mfrow=c(1,2))
boxplot(dataset$p.corruption, main= "p.corruption", col= F)$out %>% 
  range()

## [1] 0.09783333 0.47650000

boxplot(dataset$generosity, main= "generosity", col= F)$out %>% 
  range()

## [1] 0.3814667 0.6090000

Total 15 values in p.corruption, and 1 value form p.corruption are outliers, Removing them

dim(dataset)

## [1] 155   9

dataset %>% filter(p.corruption>0.47650000, generosity<.38 )-> dataset

After removing outliers, checking again for extreame value and Normality

par(mfrow= c(3,3))
for(i in c(1:9)) {
    boxplot(dataset[,i], col= i+2)
    title(main= names(dataset[i]))
}

par(mfrow=c(3,3))
for(i in c(1:9)) {
    rcompanion::plotNormalHistogram(dataset[,i],
                                    col=i+2,
                                    xlab= "",
                                    main= names(dataset[i]))
}

QQPLOT (another graphical way to check normality)

Note: Don’t know Why “for loop” is NOT working for qqnorm, so have to write too much codes"

par(mfrow= c(3,3))
qqnorm(dataset$happy, col= "blue", pch= 19, main= "happy")
qqline(dataset$happy, col= "red", lwd=3)
qqnorm(dataset$gdp, col= "blue", pch= 19, main= "gdp")
qqline(dataset$gdp, col= "red", lwd=3)
qqnorm(dataset$soci.s, col= "blue", pch= 19, main= "soci.s")
qqline(dataset$soci.s, col= "red", lwd=3)
qqnorm(dataset$hlexp.b, col= "blue", pch= 19, main= "hlexp.b")
qqline(dataset$hlexp.b, col= "red", lwd=3)
qqnorm(dataset$freedom, col= "blue", pch= 19, main= "freedom")
qqline(dataset$freedom, col= "red", lwd=3)
qqnorm(dataset$generosity, col= "blue", pch= 19, main= "generosity")
qqline(dataset$generosity, col= "red", lwd= 3)
qqnorm(dataset$p.corruption, col= "blue", pch= 19, main= "p.corruption")
qqline(dataset$p.corruption, col= "red", lwd=3, main= "")
qqnorm(dataset$p.affect, col= "blue", pch= 19, main= "p.affect")
qqline(dataset$p.affect, col= "red", lwd=3)
qqnorm(dataset$n.affect, col= "blue", pch= 19, main= "n.affect")
qqline(dataset$n.affect, col= "red", lwd=3)

From the graph of qqnorm() it seems Social support (soci.s), and perception of corruption is not Normal… But we can ignore the issue of normality because majority of variables are following normal distribution in this dataset.

Descriptive Statistics

Pearson’s Correlation Coefficient Table

cor(dataset) %>% round(2)

##              happy   gdp soci.s hlexp.b freedom generosity p.corruption
## happy         1.00  0.80   0.75    0.79    0.50       0.04        -0.23
## gdp           0.80  1.00   0.73    0.84    0.29      -0.21        -0.11
## soci.s        0.75  0.73   1.00    0.64    0.41      -0.08        -0.08
## hlexp.b       0.79  0.84   0.64    1.00    0.31      -0.12        -0.10
## freedom       0.50  0.29   0.41    0.31    1.00       0.28        -0.28
## generosity    0.04 -0.21  -0.08   -0.12    0.28       1.00        -0.23
## p.corruption -0.23 -0.11  -0.08   -0.10   -0.28      -0.23         1.00
## p.affect      0.54  0.27   0.48    0.27    0.65       0.28        -0.22
## n.affect     -0.28 -0.19  -0.42   -0.15   -0.27       0.02         0.15
##              p.affect n.affect
## happy            0.54    -0.28
## gdp              0.27    -0.19
## soci.s           0.48    -0.42
## hlexp.b          0.27    -0.15
## freedom          0.65    -0.27
## generosity       0.28     0.02
## p.corruption    -0.22     0.15
## p.affect         1.00    -0.43
## n.affect        -0.43     1.00

Significance level for Correlation

data.frame(Association= cor.ci(dataset, plot= FALSE)$ci %>% round(2) %>% rownames(),
           p_value= cor.ci(dataset, plot = FALSE)$ci %>% .$p %>% 
             round(2))

##    Association p_value
## 1    happy-gdp    0.00
## 2  happy-soc.s    0.00
## 3  happy-hlxp.    0.00
## 4  happy-fredm    0.00
## 5  happy-gnrst    0.79
## 6  happy-p.crr    0.02
## 7  happy-p.ffc    0.00
## 8  happy-n.ffc    0.00
## 9    gdp-soc.s    0.00
## 10   gdp-hlxp.    0.00
## 11   gdp-fredm    0.00
## 12   gdp-gnrst    0.02
## 13   gdp-p.crr    0.21
## 14   gdp-p.ffc    0.00
## 15   gdp-n.ffc    0.03
## 16 soc.s-hlxp.    0.00
## 17 soc.s-fredm    0.00
## 18 soc.s-gnrst    0.35
## 19 soc.s-p.crr    0.48
## 20 soc.s-p.ffc    0.00
## 21 soc.s-n.ffc    0.00
## 22 hlxp.-fredm    0.00
## 23 hlxp.-gnrst    0.17
## 24 hlxp.-p.crr    0.22
## 25 hlxp.-p.ffc    0.00
## 26 hlxp.-n.ffc    0.04
## 27 fredm-gnrst    0.01
## 28 fredm-p.crr    0.00
## 29 fredm-p.ffc    0.00
## 30 fredm-n.ffc    0.01
## 31 gnrst-p.crr    0.03
## 32 gnrst-p.ffc    0.00
## 33 gnrst-n.ffc    0.64
## 34 p.crr-p.ffc    0.02
## 35 p.crr-n.ffc    0.15
## 36 p.ffc-n.ffc    0.00

describe(dataset) %>% select(mean, sd)

##               mean   sd
## happy         5.24 0.94
## gdp           9.13 1.11
## soci.s        0.79 0.11
## hlexp.b      61.72 7.42
## freedom       0.71 0.12
## generosity   -0.02 0.12
## p.corruption  0.79 0.10
## p.affect      0.70 0.10
## n.affect      0.28 0.07

Descriptive Table (Mean, SD, and Correlation values)

	Mean (SD)	1	2	3	4	5	6	7	8	9
Happiness	5.24(.94)	1
GDP	9.13(1.11)	.80	1
Social Supp.	.79(0.11)	.75	.73	1
Healthy Life	61.72(7.42)	.73	.73	0.64	1
Freedom	.71(.12)	.64	.29	.41	.31	1
Generosity	-.02(.12)	.04	-.21	-.08	-.12	.28	1
Corruption	.79(.10)	-.23	-.11	-.08	-.10	-.28	-.23	1
Positive Aff	.70(.10)	.54	.27	.48	.27	.65	.28	-.22	1
Negative Aff.	.28(.07)	-.28	-.19	-.42	-.15	-.27	.02	.15	-.43	1

coefficient

Correlation Plot

ggcorrplot::ggcorrplot(cor(dataset),
                       method = "circle", type = "upper", ggtheme = theme_foundation(), legend.title = "Correlaiton\nCoefficient", outline.color = "black")

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Multiple Liniar Regression

We are taking happy (Happiness) as a criterian variable, and happy gdp, soci.s, hlexp.b, freedom, p.corruption, p.affect, and n.affect as Predictor variable.

full_model <- lm(happy~., data= dataset)
summary(full_model)

## 
## Call:
## lm(formula = happy ~ ., data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.54437 -0.22331  0.00723  0.21652  0.97397 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.420606   0.586467  -4.127 6.57e-05 ***
## gdp           0.282997   0.069576   4.067 8.25e-05 ***
## soci.s        1.653836   0.537616   3.076 0.002564 ** 
## hlexp.b       0.039277   0.009027   4.351 2.75e-05 ***
## freedom       0.429442   0.418906   1.025 0.307226    
## generosity    0.525363   0.332444   1.580 0.116505    
## p.corruption -0.657905   0.381796  -1.723 0.087270 .  
## p.affect      2.091064   0.553694   3.777 0.000242 ***
## n.affect      0.387465   0.609120   0.636 0.525843    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4202 on 128 degrees of freedom
## Multiple R-squared:  0.8133, Adjusted R-squared:  0.8016 
## F-statistic: 69.68 on 8 and 128 DF,  p-value: < 2.2e-16

4 predictor variables are insignificant in explaining criterion variable, to get most effective variable I am using Stepwise regression

OLS Stepwise Regression

olsrr package (Alternate way)

olsrr package is dedicated package for regression it can make our work easy.

olsrr::ols_step_both_p(full_model, pent= .05, prem= .10, detail= TRUE)

## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. gdp 
## 2. soci.s 
## 3. hlexp.b 
## 4. freedom 
## 5. generosity 
## 6. p.corruption 
## 7. p.affect 
## 8. n.affect 
## 
## We are selecting variables based on p value...
## 
## 
## Stepwise Selection: Step 1 
## 
## - gdp added 
## 
##                         Model Summary                          
## --------------------------------------------------------------
## R                       0.800       RMSE                0.568 
## R-Squared               0.640       Coef. Var          10.844 
## Adj. R-Squared          0.638       MSE                 0.322 
## Pred R-Squared          0.631       MAE                 0.456 
## --------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     77.499          1         77.499    240.324    0.0000 
## Residual       43.535        135          0.322                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -0.963         0.403                 -2.391    0.018    -1.760    -0.167 
##         gdp     0.679         0.044        0.800    15.502    0.000     0.593     0.766 
## ----------------------------------------------------------------------------------------
## 
## 
## 
## Stepwise Selection: Step 2 
## 
## - p.affect added 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.867       RMSE               0.473 
## R-Squared               0.752       Coef. Var          9.030 
## Adj. R-Squared          0.749       MSE                0.224 
## Pred R-Squared          0.742       MAE                0.365 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     91.066          2         45.533    203.596    0.0000 
## Residual       29.968        134          0.224                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.609         0.397                 -6.580    0.000    -3.393    -1.825 
##         gdp     0.598         0.038        0.705    15.772    0.000     0.523     0.674 
##    p.affect     3.428         0.440        0.348     7.789    0.000     2.557     4.298 
## ----------------------------------------------------------------------------------------
## 
## 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.867       RMSE               0.473 
## R-Squared               0.752       Coef. Var          9.030 
## Adj. R-Squared          0.749       MSE                0.224 
## Pred R-Squared          0.742       MAE                0.365 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     91.066          2         45.533    203.596    0.0000 
## Residual       29.968        134          0.224                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.609         0.397                 -6.580    0.000    -3.393    -1.825 
##         gdp     0.598         0.038        0.705    15.772    0.000     0.523     0.674 
##    p.affect     3.428         0.440        0.348     7.789    0.000     2.557     4.298 
## ----------------------------------------------------------------------------------------
## 
## 
## 
## Stepwise Selection: Step 3 
## 
## - hlexp.b added 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.887       RMSE               0.441 
## R-Squared               0.787       Coef. Var          8.415 
## Adj. R-Squared          0.782       MSE                0.194 
## Pred R-Squared          0.773       MAE                0.343 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     95.203          3         31.734    163.397    0.0000 
## Residual       25.831        133          0.194                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.999         0.379                 -7.912    0.000    -3.748    -2.249 
##         gdp     0.361         0.063        0.425     5.766    0.000     0.237     0.484 
##    p.affect     3.277         0.411        0.333     7.966    0.000     2.464     4.091 
##     hlexp.b     0.043         0.009        0.340     4.616    0.000     0.025     0.062 
## ----------------------------------------------------------------------------------------
## 
## 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.887       RMSE               0.441 
## R-Squared               0.787       Coef. Var          8.415 
## Adj. R-Squared          0.782       MSE                0.194 
## Pred R-Squared          0.773       MAE                0.343 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     95.203          3         31.734    163.397    0.0000 
## Residual       25.831        133          0.194                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.999         0.379                 -7.912    0.000    -3.748    -2.249 
##         gdp     0.361         0.063        0.425     5.766    0.000     0.237     0.484 
##    p.affect     3.277         0.411        0.333     7.966    0.000     2.464     4.091 
##     hlexp.b     0.043         0.009        0.340     4.616    0.000     0.025     0.062 
## ----------------------------------------------------------------------------------------
## 
## 
## 
## Stepwise Selection: Step 4 
## 
## - soci.s added 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.894       RMSE               0.430 
## R-Squared               0.798       Coef. Var          8.209 
## Adj. R-Squared          0.792       MSE                0.185 
## Pred R-Squared          0.783       MAE                0.326 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     96.640          4         24.160    130.735    0.0000 
## Residual       24.394        132          0.185                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.889         0.372                 -7.769    0.000    -3.624    -2.153 
##         gdp     0.271         0.069        0.320     3.940    0.000     0.135     0.408 
##    p.affect     2.765         0.441        0.281     6.264    0.000     1.892     3.638 
##     hlexp.b     0.042         0.009        0.329     4.578    0.000     0.024     0.060 
##      soci.s     1.443         0.517        0.176     2.788    0.006     0.419     2.467 
## ----------------------------------------------------------------------------------------
## 
## 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.894       RMSE               0.430 
## R-Squared               0.798       Coef. Var          8.209 
## Adj. R-Squared          0.792       MSE                0.185 
## Pred R-Squared          0.783       MAE                0.326 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     96.640          4         24.160    130.735    0.0000 
## Residual       24.394        132          0.185                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.889         0.372                 -7.769    0.000    -3.624    -2.153 
##         gdp     0.271         0.069        0.320     3.940    0.000     0.135     0.408 
##    p.affect     2.765         0.441        0.281     6.264    0.000     1.892     3.638 
##     hlexp.b     0.042         0.009        0.329     4.578    0.000     0.024     0.060 
##      soci.s     1.443         0.517        0.176     2.788    0.006     0.419     2.467 
## ----------------------------------------------------------------------------------------
## 
## 
## 
## Stepwise Selection: Step 5 
## 
## - generosity added 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.898       RMSE               0.423 
## R-Squared               0.806       Coef. Var          8.082 
## Adj. R-Squared          0.799       MSE                0.179 
## Pred R-Squared          0.788       MAE                0.321 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     97.565          5         19.513    108.919    0.0000 
## Residual       23.469        131          0.179                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.844         0.367                 -7.757    0.000    -3.569    -2.119 
##         gdp     0.302         0.069        0.356     4.372    0.000     0.166     0.439 
##    p.affect     2.400         0.463        0.244     5.180    0.000     1.484     3.317 
##     hlexp.b     0.040         0.009        0.316     4.451    0.000     0.022     0.058 
##      soci.s     1.497         0.510        0.182     2.936    0.004     0.488     2.506 
##  generosity     0.729         0.321        0.096     2.272    0.025     0.094     1.363 
## ----------------------------------------------------------------------------------------
## 
## 
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.898       RMSE               0.423 
## R-Squared               0.806       Coef. Var          8.082 
## Adj. R-Squared          0.799       MSE                0.179 
## Pred R-Squared          0.788       MAE                0.321 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     97.565          5         19.513    108.919    0.0000 
## Residual       23.469        131          0.179                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.844         0.367                 -7.757    0.000    -3.569    -2.119 
##         gdp     0.302         0.069        0.356     4.372    0.000     0.166     0.439 
##    p.affect     2.400         0.463        0.244     5.180    0.000     1.484     3.317 
##     hlexp.b     0.040         0.009        0.316     4.451    0.000     0.022     0.058 
##      soci.s     1.497         0.510        0.182     2.936    0.004     0.488     2.506 
##  generosity     0.729         0.321        0.096     2.272    0.025     0.094     1.363 
## ----------------------------------------------------------------------------------------
## 
## 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.898       RMSE               0.423 
## R-Squared               0.806       Coef. Var          8.082 
## Adj. R-Squared          0.799       MSE                0.179 
## Pred R-Squared          0.788       MAE                0.321 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                 
## ---------------------------------------------------------------------
##                Sum of                                                
##               Squares         DF    Mean Square       F         Sig. 
## ---------------------------------------------------------------------
## Regression     97.565          5         19.513    108.919    0.0000 
## Residual       23.469        131          0.179                      
## Total         121.034        136                                     
## ---------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    -2.844         0.367                 -7.757    0.000    -3.569    -2.119 
##         gdp     0.302         0.069        0.356     4.372    0.000     0.166     0.439 
##    p.affect     2.400         0.463        0.244     5.180    0.000     1.484     3.317 
##     hlexp.b     0.040         0.009        0.316     4.451    0.000     0.022     0.058 
##      soci.s     1.497         0.510        0.182     2.936    0.004     0.488     2.506 
##  generosity     0.729         0.321        0.096     2.272    0.025     0.094     1.363 
## ----------------------------------------------------------------------------------------

## 
##                                Stepwise Selection Summary                                
## ----------------------------------------------------------------------------------------
##                        Added/                   Adj.                                        
## Step     Variable     Removed     R-Square    R-Square      C(p)        AIC        RMSE     
## ----------------------------------------------------------------------------------------
##    1       gdp        addition       0.640       0.638    113.5380    237.7291    0.5679    
##    2     p.affect     addition       0.752       0.749     38.7100    188.5705    0.4729    
##    3     hlexp.b      addition       0.787       0.782     17.2800    170.2165    0.4407    
##    4      soci.s      addition       0.798       0.792     11.1430    164.3753    0.4299    
##    5    generosity    addition       0.806       0.799      7.9040    161.0790    0.4233    
## ----------------------------------------------------------------------------------------

Result Summary

GDP per Capita, Healthy life expectancy at birth, Positive affect, Social Support, and Generosity emerges most important in predicting the level of happiness respectively.

5 best predictor variables are separated using ols regression

###################3 Done #############

Final Model

lm(happy~gdp+p.affect+hlexp.b+soci.s+generosity, data= dataset) -> final_model
summary(final_model)# %>% print(digits= 2)

## 
## Call:
## lm(formula = happy ~ gdp + p.affect + hlexp.b + soci.s + generosity, 
##     data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62437 -0.27131  0.00951  0.29917  0.91453 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.843837   0.366615  -7.757 2.16e-12 ***
## gdp          0.302403   0.069175   4.372 2.49e-05 ***
## p.affect     2.400211   0.463323   5.180 8.15e-07 ***
## hlexp.b      0.040196   0.009031   4.451 1.81e-05 ***
## soci.s       1.497424   0.510064   2.936  0.00393 ** 
## generosity   0.728894   0.320767   2.272  0.02469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4233 on 131 degrees of freedom
## Multiple R-squared:  0.8061, Adjusted R-squared:  0.7987 
## F-statistic: 108.9 on 5 and 131 DF,  p-value: < 2.2e-16

Detailed analysis for Model assumptions

par(mfrow= c(3,2), lwd= 2)
for(i in 1:6) {
    plot(final_model,i,  col=i+2, pch= 19)
}

for more Google it (interpretation of regression plots in R programming)

Residuals vs Fitted: follows assumption of Linearity
Normal Q-Q: Residuals are normally distributed
Scale-Location (also known as Spread-Location): To test Homoscedasity/ or Homogeneity of variance.Heterogeneity is bad for any parametric statistical test.
Residuals vs Leverage: To get information whether out model is in effect of any extreme value.

From this plot we can say that out model follows all the assumptions of regression model.

Standardize beta value

lm.beta::lm.beta(final_model) %>% print(digits= 2)

## 
## Call:
## lm(formula = happy ~ gdp + p.affect + hlexp.b + soci.s + generosity, 
##     data = dataset)
## 
## Standardized Coefficients::
## (Intercept)         gdp    p.affect     hlexp.b      soci.s  generosity 
##       0.000       0.356       0.244       0.316       0.182       0.096

ANOVA

NOTE: no need to run ANOVA 5 times for 5 models, “olsrr” package is more than sufficient.

To get less output (and to demonstrate) running anova 5 times manually. (“olsrr” produce huge stuff)

null_model <- lm(happy~1, data= dataset)
anova(null_model, lm(happy~gdp, data= dataset))

## Analysis of Variance Table
## 
## Model 1: happy ~ 1
## Model 2: happy ~ gdp
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    136 121.034                                  
## 2    135  43.535  1    77.499 240.32 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(null_model, lm(happy~gdp+p.affect, data= dataset))

## Analysis of Variance Table
## 
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect
##   Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
## 1    136 121.034                                 
## 2    134  29.968  2    91.066 203.6 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(null_model, lm(happy~gdp+p.affect+hlexp.b, data= dataset))

## Analysis of Variance Table
## 
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b
##   Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
## 1    136 121.034                                 
## 2    133  25.831  3    95.203 163.4 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(null_model, lm(happy~gdp+p.affect+hlexp.b+soci.s, data= dataset))

## Analysis of Variance Table
## 
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b + soci.s
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    136 121.034                                  
## 2    132  24.394  4     96.64 130.73 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(null_model, final_model)

## Analysis of Variance Table
## 
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b + soci.s + generosity
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    136 121.034                                  
## 2    131  23.469  5    97.565 108.92 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

EXTRA

Checking Normality (of IVs) using K-S test

dataset <- as.data.frame(dataset)
for(i in 1:ncol(dataset)) {
    print(
        paste(paste0(i,"."), "The variable",
              names(dataset[i]),
              "is Normal:",
              nortest::lillie.test(dataset[,i])$p.value < 0.05)
    )
}

## [1] "1. The variable happy is Normal: TRUE"
## [1] "2. The variable gdp is Normal: TRUE"
## [1] "3. The variable soci.s is Normal: TRUE"
## [1] "4. The variable hlexp.b is Normal: TRUE"
## [1] "5. The variable freedom is Normal: FALSE"
## [1] "6. The variable generosity is Normal: FALSE"
## [1] "7. The variable p.corruption is Normal: TRUE"
## [1] "8. The variable p.affect is Normal: TRUE"
## [1] "9. The variable n.affect is Normal: TRUE"

Kolmogornov-Smirnov (K-S test or Lilliefors Normality test) test shows only generosity and freedom follows normal distribution

For normal distribution p-value should be more than .05, because Null hypothesis for K.S test H0: Data is Normal, H1: Data is away from Normal.

If p-value is more than .05 then we fail to reject Null hypothesis, and conclude that H0 is true. (Google it “K-S test”)

Thank You

Regards

Please visit my profile

Alok Pratap Singh (Research Scholar)

Linkedin (Open in New TAB)

apsingh@allduniv.ac.in

Department of Psychology

University of Allahabad

Data to information, Information to Insight, and Insight to Impact. ❤