Introduction
The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness. The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.
Summary
The world happiness index surveyed by the United Nations Sustainable Development Solutions. The data contain information of 166 countries, duration of 15 years from 2005 to 2020. In this survey we have the data on different parameters i.e., Life ladder (represents the level of happiness), GDP of countries, Social support, Healthy life expectancy at birth, freedom to make life choices, and Generosity. To get the correlation between variables Pearson’s coefficient of correlation us used, also Multiple/ Stepwise regression is used for modelling and prediction.
Libraries required in the Project
library(readr) # reading dataset
library(dplyr, warn.conflicts = F) # for Data wrangling
library(ggplot2) # for data visualization
library(ggthemes) # for some nice themes and automatic colors
library(psych) # for summary, etc
library(olsrr) # ols stepwise regression
library(ggcorrplot)# for correlation plot
library(rcompanion) # to plot histogram with density plot on real values
library(nortest) # Statistical test for the Normality of the dataset
Loading data
<- readr::read_csv("C:/Users/Asus/Documents/R Clints/Catherine Arceno/ProjectScience/World-happiness.csv") # loading dataset data
Top 6 rows
head(data)
## # A tibble: 6 x 11
## `Country name` year `Life Ladder` `Log GDP per capita` `Social support`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan 2008 3.72 7.37 0.451
## 2 Afghanistan 2009 4.40 7.54 0.552
## 3 Afghanistan 2010 4.76 7.65 0.539
## 4 Afghanistan 2011 3.83 7.62 0.521
## 5 Afghanistan 2012 3.78 7.70 0.521
## 6 Afghanistan 2013 3.57 7.72 0.484
## # ... with 6 more variables: Healthy life expectancy at birth <dbl>,
## # Freedom to make life choices <dbl>, Generosity <dbl>,
## # Perceptions of corruption <dbl>, Positive affect <dbl>,
## # Negative affect <dbl>
Name, dimension (rows and column), and type of variables
glimpse(data)
## Rows: 1,949
## Columns: 11
## $ `Country name` <chr> "Afghanistan", "Afghanistan", "Afgh~
## $ year <dbl> 2008, 2009, 2010, 2011, 2012, 2013,~
## $ `Life Ladder` <dbl> 3.724, 4.402, 4.758, 3.832, 3.783, ~
## $ `Log GDP per capita` <dbl> 7.370, 7.540, 7.647, 7.620, 7.705, ~
## $ `Social support` <dbl> 0.451, 0.552, 0.539, 0.521, 0.521, ~
## $ `Healthy life expectancy at birth` <dbl> 50.80, 51.20, 51.60, 51.92, 52.24, ~
## $ `Freedom to make life choices` <dbl> 0.718, 0.679, 0.600, 0.496, 0.531, ~
## $ Generosity <dbl> 0.168, 0.190, 0.121, 0.162, 0.236, ~
## $ `Perceptions of corruption` <dbl> 0.882, 0.850, 0.707, 0.731, 0.776, ~
## $ `Positive affect` <dbl> 0.518, 0.584, 0.618, 0.611, 0.710, ~
## $ `Negative affect` <dbl> 0.258, 0.237, 0.275, 0.267, 0.268, ~
Data is consist of 1949 observations/rows and 11 variables/columns.
unique(data$`Country name`) %>%
length() # Number of countries in out dataset
## [1] 166
range(data$year) # time period of datsets
## [1] 2005 2020
we have data of 166 countries, duration between 2005 to 2020
Missing Cases in dataset
complete.cases(data) %>% table() # total number of complete cases
## .
## FALSE TRUE
## 241 1708
is.na(data) %>% table() # total number of missing values
## .
## FALSE TRUE
## 21066 373
We have total 1949 cases, in which 241 cases have missing values. Total number of missing values are 373.
How many missing values in each Variables ?
colSums(is.na(data)) %>% data.frame() # missing values (NA) in variables
## .
## Country name 0
## year 0
## Life Ladder 0
## Log GDP per capita 36
## Social support 13
## Healthy life expectancy at birth 55
## Freedom to make life choices 32
## Generosity 89
## Perceptions of corruption 110
## Positive affect 22
## Negative affect 16
Descriptive analysis
%>% select(-c(`Country name`, year)) %>%
data ::describe(na.rm = T, type = 2, fast = F) %>%
psychselect(-c(vars, n, se, trimmed, mad, min, max, range))
## mean sd median skew kurtosis
## Life Ladder 5.47 1.12 5.39 0.07 -0.69
## Log GDP per capita 9.37 1.15 9.46 -0.31 -0.86
## Social support 0.81 0.12 0.84 -1.11 1.18
## Healthy life expectancy at birth 63.36 7.51 65.20 -0.74 -0.05
## Freedom to make life choices 0.74 0.14 0.76 -0.62 -0.13
## Generosity 0.00 0.16 -0.03 0.81 0.84
## Perceptions of corruption 0.75 0.19 0.80 -1.50 1.84
## Positive affect 0.71 0.11 0.72 -0.36 -0.58
## Negative affect 0.27 0.09 0.26 0.74 0.87
Top 10 countries in GDP per capita (Taking the average of 15 Years starting from 2005 to 2020)
%>%
(data select(`Country name`, `Log GDP per capita`) %>%
na.omit() %>% # Removing the country have NA data in any 15 year
group_by(`Country name`) %>%
summarise(gdp= mean(`Log GDP per capita`)) %>%
arrange(-gdp) %>% .[1:10,] -> top10gdp)
## # A tibble: 10 x 2
## `Country name` gdp
## <chr> <dbl>
## 1 Luxembourg 11.6
## 2 Qatar 11.5
## 3 Singapore 11.3
## 4 Switzerland 11.1
## 5 Ireland 11.1
## 6 United Arab Emirates 11.1
## 7 Norway 11.0
## 8 Kuwait 11.0
## 9 United States 11.0
## 10 Hong Kong S.A.R. of China 10.9
Bottom 10 countries in GDP per capita (Taking the average of 15 Years starting from 2005 to 2020)
%>% select(`Country name`, `Log GDP per capita`) %>%
(data na.omit() %>% # Removing the country have NA data in any 15 year
group_by(`Country name`) %>%
summarise(gdp= mean(`Log GDP per capita`)) %>%
arrange(gdp) %>% .[1:10,] -> bottom10gdp)
## # A tibble: 10 x 2
## `Country name` gdp
## <chr> <dbl>
## 1 Burundi 6.72
## 2 Malawi 6.88
## 3 Congo (Kinshasa) 6.88
## 4 Central African Republic 6.96
## 5 Niger 6.99
## 6 Mozambique 7.01
## 7 Togo 7.24
## 8 Liberia 7.30
## 9 Sierra Leone 7.36
## 10 Madagascar 7.37
Top 10 countries in Happiness (taking mean average of 15 year)
%>% select(`Country name`, `Life Ladder`) %>%
(data group_by(`Country name`) %>%
summarise(happiness= mean(`Life Ladder`)) %>%
arrange(-happiness) %>% .[1:10,] -> top10happiness)
## # A tibble: 10 x 2
## `Country name` happiness
## <chr> <dbl>
## 1 Denmark 7.68
## 2 Finland 7.60
## 3 Switzerland 7.55
## 4 Norway 7.51
## 5 Netherlands 7.47
## 6 Iceland 7.45
## 7 Canada 7.38
## 8 Sweden 7.37
## 9 New Zealand 7.31
## 10 Australia 7.28
Bottop 10 countries in Happiness (taking mean average of 15 year)
%>% select(`Country name`, `Life Ladder`) %>%
(data group_by(`Country name`) %>%
summarise(happiness= mean(`Life Ladder`)) %>%
arrange(happiness) %>% .[1:10,] -> bottom10hapiness)
## # A tibble: 10 x 2
## `Country name` happiness
## <chr> <dbl>
## 1 South Sudan 3.40
## 2 Central African Republic 3.52
## 3 Burundi 3.55
## 4 Togo 3.56
## 5 Afghanistan 3.59
## 6 Rwanda 3.65
## 7 Tanzania 3.70
## 8 Zimbabwe 3.88
## 9 Yemen 3.91
## 10 Comoros 3.94
Comparing Ranks
Switzerland and Norway are common in the list of “top 10 countries in gdp and happiness”
inner_join(top10gdp, top10happiness, by= "Country name")
## # A tibble: 2 x 3
## `Country name` gdp happiness
## <chr> <dbl> <dbl>
## 1 Switzerland 11.1 7.55
## 2 Norway 11.0 7.51
Similarly on comparing the ranks of bottom 10 countries in happiness and GDP, 3 Countries remain in common list
inner_join(bottom10hapiness, bottom10gdp, by= "Country name")
## # A tibble: 3 x 3
## `Country name` happiness gdp
## <chr> <dbl> <dbl>
## 1 Central African Republic 3.52 6.96
## 2 Burundi 3.55 6.72
## 3 Togo 3.56 7.24
Data preparation for Hypothesis testing
%>% na.omit %>% # Removing all rows with Missing values in dataset
data group_by(`Country name`) %>%
summarise(happy= mean(`Life Ladder`), # Taking mean of all variables Country wise; it is mean value of 15 years
gdp= mean(`Log GDP per capita`),
soci.s= mean(`Social support`),
hlexp.b= mean(`Healthy life expectancy at birth`),
freedom= mean(`Freedom to make life choices`),
generosity= mean(Generosity),
p.corruption= mean(`Perceptions of corruption`),
p.affect= mean(`Positive affect`),
n.affect= mean(`Negative affect`)) %>%
ungroup() %>%
select(-`Country name`)-> dataset # Dataset is ready for Regression Analysis
TOP 6 rows
%>% head() dataset
## # A tibble: 6 x 9
## happy gdp soci.s hlexp.b freedom generosity p.corruption p.affect n.affect
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.59 7.65 0.508 52.3 0.518 0.0701 0.843 0.549 0.326
## 2 5.02 9.38 0.716 67.5 0.663 -0.0827 0.869 0.654 0.299
## 3 5.19 9.33 0.812 65.4 0.504 -0.132 0.706 0.594 0.256
## 4 4.42 8.99 0.738 53.6 0.456 -0.0883 0.867 0.614 0.351
## 5 6.31 10.0 0.904 67.9 0.768 -0.160 0.842 0.833 0.284
## 6 4.51 9.27 0.719 65.7 0.564 -0.200 0.846 0.550 0.434
Data Preperation (Checking outlierss, Normality, Removing outlierss, etc)
Abbreviated Names of Variables
Old name | New Name |
---|---|
Country name | Country name |
year | year |
Life Ladder | happy |
Log GDP per Capita | gdp |
Social Support | soci.s |
Healthy life expectancy at birth | hlexp.b |
Freedom to make life choices | freedom |
Generosity | generosity |
Perception of corruption | p.corruption |
Positive affect | p.affact |
Negative affect | n.affact |
Boxplots
par(mfrow= c(3,3))
for(i in c(1:9)) {
boxplot(dataset[,i], col= i+2, lwd= 1.3)
title(main= names(dataset[i]))
}
perception of corruption have extreme values
Histogram with Normal Plot
par(mfrow=c(3,3))
for(i in c(1:9)) {
::plotNormalHistogram(dataset[,i],
rcompanioncol=i+2,
xlab= "",
main= names(dataset[i]))
}
From Histograms all data seems Normality distributed, except for ‘perception of corruption, and generosity’
Data Cleaning: removing “Extreme values” form dataset
par(mfrow=c(1,2))
boxplot(dataset$p.corruption, main= "p.corruption", col= F)$out %>%
range()
## [1] 0.09783333 0.47650000
boxplot(dataset$generosity, main= "generosity", col= F)$out %>%
range()
## [1] 0.3814667 0.6090000
Total 15 values in p.corruption, and 1 value form p.corruption are outliers, Removing them
dim(dataset)
## [1] 155 9
%>% filter(p.corruption>0.47650000, generosity<.38 )-> dataset dataset
After removing outliers, checking again for extreame value and Normality
par(mfrow= c(3,3))
for(i in c(1:9)) {
boxplot(dataset[,i], col= i+2)
title(main= names(dataset[i]))
}
par(mfrow=c(3,3))
for(i in c(1:9)) {
::plotNormalHistogram(dataset[,i],
rcompanioncol=i+2,
xlab= "",
main= names(dataset[i]))
}
QQPLOT (another graphical way to check normality)
Note: Don’t know Why “for loop” is NOT working for qqnorm, so have to write too much codes"
par(mfrow= c(3,3))
qqnorm(dataset$happy, col= "blue", pch= 19, main= "happy")
qqline(dataset$happy, col= "red", lwd=3)
qqnorm(dataset$gdp, col= "blue", pch= 19, main= "gdp")
qqline(dataset$gdp, col= "red", lwd=3)
qqnorm(dataset$soci.s, col= "blue", pch= 19, main= "soci.s")
qqline(dataset$soci.s, col= "red", lwd=3)
qqnorm(dataset$hlexp.b, col= "blue", pch= 19, main= "hlexp.b")
qqline(dataset$hlexp.b, col= "red", lwd=3)
qqnorm(dataset$freedom, col= "blue", pch= 19, main= "freedom")
qqline(dataset$freedom, col= "red", lwd=3)
qqnorm(dataset$generosity, col= "blue", pch= 19, main= "generosity")
qqline(dataset$generosity, col= "red", lwd= 3)
qqnorm(dataset$p.corruption, col= "blue", pch= 19, main= "p.corruption")
qqline(dataset$p.corruption, col= "red", lwd=3, main= "")
qqnorm(dataset$p.affect, col= "blue", pch= 19, main= "p.affect")
qqline(dataset$p.affect, col= "red", lwd=3)
qqnorm(dataset$n.affect, col= "blue", pch= 19, main= "n.affect")
qqline(dataset$n.affect, col= "red", lwd=3)
From the graph of qqnorm() it seems Social support (soci.s), and perception of corruption is not Normal… But we can ignore the issue of normality because majority of variables are following normal distribution in this dataset.
Descriptive Statistics
Pearson’s Correlation Coefficient Table
cor(dataset) %>% round(2)
## happy gdp soci.s hlexp.b freedom generosity p.corruption
## happy 1.00 0.80 0.75 0.79 0.50 0.04 -0.23
## gdp 0.80 1.00 0.73 0.84 0.29 -0.21 -0.11
## soci.s 0.75 0.73 1.00 0.64 0.41 -0.08 -0.08
## hlexp.b 0.79 0.84 0.64 1.00 0.31 -0.12 -0.10
## freedom 0.50 0.29 0.41 0.31 1.00 0.28 -0.28
## generosity 0.04 -0.21 -0.08 -0.12 0.28 1.00 -0.23
## p.corruption -0.23 -0.11 -0.08 -0.10 -0.28 -0.23 1.00
## p.affect 0.54 0.27 0.48 0.27 0.65 0.28 -0.22
## n.affect -0.28 -0.19 -0.42 -0.15 -0.27 0.02 0.15
## p.affect n.affect
## happy 0.54 -0.28
## gdp 0.27 -0.19
## soci.s 0.48 -0.42
## hlexp.b 0.27 -0.15
## freedom 0.65 -0.27
## generosity 0.28 0.02
## p.corruption -0.22 0.15
## p.affect 1.00 -0.43
## n.affect -0.43 1.00
Significance level for Correlation
data.frame(Association= cor.ci(dataset, plot= FALSE)$ci %>% round(2) %>% rownames(),
p_value= cor.ci(dataset, plot = FALSE)$ci %>% .$p %>%
round(2))
## Association p_value
## 1 happy-gdp 0.00
## 2 happy-soc.s 0.00
## 3 happy-hlxp. 0.00
## 4 happy-fredm 0.00
## 5 happy-gnrst 0.79
## 6 happy-p.crr 0.02
## 7 happy-p.ffc 0.00
## 8 happy-n.ffc 0.00
## 9 gdp-soc.s 0.00
## 10 gdp-hlxp. 0.00
## 11 gdp-fredm 0.00
## 12 gdp-gnrst 0.02
## 13 gdp-p.crr 0.21
## 14 gdp-p.ffc 0.00
## 15 gdp-n.ffc 0.03
## 16 soc.s-hlxp. 0.00
## 17 soc.s-fredm 0.00
## 18 soc.s-gnrst 0.35
## 19 soc.s-p.crr 0.48
## 20 soc.s-p.ffc 0.00
## 21 soc.s-n.ffc 0.00
## 22 hlxp.-fredm 0.00
## 23 hlxp.-gnrst 0.17
## 24 hlxp.-p.crr 0.22
## 25 hlxp.-p.ffc 0.00
## 26 hlxp.-n.ffc 0.04
## 27 fredm-gnrst 0.01
## 28 fredm-p.crr 0.00
## 29 fredm-p.ffc 0.00
## 30 fredm-n.ffc 0.01
## 31 gnrst-p.crr 0.03
## 32 gnrst-p.ffc 0.00
## 33 gnrst-n.ffc 0.64
## 34 p.crr-p.ffc 0.02
## 35 p.crr-n.ffc 0.15
## 36 p.ffc-n.ffc 0.00
describe(dataset) %>% select(mean, sd)
## mean sd
## happy 5.24 0.94
## gdp 9.13 1.11
## soci.s 0.79 0.11
## hlexp.b 61.72 7.42
## freedom 0.71 0.12
## generosity -0.02 0.12
## p.corruption 0.79 0.10
## p.affect 0.70 0.10
## n.affect 0.28 0.07
Descriptive Table (Mean, SD, and Correlation values)
Mean (SD) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
Happiness | 5.24(.94) | 1 | ||||||||
GDP | 9.13(1.11) | .80 | 1 | |||||||
Social Supp. | .79(0.11) | .75 | .73 | 1 | ||||||
Healthy Life | 61.72(7.42) | .73 | .73 | 0.64 | 1 | |||||
Freedom | .71(.12) | .64 | .29 | .41 | .31 | 1 | ||||
Generosity | -.02(.12) | .04 | -.21 | -.08 | -.12 | .28 | 1 | |||
Corruption | .79(.10) | -.23 | -.11 | -.08 | -.10 | -.28 | -.23 | 1 | ||
Positive Aff | .70(.10) | .54 | .27 | .48 | .27 | .65 | .28 | -.22 | 1 | |
Negative Aff. | .28(.07) | -.28 | -.19 | -.42 | -.15 | -.27 | .02 | .15 | -.43 | 1 |
coefficient
Correlation Plot
::ggcorrplot(cor(dataset),
ggcorrplotmethod = "circle", type = "upper", ggtheme = theme_foundation(), legend.title = "Correlaiton\nCoefficient", outline.color = "black")
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Multiple Liniar Regression
We are taking happy (Happiness) as a criterian variable, and happy gdp, soci.s, hlexp.b, freedom, p.corruption, p.affect, and n.affect as Predictor variable.
<- lm(happy~., data= dataset)
full_model summary(full_model)
##
## Call:
## lm(formula = happy ~ ., data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.54437 -0.22331 0.00723 0.21652 0.97397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.420606 0.586467 -4.127 6.57e-05 ***
## gdp 0.282997 0.069576 4.067 8.25e-05 ***
## soci.s 1.653836 0.537616 3.076 0.002564 **
## hlexp.b 0.039277 0.009027 4.351 2.75e-05 ***
## freedom 0.429442 0.418906 1.025 0.307226
## generosity 0.525363 0.332444 1.580 0.116505
## p.corruption -0.657905 0.381796 -1.723 0.087270 .
## p.affect 2.091064 0.553694 3.777 0.000242 ***
## n.affect 0.387465 0.609120 0.636 0.525843
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4202 on 128 degrees of freedom
## Multiple R-squared: 0.8133, Adjusted R-squared: 0.8016
## F-statistic: 69.68 on 8 and 128 DF, p-value: < 2.2e-16
4 predictor variables are insignificant in explaining criterion variable, to get most effective variable I am using Stepwise regression
OLS Stepwise Regression
olsrr package (Alternate way)
olsrr package is dedicated package for regression it can make our work easy.
::ols_step_both_p(full_model, pent= .05, prem= .10, detail= TRUE) olsrr
## Stepwise Selection Method
## ---------------------------
##
## Candidate Terms:
##
## 1. gdp
## 2. soci.s
## 3. hlexp.b
## 4. freedom
## 5. generosity
## 6. p.corruption
## 7. p.affect
## 8. n.affect
##
## We are selecting variables based on p value...
##
##
## Stepwise Selection: Step 1
##
## - gdp added
##
## Model Summary
## --------------------------------------------------------------
## R 0.800 RMSE 0.568
## R-Squared 0.640 Coef. Var 10.844
## Adj. R-Squared 0.638 MSE 0.322
## Pred R-Squared 0.631 MAE 0.456
## --------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 77.499 1 77.499 240.324 0.0000
## Residual 43.535 135 0.322
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -0.963 0.403 -2.391 0.018 -1.760 -0.167
## gdp 0.679 0.044 0.800 15.502 0.000 0.593 0.766
## ----------------------------------------------------------------------------------------
##
##
##
## Stepwise Selection: Step 2
##
## - p.affect added
##
## Model Summary
## -------------------------------------------------------------
## R 0.867 RMSE 0.473
## R-Squared 0.752 Coef. Var 9.030
## Adj. R-Squared 0.749 MSE 0.224
## Pred R-Squared 0.742 MAE 0.365
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 91.066 2 45.533 203.596 0.0000
## Residual 29.968 134 0.224
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.609 0.397 -6.580 0.000 -3.393 -1.825
## gdp 0.598 0.038 0.705 15.772 0.000 0.523 0.674
## p.affect 3.428 0.440 0.348 7.789 0.000 2.557 4.298
## ----------------------------------------------------------------------------------------
##
##
##
## Model Summary
## -------------------------------------------------------------
## R 0.867 RMSE 0.473
## R-Squared 0.752 Coef. Var 9.030
## Adj. R-Squared 0.749 MSE 0.224
## Pred R-Squared 0.742 MAE 0.365
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 91.066 2 45.533 203.596 0.0000
## Residual 29.968 134 0.224
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.609 0.397 -6.580 0.000 -3.393 -1.825
## gdp 0.598 0.038 0.705 15.772 0.000 0.523 0.674
## p.affect 3.428 0.440 0.348 7.789 0.000 2.557 4.298
## ----------------------------------------------------------------------------------------
##
##
##
## Stepwise Selection: Step 3
##
## - hlexp.b added
##
## Model Summary
## -------------------------------------------------------------
## R 0.887 RMSE 0.441
## R-Squared 0.787 Coef. Var 8.415
## Adj. R-Squared 0.782 MSE 0.194
## Pred R-Squared 0.773 MAE 0.343
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 95.203 3 31.734 163.397 0.0000
## Residual 25.831 133 0.194
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.999 0.379 -7.912 0.000 -3.748 -2.249
## gdp 0.361 0.063 0.425 5.766 0.000 0.237 0.484
## p.affect 3.277 0.411 0.333 7.966 0.000 2.464 4.091
## hlexp.b 0.043 0.009 0.340 4.616 0.000 0.025 0.062
## ----------------------------------------------------------------------------------------
##
##
##
## Model Summary
## -------------------------------------------------------------
## R 0.887 RMSE 0.441
## R-Squared 0.787 Coef. Var 8.415
## Adj. R-Squared 0.782 MSE 0.194
## Pred R-Squared 0.773 MAE 0.343
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 95.203 3 31.734 163.397 0.0000
## Residual 25.831 133 0.194
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.999 0.379 -7.912 0.000 -3.748 -2.249
## gdp 0.361 0.063 0.425 5.766 0.000 0.237 0.484
## p.affect 3.277 0.411 0.333 7.966 0.000 2.464 4.091
## hlexp.b 0.043 0.009 0.340 4.616 0.000 0.025 0.062
## ----------------------------------------------------------------------------------------
##
##
##
## Stepwise Selection: Step 4
##
## - soci.s added
##
## Model Summary
## -------------------------------------------------------------
## R 0.894 RMSE 0.430
## R-Squared 0.798 Coef. Var 8.209
## Adj. R-Squared 0.792 MSE 0.185
## Pred R-Squared 0.783 MAE 0.326
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 96.640 4 24.160 130.735 0.0000
## Residual 24.394 132 0.185
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.889 0.372 -7.769 0.000 -3.624 -2.153
## gdp 0.271 0.069 0.320 3.940 0.000 0.135 0.408
## p.affect 2.765 0.441 0.281 6.264 0.000 1.892 3.638
## hlexp.b 0.042 0.009 0.329 4.578 0.000 0.024 0.060
## soci.s 1.443 0.517 0.176 2.788 0.006 0.419 2.467
## ----------------------------------------------------------------------------------------
##
##
##
## Model Summary
## -------------------------------------------------------------
## R 0.894 RMSE 0.430
## R-Squared 0.798 Coef. Var 8.209
## Adj. R-Squared 0.792 MSE 0.185
## Pred R-Squared 0.783 MAE 0.326
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 96.640 4 24.160 130.735 0.0000
## Residual 24.394 132 0.185
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.889 0.372 -7.769 0.000 -3.624 -2.153
## gdp 0.271 0.069 0.320 3.940 0.000 0.135 0.408
## p.affect 2.765 0.441 0.281 6.264 0.000 1.892 3.638
## hlexp.b 0.042 0.009 0.329 4.578 0.000 0.024 0.060
## soci.s 1.443 0.517 0.176 2.788 0.006 0.419 2.467
## ----------------------------------------------------------------------------------------
##
##
##
## Stepwise Selection: Step 5
##
## - generosity added
##
## Model Summary
## -------------------------------------------------------------
## R 0.898 RMSE 0.423
## R-Squared 0.806 Coef. Var 8.082
## Adj. R-Squared 0.799 MSE 0.179
## Pred R-Squared 0.788 MAE 0.321
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 97.565 5 19.513 108.919 0.0000
## Residual 23.469 131 0.179
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.844 0.367 -7.757 0.000 -3.569 -2.119
## gdp 0.302 0.069 0.356 4.372 0.000 0.166 0.439
## p.affect 2.400 0.463 0.244 5.180 0.000 1.484 3.317
## hlexp.b 0.040 0.009 0.316 4.451 0.000 0.022 0.058
## soci.s 1.497 0.510 0.182 2.936 0.004 0.488 2.506
## generosity 0.729 0.321 0.096 2.272 0.025 0.094 1.363
## ----------------------------------------------------------------------------------------
##
##
##
## Model Summary
## -------------------------------------------------------------
## R 0.898 RMSE 0.423
## R-Squared 0.806 Coef. Var 8.082
## Adj. R-Squared 0.799 MSE 0.179
## Pred R-Squared 0.788 MAE 0.321
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 97.565 5 19.513 108.919 0.0000
## Residual 23.469 131 0.179
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.844 0.367 -7.757 0.000 -3.569 -2.119
## gdp 0.302 0.069 0.356 4.372 0.000 0.166 0.439
## p.affect 2.400 0.463 0.244 5.180 0.000 1.484 3.317
## hlexp.b 0.040 0.009 0.316 4.451 0.000 0.022 0.058
## soci.s 1.497 0.510 0.182 2.936 0.004 0.488 2.506
## generosity 0.729 0.321 0.096 2.272 0.025 0.094 1.363
## ----------------------------------------------------------------------------------------
##
##
##
## No more variables to be added/removed.
##
##
## Final Model Output
## ------------------
##
## Model Summary
## -------------------------------------------------------------
## R 0.898 RMSE 0.423
## R-Squared 0.806 Coef. Var 8.082
## Adj. R-Squared 0.799 MSE 0.179
## Pred R-Squared 0.788 MAE 0.321
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ---------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ---------------------------------------------------------------------
## Regression 97.565 5 19.513 108.919 0.0000
## Residual 23.469 131 0.179
## Total 121.034 136
## ---------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) -2.844 0.367 -7.757 0.000 -3.569 -2.119
## gdp 0.302 0.069 0.356 4.372 0.000 0.166 0.439
## p.affect 2.400 0.463 0.244 5.180 0.000 1.484 3.317
## hlexp.b 0.040 0.009 0.316 4.451 0.000 0.022 0.058
## soci.s 1.497 0.510 0.182 2.936 0.004 0.488 2.506
## generosity 0.729 0.321 0.096 2.272 0.025 0.094 1.363
## ----------------------------------------------------------------------------------------
##
## Stepwise Selection Summary
## ----------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ----------------------------------------------------------------------------------------
## 1 gdp addition 0.640 0.638 113.5380 237.7291 0.5679
## 2 p.affect addition 0.752 0.749 38.7100 188.5705 0.4729
## 3 hlexp.b addition 0.787 0.782 17.2800 170.2165 0.4407
## 4 soci.s addition 0.798 0.792 11.1430 164.3753 0.4299
## 5 generosity addition 0.806 0.799 7.9040 161.0790 0.4233
## ----------------------------------------------------------------------------------------
Result Summary
GDP per Capita, Healthy life expectancy at birth, Positive affect, Social Support, and Generosity emerges most important in predicting the level of happiness respectively.
5 best predictor variables are separated using ols regression
###################3 Done #############
Final Model
lm(happy~gdp+p.affect+hlexp.b+soci.s+generosity, data= dataset) -> final_model
summary(final_model)# %>% print(digits= 2)
##
## Call:
## lm(formula = happy ~ gdp + p.affect + hlexp.b + soci.s + generosity,
## data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.62437 -0.27131 0.00951 0.29917 0.91453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.843837 0.366615 -7.757 2.16e-12 ***
## gdp 0.302403 0.069175 4.372 2.49e-05 ***
## p.affect 2.400211 0.463323 5.180 8.15e-07 ***
## hlexp.b 0.040196 0.009031 4.451 1.81e-05 ***
## soci.s 1.497424 0.510064 2.936 0.00393 **
## generosity 0.728894 0.320767 2.272 0.02469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4233 on 131 degrees of freedom
## Multiple R-squared: 0.8061, Adjusted R-squared: 0.7987
## F-statistic: 108.9 on 5 and 131 DF, p-value: < 2.2e-16
Detailed analysis for Model assumptions
par(mfrow= c(3,2), lwd= 2)
for(i in 1:6) {
plot(final_model,i, col=i+2, pch= 19)
}
for more Google it (interpretation of regression plots in R programming)
Residuals vs Fitted: follows assumption of Linearity
Normal Q-Q: Residuals are normally distributed
Scale-Location (also known as Spread-Location): To test Homoscedasity/ or Homogeneity of variance.Heterogeneity is bad for any parametric statistical test.
Residuals vs Leverage: To get information whether out model is in effect of any extreme value.
From this plot we can say that out model follows all the assumptions of regression model.
Standardize beta value
::lm.beta(final_model) %>% print(digits= 2) lm.beta
##
## Call:
## lm(formula = happy ~ gdp + p.affect + hlexp.b + soci.s + generosity,
## data = dataset)
##
## Standardized Coefficients::
## (Intercept) gdp p.affect hlexp.b soci.s generosity
## 0.000 0.356 0.244 0.316 0.182 0.096
ANOVA
NOTE: no need to run ANOVA 5 times for 5 models, “olsrr” package is more than sufficient.
To get less output (and to demonstrate) running anova 5 times manually. (“olsrr” produce huge stuff)
<- lm(happy~1, data= dataset)
null_model anova(null_model, lm(happy~gdp, data= dataset))
## Analysis of Variance Table
##
## Model 1: happy ~ 1
## Model 2: happy ~ gdp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 136 121.034
## 2 135 43.535 1 77.499 240.32 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(null_model, lm(happy~gdp+p.affect, data= dataset))
## Analysis of Variance Table
##
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 136 121.034
## 2 134 29.968 2 91.066 203.6 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(null_model, lm(happy~gdp+p.affect+hlexp.b, data= dataset))
## Analysis of Variance Table
##
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 136 121.034
## 2 133 25.831 3 95.203 163.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(null_model, lm(happy~gdp+p.affect+hlexp.b+soci.s, data= dataset))
## Analysis of Variance Table
##
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b + soci.s
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 136 121.034
## 2 132 24.394 4 96.64 130.73 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(null_model, final_model)
## Analysis of Variance Table
##
## Model 1: happy ~ 1
## Model 2: happy ~ gdp + p.affect + hlexp.b + soci.s + generosity
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 136 121.034
## 2 131 23.469 5 97.565 108.92 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
EXTRA
Checking Normality (of IVs) using K-S test
<- as.data.frame(dataset)
dataset for(i in 1:ncol(dataset)) {
print(
paste(paste0(i,"."), "The variable",
names(dataset[i]),
"is Normal:",
::lillie.test(dataset[,i])$p.value < 0.05)
nortest
) }
## [1] "1. The variable happy is Normal: TRUE"
## [1] "2. The variable gdp is Normal: TRUE"
## [1] "3. The variable soci.s is Normal: TRUE"
## [1] "4. The variable hlexp.b is Normal: TRUE"
## [1] "5. The variable freedom is Normal: FALSE"
## [1] "6. The variable generosity is Normal: FALSE"
## [1] "7. The variable p.corruption is Normal: TRUE"
## [1] "8. The variable p.affect is Normal: TRUE"
## [1] "9. The variable n.affect is Normal: TRUE"
Kolmogornov-Smirnov (K-S test or Lilliefors Normality test) test shows only generosity and freedom follows normal distribution
For normal distribution p-value should be more than .05, because Null hypothesis for K.S test H0: Data is Normal, H1: Data is away from Normal.
If p-value is more than .05 then we fail to reject Null hypothesis, and conclude that H0 is true. (Google it “K-S test”)
Thank You
Regards
Please visit my profile
Alok Pratap Singh (Research Scholar)
Linkedin (Open in New TAB)
Department of Psychology
University of Allahabad
Data to information, Information to Insight, and Insight to Impact. ❤