South Asia, a melting pot of cultures, religions and histories, offers a unique backdrop against which the pursuit of happiness unfolds in diverse and intricate ways. This region, home to the majestic Himalayas, the serene beaches of Sri Lanka, the bustling markets of India and the ancient ruins of civilizations past, presents a tapestry of life that is rich in contrast and contradiction.
The World Happiness Report serves as a compass that navigates through the various dimensions of well-being and satisfaction across countries worldwide. In the context of South Asia, it offers us valuable insights into how people perceive their happiness, influenced by a myriad of factors from economic stability and social support to health, freedom and the environment. This analysis is not merely an academic exercise but a window into the lives of millions, shedding light on their joys, struggles, and aspirations.
Join us as we navigate through the landscapes of South Asia, seeking to understand what drives happiness in this diverse region and what lessons can be learned from its approach to fostering well-being and contentment among its people.
To analyze the happiness level from several country across South Asia, we will use World Happiness Report 2023 Data published in kaggle by USAMA BUTTAR. The data itself was from 2006 to 2022.
Explanation of each column:
Country.Name : Name of the CountryRegional.Indicator : Name of the Regional on each CountryYear : Yearly dataLife.Ladder : Happiness Level [1-10]Log.GDP.Per.Capita : Log Gross Domestic Product each people in the countrySocial.Support : Having someone to rely on.Healthy.Life.Expectancy.At.Birth : Expectation of having proper healthy life since birthFreedom.To.Make.Life.Choices : Difficulties of making live choices [0-1]Generosity : How often do they make a donation each monthPerceptions.Of.Corruption : Perceptions of Corruption level [0-1]Positive.Affect : Average positive effect from yesterday for laugh, happiness and interestNegative.Affect : Average negative effect from yesterday for worry, sadness and angerConfidence.In.National.Goverment : How trust with the governmentNeeded Libraries
#Packages for dataframe transformation
library(dplyr)
library(tidyr)
library(lubridate)
#Packages for visualization
library(ggcorrplot)
library(gplots)
library(ggplot2)
library(plotly)
library(foreign)
#Packages for further analysis
library(plm)
library(lfe)
library(lmtest)
library(car)
library(tseries)
library(MLmetrics)We will do several steps including:
data_input named World_Happiness_Report.csv# 1. dataset import
df <- read.csv("data_input/World_Happiness_Report.csv")
# 2. using only south asia data and remove regional indicator column
df_sasia <- df %>%
filter(Regional.Indicator == "South Asia") %>%
select(-Regional.Indicator)
head(df_sasia)We can use two different checking:
1. Checking the freuqencies of the data from individual index
table(df_sasia$Country.Name)##
## Afghanistan Bangladesh India Maldives Nepal Pakistan
## 14 17 17 1 17 16
## Sri Lanka
## 15
2. Using is.pbalanced() Function
We can use this function with notes that the data must came with pdata.frame format otherwise we can add the parameter index("individual column", "time column"). The expected result from the checking is TRUE which mean the data panel is now balanced.
is.pbalanced(df_sasia,index = c("Country.Name","Year"))## [1] FALSE
From frequency checking and data balancing above, we can see that:
The data is not balanced.
Maldives is the country that has the most insufficient data followed by Afghanistan, Sri Lanka and Pakistan
After some consideration, i took the Maldives out from the data as it is highly insufficient for further process.
df_sasia <- df_sasia %>% filter(Country.Name != "Maldives")
df_sasia1. Create Panel Data Frame
For balancing purpose of the data, we have to change the data format to become a panel data frame. To create a panel data frame, we can use pdata.frame() function with parameters:
data : The data that will be usedindex : c(“individual information”,“time information”)#creating pdata.frame
df_sasia <- df_sasia %>% pdata.frame(index = c("Country.Name","Year"))
#memeriksa struktur data
glimpse(df_sasia)## Rows: 96
## Columns: 12
## $ Country.Name <fct> Afghanistan, Afghanistan, Afghanista…
## $ Year <fct> 2008, 2009, 2010, 2011, 2012, 2013, …
## $ Life.Ladder <pseries> 3.723590, 4.401778, 4.758381, 3.…
## $ Log.GDP.Per.Capita <pseries> 7.350416, 7.508646, 7.613900, 7.…
## $ Social.Support <pseries> 0.4506623, 0.5523084, 0.5390752,…
## $ Healthy.Life.Expectancy.At.Birth <pseries> 50.500, 50.800, 51.100, 51.400, …
## $ Freedom.To.Make.Life.Choices <pseries> 0.7181143, 0.6788964, 0.6001272,…
## $ Generosity <pseries> 0.167652458, 0.190808803, 0.1213…
## $ Perceptions.Of.Corruption <pseries> 0.8816863, 0.8500354, 0.7067661,…
## $ Positive.Affect <pseries> 0.4142970, 0.4814214, 0.5169067,…
## $ Negative.Affect <pseries> 0.2581955, 0.2370924, 0.2753238,…
## $ Confidence.In.National.Government <pseries> 0.6120721, 0.6115452, 0.2993574,…
By doing this, it will automatically changed the type of the data of each columns
The index column will be converted as factor
Other than that will be converted as pseries
2. Checking Data Dimension
We can use pdim() function to check the dimension of the data
pdim(df_sasia)## Unbalanced Panel: n = 6, T = 14-17, N = 96
From this checking step, we could get some information such:
The data is not balanced.
There are 6 countries
Time index from 14 to 17
It has 96 of observation data row
We are now doing data balancing with help of make.balanced function with parameter balance.type that can be filled with 3 options:
fill (default): The union of available time periods over all individuals is taken (w/o NA values). Missing time periods for an individual are identified and corresponding rows (elements for pseries) are inserted and filled with NA for the non–index variables (elements for a pseries). This means, only time periods present for at least one individual are inserted, if missing.
shared.times : The intersect of available time periods over all individuals is taken (w/o NA values). Thus, time periods not available for all individuals are discarded, i. e., only time periods shared by all individuals are left in the result).
shared.individuals: All available time periods are kept and those individuals are dropped for which not all time periods are available, i. e., only individuals shared by all time periods are left in the result (symmetric to “shared.times”).
We use the fill options as it likely matched with our needs
Balancing with fill
balance1 <- df_sasia %>% make.pbalanced(balance.type = "fill")
table(balance1$Country.Name)##
## Afghanistan Bangladesh India Nepal Pakistan Sri Lanka
## 18 18 18 18 18 18
unique(balance1$Year)## [1] 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
## [16] 2020 2021 2022
## 18 Levels: 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 ... 2022
We will save it to an object name balance1.
is.pbalanced(balance1)## [1] TRUE
pdim(balance1)## Balanced Panel: n = 6, T = 18, N = 108
The data is now balanced.
Before we continue to checking the completeness of the data, we have to know how many time information was added on previous step.
# Amount of balanced missing data - Amount of unbalance missing data
colSums(is.na(balance1)) - colSums(is.na(df_sasia))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 12 12
## Social.Support Healthy.Life.Expectancy.At.Birth
## 12 12
## Freedom.To.Make.Life.Choices Generosity
## 12 12
## Perceptions.Of.Corruption Positive.Affect
## 12 12
## Negative.Affect Confidence.In.National.Government
## 12 12
It shows that there are 12 additional row data added for each column from the balancing steps.
We are checking the completeness of data
colSums(is.na(balance1))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 12 13
## Social.Support Healthy.Life.Expectancy.At.Birth
## 12 12
## Freedom.To.Make.Life.Choices Generosity
## 12 16
## Perceptions.Of.Corruption Positive.Affect
## 12 14
## Negative.Affect Confidence.In.National.Government
## 12 19
Based on the inspection results above, we can see that overall there are quite a lot of columns that have missing values
balance1 <-
balance1 %>% select(-Confidence.In.National.Government,-Generosity)To check and filling missing value, we have to interpolate it separately on each country.
afghan <- balance1 %>% filter(Country.Name == "Afghanistan")
colSums(is.na(afghan))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 4 5
## Social.Support Healthy.Life.Expectancy.At.Birth
## 4 4
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 4 4
## Positive.Affect Negative.Affect
## 4 4
We found that there’s some missing value on several columns, lets fill it with na.fill() function with fill = "extend"
afghan <- afghan %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(afghan)## [1] FALSE
bangla <- balance1 %>% filter(Country.Name == "Bangladesh")
colSums(is.na(bangla))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 1 1
## Social.Support Healthy.Life.Expectancy.At.Birth
## 1 1
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 1 1
## Positive.Affect Negative.Affect
## 2 1
We found that there’s some missing value on several columns, lets fill it with na.fill() function with fill = "extend"
bangla <- bangla %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(bangla)## [1] FALSE
india <- balance1 %>% filter(Country.Name == "India")
colSums(is.na(india))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 1 1
## Social.Support Healthy.Life.Expectancy.At.Birth
## 1 1
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 1 1
## Positive.Affect Negative.Affect
## 1 1
We found that there’s some missing value on several columns, lets fill it with na.fill() function with fill = "extend"
india <- india %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(india)## [1] FALSE
nepal <- balance1 %>% filter(Country.Name == "Nepal")
colSums(is.na(nepal))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 1 1
## Social.Support Healthy.Life.Expectancy.At.Birth
## 1 1
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 1 1
## Positive.Affect Negative.Affect
## 1 1
We found that there’s some missing value on several columns, lets fill it with na.fill() function with fill = "extend"
nepal <- nepal %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(nepal)## [1] FALSE
pakis <- balance1 %>% filter(Country.Name == "Pakistan")
colSums(is.na(pakis))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 2 2
## Social.Support Healthy.Life.Expectancy.At.Birth
## 2 2
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 2 2
## Positive.Affect Negative.Affect
## 3 2
pakis <- pakis %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(pakis)## [1] FALSE
srilan <- balance1 %>% filter(Country.Name == "Sri Lanka")
colSums(is.na(srilan))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 3 3
## Social.Support Healthy.Life.Expectancy.At.Birth
## 3 3
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 3 3
## Positive.Affect Negative.Affect
## 3 3
srilan <- srilan %>% mutate(
Life.Ladder = na.fill(Life.Ladder, fill = "extend"),
Log.GDP.Per.Capita = na.fill(Log.GDP.Per.Capita, fill = "extend"),
Social.Support = na.fill(Social.Support, fill = "extend"),
Healthy.Life.Expectancy.At.Birth = na.fill(Healthy.Life.Expectancy.At.Birth, fill = "extend"),
Freedom.To.Make.Life.Choices = na.fill(Freedom.To.Make.Life.Choices, fill = "extend"),
Perceptions.Of.Corruption = na.fill(Perceptions.Of.Corruption, fill = "extend"),
Positive.Affect = na.fill(Positive.Affect, fill = "extend"),
Negative.Affect = na.fill(Negative.Affect, fill = "extend"))
anyNA(srilan)## [1] FALSE
After all of the missing values has been filled, we bind it back altogether and saved it to balanced2
balanced2 <- bind_rows(afghan, bangla, india, nepal, pakis, srilan) Recheck the balance of the data
pdim(balanced2)## Balanced Panel: n = 6, T = 18, N = 108
Checking the completeness of the data
colSums(is.na(balanced2))## Country.Name Year
## 0 0
## Life.Ladder Log.GDP.Per.Capita
## 0 0
## Social.Support Healthy.Life.Expectancy.At.Birth
## 0 0
## Freedom.To.Make.Life.Choices Perceptions.Of.Corruption
## 0 0
## Positive.Affect Negative.Affect
## 0 0
The data is now ready for the next step.
summary(balanced2)## Country.Name Year Life.Ladder Log.GDP.Per.Capita
## Afghanistan:18 2005 : 6 Min. :1.281 Min. :7.324
## Bangladesh :18 2006 : 6 1st Qu.:4.180 1st Qu.:7.961
## India :18 2007 : 6 Median :4.479 Median :8.312
## Nepal :18 2008 : 6 Mean :4.445 Mean :8.344
## Pakistan :18 2009 : 6 3rd Qu.:4.931 3rd Qu.:8.638
## Sri Lanka :18 2010 : 6 Max. :5.982 Max. :9.529
## (Other):72
## Social.Support Healthy.Life.Expectancy.At.Birth Freedom.To.Make.Life.Choices
## Min. :0.2282 Min. :50.50 Min. :0.3352
## 1st Qu.:0.5380 1st Qu.:55.48 1st Qu.:0.6056
## Median :0.6150 Median :59.71 Median :0.7322
## Mean :0.6442 Mean :59.06 Mean :0.6916
## 3rd Qu.:0.7807 3rd Qu.:62.28 3rd Qu.:0.8182
## Max. :0.8737 Max. :67.20 Max. :0.9064
##
## Perceptions.Of.Corruption Positive.Affect Negative.Affect
## Min. :0.6169 Min. :0.1789 Min. :0.1523
## 1st Qu.:0.7676 1st Qu.:0.4740 1st Qu.:0.2340
## Median :0.8210 Median :0.5176 Median :0.2952
## Mean :0.8128 Mean :0.5374 Mean :0.2998
## 3rd Qu.:0.8616 3rd Qu.:0.5897 3rd Qu.:0.3500
## Max. :0.9544 Max. :0.7894 Max. :0.6067
##
From the summary above, we could get some information:
The Highest level of Life Ladder across several country in South Asia is 5.982
The Lowest Level of Life Ladder across several country in South Asia is 1.281
We can use the ggcorplot function to visualize it for convenience
need to unselect categorical and factor variable, in here is Country.Name and Year column
balanced2 %>% select(-Country.Name, -Year) %>% cor() %>% ggcorrplot(type = "lower", lab = TRUE)From the plot result above, we could see that:
We can use coplot() function to gain a better information from our data, with parameters:
formula = filled with target ~ index1 given index2type = "l" for line dan "b" for point & line plotdata = datasetrows = How many row the panel will be plottedcol = color of the plotcoplot(Life.Ladder ~ Year|Country.Name,
type = "b",
data = balanced2,
rows = 1,
col = "red")From line plot above, we could see that:
coplot(Log.GDP.Per.Capita ~ Year|Country.Name,
type = "b",
data = balanced2,
rows = 1,
col = "red")From line plot above, we could see that:
This is the step before creating a model, the data will be splitted into train data and test data. The data has year information therefore, we will split it sequentially by year.
Using filter() function
#creating train data
ladder_train <- balanced2 %>% filter(Year != 2022)
#creating test data
ladder_test <- balanced2 %>% filter(Year == 2022)After that, we have to assure that the train data is balanced, we can do balancing if needed.
ladder_train <- ladder_train %>%
droplevels() %>% # Cleaning 2022 time information
make.pbalanced() # doing rebalance
is.pbalanced(ladder_train)## [1] TRUE
On earlier step, we found that there are some multicollinearities happened between predictor variable, therefore, we will be doing multicollinearities assumption checking by creating a regression model first with lm() function and continue with vif() function.
if:
VIF Value > 10: The model has multicollinearity
VIF Value < 10: The model has no multicollinearity detected
We took the Country.Name and Year out as it is a categorical and factor variable
lm(Life.Ladder ~ .-Country.Name -Year, ladder_train) %>% vif()## Log.GDP.Per.Capita Social.Support
## 6.364906 2.642561
## Healthy.Life.Expectancy.At.Birth Freedom.To.Make.Life.Choices
## 4.346578 1.973927
## Perceptions.Of.Corruption Positive.Affect
## 1.483278 4.696021
## Negative.Affect
## 1.959480
The results is: Model has no Multicollinearity (VIF < 10)
For each model creation, we will be using plm() function from plm package with parameters:
formula = Target ~ Prediktordata = dataframeindex = c(“individual_column”,“time_column”)model =
"pooling" : for CEM model"within" : for FEM model"random" : for REM modelwhere
Create Common Effect Model and saved to an object named cem
cem <- plm(
Life.Ladder ~ Log.GDP.Per.Capita
+ Social.Support
+ Healthy.Life.Expectancy.At.Birth
+ Freedom.To.Make.Life.Choices
+ Perceptions.Of.Corruption
+ Positive.Affect
+ Negative.Affect,
data = ladder_train,
index = c("Country.Name", "Year"),
model = "pooling"
)Create FEM model with additional parameter effect = "twoways", to adding individual and time effect, saved to an object named fem.two
fem <- plm(
Life.Ladder ~ Log.GDP.Per.Capita
+ Social.Support
+ Healthy.Life.Expectancy.At.Birth
+ Freedom.To.Make.Life.Choices
+ Perceptions.Of.Corruption
+ Positive.Affect
+ Negative.Affect,
data = ladder_train,
index = c("Country.Name", "Year"),
model = "within"
)Chow Test is done to choose which model give the best result. To do this, we can use pooltest(model_cem, model_fem) function.
The hypothesis that will be tested are:
H0 will be rejected if P-value < α. The α value is 5%.
pooltest(cem,fem)##
## F statistic
##
## data: Life.Ladder ~ Log.GDP.Per.Capita + Social.Support + Healthy.Life.Expectancy.At.Birth + ...
## F = 24.717, df1 = 5, df2 = 89, p-value = 1.572e-15
## alternative hypothesis: unstability
From the test above, the p-value < α, therefore, the best model to be used in World Happiness data is Fixed Effect Model.
Creating Random Effect Model and saved to an object named rem
rem <- plm(
Life.Ladder ~ Log.GDP.Per.Capita
+ Social.Support
+ Healthy.Life.Expectancy.At.Birth
+ Freedom.To.Make.Life.Choices,
data = ladder_train,
index = c("Country.Name", "Year"),
model = "random"
)Use phtest(model_rem, model_fem) function to do the test with hypothesis:
Decision to reject H0 if p-value < α.
phtest(rem,fem)##
## Hausman Test
##
## data: Life.Ladder ~ Log.GDP.Per.Capita + Social.Support + Healthy.Life.Expectancy.At.Birth + ...
## chisq = 5.0966, df = 4, p-value = 0.2775
## alternative hypothesis: one model is inconsistent
Based on Hausman Test, p-value > α, therefore, we fail to reject the Null Hypothesis (H0), but on this moment, we will just continue with Fixed Effect Model and do assumption test.
The Hypothesis are:
H0 will be rejected if P-value < α. The α value is 5%.
fem$residuals %>% shapiro.test()##
## Shapiro-Wilk normality test
##
## data: .
## W = 0.98697, p-value = 0.4209
Based on the results of the residual normality test, a p-value > 0.05 was obtained, meaning that the residuals were not normally distributed.
The Hypothesis are:
H0 will be rejected if P-value < α. The α value is 5%.
fem %>% bptest()##
## studentized Breusch-Pagan test
##
## data: .
## BP = 11.694, df = 7, p-value = 0.1111
Based on the results of the homogeneity test, a p-value > 0.05 was obtained, meaning that the residuals had a variety that was not homogeneous.
The Hypothesis are:
H0 will be rejected if P-value < α. The α value is 5%.
fem$residuals %>% Box.test(type = "Ljung-Box")##
## Box-Ljung test
##
## data: .
## X-squared = 17.093, df = 1, p-value = 3.559e-05
Based on the results of the autocorrelation test, a p-value < 0.05 was obtained, meaning that there was an autocorrelation problem between the residuals.
summary(fem)## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Life.Ladder ~ Log.GDP.Per.Capita + Social.Support +
## Healthy.Life.Expectancy.At.Birth + Freedom.To.Make.Life.Choices +
## Perceptions.Of.Corruption + Positive.Affect + Negative.Affect,
## data = ladder_train, model = "within", index = c("Country.Name",
## "Year"))
##
## Balanced Panel: n = 6, T = 17, N = 102
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.926489 -0.280156 0.033346 0.276894 1.252755
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Log.GDP.Per.Capita 1.432868 0.651477 2.1994 0.0304402 *
## Social.Support 1.999088 0.787535 2.5384 0.0128742 *
## Healthy.Life.Expectancy.At.Birth -0.253399 0.066212 -3.8271 0.0002406 ***
## Freedom.To.Make.Life.Choices 0.235235 0.477405 0.4927 0.6234118
## Perceptions.Of.Corruption -0.699380 0.955635 -0.7318 0.4661843
## Positive.Affect 0.047065 0.950073 0.0495 0.9606017
## Negative.Affect -1.365632 0.892104 -1.5308 0.1293656
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 23.896
## Residual Sum of Squares: 14.977
## R-Squared: 0.37326
## Adj. R-Squared: 0.28876
## F-statistic: 7.57217 on 7 and 89 DF, p-value: 3.8176e-07
Interpretation:
\[life.ladder = 1.432868 * Log.GDP.Per.Capita + 1.999088 * Social.Support - 0.253399 * Healthy.Life.Expectancy.At.Birth + 0.235235 * Freedom.To.Make.Life.Choices - 0.699380 * Perceptions.Of.Corruption + 0.047065 * Positive.Affect - 1.365632 * Negative.Affect + uit\]
By using fixef(model fem) function
fixef(fem)## Afghanistan Bangladesh India Nepal Pakistan Sri Lanka
## 5.8346 8.1019 6.6609 7.7865 6.8733 6.5465
Interpretation:
Using predict() function to make a prediction with parameters: - object = name of the model - newdata = new data to be predicted
pred <- predict(fem, ladder_test, na.fill = F)We can use MAPE error metric to evaluate if our new model is good or not, with MAPE() function and parameters:
MAPE(y_pred = pred,
y_true = ladder_test$Life.Ladder)## [1] 0.3730909
Insight: The goodness of FEM model to predict is just about 63% (1 - 0.373), we can do some adjustment in the process to gain better result.
From several analysis steps that we have done, the conclusion are:
3.0.1.3 Social Support
From line plot above, we could see that: