ASA Analysis - Cakra Analitika

Data Import

The dataset used in this study captures a twenty–year period of socio-economic and environmental development across ASEAN countries, covering the years 2000 to 2019. In total, the dataset consists of 240 observations, where each entry represents a specific country in a particular year. Through this structure, the data not only reflects temporal changes within each nation but also highlights cross-country differences that characterize the ASEAN region as a whole.

Several key variables are included to provide a comprehensive overview of the determinants of population well-being. Life expectancy, measured in years, serves as the main indicator of public health outcomes and is used as the dependent variable in this study. To explain variations in life expectancy, four main explanatory factors are considered. GDP per capita represents the economic capacity of each country and acts as an indicator of prosperity. Health expenditure per capita reflects the level of investment made by a country toward improving healthcare services and accessibility. Meanwhile, CO₂ damage, expressed as a percentage of gross national income, serves as an indicator of environmental stress and sustainability challenges. Lastly, urban population, measured as the percentage of people living in urban areas, captures the demographic and structural aspects of development that may influence living conditions and access to health facilities.

Together, these variables form a balanced panel dataset that allows for both cross-sectional and longitudinal analysis. The dataset thus provides a rich foundation to examine how economic growth, environmental degradation, public health investment, and urbanization collectively shape life expectancy patterns across the ASEAN region over time.

dfasa <- read.csv("D:\\Documents\\dataset_ASEAN_interpolated1.csv")
head(dfasa)

##        Country.Name Country.Code Year CO2_damage Health_expenditure
## 1 Brunei Darussalam          BRN 2000   1.459899           2.547906
## 2 Brunei Darussalam          BRN 2001   1.631936           2.546511
## 3 Brunei Darussalam          BRN 2002   1.593368           2.534479
## 4 Brunei Darussalam          BRN 2003   1.768876           2.602606
## 5 Brunei Darussalam          BRN 2004   1.429289           2.551934
## 6 Brunei Darussalam          BRN 2005   1.218064           2.233862
##   GDP_per_capita Life_expectancy Urban_population
## 1       20130.26          74.017           71.164
## 2       18287.83          74.209           71.652
## 3       18621.29          74.365           72.046
## 4       20677.90          74.509           72.421
## 5       24423.09          74.603           72.794
## 6       29386.27          74.683           73.163

MULTICOLLINEARITY CHECK

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(car)

## Warning: package 'car' was built under R version 4.4.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.4.3

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

# Model per tahun otomatis dari 2000-2023
models <- list()
for (yr in 2000:2023) {
  models[[as.character(yr)]] <- lm(Life_expectancy ~ GDP_per_capita + Health_expenditure +
                                     Urban_population + CO2_damage,
                                   data = dfasa %>% filter(Year == yr))
}

# Model gabungan seluruh tahun
models[["2000-2023"]] <- lm(Life_expectancy ~ GDP_per_capita + Health_expenditure +
                              Urban_population + CO2_damage, data = dfasa)

# Hitung VIF tiap model
vif_list <- lapply(models, function(m) as.vector(vif(m)))

# Gabungkan hasil VIF jadi satu tabel seperti contohmu
Multikol <- do.call(rbind, vif_list)

# Tambahkan nama baris (rownames) dan kolom
rownames(Multikol) <- c(paste("Tahun", 2000:2023), "Tahun 2000-2023")
colnames(Multikol) <- c("GDP_per_capita", "Health_expenditure", "Urban_population", "CO2_damage")

# Lihat hasil
Multikol

##                 GDP_per_capita Health_expenditure Urban_population CO2_damage
## Tahun 2000            7.203887           1.733582         7.004671   1.906001
## Tahun 2001            7.317453           1.625378         7.291937   1.725218
## Tahun 2002            7.479406           1.515481         7.555937   1.539287
## Tahun 2003            6.435008           1.573591         6.331637   1.666421
## Tahun 2004            7.237533           2.231321         6.811370   2.435341
## Tahun 2005            7.129083           3.206209         5.356053   3.348552
## Tahun 2006            6.165364           3.243916         4.464650   3.022122
## Tahun 2007            8.975299           3.115019         6.055326   3.278438
## Tahun 2008            4.853444           1.782515         4.597213   1.773481
## Tahun 2009            5.832654           1.573185         6.343632   1.566299
## Tahun 2010            6.864312           1.374944         6.987727   1.600680
## Tahun 2011            5.937848           1.263848         5.802072   1.508507
## Tahun 2012            6.269482           1.371305         5.856654   1.661384
## Tahun 2013            7.647793           1.622898         6.890945   2.224761
## Tahun 2014            8.492856           1.505473         7.545215   2.269759
## Tahun 2015            8.747136           2.222964         6.923073   3.284422
## Tahun 2016            5.087380           2.735481         4.902905   2.899405
## Tahun 2017            4.374889           2.752192         5.134873   2.740985
## Tahun 2018            4.362871           3.490844         5.103778   3.606737
## Tahun 2019            4.444851           2.580416         5.080516   2.634783
## Tahun 2020            3.936497           2.414880         4.588157   2.645581
## Tahun 2021            3.402350           2.004583         5.009454   2.060641
## Tahun 2022            3.319331           1.613806         4.264898   1.778160
## Tahun 2023            3.261713           1.605007         4.155280   1.770310
## Tahun 2000-2023       3.394062           1.207256         3.381087   1.212925

Overall, the multicollinearity inspection suggests that there is no severe multicollinearity problem among the independent variables, as the variation across years remains distinguishable and no consistent pattern of excessively high interdependence is observed. This condition ensures that the regression estimates derived from the panel data model are statistically reliable and not distorted by redundancy among explanatory variables.

VARIABLES STANDARDIZATION

Before performing the regression and predictive modeling, all independent variables were standardized using the z-score transformation, while the dependent variable (Life expectancy) was kept in its original scale (years). The standardization process aims to eliminate unit differences among the explanatory variables such as GDP per capita (in USD), CO₂ damage (in percentage of GNI), health expenditure (in percentage of GDP), and urban population (in percentage of total population) so that each variable contributes comparably to the estimation process. The dependent variable (Life expectancy) was intentionally not standardized, as it represents a directly interpretable outcome in years. Keeping the dependent variable in its natural unit allows the interpretation of predicted values in meaningful real world terms for instance, an increase of one predicted unit corresponds to an additional year of life expectancy.

dfasaaa <- data.frame(
  Country.Name = dfasa$Country.Name,
  Year = dfasa$Year,
  Life_expectancy = dfasa$Life_expectancy,
  GDP_per_capita = scale(dfasa$GDP_per_capita),
  CO2_damage = scale(dfasa$CO2_damage),
  Health_expenditure = scale(dfasa$Health_expenditure),
  Urban_population = scale(dfasa$Urban_population)
)
head(dfasaaa)

##        Country.Name Year Life_expectancy GDP_per_capita  CO2_damage
## 1 Brunei Darussalam 2000          74.017      0.5584252 -0.32186290
## 2 Brunei Darussalam 2001          74.209      0.4519184 -0.16597804
## 3 Brunei Darussalam 2002          74.365      0.4711952 -0.20092483
## 4 Brunei Darussalam 2003          74.509      0.5900832 -0.04189456
## 5 Brunei Darussalam 2004          74.603      0.8065846 -0.34959916
## 6 Brunei Darussalam 2005          74.683      1.0934949 -0.54099369
##   Health_expenditure Urban_population
## 1         -0.7306505        0.8828937
## 2         -0.7314279        0.9029805
## 3         -0.7381293        0.9191982
## 4         -0.7001833        0.9346338
## 5         -0.7284073        0.9499870
## 6         -0.9055715        0.9651756

DETERMINING THE BEST TYPE

Common Effect Model

The Common Effect Model (CEM), also known as Pooled Least Squares, is one of the models in panel data regression that combines time series and cross-sectional data into a single entity.

library(plm)

## Warning: package 'plm' was built under R version 4.4.3

## 
## Attaching package: 'plm'

## The following objects are masked from 'package:dplyr':
## 
##     between, lag, lead

cem <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage,
           data=dfasaaa,
           model="pooling")

summary(cem)

## Pooling Model
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, model = "pooling")
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -7.28074 -1.48082 -0.36574  0.84318  6.86816 
## 
## Coefficients:
##                    Estimate Std. Error  t-value  Pr(>|t|)    
## (Intercept)        70.16647    0.18071 388.2843 < 2.2e-16 ***
## GDP_per_capita      0.92729    0.33362   2.7795  0.005885 ** 
## Health_expenditure  0.88007    0.19897   4.4232 1.488e-05 ***
## Urban_population    4.98226    0.33298  14.9627 < 2.2e-16 ***
## CO2_damage          1.14119    0.19944   5.7221 3.189e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    9041.6
## Residual Sum of Squares: 1841.8
## R-Squared:      0.7963
## Adj. R-Squared: 0.79283
## F-statistic: 229.663 on 4 and 235 DF, p-value: < 2.22e-16

All explanatory variables in the Pooling Model (GDP per capita, health expenditure, urban population, and CO₂ damage) have a significant effect on the response variable, life expectancy, at the 5% significance level. The R-Squared value of 0.7963 and Adjusted R-Squared of 0.7928 indicate that these four explanatory variables collectively explain approximately 79.28% of the variation in life expectancy across ASEAN countries during the 2000–2023 period, while the remaining variation is influenced by other factors outside the model.

Fixed Effect Model

The fixed effect approach is that an object has a constant value that remains the same over different periods of time. Likewise, its regression coefficients remain constant over time. The fixed effect model is a model with different intercepts for each individual, but the slope for each subject does not change over time.

# fem ind
fem.ind <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, 
               model = "within",effect= "individual", index = c("Country.Name","Year"))

summary(fem.ind)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "individual", 
##     model = "within", index = c("Country.Name", "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -6.19495 -0.63789  0.29316  0.93060  2.62620 
## 
## Coefficients:
##                    Estimate Std. Error t-value  Pr(>|t|)    
## GDP_per_capita      1.11142    0.25117   4.425 1.500e-05 ***
## Health_expenditure  0.35590    0.15113   2.355   0.01938 *  
## Urban_population    7.98016    0.69376  11.503 < 2.2e-16 ***
## CO2_damage          0.70222    0.12978   5.411 1.595e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    1168.9
## Residual Sum of Squares: 505.02
## R-Squared:      0.56795
## Adj. R-Squared: 0.5431
## F-statistic: 74.2718 on 4 and 226 DF, p-value: < 2.22e-16

All explanatory variables in the fixed effect model significantly influence the response variable (life expectancy) at the 5% significance level. The model obtained an R-squared value of 0.5679 and an adjusted R-squared of 0.5431, indicating that the explanatory variables collectively explain 54.31% of the variation in life expectancy across ASEAN countries during the study period.

# fem time
fem.time <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, model = "within", effect= "time", index = c("Country.Name","Year"))

summary(fem.time)

## Oneway (time) effect Within Model
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "time", 
##     model = "within", index = c("Country.Name", "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -6.88296 -1.87393 -0.36326  0.90597  7.77869 
## 
## Coefficients:
##                    Estimate Std. Error t-value  Pr(>|t|)    
## GDP_per_capita      0.90584    0.33660  2.6912  0.007688 ** 
## Health_expenditure  0.97048    0.22880  4.2416 3.315e-05 ***
## Urban_population    5.00939    0.32498 15.4145 < 2.2e-16 ***
## CO2_damage          1.43201    0.24595  5.8223 2.128e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8234.3
## Residual Sum of Squares: 1571.4
## R-Squared:      0.80916
## Adj. R-Squared: 0.78486
## F-statistic: 224.725 on 4 and 212 DF, p-value: < 2.22e-16

All explanatory variables in the time fixed effect model have a significant influence on the response variable (Life Expectancy) at the 5% significance level. The model yields an R-Squared value of 0.80916 and an Adjusted R-Squared of 0.78486, indicating that the explanatory variables collectively explain 78.49% of the variation in life expectancy across ASEAN countries over time.

fem.twoway <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, model = "within", effect= "twoway", index = c("Country.Name","Year"))

summary(fem.twoway)

## Twoways effects Within Model
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "twoway", 
##     model = "within", index = c("Country.Name", "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -5.528601 -0.556909 -0.052402  0.683020  2.858952 
## 
## Coefficients:
##                     Estimate Std. Error t-value  Pr(>|t|)    
## GDP_per_capita     -0.080596   0.232723 -0.3463   0.72946    
## Health_expenditure  0.227517   0.129484  1.7571   0.08041 .  
## Urban_population    0.517603   0.930890  0.5560   0.57880    
## CO2_damage          1.031358   0.118834  8.6790 1.303e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    361.61
## Residual Sum of Squares: 259.52
## R-Squared:      0.28233
## Adj. R-Squared: 0.15506
## F-statistic: 19.9648 on 4 and 203 DF, p-value: 7.0558e-14

In the two-way fixed effects model, only CO₂ damage has a statistically significant effect on life expectancy at the 5% significance level. The model yields an R-squared value of 0.2823 and an adjusted R-squared of 0.1551, indicating that the explanatory variables collectively explain about 15.51% of the variation in life expectancy across ASEAN countries from 2000 to 2023.

Random Effect Model

Random effect is an approach for estimating panel data where the residuals may be correlated across time and individuals. In a random effect model, parameters that differ across individuals and over time are included in the error term, which is why this model is also referred to as an error component model.

# rem individual
rem_ind <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, index = c("Country.Name", "Year"), effect = "individual", model = "random", random.method = "amemiya")

summary(rem_ind)

## Oneway (individual) effect Random Effect Model 
##    (Amemiya's transformation)
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "individual", 
##     model = "random", random.method = "amemiya", index = c("Country.Name", 
##         "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Effects:
##                  var std.dev share
## idiosyncratic  2.196   1.482 0.114
## individual    17.012   4.125 0.886
## theta: 0.9269
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -6.07917 -0.73942  0.29385  0.89573  2.83389 
## 
## Coefficients:
##                    Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept)        70.16647    1.31747 53.2583 < 2.2e-16 ***
## GDP_per_capita      1.07228    0.24958  4.2964 1.736e-05 ***
## Health_expenditure  0.41357    0.14858  2.7835  0.005378 ** 
## Urban_population    7.27416    0.61964 11.7393 < 2.2e-16 ***
## CO2_damage          0.74289    0.12827  5.7917 6.967e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    1211
## Residual Sum of Squares: 523.67
## R-Squared:      0.56757
## Adj. R-Squared: 0.56021
## Chisq: 308.441 on 4 DF, p-value: < 2.22e-16

All explanatory variables in the Random Effect Model significantly affect the response variable (Life Expectancy) at the 5% significance level. The model yields an R-squared value of 0.5676 and an adjusted R-squared of 0.5602, indicating that the explanatory variables collectively explain about 56.02% of the variation in life expectancy across ASEAN countries.

# rem time
rem_time <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, index = c("Country.Name", "Year"), effect = "time", model = "random", random.method = "amemiya")

summary(rem_time)

## Oneway (time) effect Random Effect Model 
##    (Amemiya's transformation)
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "time", 
##     model = "random", random.method = "amemiya", index = c("Country.Name", 
##         "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Effects:
##                  var std.dev share
## idiosyncratic 7.2750  2.6972 0.938
## time          0.4801  0.6929 0.062
## theta: 0.2238
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -7.16208 -1.53896 -0.30969  0.73604  6.64378 
## 
## Coefficients:
##                    Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept)        70.16647    0.22617 310.2445 < 2.2e-16 ***
## GDP_per_capita      0.90602    0.32765   2.7652  0.005688 ** 
## Health_expenditure  0.89542    0.20338   4.4028 1.069e-05 ***
## Urban_population    4.99076    0.32390  15.4085 < 2.2e-16 ***
## CO2_damage          1.22892    0.20892   5.8823 4.045e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    8720.7
## Residual Sum of Squares: 1738
## R-Squared:      0.8007
## Adj. R-Squared: 0.79731
## Chisq: 944.146 on 4 DF, p-value: < 2.22e-16

All explanatory variables in the Random Effect Model (REM) significantly influence the response variable, life expectancy, at the 5% significance level. The model yields an R-squared value of 0.8007 and an adjusted R-squared of 0.7973, indicating that the explanatory variables collectively explain about 79.73% of the variation in life expectancy across ASEAN countries from 2000 to 2023.

# rem two ways
rem_twoway <- plm(Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population + CO2_damage, data = dfasaaa, index = c("Country.Name", "Year"), effect = "twoway", model = "random", random.method = "amemiya")

summary(rem_twoway)

## Twoways effects Random Effect Model 
##    (Amemiya's transformation)
## 
## Call:
## plm(formula = Life_expectancy ~ GDP_per_capita + Health_expenditure + 
##     Urban_population + CO2_damage, data = dfasaaa, effect = "twoway", 
##     model = "random", random.method = "amemiya", index = c("Country.Name", 
##         "Year"))
## 
## Balanced Panel: n = 10, T = 24, N = 240
## 
## Effects:
##                  var std.dev share
## idiosyncratic  1.254   1.120 0.037
## individual    30.290   5.504 0.887
## time           2.587   1.609 0.076
## theta: 0.9585 (id) 0.785 (time) 0.7843 (total)
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -5.832106 -0.577973  0.054211  0.799038  2.342464 
## 
## Coefficients:
##                    Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept)        70.16647    1.76544 39.7446 < 2.2e-16 ***
## GDP_per_capita      0.23912    0.21540  1.1101 0.2669416    
## Health_expenditure  0.30849    0.12498  2.4684 0.0135730 *  
## Urban_population    2.54849    0.75695  3.3668 0.0007605 ***
## CO2_damage          1.04892    0.11522  9.1040 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    412.47
## Residual Sum of Squares: 292.25
## R-Squared:      0.29146
## Adj. R-Squared: 0.2794
## Chisq: 96.6688 on 4 DF, p-value: < 2.22e-16

All explanatory variables in the Random Effect Model significantly influence the response variable (Life Expectancy) at the 5% significance level, except for GDP per capita. The model yields an R-squared value of 0.2915 and an adjusted R-squared of 0.2794, indicating that the explanatory variables collectively explain approximately 27.94% of the variation in life expectancy across ASEAN countries.

Determining The Best Model

Uji Chow

Hypothesis

H0 : Common Effect Model (CEM) is better to use

H1 : Fixed Effect Model (FEM) is better to use

# cem vs fem two-way
pooltest(cem, fem.twoway)

## 
##  F statistic
## 
## data:  Life_expectancy ~ GDP_per_capita + Health_expenditure + Urban_population +  ...
## F = 38.677, df1 = 32, df2 = 203, p-value < 2.2e-16
## alternative hypothesis: unstability