Intro

Dataset yang akan saya gunakan dalam analisi regresi berikut ini adalah data mengenai angka harapan hidup dari 193 negara di dunia untuk beberapa tahun kebelakangan. Angka Harapan Hidup merupakan alat untuk mengevaluasi kinerja pemerintah dalam meningkatkan kesejahteraan penduduk pada umumnya, dan meningkatkan derajat kesehatan pada khususnya. sehingga tujuan saya kali ini yaitu membuat model untuk menganalisis variable mana yang lebih berpengaruh terhadap angka harapan hidup, sehingga apabila pemerintah ingin meningkatkan angka harapan hidup bisa memfokuskan ke variable tersebut. lalu tujuan berikutnya adalah membuat model untuk memprediksi angka harapan hidup berdasarkan prediktor prediktor yang ada
Preparation
Load data
life <- read.csv("lifee.csv")
head(life, 10)
## 'data.frame': 2938 obs. of 22 variables:
## $ Country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Year : int 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 ...
## $ Status : chr "Developing" "Developing" "Developing" "Developing" ...
## $ Life.expectancy : num 65 59.9 59.9 59.5 59.2 58.8 58.6 58.1 57.5 57.3 ...
## $ Adult.Mortality : int 263 271 268 272 275 279 281 287 295 295 ...
## $ infant.deaths : int 62 64 66 69 71 74 77 80 82 84 ...
## $ Alcohol : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.02 0.03 ...
## $ percentage.expenditure : num 71.3 73.5 73.2 78.2 7.1 ...
## $ Hepatitis.B : int 65 62 64 67 68 66 63 64 63 64 ...
## $ Measles : int 1154 492 430 2787 3013 1989 2861 1599 1141 1990 ...
## $ BMI : num 19.1 18.6 18.1 17.6 17.2 16.7 16.2 15.7 15.2 14.7 ...
## $ under.five.deaths : int 83 86 89 93 97 102 106 110 113 116 ...
## $ Polio : int 6 58 62 67 68 66 63 64 63 58 ...
## $ Total.expenditure : num 8.16 8.18 8.13 8.52 7.87 9.2 9.42 8.33 6.73 7.43 ...
## $ Diphtheria : int 65 62 64 67 68 66 63 64 63 58 ...
## $ HIV.AIDS : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
## $ GDP : num 584.3 612.7 631.7 670 63.5 ...
## $ Population : num 33736494 327582 31731688 3696958 2978599 ...
## $ thinness..1.19.years : num 17.2 17.5 17.7 17.9 18.2 18.4 18.6 18.8 19 19.2 ...
## $ thinness.5.9.years : num 17.3 17.5 17.7 18 18.2 18.4 18.7 18.9 19.1 19.3 ...
## $ Income.composition.of.resources: num 0.479 0.476 0.47 0.463 0.454 0.448 0.434 0.433 0.415 0.405 ...
## $ Schooling : num 10.1 10 9.9 9.8 9.5 9.2 8.9 8.7 8.4 8.1 ...
Cek Na
## Country Year
## 0 0
## Status Life.expectancy
## 0 10
## Adult.Mortality infant.deaths
## 10 0
## Alcohol percentage.expenditure
## 194 0
## Hepatitis.B Measles
## 553 0
## BMI under.five.deaths
## 34 0
## Polio Total.expenditure
## 19 226
## Diphtheria HIV.AIDS
## 19 0
## GDP Population
## 448 652
## thinness..1.19.years thinness.5.9.years
## 34 34
## Income.composition.of.resources Schooling
## 167 163
Cek Outlier
boxplot(life %>% select(-c(Country, Year, Status)))

Dealing With NA and Outlier
life_clean <- life %>%
select(-c(Country, Year)) %>%
mutate(Status = as.factor(Status)) %>%
filter(thinness..1.19.years <= 20) %>%
na.omit()
Modeling
RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(1234)
index <- sample(nrow(life_clean), nrow(life_clean)*0.75)
train = life_clean[index, ]
test = life_clean[-index, ]
model1 <- lm(Life.expectancy~., data = train)
summary(model1)
##
## Call:
## lm(formula = Life.expectancy ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9988 -2.1057 0.0423 2.2513 11.9603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.425e+01 9.883e-01 54.895 < 2e-16 ***
## StatusDeveloping -1.064e+00 3.914e-01 -2.718 0.00666 **
## Adult.Mortality -1.659e-02 1.069e-03 -15.523 < 2e-16 ***
## infant.deaths 9.456e-02 1.500e-02 6.306 4.01e-10 ***
## Alcohol -9.712e-02 3.888e-02 -2.498 0.01263 *
## percentage.expenditure 1.514e-04 2.371e-04 0.639 0.52318
## Hepatitis.B -6.521e-03 5.287e-03 -1.234 0.21761
## Measles -1.095e-05 1.787e-05 -0.613 0.54027
## BMI 3.616e-02 6.993e-03 5.172 2.71e-07 ***
## under.five.deaths -7.091e-02 1.009e-02 -7.025 3.57e-12 ***
## Polio 1.162e-02 5.864e-03 1.981 0.04781 *
## Total.expenditure 9.315e-02 4.893e-02 1.904 0.05718 .
## Diphtheria 1.507e-02 6.807e-03 2.214 0.02704 *
## HIV.AIDS -4.349e-01 1.939e-02 -22.427 < 2e-16 ***
## GDP 3.986e-05 3.725e-05 1.070 0.28471
## Population -1.819e-09 3.795e-09 -0.479 0.63180
## thinness..1.19.years 3.413e-02 6.239e-02 0.547 0.58447
## thinness.5.9.years -7.095e-02 6.183e-02 -1.147 0.25144
## Income.composition.of.resources 9.922e+00 9.818e-01 10.106 < 2e-16 ***
## Schooling 8.346e-01 6.953e-02 12.003 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.605 on 1205 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8373
## F-statistic: 332.6 on 19 and 1205 DF, p-value: < 2.2e-16
Berdasarkan summary dari model lm, terdapat 4 prediktor yang paling berpengaruh terhadap angka harapan hidup, dua berpengaruh positif, yaitu pendapatan perkapita dan tingkat pendidikan, dan dua lagi berpengaruh negatif yaitu tingkat kematian orang dewasa dan angka HIV AIDS di negara tersebut
Create New Model for resolve this issue
Try using Both Direction by Step
modelboth <- step(model1, direction = "both")
## Start: AIC=3161.75
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + under.five.deaths + Polio + Total.expenditure + Diphtheria +
## HIV.AIDS + GDP + Population + thinness..1.19.years + thinness.5.9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - Population 1 3.0 15666 3160.0
## - thinness..1.19.years 1 3.9 15667 3160.1
## - Measles 1 4.9 15668 3160.1
## - percentage.expenditure 1 5.3 15668 3160.2
## - GDP 1 14.9 15678 3160.9
## - thinness.5.9.years 1 17.1 15680 3161.1
## - Hepatitis.B 1 19.8 15683 3161.3
## <none> 15663 3161.8
## - Total.expenditure 1 47.1 15710 3163.4
## - Polio 1 51.0 15714 3163.7
## - Diphtheria 1 63.7 15727 3164.7
## - Alcohol 1 81.1 15744 3166.1
## - Status 1 96.1 15759 3167.2
## - BMI 1 347.7 16011 3186.6
## - infant.deaths 1 516.9 16180 3199.5
## - under.five.deaths 1 641.5 16305 3208.9
## - Income.composition.of.resources 1 1327.5 16991 3259.4
## - Schooling 1 1872.8 17536 3298.1
## - Adult.Mortality 1 3132.2 18795 3383.1
## - HIV.AIDS 1 6537.6 22201 3587.0
##
## Step: AIC=3159.98
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + under.five.deaths + Polio + Total.expenditure + Diphtheria +
## HIV.AIDS + GDP + thinness..1.19.years + thinness.5.9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - thinness..1.19.years 1 3.6 15670 3158.3
## - Measles 1 4.2 15670 3158.3
## - percentage.expenditure 1 5.3 15671 3158.4
## - GDP 1 14.8 15681 3159.1
## - thinness.5.9.years 1 16.4 15682 3159.3
## - Hepatitis.B 1 19.6 15686 3159.5
## <none> 15666 3160.0
## - Total.expenditure 1 48.2 15714 3161.7
## + Population 1 3.0 15663 3161.8
## - Polio 1 51.2 15717 3162.0
## - Diphtheria 1 63.0 15729 3162.9
## - Alcohol 1 82.5 15749 3164.4
## - Status 1 98.7 15765 3165.7
## - BMI 1 348.0 16014 3184.9
## - infant.deaths 1 530.9 16197 3198.8
## - under.five.deaths 1 651.0 16317 3207.9
## - Income.composition.of.resources 1 1325.2 16991 3257.5
## - Schooling 1 1869.8 17536 3296.1
## - Adult.Mortality 1 3145.0 18811 3382.1
## - HIV.AIDS 1 6536.0 22202 3585.1
##
## Step: AIC=3158.26
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + under.five.deaths + Polio + Total.expenditure + Diphtheria +
## HIV.AIDS + GDP + thinness.5.9.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - Measles 1 4.1 15674 3156.6
## - percentage.expenditure 1 5.3 15675 3156.7
## - GDP 1 14.8 15684 3157.4
## - Hepatitis.B 1 19.2 15689 3157.8
## - thinness.5.9.years 1 23.3 15693 3158.1
## <none> 15670 3158.3
## + thinness..1.19.years 1 3.6 15666 3160.0
## - Total.expenditure 1 48.3 15718 3160.0
## + Population 1 2.7 15667 3160.1
## - Polio 1 54.0 15724 3160.5
## - Diphtheria 1 62.2 15732 3161.1
## - Alcohol 1 85.7 15755 3162.9
## - Status 1 99.6 15769 3164.0
## - BMI 1 345.5 16015 3183.0
## - infant.deaths 1 529.2 16199 3196.9
## - under.five.deaths 1 648.5 16318 3205.9
## - Income.composition.of.resources 1 1322.1 16992 3255.5
## - Schooling 1 1866.3 17536 3294.1
## - Adult.Mortality 1 3150.4 18820 3380.7
## - HIV.AIDS 1 6534.5 22204 3583.2
##
## Step: AIC=3156.59
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + BMI + under.five.deaths +
## Polio + Total.expenditure + Diphtheria + HIV.AIDS + GDP +
## thinness.5.9.years + Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - percentage.expenditure 1 5.3 15679 3155.0
## - GDP 1 15.0 15689 3155.8
## - Hepatitis.B 1 18.2 15692 3156.0
## - thinness.5.9.years 1 22.2 15696 3156.3
## <none> 15674 3156.6
## + Measles 1 4.1 15670 3158.3
## + thinness..1.19.years 1 3.5 15670 3158.3
## - Total.expenditure 1 48.4 15722 3158.4
## + Population 1 2.0 15672 3158.4
## - Polio 1 53.4 15727 3158.7
## - Diphtheria 1 61.4 15735 3159.4
## - Alcohol 1 86.7 15760 3161.3
## - Status 1 99.5 15773 3162.3
## - BMI 1 347.0 16021 3181.4
## - infant.deaths 1 569.8 16244 3198.3
## - under.five.deaths 1 673.9 16348 3206.2
## - Income.composition.of.resources 1 1323.7 16998 3253.9
## - Schooling 1 1866.8 17541 3292.4
## - Adult.Mortality 1 3146.7 18821 3378.7
## - HIV.AIDS 1 6545.9 22220 3582.1
##
## Step: AIC=3155
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + Hepatitis.B + BMI + under.five.deaths + Polio +
## Total.expenditure + Diphtheria + HIV.AIDS + GDP + thinness.5.9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - Hepatitis.B 1 19.7 15699 3154.5
## - thinness.5.9.years 1 21.9 15701 3154.7
## <none> 15679 3155.0
## + percentage.expenditure 1 5.3 15674 3156.6
## + Measles 1 4.1 15675 3156.7
## + thinness..1.19.years 1 3.5 15676 3156.7
## + Population 1 2.1 15677 3156.8
## - Total.expenditure 1 50.6 15730 3156.9
## - Polio 1 52.0 15731 3157.1
## - Diphtheria 1 62.6 15742 3157.9
## - Alcohol 1 88.6 15768 3159.9
## - Status 1 102.3 15781 3161.0
## - BMI 1 345.7 16025 3179.7
## - GDP 1 452.0 16131 3187.8
## - infant.deaths 1 569.0 16248 3196.7
## - under.five.deaths 1 673.3 16352 3204.5
## - Income.composition.of.resources 1 1318.6 16998 3251.9
## - Schooling 1 1864.2 17543 3290.6
## - Adult.Mortality 1 3151.3 18831 3377.4
## - HIV.AIDS 1 6543.3 22222 3580.2
##
## Step: AIC=3154.54
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + BMI + under.five.deaths + Polio + Total.expenditure +
## Diphtheria + HIV.AIDS + GDP + thinness.5.9.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - thinness.5.9.years 1 25.0 15724 3154.5
## <none> 15699 3154.5
## + Hepatitis.B 1 19.7 15679 3155.0
## - Diphtheria 1 43.8 15743 3156.0
## - Polio 1 44.0 15743 3156.0
## + percentage.expenditure 1 6.9 15692 3156.0
## + thinness..1.19.years 1 3.2 15696 3156.3
## + Measles 1 3.0 15696 3156.3
## - Total.expenditure 1 48.9 15748 3156.4
## + Population 1 2.0 15697 3156.4
## - Alcohol 1 82.1 15781 3158.9
## - Status 1 94.8 15794 3159.9
## - BMI 1 338.9 16038 3178.7
## - GDP 1 468.3 16167 3188.6
## - infant.deaths 1 567.6 16266 3196.0
## - under.five.deaths 1 671.1 16370 3203.8
## - Income.composition.of.resources 1 1327.4 17026 3252.0
## - Schooling 1 1852.1 17551 3289.2
## - Adult.Mortality 1 3167.9 18867 3377.7
## - HIV.AIDS 1 6523.7 22223 3578.3
##
## Step: AIC=3154.49
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + BMI + under.five.deaths + Polio + Total.expenditure +
## Diphtheria + HIV.AIDS + GDP + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## <none> 15724 3154.5
## + thinness.5.9.years 1 25.0 15699 3154.5
## + Hepatitis.B 1 22.8 15701 3154.7
## + thinness..1.19.years 1 11.9 15712 3155.6
## - Diphtheria 1 43.1 15767 3155.8
## - Polio 1 44.0 15768 3155.9
## + percentage.expenditure 1 6.7 15717 3156.0
## + Measles 1 2.0 15722 3156.3
## + Population 1 1.7 15722 3156.4
## - Total.expenditure 1 54.3 15778 3156.7
## - Alcohol 1 71.7 15796 3158.1
## - Status 1 95.2 15819 3159.9
## - BMI 1 468.3 16192 3188.4
## - GDP 1 474.1 16198 3188.9
## - infant.deaths 1 566.8 16291 3195.9
## - under.five.deaths 1 672.0 16396 3203.8
## - Income.composition.of.resources 1 1362.4 17086 3254.3
## - Schooling 1 1862.5 17586 3289.6
## - Adult.Mortality 1 3203.3 18927 3379.6
## - HIV.AIDS 1 6582.1 22306 3580.8
##
## Call:
## lm(formula = Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + BMI + under.five.deaths + Polio + Total.expenditure +
## Diphtheria + HIV.AIDS + GDP + Income.composition.of.resources +
## Schooling, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.993 -2.065 0.020 2.270 11.530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.377e+01 9.162e-01 58.693 < 2e-16 ***
## StatusDeveloping -1.050e+00 3.879e-01 -2.708 0.00687 **
## Adult.Mortality -1.671e-02 1.064e-03 -15.707 < 2e-16 ***
## infant.deaths 8.940e-02 1.353e-02 6.607 5.87e-11 ***
## Alcohol -8.970e-02 3.817e-02 -2.350 0.01894 *
## BMI 3.909e-02 6.508e-03 6.006 2.52e-09 ***
## under.five.deaths -6.806e-02 9.461e-03 -7.194 1.10e-12 ***
## Polio 1.062e-02 5.769e-03 1.841 0.06582 .
## Total.expenditure 9.945e-02 4.863e-02 2.045 0.04106 *
## Diphtheria 1.105e-02 6.064e-03 1.822 0.06865 .
## HIV.AIDS -4.349e-01 1.931e-02 -22.515 < 2e-16 ***
## GDP 6.412e-05 1.061e-05 6.042 2.02e-09 ***
## Income.composition.of.resources 9.999e+00 9.761e-01 10.244 < 2e-16 ***
## Schooling 8.296e-01 6.927e-02 11.977 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.603 on 1211 degrees of freedom
## Multiple R-squared: 0.8392, Adjusted R-squared: 0.8375
## F-statistic: 486.3 on 13 and 1211 DF, p-value: < 2.2e-16
Try to didnt use multicolinier Predictor
modelnocoliniear <- lm(Life.expectancy ~ Status + Adult.Mortality +
Alcohol + Hepatitis.B + Measles +
BMI + Polio + Total.expenditure + Diphtheria +
HIV.AIDS + Population + thinness..1.19.years + thinness.5.9.years +
Income.composition.of.resources + Schooling, data = train)
summary(modelnocoliniear)
##
## Call:
## lm(formula = Life.expectancy ~ Status + Adult.Mortality + Alcohol +
## Hepatitis.B + Measles + BMI + Polio + Total.expenditure +
## Diphtheria + HIV.AIDS + Population + thinness..1.19.years +
## thinness.5.9.years + Income.composition.of.resources + Schooling,
## data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.2957 -2.2698 0.0644 2.3093 11.1739
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.299e+01 9.994e-01 53.026 < 2e-16 ***
## StatusDeveloping -1.613e+00 3.955e-01 -4.079 4.83e-05 ***
## Adult.Mortality -1.764e-02 1.105e-03 -15.970 < 2e-16 ***
## Alcohol -1.437e-01 3.922e-02 -3.665 0.000258 ***
## Hepatitis.B -8.212e-03 5.460e-03 -1.504 0.132888
## Measles -1.249e-05 1.509e-05 -0.828 0.407808
## BMI 3.641e-02 7.235e-03 5.032 5.58e-07 ***
## Polio 1.559e-02 6.046e-03 2.579 0.010028 *
## Total.expenditure 1.142e-01 5.049e-02 2.261 0.023936 *
## Diphtheria 2.124e-02 7.030e-03 3.021 0.002568 **
## HIV.AIDS -4.331e-01 2.014e-02 -21.501 < 2e-16 ***
## Population -1.087e-09 3.652e-09 -0.298 0.766054
## thinness..1.19.years 8.002e-03 6.467e-02 0.124 0.901558
## thinness.5.9.years -5.821e-02 6.416e-02 -0.907 0.364416
## Income.composition.of.resources 1.077e+01 1.011e+00 10.654 < 2e-16 ***
## Schooling 9.192e-01 7.127e-02 12.897 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.746 on 1209 degrees of freedom
## Multiple R-squared: 0.8265, Adjusted R-squared: 0.8244
## F-statistic: 384 on 15 and 1209 DF, p-value: < 2.2e-16
Predicting and Evaluate error
pred1 <- predict(model1, newdata = test)
predmodelboth <- predict(modelboth, newdata = test)
predmodelnocoliniear <- predict(modelnocoliniear, newdata = test)
preddtree <- predict(modeldtree, newdata = test)
predrf <- predict(modelrf, newdata = test)
predrflong <- predict(modelrflong, newdata = test)
RMSE
RMSE(pred1, obs = test$Life.expectancy)
## [1] 3.58042
RMSE(predmodelboth, obs = test$Life.expectancy)
## [1] 3.601059
RMSE(predmodelnocoliniear, obs = test$Life.expectancy)
## [1] 3.76168
RMSE(preddtree, obs = test$Life.expectancy)
## [1] 3.007273
RMSE(predrf, obs = test$Life.expectancy)
## [1] 1.869979
RMSE(predrflong, obs = test$Life.expectancy)
## [1] 1.166801
MAE
MAE(pred1, obs = test$Life.expectancy)
## [1] 2.696044
MAE(predmodelboth, obs = test$Life.expectancy)
## [1] 2.698483
MAE(predmodelnocoliniear, obs = test$Life.expectancy)
## [1] 2.837579
MAE(preddtree, obs = test$Life.expectancy)
## [1] 2.058884
MAE(predrf, obs = test$Life.expectancy)
## [1] 1.185654
MAE(predrflong, obs = test$Life.expectancy)
## [1] 0.6624638
Conclusion
## rf variable importance
##
## Overall
## Income.composition.of.resources 100.0000
## HIV.AIDS 55.2340
## Adult.Mortality 43.0821
## Schooling 11.3506
## BMI 4.4277
## thinness.5.9.years 4.3889
## thinness..1.19.years 3.0001
## Alcohol 2.7593
## Total.expenditure 1.9680
## under.five.deaths 1.7575
## percentage.expenditure 1.7139
## GDP 1.5435
## infant.deaths 1.2092
## Polio 0.7421
## Population 0.6633
## Measles 0.5909
## Diphtheria 0.5629
## Hepatitis.B 0.5396
## StatusDeveloping 0.0000
Berdasarkan Model dari Random Forest, 3 variable yang paling berpengaruh terhadap Angka Harapan Hidup, yaitu pendapatan perkapita, tingkat HIV AIDS, dan kematian orang dewasa. tiga variable ini unggul signifikan dibanding variable variable lain.
Hal yang mengejutkan yaitu ternyata status negara maju dan negara berkembang tidak ada pengaruhnya terhadap angka harapan hidup, sehingga dapat kita simpulkan bahwa negara maju sekalipun bisa mengalami angka harapan hidup yang rendah. lalu jumlah populasi juga tidak terlalu berpengaruh terhadap angka harapan hidup, sehingga negara yang penduduknya banyak angka harapan hidupnya bisa rendah, dan negara yang penduduknya sedikit pun angka harapan hidupnya rendah.
Lalu, dari semua model yang saya buat, ternyata model yang errornya paling sedikit yaitu model yang menggunakan random forest yang prosesnya lebih lama, yaitu dengan RMSE sebesar 1,12 dan MAE sebesar 0,6. sehingga model ini lebih cocok dipakai untuk memprediksi dan menghitung angka harapan hidup di tahun tahun berikutnya. namun kelemahan dari model ini adalah harus terdapat semua prediktor untuk bisa memprediksi angka harapan hidup, kendalanya kadang terdapat NA di salah satu prediktor, apabila terdapat NA maka model ini bisa gagal