life <- read.csv("Life Expectancy Data.csv")
master.life <- life # backup data
#life <- master.life # calling backupExplanation on life dataset:
+ Country - Country Observed.
+ Year - Year Observed.
+ Status - Developed or Developing status.
+ Life.expectancy - Life Expectancy in age.
+ Adult.Mortality - Adult Mortality Rates on both sexes (probability of dying between 15-60 years/1000 population).
+ infant.deaths - Number of Infant Deaths per 1000 population.
+ Alcohol - Alcohol recorded per capita (15+) consumption (in litres of pure alcohol).
+ percentage.expenditure - Expenditure on health as a percentage of Gross Domestic Product per capita(%).
+ Hepatitis.B - Hepatitis B (HepB) immunization coverage among 1-year-olds (%).
+ Measles - Number of reported Measles cases per 1000 population.
+ BMI - Average Body Mass Index of entire population.
+ under.five.deaths - Number of under-five deaths per 1000 population.
+ Polio - Polio (Pol3) immunization coverage among 1-year-olds (%).
+ Total expenditure - General government expenditure on health as a percentage of total government expenditure (%).
+ Diphtheria - Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%).
+ HIV_AIDS - Deaths per 1 000 live births HIV/AIDS (0-4 years).
+ GDP - Gross Domestic Product per capita (in USD).
+ Population - Population of the country.
+ thinness.1-19 years - Prevalence of thinness among children and adolescents for Age 10 to 19 (%).
+ thinness 5-9 years - Prevalence of thinness among children for Age 5 to 9(%).
+ Income.composition.of.resources - Human Development Index in terms of income composition of resources (index ranging from 0 to 1).
+ Schooling - Number of years of Schooling(years) .
Based on life dataset, we are going to predict the Life.expectancy of the people, using given dependent variables.
## 'data.frame': 1649 obs. of 22 variables:
## $ Country : Factor w/ 193 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 ...
## $ Status : Factor w/ 2 levels "Developed","Developing": 2 2 2 2 2 2 2 2 2 2 ...
## $ Life.expectancy : num 65 59.9 59.9 59.5 59.2 58.8 58.6 58.1 57.5 57.3 ...
## $ Adult.Mortality : int 263 271 268 272 275 279 281 287 295 295 ...
## $ infant.deaths : int 62 64 66 69 71 74 77 80 82 84 ...
## $ Alcohol : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.02 0.03 ...
## $ percentage.expenditure : num 71.3 73.5 73.2 78.2 7.1 ...
## $ Hepatitis.B : int 65 62 64 67 68 66 63 64 63 64 ...
## $ Measles : int 1154 492 430 2787 3013 1989 2861 1599 1141 1990 ...
## $ BMI : num 19.1 18.6 18.1 17.6 17.2 16.7 16.2 15.7 15.2 14.7 ...
## $ under_5.deaths : int 83 86 89 93 97 102 106 110 113 116 ...
## $ Polio : int 6 58 62 67 68 66 63 64 63 58 ...
## $ Total.expenditure : num 8.16 8.18 8.13 8.52 7.87 9.2 9.42 8.33 6.73 7.43 ...
## $ Diphtheria : int 65 62 64 67 68 66 63 64 63 58 ...
## $ HIV_AIDS : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
## $ GDP : num 584.3 612.7 631.7 670 63.5 ...
## $ Population : num 33736494 327582 31731688 3696958 2978599 ...
## $ thinness.10_19.years : num 17.2 17.5 17.7 17.9 18.2 18.4 18.6 18.8 19 19.2 ...
## $ thinness.5_9.years : num 17.3 17.5 17.7 18 18.2 18.4 18.7 18.9 19.1 19.3 ...
## $ Income.composition.of.resources: num 0.479 0.476 0.47 0.463 0.454 0.448 0.434 0.433 0.415 0.405 ...
## $ Schooling : num 10.1 10 9.9 9.8 9.5 9.2 8.9 8.7 8.4 8.1 ...
## Country Year Status Life.expectancy
## Afghanistan: 16 Min. :2000 Developed : 242 Min. :44.0
## Albania : 16 1st Qu.:2005 Developing:1407 1st Qu.:64.4
## Armenia : 15 Median :2008 Median :71.7
## Austria : 15 Mean :2008 Mean :69.3
## Belarus : 15 3rd Qu.:2011 3rd Qu.:75.0
## Belgium : 15 Max. :2015 Max. :89.0
## (Other) :1557
## Adult.Mortality infant.deaths Alcohol percentage.expenditure
## Min. : 1.0 Min. : 0.00 Min. : 0.010 Min. : 0.00
## 1st Qu.: 77.0 1st Qu.: 1.00 1st Qu.: 0.810 1st Qu.: 37.44
## Median :148.0 Median : 3.00 Median : 3.790 Median : 145.10
## Mean :168.2 Mean : 32.55 Mean : 4.533 Mean : 698.97
## 3rd Qu.:227.0 3rd Qu.: 22.00 3rd Qu.: 7.340 3rd Qu.: 509.39
## Max. :723.0 Max. :1600.00 Max. :17.870 Max. :18961.35
##
## Hepatitis.B Measles BMI under_5.deaths
## Min. : 2.00 Min. : 0 Min. : 2.00 Min. : 0.00
## 1st Qu.:74.00 1st Qu.: 0 1st Qu.:19.50 1st Qu.: 1.00
## Median :89.00 Median : 15 Median :43.70 Median : 4.00
## Mean :79.22 Mean : 2224 Mean :38.13 Mean : 44.22
## 3rd Qu.:96.00 3rd Qu.: 373 3rd Qu.:55.80 3rd Qu.: 29.00
## Max. :99.00 Max. :131441 Max. :77.10 Max. :2100.00
##
## Polio Total.expenditure Diphtheria HIV_AIDS
## Min. : 3.00 Min. : 0.740 Min. : 2.00 Min. : 0.100
## 1st Qu.:81.00 1st Qu.: 4.410 1st Qu.:82.00 1st Qu.: 0.100
## Median :93.00 Median : 5.840 Median :92.00 Median : 0.100
## Mean :83.56 Mean : 5.956 Mean :84.16 Mean : 1.984
## 3rd Qu.:97.00 3rd Qu.: 7.470 3rd Qu.:97.00 3rd Qu.: 0.700
## Max. :99.00 Max. :14.390 Max. :99.00 Max. :50.600
##
## GDP Population thinness.10_19.years
## Min. : 1.68 Min. : 34 Min. : 0.100
## 1st Qu.: 462.15 1st Qu.: 191897 1st Qu.: 1.600
## Median : 1592.57 Median : 1419631 Median : 3.000
## Mean : 5566.03 Mean : 14653626 Mean : 4.851
## 3rd Qu.: 4718.51 3rd Qu.: 7658972 3rd Qu.: 7.100
## Max. :119172.74 Max. :1293859294 Max. :27.200
##
## thinness.5_9.years Income.composition.of.resources Schooling
## Min. : 0.100 Min. :0.0000 Min. : 4.20
## 1st Qu.: 1.700 1st Qu.:0.5090 1st Qu.:10.30
## Median : 3.200 Median :0.6730 Median :12.30
## Mean : 4.908 Mean :0.6316 Mean :12.12
## 3rd Qu.: 7.100 3rd Qu.:0.7510 3rd Qu.:14.00
## Max. :28.200 Max. :0.9360 Max. :20.70
##
## [1] 44 89
In Total, there are 22 Variables, 20 of them are Numerical, and 2 of them are Categorical.
We will need to deselect/mutate some variables because of the following conditions:
- Deselect Country -> Too many levels, and doesn’t give additional information to predict Life.expectancy.
- Deselect Year -> Time series data, and doesn’t give additional information to predict Life.expectancy.
- Mutate Hepatitis.B -> The range between minimum value and the 1st Quartile is too wide, need to be adjusted/manipulated.
- Mutate Polio -> The range between minimum value and the 1st Quartile is too wide, need to be adjusted/manipulated.
- Mutate Diphtheria -> The range between minimum value and the 1st Quartile is too wide, need to be adjusted/manipulated.
As stated on The Global Vaccine Action Plan 2011–2020 (GVAP) (1), endorsed by the World Health Assembly in 2012, all countries need to reach ≥90% national coverage for all vaccines in the country’s routine immunization schedule by 2020. Based on that statement, we are going to mutate the Hepatitis.B, Polio, and Diphtheria into a categorical variable, with 2 value: “Under 90% Covered” and “Covered by 90% or More”. By doing this, hopefully we can get a better view on the immunization impact to Life.expectancy.
life_selected <- life %>%
select(-Country, -Year) %>%
mutate(Hepatitis.B = ifelse(Hepatitis.B < 90, "<90% Covered", ">=90% Covered"),
Polio = ifelse(Polio < 90, "<90% Covered", ">=90% Covered"),
Diphtheria = ifelse(Diphtheria < 90, "<90% Covered", ">=90% Covered"),
Hepatitis.B = as.factor(Hepatitis.B),
Polio = as.factor(Polio),
Diphtheria = as.factor(Diphtheria))
str(life_selected)## 'data.frame': 1649 obs. of 20 variables:
## $ Status : Factor w/ 2 levels "Developed","Developing": 2 2 2 2 2 2 2 2 2 2 ...
## $ Life.expectancy : num 65 59.9 59.9 59.5 59.2 58.8 58.6 58.1 57.5 57.3 ...
## $ Adult.Mortality : int 263 271 268 272 275 279 281 287 295 295 ...
## $ infant.deaths : int 62 64 66 69 71 74 77 80 82 84 ...
## $ Alcohol : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.02 0.03 ...
## $ percentage.expenditure : num 71.3 73.5 73.2 78.2 7.1 ...
## $ Hepatitis.B : Factor w/ 2 levels "<90% Covered",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Measles : int 1154 492 430 2787 3013 1989 2861 1599 1141 1990 ...
## $ BMI : num 19.1 18.6 18.1 17.6 17.2 16.7 16.2 15.7 15.2 14.7 ...
## $ under_5.deaths : int 83 86 89 93 97 102 106 110 113 116 ...
## $ Polio : Factor w/ 2 levels "<90% Covered",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Total.expenditure : num 8.16 8.18 8.13 8.52 7.87 9.2 9.42 8.33 6.73 7.43 ...
## $ Diphtheria : Factor w/ 2 levels "<90% Covered",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ HIV_AIDS : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
## $ GDP : num 584.3 612.7 631.7 670 63.5 ...
## $ Population : num 33736494 327582 31731688 3696958 2978599 ...
## $ thinness.10_19.years : num 17.2 17.5 17.7 17.9 18.2 18.4 18.6 18.8 19 19.2 ...
## $ thinness.5_9.years : num 17.3 17.5 17.7 18 18.2 18.4 18.7 18.9 19.1 19.3 ...
## $ Income.composition.of.resources: num 0.479 0.476 0.47 0.463 0.454 0.448 0.434 0.433 0.415 0.405 ...
## $ Schooling : num 10.1 10 9.9 9.8 9.5 9.2 8.9 8.7 8.4 8.1 ...
To check whether there is correlation between Numerical Independent Variables with the Dependent, we will use ggcorr function.
data_num <- life_selected %>%
select_if(is.numeric)
ggcorr(data_num,
label = T,
label_size = 2,
label_round = 2,
hjust = 1,
size = 3,
color = "royalblue",
layout.exp = 5,
low = "green3",
mid = "gray95",
high = "darkorange",
name = "Correlation")The Life.expectancy as dependent variable has somewhat strong positive correlation with Schooling and Income.composition.of.resources, we are going to see it further on the model analysis. On the other hand, it has negative correlation with Adult.Mortality. And this is a valid finding, because if mortality rate of adult is high, of course the life expectancy of people will be low.
Life.expectancy also has a very weak correlation with Population and the Measles. We will test it further on the next analysis.
And based on the Corr Matrix, we can see there is very strong correlation between infant.deaths and the under_5.deaths. This strong correlation indicates multicollinearity among them. Therefore, we are going to deselect under_5.deaths, with consideration that other variables seems more related with conditions during infants period.
Check the data distribution of Life.expectancy among all of the Categorical Variables
life_selected %>%
group_by(Status) %>%
summarise(count = n()) %>%
mutate(percentage = paste0(round(count/sum(count)*100, 2), "%"))## # A tibble: 2 x 3
## Status count percentage
## <fct> <int> <chr>
## 1 Developed 242 14.68%
## 2 Developing 1407 85.32%
plot1 <- ggplot(life_selected, aes(x=Status, y = Life.expectancy, fill = Status)) +
geom_boxplot() +
scale_fill_manual(values=c("green3", "darkorange")) +
labs(x = "Development Status", y = "Life Expectancy (Age)") +
theme(legend.position = "none")
ggplotly(plot1)## Df Sum Sq Mean Sq F value Pr(>F)
## Status 1 25005 25005 401.7 <0.0000000000000002 ***
## Residuals 1647 102525 62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Status, it was clearly that distribution of higher Life.expectancy lies on the Developed Countries, with a significant Median distance. And even if there are some Outliers on the Developing Countries, we will keep it at the mean time because they were Low Leverages.life_selected %>%
group_by(Hepatitis.B) %>%
summarise(count = n()) %>%
mutate(percentage = paste0(round(count/sum(count)*100, 2), "%"))## # A tibble: 2 x 3
## Hepatitis.B count percentage
## <fct> <int> <chr>
## 1 <90% Covered 826 50.09%
## 2 >=90% Covered 823 49.91%
plot2 <- ggplot(life_selected, aes(x=Hepatitis.B, y = Life.expectancy, fill = Hepatitis.B)) +
geom_boxplot() +
scale_fill_manual(values=c("green3", "darkorange")) +
labs(x = "Hepatitis B Coverage", y = "Life Expectancy (Age)") +
theme(legend.position = "none")
ggplotly(plot2)## Df Sum Sq Mean Sq F value Pr(>F)
## Hepatitis.B 1 12164 12164 173.7 <0.0000000000000002 ***
## Residuals 1647 115366 70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hepatitis.B Immunization is half of the observations.Hepatitis.B Coverage, higher Life.expectancy lies on the Countries which cover their Hepatitis.B immunization on 90% or more, with a big Median distance. And even if there are some Outliers on the Developing Countries, we will keep it at the mean time because most of them were Low Leverages.life_selected %>%
group_by(Polio) %>%
summarise(count = n()) %>%
mutate(percentage = paste0(round(count/sum(count)*100, 2), "%"))## # A tibble: 2 x 3
## Polio count percentage
## <fct> <int> <chr>
## 1 <90% Covered 700 42.45%
## 2 >=90% Covered 949 57.55%
plot3 <- ggplot(life_selected, aes(x=Polio, y = Life.expectancy, fill = Polio)) +
geom_boxplot() +
scale_fill_manual(values=c("green3", "darkorange")) +
labs(x = "Polio Coverage", y = "Life Expectancy (Age)") +
theme(legend.position = "none")
ggplotly(plot3)## Df Sum Sq Mean Sq F value Pr(>F)
## Polio 1 26047 26047 422.7 <0.0000000000000002 ***
## Residuals 1647 101482 62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hepatitis.B Coverage, the Polio Coverage are larger.Polio Coverage, higher Life.expectancy lies on the Countries which cover their Polio immunization on 90% or more, with a big Median distance. The upper outliers of Polio is not as much as Hepatitis.B. And even if there are some Outliers on the Developing Countries, we will keep it at the mean time because most of them were Low Leverages.Polio Coverage.life_selected %>%
group_by(Diphtheria) %>%
summarise(count = n()) %>%
mutate(percentage = paste0(round(count/sum(count)*100, 2), "%"))## # A tibble: 2 x 3
## Diphtheria count percentage
## <fct> <int> <chr>
## 1 <90% Covered 704 42.69%
## 2 >=90% Covered 945 57.31%
plot4 <- ggplot(life_selected, aes(x=Diphtheria, y = Life.expectancy, fill = Diphtheria)) +
geom_boxplot() +
scale_fill_manual(values=c("green3", "darkorange")) +
labs(x = "Diphtheria Coverage", y = "Life Expectancy (Age)") +
theme(legend.position = "none")
ggplotly(plot4)## Df Sum Sq Mean Sq F value Pr(>F)
## Diphtheria 1 25834 25834 418.4 <0.0000000000000002 ***
## Residuals 1647 101695 62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Diphteria Coverage are the same like Polio Coverage in term of number of Countries.Diphteria is somewhat similar with the Polio. It may be indicating that Polio and Diphteria immunization are given at the same time.Diphteria Coverage.plot5 <- ggplot(life_selected) +
geom_mosaic(aes(x = product(Status), fill=Hepatitis.B)) +
labs(x = NULL, y = NULL) +
scale_fill_manual(values=c("green3", "darkorange")) +
theme(legend.position = "none")
ggplotly(plot5) ##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(life_selected$Status, life_selected$Hepatitis.B)
## X-squared = 49.835, df = 1, p-value = 0.000000000001672
Hepatitis.B immunizationHepatitis.B immunization.plot6 <- ggplot(life_selected) +
geom_mosaic(aes(x = product(Status), fill=Polio)) +
labs(x = NULL, y = NULL) +
scale_fill_manual(values=c("green3", "darkorange")) +
theme(legend.position = "none")
ggplotly(plot6) ##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(life_selected$Status, life_selected$Polio)
## X-squared = 127.6, df = 1, p-value < 0.00000000000000022
Polio immunizationPolio immunization.plot7 <- ggplot(life_selected) +
geom_mosaic(aes(x = product(Status), fill=Diphtheria)) +
labs(x = NULL, y = NULL) +
scale_fill_manual(values=c("green3", "darkorange")) +
theme(legend.position = "none")
ggplotly(plot7) ##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(life_selected$Status, life_selected$Diphtheria)
## X-squared = 129.28, df = 1, p-value < 0.00000000000000022
Diphtheria has a similar pattern with the Polio. We will see on next test whether we only need one of them.As mentioned at the beginning of this Analysis, we are going to predict the Life.expectancy by using Selected Variables. And this is the full linear prediction model.
##
## Call:
## lm(formula = Life.expectancy ~ ., data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.0291 -2.1529 0.0557 2.3893 11.5018
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 55.002467863226 0.810849894348 67.833
## StatusDeveloping -0.981517391893 0.346377773952 -2.834
## Adult.Mortality -0.017799228333 0.000967427774 -18.399
## infant.deaths -0.003007364967 0.001265712464 -2.376
## Alcohol -0.155154873503 0.033801578881 -4.590
## percentage.expenditure 0.000349139357 0.000186187084 1.875
## Hepatitis.B>=90% Covered -0.637192410194 0.319248875671 -1.996
## Measles 0.000016831614 0.000010792654 1.560
## BMI 0.035845737407 0.006160560054 5.819
## Polio>=90% Covered 0.568041014306 0.443924117610 1.280
## Total.expenditure 0.069943232821 0.041790211855 1.674
## Diphtheria>=90% Covered 0.909716421385 0.489937625026 1.857
## HIV_AIDS -0.427934296549 0.018491961872 -23.142
## GDP 0.000009180942 0.000029253467 0.314
## Population 0.000000002496 0.000000001766 1.414
## thinness.10_19.years -0.050182389508 0.054691293904 -0.918
## thinness.5_9.years 0.001518971381 0.053738340913 0.028
## Income.composition.of.resources 10.477963833400 0.850733729997 12.316
## Schooling 0.884291070136 0.061717791004 14.328
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.00466 **
## Adult.Mortality < 0.0000000000000002 ***
## infant.deaths 0.01762 *
## Alcohol 0.00000476823 ***
## percentage.expenditure 0.06094 .
## Hepatitis.B>=90% Covered 0.04611 *
## Measles 0.11906
## BMI 0.00000000713 ***
## Polio>=90% Covered 0.20087
## Total.expenditure 0.09439 .
## Diphtheria>=90% Covered 0.06352 .
## HIV_AIDS < 0.0000000000000002 ***
## GDP 0.75368
## Population 0.15769
## thinness.10_19.years 0.35899
## thinness.5_9.years 0.97745
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.686 on 1630 degrees of freedom
## Multiple R-squared: 0.8263, Adjusted R-squared: 0.8244
## F-statistic: 430.9 on 18 and 1630 DF, p-value: < 0.00000000000000022
Adult.Mortality, infant.deaths, Alcohol, HIV_AIDS and thinness.10_19.years indicating additional of these Variables may lead to decrease the Life.expectancy. On the other hand, Income.composition.of.resources has a big positive effect on the Life.expectancy. Some ibteresting findings also occured in the Categorical Variables, just like StatusDeveloping which expected will reduce Life.expectancy about -0.9815 compared with StatusDeveloped. Funny thing founded in Hepatitis.B>=90% Covered which may gives negative relative to Polio>=90% Covered and Diphtheria>=90% Covered.Adult.Mortality, Alcohol, BMI, HIV_AIDS, Income.composition.of.resources, and Schooling are the most significant Predictors. Followed by StatusDeveloping with 0.01 significant level, and infant.deaths, Hepatitis.B>=90% Covered with 0.05 significant level. As for the others, we may consider that changes on those predictors are not significantly associated with Life.expectancy.Now we are going to select most important predictors based on automated calculation by R.
life_full <- lm(formula = Life.expectancy ~., data = life_selected)
life_none <- lm(formula = Life.expectancy ~1, data = life_selected)## Start: AIC=4321.29
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + thinness.5_9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - thinness.5_9.years 1 0.0 22147 4319.3
## - GDP 1 1.3 22148 4319.4
## - thinness.10_19.years 1 11.4 22158 4320.1
## - Polio 1 22.2 22169 4320.9
## <none> 22147 4321.3
## - Population 1 27.1 22174 4321.3
## - Measles 1 33.0 22180 4321.8
## - Total.expenditure 1 38.1 22185 4322.1
## - Diphtheria 1 46.8 22193 4322.8
## - percentage.expenditure 1 47.8 22194 4322.8
## - Hepatitis.B 1 54.1 22201 4323.3
## - infant.deaths 1 76.7 22223 4325.0
## - Status 1 109.1 22256 4327.4
## - Alcohol 1 286.3 22433 4340.5
## - BMI 1 460.0 22607 4353.2
## - Income.composition.of.resources 1 2061.0 24208 4466.0
## - Schooling 1 2789.2 24936 4514.9
## - Adult.Mortality 1 4599.2 26746 4630.5
## - HIV_AIDS 1 7276.2 29423 4787.8
##
## Step: AIC=4319.29
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - GDP 1 1.3 22148 4317.4
## - Polio 1 22.2 22169 4318.9
## <none> 22147 4319.3
## - Population 1 27.1 22174 4319.3
## - Measles 1 33.1 22180 4319.8
## - Total.expenditure 1 38.1 22185 4320.1
## - thinness.10_19.years 1 41.7 22188 4320.4
## - Diphtheria 1 46.9 22193 4320.8
## - percentage.expenditure 1 47.8 22194 4320.8
## - Hepatitis.B 1 54.2 22201 4321.3
## - infant.deaths 1 77.2 22224 4323.0
## - Status 1 109.1 22256 4325.4
## - Alcohol 1 286.3 22433 4338.5
## - BMI 1 468.5 22615 4351.8
## - Income.composition.of.resources 1 2061.3 24208 4464.0
## - Schooling 1 2799.0 24946 4513.5
## - Adult.Mortality 1 4609.5 26756 4629.1
## - HIV_AIDS 1 7277.4 29424 4785.8
##
## Step: AIC=4317.39
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - Polio 1 22.7 22171 4317.1
## <none> 22148 4317.4
## - Population 1 27.0 22175 4317.4
## - Measles 1 33.1 22181 4317.9
## - Total.expenditure 1 37.6 22185 4318.2
## - thinness.10_19.years 1 42.0 22190 4318.5
## - Diphtheria 1 46.3 22194 4318.8
## - Hepatitis.B 1 53.2 22201 4319.4
## - infant.deaths 1 76.8 22225 4321.1
## - Status 1 111.1 22259 4323.6
## - Alcohol 1 286.1 22434 4336.6
## - BMI 1 467.8 22616 4349.9
## - percentage.expenditure 1 590.7 22739 4358.8
## - Income.composition.of.resources 1 2079.2 24227 4463.4
## - Schooling 1 2824.3 24972 4513.3
## - Adult.Mortality 1 4608.3 26756 4627.1
## - HIV_AIDS 1 7277.2 29425 4783.9
##
## Step: AIC=4317.08
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Population +
## thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## <none> 22171 4317.1
## - Population 1 27.3 22198 4317.1
## - Measles 1 31.2 22202 4317.4
## - Total.expenditure 1 36.0 22207 4317.8
## - thinness.10_19.years 1 39.9 22210 4318.0
## - Hepatitis.B 1 44.4 22215 4318.4
## - infant.deaths 1 79.3 22250 4321.0
## - Status 1 113.0 22284 4323.5
## - Diphtheria 1 206.4 22377 4330.4
## - Alcohol 1 283.1 22454 4336.0
## - BMI 1 471.9 22642 4349.8
## - percentage.expenditure 1 585.6 22756 4358.1
## - Income.composition.of.resources 1 2090.6 24261 4463.7
## - Schooling 1 2851.8 25022 4514.6
## - Adult.Mortality 1 4620.1 26791 4627.2
## - HIV_AIDS 1 7337.5 29508 4786.5
##
## Call:
## lm(formula = Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Population +
## thinness.10_19.years + Income.composition.of.resources +
## Schooling, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.0614 -2.1244 0.0496 2.3996 11.4712
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 54.988635634772 0.809572673070 67.923
## StatusDeveloping -0.996426463567 0.345406384670 -2.885
## Adult.Mortality -0.017814717510 0.000965712450 -18.447
## infant.deaths -0.003042129201 0.001258897693 -2.417
## Alcohol -0.154259916713 0.033781334791 -4.566
## percentage.expenditure 0.000402417581 0.000061274211 6.567
## Hepatitis.B>=90% Covered -0.568682910837 0.314538919313 -1.808
## Measles 0.000016309888 0.000010767114 1.515
## BMI 0.035941493947 0.006096149812 5.896
## Total.expenditure 0.067891276955 0.041686924792 1.629
## Diphtheria>=90% Covered 1.348666445388 0.345884907735 3.899
## HIV_AIDS -0.429134063064 0.018459188110 -23.248
## Population 0.000000002501 0.000000001765 1.417
## thinness.10_19.years -0.047793875815 0.027868686866 -1.715
## Income.composition.of.resources 10.522821501001 0.848000360190 12.409
## Schooling 0.889310284674 0.061359991042 14.493
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.00397 **
## Adult.Mortality < 0.0000000000000002 ***
## infant.deaths 0.01578 *
## Alcohol 0.0000053327894 ***
## percentage.expenditure 0.0000000000686 ***
## Hepatitis.B>=90% Covered 0.07079 .
## Measles 0.13002
## BMI 0.0000000045224 ***
## Total.expenditure 0.10359
## Diphtheria>=90% Covered 0.00010 ***
## HIV_AIDS < 0.0000000000000002 ***
## Population 0.15658
## thinness.10_19.years 0.08654 .
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.685 on 1633 degrees of freedom
## Multiple R-squared: 0.8262, Adjusted R-squared: 0.8246
## F-statistic: 517.4 on 15 and 1633 DF, p-value: < 0.00000000000000022
## Start: AIC=7172.14
## Life.expectancy ~ 1
##
## Df Sum of Sq RSS AIC
## + Schooling 1 67520 60009 5931.1
## + Income.composition.of.resources 1 66310 61219 5964.0
## + Adult.Mortality 1 62941 64589 6052.3
## + HIV_AIDS 1 44730 82799 6461.9
## + BMI 1 37469 90060 6600.5
## + thinness.10_19.years 1 26732 100797 6786.2
## + thinness.5_9.years 1 26694 100836 6786.9
## + Polio 1 26047 101482 6797.4
## + Diphtheria 1 25834 101695 6800.9
## + Status 1 25005 102525 6814.3
## + GDP 1 24838 102691 6816.9
## + percentage.expenditure 1 21399 106130 6871.3
## + Alcohol 1 20683 106846 6882.3
## + Hepatitis.B 1 12164 115366 7008.8
## + Total.expenditure 1 3893 123636 7123.0
## + infant.deaths 1 3646 123884 7126.3
## + Measles 1 605 126924 7166.3
## <none> 127529 7172.1
## + Population 1 63 127466 7173.3
##
## Step: AIC=5931.06
## Life.expectancy ~ Schooling
##
## Df Sum of Sq RSS AIC
## + HIV_AIDS 1 25626.4 34383 5014.7
## + Adult.Mortality 1 24319.2 35690 5076.2
## + Income.composition.of.resources 1 7477.0 52532 5713.6
## + BMI 1 3525.2 56484 5833.2
## + thinness.5_9.years 1 2123.1 57886 5873.7
## + Polio 1 1743.5 58266 5884.4
## + thinness.10_19.years 1 1695.2 58314 5885.8
## + GDP 1 1660.0 58349 5886.8
## + percentage.expenditure 1 1630.5 58379 5887.6
## + Diphtheria 1 1549.3 58460 5889.9
## + Status 1 844.1 59165 5909.7
## + Hepatitis.B 1 455.8 59554 5920.5
## + Alcohol 1 439.7 59570 5920.9
## <none> 60009 5931.1
## + Measles 1 30.2 59979 5932.2
## + infant.deaths 1 22.9 59987 5932.4
## + Population 1 6.3 60003 5932.9
## + Total.expenditure 1 1.0 60009 5933.0
##
## Step: AIC=5014.67
## Life.expectancy ~ Schooling + HIV_AIDS
##
## Df Sum of Sq RSS AIC
## + Adult.Mortality 1 7231.6 27152 4627.3
## + Income.composition.of.resources 1 4265.9 30117 4798.2
## + BMI 1 1702.8 32680 4932.9
## + percentage.expenditure 1 1548.9 32834 4940.7
## + GDP 1 1527.8 32855 4941.7
## + thinness.5_9.years 1 947.7 33435 4970.6
## + thinness.10_19.years 1 805.3 33578 4977.6
## + Status 1 627.9 33755 4986.3
## + Polio 1 329.3 34054 5000.8
## + Diphtheria 1 322.9 34060 5001.1
## + Total.expenditure 1 227.9 34155 5005.7
## + infant.deaths 1 123.6 34260 5010.7
## + Hepatitis.B 1 46.2 34337 5014.5
## <none> 34383 5014.7
## + Population 1 11.9 34371 5016.1
## + Measles 1 0.8 34382 5016.6
## + Alcohol 1 0.4 34383 5016.7
##
## Step: AIC=4627.29
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality
##
## Df Sum of Sq RSS AIC
## + Income.composition.of.resources 1 2815.85 24336 4448.7
## + percentage.expenditure 1 1059.31 26092 4563.7
## + GDP 1 1057.75 26094 4563.8
## + BMI 1 1011.53 26140 4566.7
## + thinness.5_9.years 1 619.35 26532 4591.2
## + thinness.10_19.years 1 591.82 26560 4592.9
## + Status 1 340.09 26812 4608.5
## + Diphtheria 1 279.04 26873 4612.3
## + Polio 1 270.01 26882 4612.8
## + infant.deaths 1 209.12 26943 4616.5
## + Total.expenditure 1 141.14 27010 4620.7
## <none> 27152 4627.3
## + Hepatitis.B 1 30.45 27121 4627.4
## + Alcohol 1 29.49 27122 4627.5
## + Population 1 25.36 27126 4627.7
## + Measles 1 11.99 27140 4628.6
##
## Step: AIC=4448.74
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources
##
## Df Sum of Sq RSS AIC
## + percentage.expenditure 1 706.09 23630 4402.2
## + BMI 1 664.20 23672 4405.1
## + GDP 1 653.87 23682 4405.8
## + thinness.5_9.years 1 380.74 23955 4424.7
## + thinness.10_19.years 1 344.14 23992 4427.3
## + infant.deaths 1 284.56 24051 4431.3
## + Status 1 170.61 24165 4439.1
## + Diphtheria 1 157.07 24179 4440.1
## + Polio 1 155.91 24180 4440.1
## + Total.expenditure 1 147.42 24188 4440.7
## + Population 1 44.51 24291 4447.7
## + Measles 1 32.54 24303 4448.5
## <none> 24336 4448.7
## + Alcohol 1 22.62 24313 4449.2
## + Hepatitis.B 1 18.65 24317 4449.5
##
## Step: AIC=4402.19
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure
##
## Df Sum of Sq RSS AIC
## + BMI 1 681.26 22948 4355.9
## + thinness.5_9.years 1 328.31 23301 4381.1
## + thinness.10_19.years 1 302.59 23327 4382.9
## + infant.deaths 1 276.51 23353 4384.8
## + Polio 1 182.17 23448 4391.4
## + Diphtheria 1 172.04 23458 4392.1
## + Alcohol 1 112.62 23517 4396.3
## + Total.expenditure 1 95.00 23535 4397.5
## + Hepatitis.B 1 51.51 23578 4400.6
## + Population 1 42.71 23587 4401.2
## <none> 23630 4402.2
## + Status 1 27.79 23602 4402.2
## + Measles 1 25.33 23604 4402.4
## + GDP 1 1.11 23629 4404.1
##
## Step: AIC=4355.95
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI
##
## Df Sum of Sq RSS AIC
## + Diphtheria 1 227.284 22721 4341.5
## + Polio 1 223.936 22724 4341.8
## + infant.deaths 1 159.159 22789 4346.5
## + Alcohol 1 124.554 22824 4349.0
## + thinness.5_9.years 1 78.289 22870 4352.3
## + Hepatitis.B 1 72.254 22876 4352.7
## + thinness.10_19.years 1 72.167 22876 4352.8
## + Total.expenditure 1 59.666 22889 4353.7
## + Status 1 27.891 22921 4355.9
## <none> 22948 4355.9
## + Population 1 19.272 22929 4356.6
## + Measles 1 3.196 22945 4357.7
## + GDP 1 2.338 22946 4357.8
##
## Step: AIC=4341.53
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria
##
## Df Sum of Sq RSS AIC
## + Alcohol 1 159.749 22561 4331.9
## + infant.deaths 1 109.257 22612 4335.6
## + thinness.10_19.years 1 78.356 22643 4337.8
## + thinness.5_9.years 1 73.033 22648 4338.2
## + Total.expenditure 1 43.232 22678 4340.4
## <none> 22721 4341.5
## + Hepatitis.B 1 27.102 22694 4341.6
## + Status 1 21.917 22699 4341.9
## + Polio 1 13.602 22708 4342.5
## + Population 1 10.366 22711 4342.8
## + GDP 1 0.676 22720 4343.5
## + Measles 1 0.432 22721 4343.5
##
## Step: AIC=4331.9
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol
##
## Df Sum of Sq RSS AIC
## + thinness.10_19.years 1 116.370 22445 4325.4
## + Status 1 112.074 22449 4325.7
## + thinness.5_9.years 1 106.354 22455 4326.1
## + infant.deaths 1 95.306 22466 4326.9
## + Total.expenditure 1 52.166 22509 4330.1
## + Hepatitis.B 1 41.428 22520 4330.9
## <none> 22561 4331.9
## + Polio 1 13.930 22547 4332.9
## + Population 1 9.950 22551 4333.2
## + GDP 1 1.478 22560 4333.8
## + Measles 1 0.012 22561 4333.9
##
## Step: AIC=4325.37
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years
##
## Df Sum of Sq RSS AIC
## + Status 1 113.215 22332 4319.0
## + Total.expenditure 1 39.968 22405 4324.4
## + Hepatitis.B 1 38.551 22406 4324.5
## + infant.deaths 1 31.863 22413 4325.0
## <none> 22445 4325.4
## + Polio 1 15.716 22429 4326.2
## + thinness.5_9.years 1 2.539 22442 4327.2
## + Measles 1 1.611 22443 4327.3
## + GDP 1 1.256 22444 4327.3
## + Population 1 0.047 22445 4327.4
##
## Step: AIC=4319.03
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status
##
## Df Sum of Sq RSS AIC
## + Hepatitis.B 1 42.565 22289 4317.9
## + Total.expenditure 1 35.002 22297 4318.4
## + infant.deaths 1 28.608 22303 4318.9
## <none> 22332 4319.0
## + Polio 1 13.763 22318 4320.0
## + Measles 1 2.718 22329 4320.8
## + thinness.5_9.years 1 1.738 22330 4320.9
## + Population 1 0.239 22332 4321.0
## + GDP 1 0.192 22332 4321.0
##
## Step: AIC=4317.89
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status + Hepatitis.B
##
## Df Sum of Sq RSS AIC
## + Total.expenditure 1 35.631 22254 4317.2
## + infant.deaths 1 30.432 22259 4317.6
## <none> 22289 4317.9
## + Polio 1 22.471 22267 4318.2
## + Measles 1 2.472 22287 4319.7
## + thinness.5_9.years 1 1.184 22288 4319.8
## + GDP 1 0.861 22288 4319.8
## + Population 1 0.113 22289 4319.9
##
## Step: AIC=4317.25
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status + Hepatitis.B + Total.expenditure
##
## Df Sum of Sq RSS AIC
## + infant.deaths 1 27.5771 22226 4317.2
## <none> 22254 4317.2
## + Polio 1 23.8495 22230 4317.5
## + Measles 1 3.8052 22250 4319.0
## + GDP 1 1.3515 22252 4319.1
## + thinness.5_9.years 1 0.6689 22253 4319.2
## + Population 1 0.2788 22253 4319.2
##
## Step: AIC=4317.2
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status + Hepatitis.B + Total.expenditure + infant.deaths
##
## Df Sum of Sq RSS AIC
## + Measles 1 28.2358 22198 4317.1
## <none> 22226 4317.2
## + Population 1 24.3561 22202 4317.4
## + Polio 1 21.1019 22205 4317.6
## + GDP 1 1.5363 22224 4319.1
## + thinness.5_9.years 1 0.0990 22226 4319.2
##
## Step: AIC=4317.11
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status + Hepatitis.B + Total.expenditure + infant.deaths +
## Measles
##
## Df Sum of Sq RSS AIC
## + Population 1 27.2727 22171 4317.1
## <none> 22198 4317.1
## + Polio 1 22.9679 22175 4317.4
## + GDP 1 1.5394 22196 4319.0
## + thinness.5_9.years 1 0.0022 22198 4319.1
##
## Step: AIC=4317.08
## Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Alcohol + thinness.10_19.years +
## Status + Hepatitis.B + Total.expenditure + infant.deaths +
## Measles + Population
##
## Df Sum of Sq RSS AIC
## <none> 22171 4317.1
## + Polio 1 22.6508 22148 4317.4
## + GDP 1 1.7488 22169 4318.9
## + thinness.5_9.years 1 0.0001 22171 4319.1
##
## Call:
## lm(formula = Life.expectancy ~ Schooling + HIV_AIDS + Adult.Mortality +
## Income.composition.of.resources + percentage.expenditure +
## BMI + Diphtheria + Alcohol + thinness.10_19.years + Status +
## Hepatitis.B + Total.expenditure + infant.deaths + Measles +
## Population, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.0614 -2.1244 0.0496 2.3996 11.4712
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 54.988635634772 0.809572673070 67.923
## Schooling 0.889310284674 0.061359991042 14.493
## HIV_AIDS -0.429134063064 0.018459188110 -23.248
## Adult.Mortality -0.017814717510 0.000965712450 -18.447
## Income.composition.of.resources 10.522821501001 0.848000360190 12.409
## percentage.expenditure 0.000402417581 0.000061274211 6.567
## BMI 0.035941493947 0.006096149812 5.896
## Diphtheria>=90% Covered 1.348666445388 0.345884907735 3.899
## Alcohol -0.154259916713 0.033781334791 -4.566
## thinness.10_19.years -0.047793875815 0.027868686866 -1.715
## StatusDeveloping -0.996426463567 0.345406384670 -2.885
## Hepatitis.B>=90% Covered -0.568682910837 0.314538919313 -1.808
## Total.expenditure 0.067891276955 0.041686924792 1.629
## infant.deaths -0.003042129201 0.001258897693 -2.417
## Measles 0.000016309888 0.000010767114 1.515
## Population 0.000000002501 0.000000001765 1.417
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## HIV_AIDS < 0.0000000000000002 ***
## Adult.Mortality < 0.0000000000000002 ***
## Income.composition.of.resources < 0.0000000000000002 ***
## percentage.expenditure 0.0000000000686 ***
## BMI 0.0000000045224 ***
## Diphtheria>=90% Covered 0.00010 ***
## Alcohol 0.0000053327894 ***
## thinness.10_19.years 0.08654 .
## StatusDeveloping 0.00397 **
## Hepatitis.B>=90% Covered 0.07079 .
## Total.expenditure 0.10359
## infant.deaths 0.01578 *
## Measles 0.13002
## Population 0.15658
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.685 on 1633 degrees of freedom
## Multiple R-squared: 0.8262, Adjusted R-squared: 0.8246
## F-statistic: 517.4 on 15 and 1633 DF, p-value: < 0.00000000000000022
## Start: AIC=4321.29
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + thinness.5_9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - thinness.5_9.years 1 0.0 22147 4319.3
## - GDP 1 1.3 22148 4319.4
## - thinness.10_19.years 1 11.4 22158 4320.1
## - Polio 1 22.2 22169 4320.9
## <none> 22147 4321.3
## - Population 1 27.1 22174 4321.3
## - Measles 1 33.0 22180 4321.8
## - Total.expenditure 1 38.1 22185 4322.1
## - Diphtheria 1 46.8 22193 4322.8
## - percentage.expenditure 1 47.8 22194 4322.8
## - Hepatitis.B 1 54.1 22201 4323.3
## - infant.deaths 1 76.7 22223 4325.0
## - Status 1 109.1 22256 4327.4
## - Alcohol 1 286.3 22433 4340.5
## - BMI 1 460.0 22607 4353.2
## - Income.composition.of.resources 1 2061.0 24208 4466.0
## - Schooling 1 2789.2 24936 4514.9
## - Adult.Mortality 1 4599.2 26746 4630.5
## - HIV_AIDS 1 7276.2 29423 4787.8
##
## Step: AIC=4319.29
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - GDP 1 1.3 22148 4317.4
## - Polio 1 22.2 22169 4318.9
## <none> 22147 4319.3
## - Population 1 27.1 22174 4319.3
## - Measles 1 33.1 22180 4319.8
## - Total.expenditure 1 38.1 22185 4320.1
## - thinness.10_19.years 1 41.7 22188 4320.4
## - Diphtheria 1 46.9 22193 4320.8
## - percentage.expenditure 1 47.8 22194 4320.8
## + thinness.5_9.years 1 0.0 22147 4321.3
## - Hepatitis.B 1 54.2 22201 4321.3
## - infant.deaths 1 77.2 22224 4323.0
## - Status 1 109.1 22256 4325.4
## - Alcohol 1 286.3 22433 4338.5
## - BMI 1 468.5 22615 4351.8
## - Income.composition.of.resources 1 2061.3 24208 4464.0
## - Schooling 1 2799.0 24946 4513.5
## - Adult.Mortality 1 4609.5 26756 4629.1
## - HIV_AIDS 1 7277.4 29424 4785.8
##
## Step: AIC=4317.39
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - Polio 1 22.7 22171 4317.1
## <none> 22148 4317.4
## - Population 1 27.0 22175 4317.4
## - Measles 1 33.1 22181 4317.9
## - Total.expenditure 1 37.6 22185 4318.2
## - thinness.10_19.years 1 42.0 22190 4318.5
## - Diphtheria 1 46.3 22194 4318.8
## + GDP 1 1.3 22147 4319.3
## - Hepatitis.B 1 53.2 22201 4319.4
## + thinness.5_9.years 1 0.0 22148 4319.4
## - infant.deaths 1 76.8 22225 4321.1
## - Status 1 111.1 22259 4323.6
## - Alcohol 1 286.1 22434 4336.6
## - BMI 1 467.8 22616 4349.9
## - percentage.expenditure 1 590.7 22739 4358.8
## - Income.composition.of.resources 1 2079.2 24227 4463.4
## - Schooling 1 2824.3 24972 4513.3
## - Adult.Mortality 1 4608.3 26756 4627.1
## - HIV_AIDS 1 7277.2 29425 4783.9
##
## Step: AIC=4317.08
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Population +
## thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## <none> 22171 4317.1
## - Population 1 27.3 22198 4317.1
## + Polio 1 22.7 22148 4317.4
## - Measles 1 31.2 22202 4317.4
## - Total.expenditure 1 36.0 22207 4317.8
## - thinness.10_19.years 1 39.9 22210 4318.0
## - Hepatitis.B 1 44.4 22215 4318.4
## + GDP 1 1.7 22169 4318.9
## + thinness.5_9.years 1 0.0 22171 4319.1
## - infant.deaths 1 79.3 22250 4321.0
## - Status 1 113.0 22284 4323.5
## - Diphtheria 1 206.4 22377 4330.4
## - Alcohol 1 283.1 22454 4336.0
## - BMI 1 471.9 22642 4349.8
## - percentage.expenditure 1 585.6 22756 4358.1
## - Income.composition.of.resources 1 2090.6 24261 4463.7
## - Schooling 1 2851.8 25022 4514.6
## - Adult.Mortality 1 4620.1 26791 4627.2
## - HIV_AIDS 1 7337.5 29508 4786.5
##
## Call:
## lm(formula = Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Population +
## thinness.10_19.years + Income.composition.of.resources +
## Schooling, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.0614 -2.1244 0.0496 2.3996 11.4712
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 54.988635634772 0.809572673070 67.923
## StatusDeveloping -0.996426463567 0.345406384670 -2.885
## Adult.Mortality -0.017814717510 0.000965712450 -18.447
## infant.deaths -0.003042129201 0.001258897693 -2.417
## Alcohol -0.154259916713 0.033781334791 -4.566
## percentage.expenditure 0.000402417581 0.000061274211 6.567
## Hepatitis.B>=90% Covered -0.568682910837 0.314538919313 -1.808
## Measles 0.000016309888 0.000010767114 1.515
## BMI 0.035941493947 0.006096149812 5.896
## Total.expenditure 0.067891276955 0.041686924792 1.629
## Diphtheria>=90% Covered 1.348666445388 0.345884907735 3.899
## HIV_AIDS -0.429134063064 0.018459188110 -23.248
## Population 0.000000002501 0.000000001765 1.417
## thinness.10_19.years -0.047793875815 0.027868686866 -1.715
## Income.composition.of.resources 10.522821501001 0.848000360190 12.409
## Schooling 0.889310284674 0.061359991042 14.493
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.00397 **
## Adult.Mortality < 0.0000000000000002 ***
## infant.deaths 0.01578 *
## Alcohol 0.0000053327894 ***
## percentage.expenditure 0.0000000000686 ***
## Hepatitis.B>=90% Covered 0.07079 .
## Measles 0.13002
## BMI 0.0000000045224 ***
## Total.expenditure 0.10359
## Diphtheria>=90% Covered 0.00010 ***
## HIV_AIDS < 0.0000000000000002 ***
## Population 0.15658
## thinness.10_19.years 0.08654 .
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.685 on 1633 degrees of freedom
## Multiple R-squared: 0.8262, Adjusted R-squared: 0.8246
## F-statistic: 517.4 on 15 and 1633 DF, p-value: < 0.00000000000000022
regs <- regsubsets(Life.expectancy ~., data = life_selected, nbest=10)
plot(regs,
scale="adjr",
main="All possible regression: ranked by Adjusted R-squared") Based on given Plot, we can determine the most significant Variables based on Largest Adj. R-Squared:
Adult.Mortality, Alcohol, percentage.expenditure, BMI, Diphtheria, HIV_AIDS, Income.composition.of.resources, and Schooling. This selected Variables also reflected by the siginificancy of p-value on other models (three stars / ***)
Create Model Based on Selected Variables:
model_regs <- lm(formula = Life.expectancy ~ Adult.Mortality + Alcohol + percentage.expenditure + BMI + Diphtheria + HIV_AIDS + Income.composition.of.resources + Schooling, data = life_selected)
summary(model_regs)##
## Call:
## lm(formula = Life.expectancy ~ Adult.Mortality + Alcohol + percentage.expenditure +
## BMI + Diphtheria + HIV_AIDS + Income.composition.of.resources +
## Schooling, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.2945 -2.0744 0.0991 2.4163 12.4001
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 53.22608649 0.58395177 91.148
## Adult.Mortality -0.01805638 0.00096422 -18.726
## Alcohol -0.10392185 0.03049635 -3.408
## percentage.expenditure 0.00046830 0.00005927 7.901
## BMI 0.04215698 0.00568985 7.409
## Diphtheria>=90% Covered 0.94152766 0.21554902 4.368
## HIV_AIDS -0.42895610 0.01838683 -23.330
## Income.composition.of.resources 10.53525738 0.84747104 12.431
## Schooling 0.93299775 0.06085501 15.331
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## Adult.Mortality < 0.0000000000000002 ***
## Alcohol 0.000671 ***
## percentage.expenditure 0.00000000000000502 ***
## BMI 0.00000000000020250 ***
## Diphtheria>=90% Covered 0.00001332152624881 ***
## HIV_AIDS < 0.0000000000000002 ***
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.709 on 1640 degrees of freedom
## Multiple R-squared: 0.8231, Adjusted R-squared: 0.8222
## F-statistic: 953.8 on 8 and 1640 DF, p-value: < 0.00000000000000022
We sould like to see, if we are only using numeric variables, which Variables that will come out as the best.
##
## Call:
## lm(formula = as.formula(as.character(formul)), data = don)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.9301 -3.0501 0.1115 3.1362 15.1075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.58327 0.52114 89.39 <0.0000000000000002 ***
## HIV_AIDS -0.66888 0.01910 -35.03 <0.0000000000000002 ***
## Schooling 1.98401 0.04121 48.14 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.57 on 1646 degrees of freedom
## Multiple R-squared: 0.7304, Adjusted R-squared: 0.7301
## F-statistic: 2230 on 2 and 1646 DF, p-value: < 0.00000000000000022
Create Model Based on Selected Variables:
model_regMod <- lm(formula = Life.expectancy ~ HIV_AIDS + Schooling, data = life_selected)
summary(model_regMod)##
## Call:
## lm(formula = Life.expectancy ~ HIV_AIDS + Schooling, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.9301 -3.0501 0.1115 3.1362 15.1075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.58327 0.52114 89.39 <0.0000000000000002 ***
## HIV_AIDS -0.66888 0.01910 -35.03 <0.0000000000000002 ***
## Schooling 1.98401 0.04121 48.14 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.57 on 1646 degrees of freedom
## Multiple R-squared: 0.7304, Adjusted R-squared: 0.7301
## F-statistic: 2230 on 2 and 1646 DF, p-value: < 0.00000000000000022
data.frame(model = c("model_backward","model_forward","model_both", "model_regs", "model_regMod"),
AdjRsquare = c(summary(model_backward)$adj.r.square,
summary(model_forward)$adj.r.square,
summary(model_both)$adj.r.square,
summary(model_regs)$adj.r.square,
summary(model_regMod)$adj.r.square))## model AdjRsquare
## 1 model_backward 0.8245570
## 2 model_forward 0.8245570
## 3 model_both 0.8245570
## 4 model_regs 0.8222260
## 5 model_regMod 0.7300628
From the given Result, we will choose model_backward as our model to predict the `Life.expectancy.
Create Prediction Model to define
Checking Errors with Various Methods
data.frame(Method = c("MSE","RMSE","MAE", "MAPE"),
Error.Value = c(MSE(life_pred, life_selected$Life.expectancy),
RMSE(life_pred, life_selected$Life.expectancy),
MAE(life_pred, life_selected$Life.expectancy),
MAPE(life_pred, life_selected$Life.expectancy)))## Method Error.Value
## 1 MSE 13.44479835
## 2 RMSE 3.66671493
## 3 MAE 2.81676542
## 4 MAPE 0.04251349
## [1] 44 89
If we take a look the Error Value from every methods, the error seems small compared to the range of the Life.expectancy as the Dependent Variable. Therefore we can assume that the predicted values will not so far from the actual values.
Most of the Residuals seems distributed on the center, indicates they are distributed normally.
Most of the residuals gathered on the center line, indicates they are distributed normally
##
## Shapiro-Wilk normality test
##
## data: model_regs$residuals
## W = 0.98912, p-value = 0.0000000008975
Based on Shapiro-Wilk normality test, the p-value < 0.05 implying that the distribution of the data are significantly different from normal distribution. Therefore, we need to do some adjustment to data.
We will try to remove the Outliers that keeped on previous findings.
outliers_out <- boxplot(life_selected$Life.expectancy, plot = F)$out # untuk mendaptkan outlier
life_clean <- life_selected[-which(life_selected$Life.expectancy %in% outliers_out), ] # remove outlier dari dataLet us see the Boxplot after Outliers taken
Unfortunately, there still some Outliers, so we will eliminate all data with Life.expectancy 50 and below
life_clean1 <- life_clean[life_clean$Life.expectancy > 50, ] # Eliminate all below Age 50
boxplot(life_clean1$Life.expectancy, ylab = "Life Expectancy (Age)")clean_full <- lm(formula = Life.expectancy ~., data = life_clean1)
clean_none <- lm(formula = Life.expectancy ~1, data = life_clean1)
clean_backward <- step(clean_full, direction = "backward")## Start: AIC=3973.83
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + thinness.5_9.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - thinness.5_9.years 1 0.8 18900 3971.9
## - GDP 1 2.3 18901 3972.0
## - Polio 1 9.0 18908 3972.6
## - thinness.10_19.years 1 11.0 18910 3972.8
## - Population 1 12.5 18911 3972.9
## <none> 18899 3973.8
## - Diphtheria 1 38.8 18938 3975.1
## - percentage.expenditure 1 43.9 18943 3975.5
## - Hepatitis.B 1 50.0 18949 3976.0
## - Status 1 71.5 18970 3977.8
## - infant.deaths 1 79.2 18978 3978.5
## - Measles 1 79.9 18979 3978.5
## - Total.expenditure 1 80.1 18979 3978.6
## - Alcohol 1 94.4 18993 3979.8
## - BMI 1 312.0 19211 3997.9
## - Income.composition.of.resources 1 1690.6 20590 4108.1
## - Schooling 1 2527.7 21427 4171.4
## - Adult.Mortality 1 3246.1 22145 4223.9
## - HIV_AIDS 1 4334.9 23234 4300.2
##
## Step: AIC=3971.9
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## GDP + Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - GDP 1 2.3 18902 3970.1
## - Polio 1 8.9 18909 3970.6
## - Population 1 12.4 18912 3970.9
## <none> 18900 3971.9
## - thinness.10_19.years 1 24.8 18925 3972.0
## - Diphtheria 1 38.5 18938 3973.1
## - percentage.expenditure 1 44.0 18944 3973.6
## - Hepatitis.B 1 49.6 18949 3974.1
## - Status 1 71.3 18971 3975.9
## - infant.deaths 1 78.4 18978 3976.5
## - Measles 1 79.3 18979 3976.6
## - Total.expenditure 1 79.5 18979 3976.6
## - Alcohol 1 94.6 18994 3977.8
## - BMI 1 313.7 19213 3996.1
## - Income.composition.of.resources 1 1691.6 20591 4106.2
## - Schooling 1 2541.5 21441 4170.5
## - Adult.Mortality 1 3248.6 22148 4222.1
## - HIV_AIDS 1 4337.9 23238 4298.4
##
## Step: AIC=3970.09
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Polio + Total.expenditure + Diphtheria + HIV_AIDS +
## Population + thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - Polio 1 9.3 18911 3968.9
## - Population 1 12.3 18914 3969.1
## <none> 18902 3970.1
## - thinness.10_19.years 1 25.2 18927 3970.2
## - Diphtheria 1 37.7 18940 3971.3
## - Hepatitis.B 1 48.4 18950 3972.2
## - Status 1 73.2 18975 3974.2
## - infant.deaths 1 77.9 18980 3974.6
## - Total.expenditure 1 78.7 18981 3974.7
## - Measles 1 79.3 18981 3974.7
## - Alcohol 1 94.5 18996 3976.0
## - BMI 1 312.9 19215 3994.2
## - percentage.expenditure 1 597.4 19499 4017.6
## - Income.composition.of.resources 1 1708.1 20610 4105.6
## - Schooling 1 2568.5 21470 4170.7
## - Adult.Mortality 1 3247.5 22149 4220.2
## - HIV_AIDS 1 4337.8 23240 4296.6
##
## Step: AIC=3968.86
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Population +
## thinness.10_19.years + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## - Population 1 12.4 18924 3967.9
## <none> 18911 3968.9
## - thinness.10_19.years 1 24.0 18935 3968.9
## - Hepatitis.B 1 42.9 18954 3970.5
## - Status 1 73.9 18985 3973.1
## - Measles 1 77.4 18989 3973.4
## - Total.expenditure 1 77.6 18989 3973.4
## - infant.deaths 1 79.6 18991 3973.5
## - Alcohol 1 92.7 19004 3974.6
## - Diphtheria 1 132.1 19043 3977.9
## - BMI 1 314.8 19226 3993.1
## - percentage.expenditure 1 594.3 19505 4016.1
## - Income.composition.of.resources 1 1714.8 20626 4104.9
## - Schooling 1 2584.1 21495 4170.5
## - Adult.Mortality 1 3255.0 22166 4219.4
## - HIV_AIDS 1 4390.3 23302 4298.8
##
## Step: AIC=3967.91
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + thinness.10_19.years +
## Income.composition.of.resources + Schooling
##
## Df Sum of Sq RSS AIC
## - thinness.10_19.years 1 23.3 18947 3967.9
## <none> 18924 3967.9
## - Hepatitis.B 1 43.4 18967 3969.6
## - infant.deaths 1 71.9 18995 3971.9
## - Status 1 73.0 18997 3972.0
## - Measles 1 74.5 18998 3972.2
## - Total.expenditure 1 77.3 19001 3972.4
## - Alcohol 1 94.2 19018 3973.8
## - Diphtheria 1 134.6 19058 3977.2
## - BMI 1 320.7 19244 3992.6
## - percentage.expenditure 1 596.5 19520 4015.3
## - Income.composition.of.resources 1 1714.6 20638 4103.8
## - Schooling 1 2628.0 21552 4172.7
## - Adult.Mortality 1 3247.1 22171 4217.7
## - HIV_AIDS 1 4405.0 23329 4298.6
##
## Step: AIC=3967.86
## Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Income.composition.of.resources +
## Schooling
##
## Df Sum of Sq RSS AIC
## <none> 18947 3967.9
## - Hepatitis.B 1 45.7 18993 3969.7
## - Status 1 71.4 19018 3971.8
## - Alcohol 1 80.9 19028 3972.6
## - Total.expenditure 1 83.4 19030 3972.8
## - Measles 1 87.1 19034 3973.2
## - Diphtheria 1 129.1 19076 3976.7
## - infant.deaths 1 137.8 19085 3977.4
## - BMI 1 423.0 19370 4001.0
## - percentage.expenditure 1 598.0 19545 4015.3
## - Income.composition.of.resources 1 1760.5 20707 4107.1
## - Schooling 1 2660.5 21607 4174.8
## - Adult.Mortality 1 3255.5 22202 4218.0
## - HIV_AIDS 1 4485.5 23432 4303.7
##
## Call:
## lm(formula = Life.expectancy ~ Status + Adult.Mortality + infant.deaths +
## Alcohol + percentage.expenditure + Hepatitis.B + Measles +
## BMI + Total.expenditure + Diphtheria + HIV_AIDS + Income.composition.of.resources +
## Schooling, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.785 -1.980 -0.008 2.263 13.405
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 55.41212847 0.72528872 76.400
## StatusDeveloping -0.79640456 0.32680840 -2.437
## Adult.Mortality -0.01758518 0.00106863 -16.456
## infant.deaths -0.00295954 0.00087412 -3.386
## Alcohol -0.08341745 0.03216117 -2.594
## percentage.expenditure 0.00040704 0.00005772 7.053
## Hepatitis.B>=90% Covered -0.59461631 0.30495556 -1.950
## Measles 0.00002824 0.00001049 2.691
## BMI 0.03242806 0.00546683 5.932
## Total.expenditure 0.10607817 0.04028236 2.633
## Diphtheria>=90% Covered 1.09897888 0.33539972 3.277
## HIV_AIDS -0.59567125 0.03083829 -19.316
## Income.composition.of.resources 9.67520937 0.79952369 12.101
## Schooling 0.86616701 0.05822545 14.876
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.014923 *
## Adult.Mortality < 0.0000000000000002 ***
## infant.deaths 0.000727 ***
## Alcohol 0.009582 **
## percentage.expenditure 0.00000000000262 ***
## Hepatitis.B>=90% Covered 0.051372 .
## Measles 0.007195 **
## BMI 0.00000000367619 ***
## Total.expenditure 0.008537 **
## Diphtheria>=90% Covered 0.001073 **
## HIV_AIDS < 0.0000000000000002 ***
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.467 on 1576 degrees of freedom
## Multiple R-squared: 0.8063, Adjusted R-squared: 0.8047
## F-statistic: 504.6 on 13 and 1576 DF, p-value: < 0.00000000000000022
## Start: AIC=6551.77
## Life.expectancy ~ 1
##
## Df Sum of Sq RSS AIC
## + Schooling 1 54529 43287 5257.5
## + Income.composition.of.resources 1 51607 46209 5361.4
## + Adult.Mortality 1 44653 53163 5584.3
## + BMI 1 28468 69348 6006.9
## + HIV_AIDS 1 28405 69410 6008.3
## + Status 1 20987 76829 6169.8
## + GDP 1 20982 76833 6169.9
## + thinness.5_9.years 1 20704 77112 6175.6
## + thinness.10_19.years 1 20532 77284 6179.2
## + Alcohol 1 19296 78519 6204.4
## + Polio 1 18881 78934 6212.8
## + Diphtheria 1 18680 79136 6216.8
## + percentage.expenditure 1 18206 79610 6226.3
## + Hepatitis.B 1 8291 89525 6412.9
## + Total.expenditure 1 4313 93503 6482.1
## + infant.deaths 1 3390 94426 6497.7
## + Measles 1 516 97300 6545.4
## + Population 1 135 97681 6551.6
## <none> 97816 6551.8
##
## Step: AIC=5257.55
## Life.expectancy ~ Schooling
##
## Df Sum of Sq RSS AIC
## + Adult.Mortality 1 15022.0 28265 4581.8
## + HIV_AIDS 1 14988.1 28299 4583.8
## + Income.composition.of.resources 1 5267.1 38020 5053.3
## + BMI 1 2517.9 40769 5164.3
## + GDP 1 1676.9 41610 5196.7
## + percentage.expenditure 1 1657.6 41629 5197.5
## + thinness.5_9.years 1 1598.0 41689 5199.7
## + thinness.10_19.years 1 1206.0 42081 5214.6
## + Polio 1 1046.5 42241 5220.6
## + Diphtheria 1 901.3 42386 5226.1
## + Status 1 887.3 42400 5226.6
## + Hepatitis.B 1 184.0 43103 5252.8
## + Alcohol 1 93.7 43193 5256.1
## + infant.deaths 1 65.8 43221 5257.1
## + Total.expenditure 1 56.4 43231 5257.5
## <none> 43287 5257.5
## + Measles 1 25.8 43261 5258.6
## + Population 1 1.6 43285 5259.5
##
## Step: AIC=4581.85
## Life.expectancy ~ Schooling + Adult.Mortality
##
## Df Sum of Sq RSS AIC
## + HIV_AIDS 1 5173.7 23091 4262.4
## + Income.composition.of.resources 1 2642.8 25622 4427.8
## + BMI 1 1020.4 27245 4525.4
## + GDP 1 947.6 27317 4529.6
## + percentage.expenditure 1 925.2 27340 4530.9
## + thinness.5_9.years 1 764.0 27501 4540.3
## + thinness.10_19.years 1 706.1 27559 4543.6
## + Polio 1 479.0 27786 4556.7
## + Diphtheria 1 451.5 27814 4558.2
## + Status 1 340.2 27925 4564.6
## + infant.deaths 1 172.0 28093 4574.1
## + Hepatitis.B 1 100.4 28165 4578.2
## <none> 28265 4581.8
## + Total.expenditure 1 25.5 28240 4582.4
## + Alcohol 1 24.1 28241 4582.5
## + Population 1 17.8 28247 4582.8
## + Measles 1 0.1 28265 4583.8
##
## Step: AIC=4262.4
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS
##
## Df Sum of Sq RSS AIC
## + Income.composition.of.resources 1 2383.53 20708 4091.2
## + percentage.expenditure 1 1117.44 21974 4185.5
## + GDP 1 1108.05 21983 4186.2
## + BMI 1 698.64 22393 4215.5
## + thinness.5_9.years 1 502.29 22589 4229.4
## + thinness.10_19.years 1 500.41 22591 4229.6
## + Status 1 363.47 22728 4239.2
## + Total.expenditure 1 211.74 22880 4249.8
## + Diphtheria 1 163.51 22928 4253.1
## + Alcohol 1 147.89 22943 4254.2
## + infant.deaths 1 145.57 22946 4254.3
## + Polio 1 140.98 22950 4254.7
## <none> 23091 4262.4
## + Population 1 28.61 23063 4262.4
## + Hepatitis.B 1 5.36 23086 4264.0
## + Measles 1 1.11 23090 4264.3
##
## Step: AIC=4091.17
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources
##
## Df Sum of Sq RSS AIC
## + percentage.expenditure 1 784.52 19923 4031.8
## + GDP 1 729.44 19978 4036.2
## + BMI 1 446.26 20262 4058.5
## + thinness.5_9.years 1 304.70 20403 4069.6
## + thinness.10_19.years 1 289.35 20418 4070.8
## + Total.expenditure 1 209.53 20498 4077.0
## + infant.deaths 1 206.25 20502 4077.3
## + Status 1 201.28 20507 4077.6
## + Diphtheria 1 82.48 20625 4086.8
## + Polio 1 70.22 20638 4087.8
## + Population 1 46.81 20661 4089.6
## <none> 20708 4091.2
## + Alcohol 1 7.60 20700 4092.6
## + Measles 1 1.81 20706 4093.0
## + Hepatitis.B 1 1.51 20706 4093.1
##
## Step: AIC=4031.76
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure
##
## Df Sum of Sq RSS AIC
## + BMI 1 459.75 19464 3996.6
## + thinness.5_9.years 1 255.24 19668 4013.3
## + thinness.10_19.years 1 248.96 19674 4013.8
## + infant.deaths 1 198.64 19725 4017.8
## + Total.expenditure 1 143.66 19780 4022.3
## + Diphtheria 1 92.69 19831 4026.4
## + Polio 1 87.94 19835 4026.7
## + Population 1 45.13 19878 4030.2
## + Status 1 36.88 19886 4030.8
## <none> 19923 4031.8
## + Hepatitis.B 1 17.28 19906 4032.4
## + Alcohol 1 10.71 19913 4032.9
## + GDP 1 1.67 19922 4033.6
## + Measles 1 0.37 19923 4033.7
##
## Step: AIC=3996.64
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI
##
## Df Sum of Sq RSS AIC
## + Diphtheria 1 129.618 19334 3988.0
## + infant.deaths 1 117.086 19346 3989.1
## + Polio 1 115.371 19348 3989.2
## + Total.expenditure 1 105.723 19358 3990.0
## + thinness.10_19.years 1 73.332 19390 3992.6
## + thinness.5_9.years 1 71.556 19392 3992.8
## + Status 1 37.304 19426 3995.6
## + Hepatitis.B 1 29.100 19434 3996.3
## <none> 19464 3996.6
## + Population 1 24.439 19439 3996.6
## + Alcohol 1 15.874 19448 3997.3
## + Measles 1 4.318 19459 3998.3
## + GDP 1 2.880 19461 3998.4
##
## Step: AIC=3988.02
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria
##
## Df Sum of Sq RSS AIC
## + Total.expenditure 1 87.195 19247 3982.8
## + infant.deaths 1 84.773 19249 3983.0
## + thinness.10_19.years 1 78.368 19256 3983.6
## + thinness.5_9.years 1 68.258 19266 3984.4
## + Hepatitis.B 1 33.278 19301 3987.3
## + Status 1 31.775 19302 3987.4
## + Alcohol 1 28.304 19306 3987.7
## <none> 19334 3988.0
## + Population 1 16.356 19318 3988.7
## + Measles 1 8.449 19325 3989.3
## + Polio 1 2.701 19331 3989.8
## + GDP 1 1.355 19333 3989.9
##
## Step: AIC=3982.83
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure
##
## Df Sum of Sq RSS AIC
## + infant.deaths 1 72.396 19174 3978.8
## + thinness.10_19.years 1 63.923 19183 3979.5
## + thinness.5_9.years 1 53.326 19193 3980.4
## + Alcohol 1 34.439 19212 3982.0
## + Hepatitis.B 1 33.222 19214 3982.1
## + Status 1 26.246 19221 3982.7
## <none> 19247 3982.8
## + Measles 1 13.391 19233 3983.7
## + Population 1 12.352 19234 3983.8
## + Polio 1 3.237 19244 3984.6
## + GDP 1 2.116 19245 3984.7
##
## Step: AIC=3978.84
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure +
## infant.deaths
##
## Df Sum of Sq RSS AIC
## + Measles 1 84.473 19090 3973.8
## + Hepatitis.B 1 34.805 19140 3978.0
## + Alcohol 1 29.598 19145 3978.4
## + Status 1 26.751 19148 3978.6
## <none> 19174 3978.8
## + thinness.10_19.years 1 23.899 19150 3978.9
## + thinness.5_9.years 1 17.443 19157 3979.4
## + Population 1 9.306 19165 3980.1
## + GDP 1 2.314 19172 3980.6
## + Polio 1 1.986 19172 3980.7
##
## Step: AIC=3973.82
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure +
## infant.deaths + Measles
##
## Df Sum of Sq RSS AIC
## + Hepatitis.B 1 34.563 19055 3972.9
## + Alcohol 1 29.720 19060 3973.3
## + Status 1 28.268 19062 3973.5
## <none> 19090 3973.8
## + thinness.10_19.years 1 14.092 19076 3974.6
## + Population 1 12.580 19077 3974.8
## + thinness.5_9.years 1 8.747 19081 3975.1
## + Polio 1 3.251 19087 3975.5
## + GDP 1 2.250 19088 3975.6
##
## Step: AIC=3972.94
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure +
## infant.deaths + Measles + Hepatitis.B
##
## Df Sum of Sq RSS AIC
## + Alcohol 1 37.094 19018 3971.8
## + Status 1 27.610 19028 3972.6
## <none> 19055 3972.9
## + Population 1 12.271 19043 3973.9
## + thinness.10_19.years 1 11.856 19043 3973.9
## + Polio 1 7.734 19048 3974.3
## + thinness.5_9.years 1 6.623 19049 3974.4
## + GDP 1 3.776 19052 3974.6
##
## Step: AIC=3971.84
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure +
## infant.deaths + Measles + Hepatitis.B + Alcohol
##
## Df Sum of Sq RSS AIC
## + Status 1 71.394 18947 3967.9
## <none> 19018 3971.8
## + thinness.10_19.years 1 21.639 18997 3972.0
## + thinness.5_9.years 1 13.487 19005 3972.7
## + Population 1 10.846 19007 3972.9
## + Polio 1 8.924 19009 3973.1
## + GDP 1 4.738 19013 3973.4
##
## Step: AIC=3967.86
## Life.expectancy ~ Schooling + Adult.Mortality + HIV_AIDS + Income.composition.of.resources +
## percentage.expenditure + BMI + Diphtheria + Total.expenditure +
## infant.deaths + Measles + Hepatitis.B + Alcohol + Status
##
## Df Sum of Sq RSS AIC
## <none> 18947 3967.9
## + thinness.10_19.years 1 23.2693 18924 3967.9
## + thinness.5_9.years 1 13.9540 18933 3968.7
## + Population 1 11.6307 18935 3968.9
## + Polio 1 8.2642 18939 3969.2
## + GDP 1 2.7072 18944 3969.6
##
## Call:
## lm(formula = Life.expectancy ~ Schooling + Adult.Mortality +
## HIV_AIDS + Income.composition.of.resources + percentage.expenditure +
## BMI + Diphtheria + Total.expenditure + infant.deaths + Measles +
## Hepatitis.B + Alcohol + Status, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.785 -1.980 -0.008 2.263 13.405
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 55.41212847 0.72528872 76.400
## Schooling 0.86616701 0.05822545 14.876
## Adult.Mortality -0.01758518 0.00106863 -16.456
## HIV_AIDS -0.59567125 0.03083829 -19.316
## Income.composition.of.resources 9.67520937 0.79952369 12.101
## percentage.expenditure 0.00040704 0.00005772 7.053
## BMI 0.03242806 0.00546683 5.932
## Diphtheria>=90% Covered 1.09897888 0.33539972 3.277
## Total.expenditure 0.10607817 0.04028236 2.633
## infant.deaths -0.00295954 0.00087412 -3.386
## Measles 0.00002824 0.00001049 2.691
## Hepatitis.B>=90% Covered -0.59461631 0.30495556 -1.950
## Alcohol -0.08341745 0.03216117 -2.594
## StatusDeveloping -0.79640456 0.32680840 -2.437
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## Adult.Mortality < 0.0000000000000002 ***
## HIV_AIDS < 0.0000000000000002 ***
## Income.composition.of.resources < 0.0000000000000002 ***
## percentage.expenditure 0.00000000000262 ***
## BMI 0.00000000367619 ***
## Diphtheria>=90% Covered 0.001073 **
## Total.expenditure 0.008537 **
## infant.deaths 0.000727 ***
## Measles 0.007195 **
## Hepatitis.B>=90% Covered 0.051372 .
## Alcohol 0.009582 **
## StatusDeveloping 0.014923 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.467 on 1576 degrees of freedom
## Multiple R-squared: 0.8063, Adjusted R-squared: 0.8047
## F-statistic: 504.6 on 13 and 1576 DF, p-value: < 0.00000000000000022
Unfortunately, the adj. R-squared fell down drastically from 0.8245570 into 0.8047. This is not what we are expecting for. Therefore, we will keep the origin of data, with outliers.
log_life <- lm(formula = log1p(Life.expectancy) ~ Status + log1p(Adult.Mortality) + log1p(infant.deaths) +
log1p(Alcohol) + log1p(percentage.expenditure) + Hepatitis.B + log1p(Measles) +
log1p(BMI) + log1p(Total.expenditure) + Diphtheria + log1p(HIV_AIDS) + log1p(thinness.10_19.years) + log1p(Income.composition.of.resources) + log1p(Schooling), data = life_selected)
summary(log_life)##
## Call:
## lm(formula = log1p(Life.expectancy) ~ Status + log1p(Adult.Mortality) +
## log1p(infant.deaths) + log1p(Alcohol) + log1p(percentage.expenditure) +
## Hepatitis.B + log1p(Measles) + log1p(BMI) + log1p(Total.expenditure) +
## Diphtheria + log1p(HIV_AIDS) + log1p(thinness.10_19.years) +
## log1p(Income.composition.of.resources) + log1p(Schooling),
## data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.255542 -0.028055 0.001655 0.030288 0.189695
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 3.9699293 0.0252575 157.178
## StatusDeveloping -0.0065261 0.0045033 -1.449
## log1p(Adult.Mortality) -0.0099200 0.0013338 -7.438
## log1p(infant.deaths) -0.0054746 0.0011774 -4.650
## log1p(Alcohol) 0.0029923 0.0020038 1.493
## log1p(percentage.expenditure) 0.0066102 0.0008564 7.719
## Hepatitis.B>=90% Covered -0.0090513 0.0044033 -2.056
## log1p(Measles) -0.0002588 0.0005172 -0.500
## log1p(BMI) 0.0024199 0.0019714 1.228
## log1p(Total.expenditure) 0.0068497 0.0035765 1.915
## Diphtheria>=90% Covered 0.0128582 0.0047825 2.689
## log1p(HIV_AIDS) -0.0923361 0.0019639 -47.018
## log1p(thinness.10_19.years) -0.0096884 0.0024131 -4.015
## log1p(Income.composition.of.resources) 0.1784654 0.0157667 11.319
## log1p(Schooling) 0.0995534 0.0096981 10.265
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.14748
## log1p(Adult.Mortality) 0.0000000000001648 ***
## log1p(infant.deaths) 0.0000035909852119 ***
## log1p(Alcohol) 0.13555
## log1p(percentage.expenditure) 0.0000000000000203 ***
## Hepatitis.B>=90% Covered 0.03998 *
## log1p(Measles) 0.61695
## log1p(BMI) 0.21979
## log1p(Total.expenditure) 0.05564 .
## Diphtheria>=90% Covered 0.00725 **
## log1p(HIV_AIDS) < 0.0000000000000002 ***
## log1p(thinness.10_19.years) 0.0000621539384934 ***
## log1p(Income.composition.of.resources) < 0.0000000000000002 ***
## log1p(Schooling) < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0515 on 1634 degrees of freedom
## Multiple R-squared: 0.851, Adjusted R-squared: 0.8498
## F-statistic: 666.8 on 14 and 1634 DF, p-value: < 0.00000000000000022
Looks promising, the Adj. R-Squared even bigger than the “model_backward”
lambda <- bc$x[which.max(bc$y)] # choose the best lambda
powerTransform <- function(y, lambda1, lambda2 = NULL, method = "boxcox") {
boxcoxTrans <- function(x, lam1, lam2 = NULL) {
# if we set lambda2 to zero, it becomes the one parameter transformation
lam2 <- ifelse(is.null(lam2), 0, lam2)
if (lam1 == 0L) {
log(y + lam2)
} else {
(((y + lam2)^lam1) - 1) / lam1
}
}
switch(method
, boxcox = boxcoxTrans(y, lambda1, lambda2)
, tukey = y^lambda1
)
}
# re-run with transformation
boxcox_life <- lm(powerTransform(Life.expectancy, lambda) ~ Status + Adult.Mortality + infant.deaths +
Alcohol + percentage.expenditure + Hepatitis.B + Measles +
BMI + Total.expenditure + Diphtheria + HIV_AIDS + thinness.10_19.years +
Income.composition.of.resources + Schooling, data = life_selected)
summary(boxcox_life)##
## Call:
## lm(formula = powerTransform(Life.expectancy, lambda) ~ Status +
## Adult.Mortality + infant.deaths + Alcohol + percentage.expenditure +
## Hepatitis.B + Measles + BMI + Total.expenditure + Diphtheria +
## HIV_AIDS + thinness.10_19.years + Income.composition.of.resources +
## Schooling, data = life_selected)
##
## Residuals:
## Min 1Q Median 3Q Max
## -216.690 -32.484 0.082 34.665 194.624
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 423.2163283 11.8150671 35.820
## StatusDeveloping -18.4114001 5.0610413 -3.638
## Adult.Mortality -0.2569250 0.0141492 -18.158
## infant.deaths -0.0275159 0.0149145 -1.845
## Alcohol -2.2417078 0.4947660 -4.531
## percentage.expenditure 0.0068207 0.0008978 7.597
## Hepatitis.B>=90% Covered -9.9855002 4.6088322 -2.167
## Measles 0.0002341 0.0001576 1.486
## BMI 0.5115523 0.0892458 5.732
## Total.expenditure 1.2746992 0.6108360 2.087
## Diphtheria>=90% Covered 20.8908095 5.0661567 4.124
## HIV_AIDS -5.5564512 0.2704781 -20.543
## thinness.10_19.years -0.7798239 0.4082681 -1.910
## Income.composition.of.resources 153.2651542 12.4259713 12.334
## Schooling 13.0819270 0.8964068 14.594
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## StatusDeveloping 0.000283 ***
## Adult.Mortality < 0.0000000000000002 ***
## infant.deaths 0.065231 .
## Alcohol 0.0000063012976700 ***
## percentage.expenditure 0.0000000000000507 ***
## Hepatitis.B>=90% Covered 0.030409 *
## Measles 0.137488
## BMI 0.0000000118010916 ***
## Total.expenditure 0.037060 *
## Diphtheria>=90% Covered 0.0000391729179153 ***
## HIV_AIDS < 0.0000000000000002 ***
## thinness.10_19.years 0.056298 .
## Income.composition.of.resources < 0.0000000000000002 ***
## Schooling < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 53.99 on 1634 degrees of freedom
## Multiple R-squared: 0.8212, Adjusted R-squared: 0.8197
## F-statistic: 536 on 14 and 1634 DF, p-value: < 0.00000000000000022
The Adj. R-Squared is smaller than the “model_backward”, we will not consider the box-cox transformation, and will use log transformation as our new model.
The Residuals seems has a better distribution at center.
The residuals under -2 and over 2 fell way above/below the center line. Seems still not following the normal distribution.
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.9798 0.0000
## Kolmogorov-Smirnov 0.0439 0.0035
## Cramer-von Mises 497.4152 0.1200
## Anderson-Darling 6.03 0.0000
## -----------------------------------------------
Based on Shapiro-Wilk, Kolmogorov-Smirnov, and Anderson-Darling normality test, the p-value < 0.05 implying that the distribution of the data are significantly different from normal distribution. Therefore, we need to do some adjustment to data. Only in Cramer-von Mises the p-value is > 0.05. We may conclude that the residuals still not following the normal distribution.
The error seems not following particular pattern, by visual plot analysis.
##
## studentized Breusch-Pagan test
##
## data: log_life
## BP = 81.344, df = 14, p-value = 0.00000000001592
##
## Breusch Pagan Test for Heteroskedasticity
## -----------------------------------------
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## --------------------------------------------------
## Response : log1p(Life.expectancy)
## Variables: fitted values of log1p(Life.expectancy)
##
## Test Summary
## ----------------------------------------------
## DF = 1
## Chi2 = 85.30346
## Prob > Chi2 = 0.0000000000000000000255915
Using 2 different function to test the homocedasticity, we still get conclusion that the residuals variance is not constant.
## Status
## 1.578679
## log1p(Adult.Mortality)
## 1.210762
## log1p(infant.deaths)
## 2.340268
## log1p(Alcohol)
## 1.866246
## log1p(percentage.expenditure)
## 1.678846
## Hepatitis.B
## 3.013337
## log1p(Measles)
## 1.754956
## log1p(BMI)
## 1.351176
## log1p(Total.expenditure)
## 1.100274
## Diphtheria
## 3.478853
## log1p(HIV_AIDS)
## 1.534598
## log1p(thinness.10_19.years)
## 1.864135
## log1p(Income.composition.of.resources)
## 2.353177
## log1p(Schooling)
## 3.115301
After tested, we can conclude that all of the independent variables are not correlated each other, since the vif test values are < 10.
# create some function to run cor.test silmutenaously
cor.test.all <- function(data,target) {
names <- names(data)
df <- NULL
for (i in 1:length(names)) {
y <- target
x <- names[[i]]
p_value <- cor.test(data[,y], data[,x])[3]
temp <- data.frame(x = x,
y = y,
p_value = as.numeric(p_value))
df <- rbind(df,temp)
}
return(df)
}
data_num2 <- life_selected %>%
select(Life.expectancy, Status, Adult.Mortality, infant.deaths, Alcohol, percentage.expenditure, Hepatitis.B, Measles,
BMI, Total.expenditure, Diphtheria, HIV_AIDS, thinness.10_19.years,
Income.composition.of.resources, Schooling) %>% # select only variables in model_backward
select_if(is.numeric)
p_value <- cor.test.all(data_num2, "Life.expectancy")
p_value %>%
filter(p_value > 0.05)## [1] x y p_value
## <0 rows> (or 0-length row.names)
All selected variables have linear correlation with the Dependent Variable, since no p-value > 0.05.
The linear model seems fit to predict Life.expectancy based on the Adj. R-Squared value, Error Value, and pass 2 of 4 Assumption Check, which is the Multicollinearity and Linearity Test. However, the Normality and Homocedasticity doesn’t give expected result. Even when we look at the visualization the residuals plot seems following Normal Distribution and Homocedasticity principle, but the statistic test giving different result.
The Linear Model can be used to explain the linear correlation between Life.expectancy and the selected independent variables. However, since this model is highly sensitive to outliers (which quite massive occured in this data and taking it out is not a good option), it is highly recommended to see the outliers pattern if you still wish to use this model on the new set of Life.expectancy data.