We use the following libraries
library(car)
library(lmtest)
rm(list=ls())
adot <- read.csv2("Life_Expectancy_Data.csv",header=TRUE,sep=",",dec=".")
head(adot)
NA
We want the data just from the year 2011 (Data Selection)
attach(adot)
adot <- adot[Year==2011,]
adot
We want to choose just some of the columns
adot <- adot[,c("Life_expectancy","Alcohol","percentage_expenditure","BMI")]
If we have any empty spaces in the database, than we delete the corresponding records
adot <- na.omit(adot)
summary(adot)
Life_expectancy Alcohol percentage_expenditure BMI
Min. :48.90 Min. : 0.010 Min. : 0.00 Min. : 2.50
1st Qu.:64.30 1st Qu.: 1.230 1st Qu.: 20.83 1st Qu.:22.00
Median :73.40 Median : 4.090 Median : 158.28 Median :45.30
Mean :70.78 Mean : 4.887 Mean : 1039.82 Mean :39.85
3rd Qu.:76.10 3rd Qu.: 8.110 3rd Qu.: 683.92 3rd Qu.:57.90
Max. :88.00 Max. :17.310 Max. :18822.87 Max. :75.70
attach(adot)
#GDPPerCapita <- adot$GDP / adot$Polulation
result <-lm(Life_expectancy ~ +1 + Alcohol + BMI + percentage_expenditure )
summary(result)
Call:
lm(formula = Life_expectancy ~ +1 + Alcohol + BMI + percentage_expenditure)
Residuals:
Min 1Q Median 3Q Max
-17.2645 -4.0634 0.6862 4.0606 17.3444
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.993e+01 1.190e+00 50.356 < 2e-16 ***
Alcohol 5.215e-01 1.377e-01 3.786 0.000209 ***
BMI 1.879e-01 2.582e-02 7.277 1.07e-11 ***
percentage_expenditure 7.888e-04 2.057e-04 3.834 0.000175 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.736 on 177 degrees of freedom
Multiple R-squared: 0.4344, Adjusted R-squared: 0.4248
F-statistic: 45.31 on 3 and 177 DF, p-value: < 2.2e-16
Based on the estimations, all of the regressors positivelly influnce the Life Expectancy (unfortunatelly, we are not able to explain this fact). It is recommended to include some other additional variables from the original database.
plot(result)
Interpret the graphs…