The attached who.csv dataset contains real-world data from 2008. The variables included follow. Country: name of the country LifeExp: average life expectancy for the country in years InfantSurvival: proportion of those surviving to one year or more Under5Survival: proportion of those surviving to five years or more TBFree: proportion of the population without TB. PropMD: proportion of the population who are MDs PropRN: proportion of the population who are RNs PersExp: mean personal expenditures on healthcare in US dollars at average exchange rate GovtExp: mean government expenditures per capita on healthcare, US dollars at average exchange rate TotExp: sum of personal and government expenditures.
data<-read.csv("C:\\Users\\jkks9\\Documents\\DATA 605\\who.csv")
plot(data$LifeExp,data$TotExp)
lm<-lm(data$LifeExp ~ data$TotExp)
summary(lm)
##
## Call:
## lm(formula = data$LifeExp ~ data$TotExp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.764 -4.778 3.154 7.116 13.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01 7.535e-01 85.933 < 2e-16 ***
## data$TotExp 6.297e-05 7.795e-06 8.079 7.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2537
## F-statistic: 65.26 on 1 and 188 DF, p-value: 7.714e-14
P-value indiates that it is statistically significant which is anything below a value of .05. And then from looking at the R squared values, they are very low. Which indicate that only 26% of the model can be explained. Thus based on that, it would be considered a very poor model.
lm2<-lm((data$LifeExp^4.6)~I(data$TotExp^.06))
summary(lm2)
##
## Call:
## lm(formula = (data$LifeExp^4.6) ~ I(data$TotExp^0.06))
##
## Residuals:
## Min 1Q Median 3Q Max
## -308616089 -53978977 13697187 59139231 211951764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -736527910 46817945 -15.73 <2e-16 ***
## I(data$TotExp^0.06) 620060216 27518940 22.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared: 0.7298, Adjusted R-squared: 0.7283
## F-statistic: 507.7 on 1 and 188 DF, p-value: < 2.2e-16
P-value is lower which still indicates that it is statistically significant. R-squared is much higher, close to 50% higher which indicates much better performance of the model. Based just on R-squared comparisons alone, the second model is better.
LifeExp<-function(forecast)
{ y <- -736527910 + 620060216 * (forecast)
y <- y^(1/4.6)
print(y)
}
LifeExp(1.5)
## [1] 63.31153
LifeExp(2.5)
## [1] 86.50645
lm3<-lm(data$LifeExp ~ data$PropMD + data$TotExp + data$PropMD*data$TotExp)
summary(lm3)
##
## Call:
## lm(formula = data$LifeExp ~ data$PropMD + data$TotExp + data$PropMD *
## data$TotExp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.320 -4.132 2.098 6.540 13.074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.277e+01 7.956e-01 78.899 < 2e-16 ***
## data$PropMD 1.497e+03 2.788e+02 5.371 2.32e-07 ***
## data$TotExp 7.233e-05 8.982e-06 8.053 9.39e-14 ***
## data$PropMD:data$TotExp -6.026e-03 1.472e-03 -4.093 6.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared: 0.3574, Adjusted R-squared: 0.3471
## F-statistic: 34.49 on 3 and 186 DF, p-value: < 2.2e-16
P value is close to 2nd model. However, the r-squared while better then the 1st model is significantly lower than the 2nd model by almost 40% less. This model falls in between the best model so far which is the 2nd model and the worst model of the three, the 1st model.
LifeExp2<-((6.277*10^1) + (1.497*10^3)*.03 + (7.233*10^(-5))*14 - ((6.026*10^(-3))*0.03*14))
LifeExp2
## [1] 107.6785
The life expectancy does not appear to realistic because it seems a little too high. I base that off googling what is the life expectancy today and one site referenced the CDC National Center For Health Statistics which for the US life expectancy is around 80 which is almost 20 lower than the forecasted value.