MyData <- read.csv(file= "/Users/GD/Desktop/who.csv", header=TRUE, sep=",")
head(MyData)
## Country LifeExp InfantSurvival Under5Survival TBFree
## 1 Afghanistan 42 0.835 0.743 0.99769
## 2 Albania 71 0.985 0.983 0.99974
## 3 Algeria 71 0.967 0.962 0.99944
## 4 Andorra 82 0.997 0.996 0.99983
## 5 Angola 41 0.846 0.740 0.99656
## 6 Antigua and Barbuda 73 0.990 0.989 0.99991
## PropMD PropRN PersExp GovtExp TotExp
## 1 0.000228841 0.000572294 20 92 112
## 2 0.001143127 0.004614439 169 3128 3297
## 3 0.001060478 0.002091362 108 5184 5292
## 4 0.003297297 0.003500000 2589 169725 172314
## 5 0.000070400 0.001146162 36 1620 1656
## 6 0.000142857 0.002773810 503 12543 13046
model1 <- lm(LifeExp ~ TotExp,data = MyData)
summary(model1)
##
## Call:
## lm(formula = LifeExp ~ TotExp, data = MyData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.764 -4.778 3.154 7.116 13.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01 7.535e-01 85.933 < 2e-16 ***
## TotExp 6.297e-05 7.795e-06 8.079 7.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2537
## F-statistic: 65.26 on 1 and 188 DF, p-value: 7.714e-14
Looking at all the avove values,we can conclude that model is not a good fit.
T_LifeExp <- ((MyData$LifeExp) ** (4.6))
T_TotExp <- ((MyData$TotExp) ** (0.06))
model2 <- lm(T_LifeExp ~ T_TotExp)
summary(model2)
##
## Call:
## lm(formula = T_LifeExp ~ T_TotExp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -308616089 -53978977 13697187 59139231 211951764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -736527910 46817945 -15.73 <2e-16 ***
## T_TotExp 620060216 27518940 22.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared: 0.7298, Adjusted R-squared: 0.7283
## F-statistic: 507.7 on 1 and 188 DF, p-value: < 2.2e-16
plot(T_TotExp,T_LifeExp)
abline(model2)
Comparing results of model1 and model2 ,We can safely conclude that model2 is better fit. If we compare R-squared value for model2(0.7283) and model1(0.2537),model2 satisfies the condition for linear regression fit. (R^2 should tends toward 1), Though the Residual standard error is significantly higher as compared to model1, we can blame it on log transformation of dependant variable in model1.(Hence can be ignored)
Therefore model2 is ‘better’.
we can write a function to forecast the life expectancy with linear equation
Y = -736527909 + 620060216 * X
forecast <- function(X){
Y <- (-736527909 + 620060216 * X) ^ (1/4.6)
return (Y)
}
round(forecast(1.5),2)
## [1] 63.31
round(forecast(2.5),2)
## [1] 86.51
LifeExp = b0+ b1 x PropMd + b2 x TotExp + b3 x PropMD x TotExp
last_leg <- (MyData$PropMD) * (MyData$TotExp)
model3 <- lm(LifeExp ~ PropMD + TotExp + last_leg,data = MyData)
summary(model3)
##
## Call:
## lm(formula = LifeExp ~ PropMD + TotExp + last_leg, data = MyData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.320 -4.132 2.098 6.540 13.074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.277e+01 7.956e-01 78.899 < 2e-16 ***
## PropMD 1.497e+03 2.788e+02 5.371 2.32e-07 ***
## TotExp 7.233e-05 8.982e-06 8.053 9.39e-14 ***
## last_leg -6.026e-03 1.472e-03 -4.093 6.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared: 0.3574, Adjusted R-squared: 0.3471
## F-statistic: 34.49 on 3 and 186 DF, p-value: < 2.2e-16
model3 has an F-statistic of 34.49 and a statistically significant p-value < 2.2e-16, a residual standard error of 8.765, and adjusted R-squared 0.3471.To be clear,this model is not a good fit for regression but still can prove better that model 1 as standard residual error is 8.765 and R squared value is 0.3471 ,which indicates that only 34.71 of variation from data can be understood by us.
PropMD <- 0.03
TotExp <- 14
Expect_forcast <- function(propmd, totexp) {
(-724418697 + (47273338389 * propmd) + (604795792 * totexp) - (21214671638 * propmd * totexp))^(1/4.6)
}
Expect_forcast(0.03,14)
## [1] 66.97703
The maximum forecast we had is 83.3(as per question3).Therefore 66.97 seems realistic forecast.