Get the Data and Review Data
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 190 obs. of 10 variables:
## $ Country : chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
## $ LifeExp : num 42 71 71 82 41 73 75 69 82 80 ...
## $ InfantSurvival: num 0.835 0.985 0.967 0.997 0.846 0.99 0.986 0.979 0.995 0.996 ...
## $ Under5Survival: num 0.743 0.983 0.962 0.996 0.74 0.989 0.983 0.976 0.994 0.996 ...
## $ TBFree : num 0.998 1 0.999 1 0.997 ...
## $ PropMD : num 2.29e-04 1.14e-03 1.06e-03 3.30e-03 7.04e-05 ...
## $ PropRN : num 0.000572 0.004614 0.002091 0.0035 0.001146 ...
## $ PersExp : num 20 169 108 2589 36 ...
## $ GovtExp : num 92 3128 5184 169725 1620 ...
## $ TotExp : num 112 3297 5292 172314 1656 ...
## - attr(*, "spec")=
## .. cols(
## .. Country = col_character(),
## .. LifeExp = col_double(),
## .. InfantSurvival = col_double(),
## .. Under5Survival = col_double(),
## .. TBFree = col_double(),
## .. PropMD = col_double(),
## .. PropRN = col_double(),
## .. PersExp = col_double(),
## .. GovtExp = col_double(),
## .. TotExp = col_double()
## .. )
## Country LifeExp InfantSurvival Under5Survival
## Length:190 Min. :40.00 Min. :0.8350 Min. :0.7310
## Class :character 1st Qu.:61.25 1st Qu.:0.9433 1st Qu.:0.9253
## Mode :character Median :70.00 Median :0.9785 Median :0.9745
## Mean :67.38 Mean :0.9624 Mean :0.9459
## 3rd Qu.:75.00 3rd Qu.:0.9910 3rd Qu.:0.9900
## Max. :83.00 Max. :0.9980 Max. :0.9970
## TBFree PropMD PropRN
## Min. :0.9870 Min. :0.0000196 Min. :0.0000883
## 1st Qu.:0.9969 1st Qu.:0.0002444 1st Qu.:0.0008455
## Median :0.9992 Median :0.0010474 Median :0.0027584
## Mean :0.9980 Mean :0.0017954 Mean :0.0041336
## 3rd Qu.:0.9998 3rd Qu.:0.0024584 3rd Qu.:0.0057164
## Max. :1.0000 Max. :0.0351290 Max. :0.0708387
## PersExp GovtExp TotExp
## Min. : 3.00 Min. : 10.0 Min. : 13
## 1st Qu.: 36.25 1st Qu.: 559.5 1st Qu.: 584
## Median : 199.50 Median : 5385.0 Median : 5541
## Mean : 742.00 Mean : 40953.5 Mean : 41696
## 3rd Qu.: 515.25 3rd Qu.: 25680.2 3rd Qu.: 26331
## Max. :6350.00 Max. :476420.0 Max. :482750
- Provide a scatter plot of LifeExp~TotExp, and run simple linear regression. Do not transform the variables. Provide and interpret the F statistics, R^2, standard error,and p-values only. Discuss whether the assumptions of simple linear regression met.
Scatter Plot - Life Expectancy vs Total Expenditures

Model - Mod1: lm(LifeExp~TotExp, whodat)
|
term
|
estimate
|
std.error
|
statistic
|
p.value
|
|
(Intercept)
|
64.753375
|
0.7535366
|
85.932619
|
0
|
|
TotExp
|
0.000063
|
0.0000078
|
8.078626
|
0
|
Model Evaluation
|
r.squared
|
adj.r.squared
|
sigma
|
statistic
|
p.value
|
df
|
logLik
|
AIC
|
BIC
|
deviance
|
df.residual
|
|
0.2576922
|
0.2537437
|
9.371033
|
65.2642
|
0
|
2
|
-693.7415
|
1393.483
|
1403.224
|
16509.46
|
188
|
The scatter plot of Total Expenditures and Life Expectancy does not scream linear relationship so it difficult to say all the assumptions of a simple linear regression are met. The intercept of 64.7 means that with no expenditures one could expect to live 64 years. The coefficient for TotExp is very small, however, this likely reflects the scale of the expenditures. The p-values (0 values) for the intercept and coefficient indicate the variables are statistically significant. The R-Squared and Adj R-Squared around .25 indicates that Tot Exp explains approximately 25% of the variance - additional variable may improve the fit. The F-statistic of 65.26 and the p-value of zero mean that we can reject the hypothesis that the model is not better than the zero beta model.
- Raise life expectancy to the 4.6 power (i.e., LifeExp^4.6). Raise total expenditures to the 0.06 power (nearly a log transform, TotExp^.06). Plot LifeExp^4.6 as a function of TotExp^.06, and r re-run the simple regression mod1 using the transformed variables. Provide and interpret the F statistics, R^2, standard error, and p-values. Which mod1 is “better?”
Scatter Plot

Model - Mod2: lm(LifeExpTrans2 ~ TotExpTrans2, whodat2)
|
term
|
estimate
|
std.error
|
statistic
|
p.value
|
|
(Intercept)
|
-736527909
|
46817945
|
-15.73174
|
0
|
|
TotExpTrans2
|
620060216
|
27518940
|
22.53213
|
0
|
Model Evaluation
|
r.squared
|
adj.r.squared
|
sigma
|
statistic
|
p.value
|
df
|
logLik
|
AIC
|
BIC
|
deviance
|
df.residual
|
|
0.7297673
|
0.7283299
|
90492393
|
507.6967
|
0
|
2
|
-3749.541
|
7505.081
|
7514.822
|
1.539508e+18
|
188
|
Model Evaluation
|
r.squared
|
adj.r.squared
|
sigma
|
statistic
|
p.value
|
df
|
logLik
|
AIC
|
BIC
|
deviance
|
df.residual
|
|
0.3574352
|
0.3470713
|
8.765493
|
34.48833
|
0
|
4
|
-680.0333
|
1370.067
|
1386.302
|
14291.1
|
186
|
The R-Squared and Adj R-Squared of .357 and .347 indicate that the model is only capturing 34% to 36% percent of the variance. This could be an indication that we are missing some important explanatory variables. The F Statistic of 34.48 and a p-value of 0 mean we can reject the null hypothesis that our model is no better than the zero beta model. Additionally, the p-values of the coefficients are all small and close to zero, thus indicating the coefficients are statistically significant. This is also consistent with our F-Statistic. The sigma or residual standard error of 8.765 indicates that this is the average prediction error in a lifespan of 80 years that would be within about 10% - not bad.
- Forecast LifeExp when PropMD=.03 and TotExp = 14. Does this forecast seem realistic? Why or why not?
## [1] 107.6953
While this result is possible, it does not seem realistic. The model would seem to indicate we can that if we reduce spending, but increase doctors we can increase life expectancy. I suspect that there is a correlation between our two explanatory variables and this may undermining our results.