HW 12

WHO Data

library(RCurl)
## Loading required package: bitops
library(knitr)
library(ggplot2)
who <- read.csv('/Users/admin/Documents/Data 605/who.csv', header = TRUE, stringsAsFactors = FALSE)
kable(head(who))
Country LifeExp InfantSurvival Under5Survival TBFree PropMD PropRN PersExp GovtExp TotExp
Afghanistan 42 0.835 0.743 0.99769 0.0002288 0.0005723 20 92 112
Albania 71 0.985 0.983 0.99974 0.0011431 0.0046144 169 3128 3297
Algeria 71 0.967 0.962 0.99944 0.0010605 0.0020914 108 5184 5292
Andorra 82 0.997 0.996 0.99983 0.0032973 0.0035000 2589 169725 172314
Angola 41 0.846 0.740 0.99656 0.0000704 0.0011462 36 1620 1656
Antigua and Barbuda 73 0.990 0.989 0.99991 0.0001429 0.0027738 503 12543 13046

Simple Linear Regression

ggplot(data = who, aes(x = TotExp, y = LifeExp)) +
  geom_point() +
  ggtitle('Life Expectancy vs. Total Expenditure') +
  xlab('Total Expenditure') +
  ylab('Life Expectancy') +
  geom_smooth(method='lm')

who.lm <- lm(LifeExp ~ TotExp, data=who)
summary(who.lm)
## 
## Call:
## lm(formula = LifeExp ~ TotExp, data = who)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.764  -4.778   3.154   7.116  13.292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## TotExp      6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537 
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14

New Model

totexp06 = who$TotExp**0.06
lifeexp46 = who$LifeExp**4.6
ggplot(data = who, aes(x = TotExp**0.06, y = LifeExp**4.6)) +
  geom_point() +
  ggtitle('Life Expectancy vs. Total Expenditure')+
  xlab('Total Expenditure') +
  ylab('Life Expectancy') +
  geom_smooth(method='lm')  

who.lm2 <- lm(totexp06 ~ lifeexp46, data=who)
summary(who.lm2)
## 
## Call:
## lm(formula = totexp06 ~ lifeexp46, data = who)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32362 -0.08036 -0.00708  0.07949  0.39762 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.322e+00  1.845e-02   71.64   <2e-16 ***
## lifeexp46   1.177e-09  5.223e-11   22.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 188 degrees of freedom
## Multiple R-squared:  0.7298, Adjusted R-squared:  0.7283 
## F-statistic: 507.7 on 1 and 188 DF,  p-value: < 2.2e-16

This model seems to work better than the first one. It has a R-squared value of .72 compared to the first model’s .25. We can see more correlation in the second model with a higher f-statistic and lower p-value.

Life Expectancy Prediction

\[y = -736527910 + 620060216(x)\]

expect=function(x){
  y =-736527910 + 620060216*x
  return(y** (1/4.6))
}

paste("Forecasted life expectancy when TotExp^.06 = 1.5 is ", round(expect(1.5),2))
## [1] "Forecasted life expectancy when TotExp^.06 = 1.5 is  63.31"
paste("Forecasted life expectancy when TotExp^.06 = 2.5 is ", round(expect(2.5),2))
## [1] "Forecasted life expectancy when TotExp^.06 = 2.5 is  86.51"

3rd Model

who.lm3 <- lm(who$LifeExp ~ who$PropMD + who$TotExp + who$PropMD*who$TotExp)
summary(who.lm3)
## 
## Call:
## lm(formula = who$LifeExp ~ who$PropMD + who$TotExp + who$PropMD * 
##     who$TotExp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.320  -4.132   2.098   6.540  13.074 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            6.277e+01  7.956e-01  78.899  < 2e-16 ***
## who$PropMD             1.497e+03  2.788e+02   5.371 2.32e-07 ***
## who$TotExp             7.233e-05  8.982e-06   8.053 9.39e-14 ***
## who$PropMD:who$TotExp -6.026e-03  1.472e-03  -4.093 6.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared:  0.3574, Adjusted R-squared:  0.3471 
## F-statistic: 34.49 on 3 and 186 DF,  p-value: < 2.2e-16

3rd Model Forecast

\[LifeExp = b0+b1 x PropMd + b2 * TotExp +b3 * PropMD * TotExp\]

b0 = 6.277*10^1
b1 = 1.497*10^3
b2 = 7.233* (10^-5)
b3 = 6.026* (10^-3)

propmd = .03
totexp = 14
  
lifeexp2 = b0 +(b1*propmd) + (b2*totexp) + (b3 *propmd*totexp)
lifeexp2
## [1] 107.6835

This value looks like it may be an outlier.