HW12

Who DataSet

The attached who.csv dataset contains real-world data from 2008. The variables included follow.

Simple Regression

  1. Provide a scatterplot of LifeExp~TotExp, and run simple linear regression. Do not transform the variables. Provide and interpret the F statistics, R^2, standard error,and p-values only. Discuss whether the assumptions of simple linear regression met.
  • The F statistics of is not useful here as we are using a one factor model and it is only useful for multiple regression. R^2 this percentage of how well the model describes measured data. In this case, at .2537 it is low. Standard error is the measure of total variation. Its seems pretty high in comparison to the 1st and 3rd Qrt. Quartes should be 1 1/2 times STD Error. P-values gives probability that the corresponding coefficient is not revelant. P-values in this case is very low
who <- read.csv('C:\\Users\\apagan\\Documents\\CUNYSPS\\IS605FundamentalsofcomputationalMath\\who.csv')
who.lm <- lm(who$LifeExp ~ who$TotExp)
summary(who.lm)
## 
## Call:
## lm(formula = who$LifeExp ~ who$TotExp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.764  -4.778   3.154   7.116  13.292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## who$TotExp  6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537 
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14
  • In the plots below you can see the residuals are not normally distributed
plot(who$TotExp, who$LifeExp)
abline(who.lm)

plot(fitted(who.lm), resid(who.lm))

qqnorm(resid(who.lm))
qqline(resid(who.lm))

Exponential

  1. Raise life expectancy to the 4.6 power (i.e., LifeExp^4.6). Raise total expenditures to the 0.06 power (nearly a log transform, TotExp^.06). Plot LifeExp^4.6 as a function of TotExp^.06, and r re-run the simple regression model using the transformed variables. Provide and interpret the F statistics, R^2, standard error, and p-values. Which model is “better?”
  • The F statistics of does is not useful here as we are using a one factor model and it is only useful for multiple regression. R^2 incrased alomst doubled to .552 it is low. Standard error is still high in comparison to the 1st and 3rd Qrt. Quarters should be 1 1/2 times STD Error. P-values gives probability that the corresponding coefficient is not revelant. P-values is again very low.
who.lm <- lm(log(who$LifeExp^4.6) ~ log(who$TotExp^.06))
summary(who.lm)
## 
## Call:
## lm(formula = log(who$LifeExp^4.6) ~ log(who$TotExp^0.06))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9342 -0.1985  0.1223  0.3110  1.0861 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           17.1248     0.1483  115.44   <2e-16 ***
## log(who$TotExp^0.06)   4.2539     0.2795   15.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5468 on 188 degrees of freedom
## Multiple R-squared:  0.552,  Adjusted R-squared:  0.5496 
## F-statistic: 231.6 on 1 and 188 DF,  p-value: < 2.2e-16
  • QQ plots still is not normally distributed in the beginning of plot, but regress to a normal distribution when greater than -1
qqnorm(resid(who.lm))
qqline(resid(who.lm))

Forecast 1

  1. Using the results from 3, forecast life expectancy when TotExp^.06 =1.5. Then forecast life expectancy when TotExp^.06=2.5.
print(who.lm)
## 
## Call:
## lm(formula = log(who$LifeExp^4.6) ~ log(who$TotExp^0.06))
## 
## Coefficients:
##          (Intercept)  log(who$TotExp^0.06)  
##               17.125                 4.254
who.1<-round(who.lm$coefficients[1] + (who.lm$coefficients[2]*1.5),2)
who.2<-round(who.lm$coefficients[1] + (who.lm$coefficients[2]*2.5),2)
  • Equations are Y= ax + b. The forecast for TotExp^.06 =1.5 is 23.51. The forecast for TotExp^.06 =2.5 is 27.76

Multiple Regression

  1. Build the following multiple regression model and interpret the F Statistics, R^2, standard error, and p-values. How good is the model? LifeExp = b0+b1 x PropMd + b2 x TotExp +b3 x PropMD x TotExp
  • The F statistics is large and indicates that the linear model is more compatible with the data than a constant average model. R^2 this percentage of how well the model describes measured data. In this case, at .2921 it is low. Standard error is the measure of total variation. Its seems lightly high in comparison to the 1st and 3rd Qrt. Quarters should be 1 1/2 times STD Error. P-values gives probability that the corresponding coefficient is not revelant. P-values in this case is lower than .05 so these values are revelant. The QQ plot shows these linear model is not normally distributed
who.lm <- lm(who$LifeExp ~ who$PropMD+who$TotExp+who$PropMD+who$TotExp)
summary(who.lm)
## 
## Call:
## lm(formula = who$LifeExp ~ who$PropMD + who$TotExp + who$PropMD + 
##     who$TotExp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.996  -4.880   3.042   6.958  13.415 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.397e+01  7.706e-01  83.012  < 2e-16 ***
## who$PropMD  6.508e+02  1.946e+02   3.344 0.000998 ***
## who$TotExp  5.378e-05  8.074e-06   6.661 2.95e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.127 on 187 degrees of freedom
## Multiple R-squared:  0.2996, Adjusted R-squared:  0.2921 
## F-statistic: 39.99 on 2 and 187 DF,  p-value: 3.479e-15
qqnorm(resid(who.lm))
qqline(resid(who.lm))

Forecast 2

  1. Forecast LifeExp when PropMD=.03 and TotExp = 14. Does this forecast seem realistic? Why or why not?
who.3<-round(who.lm$coefficients[1] + (who.lm$coefficients[2]*.03) + (who.lm$coefficients[3]*14),2)
  • Equation is Y= ax + b1* .03 + b2* 14. The forecast for PropMD=.03 and TotExp = 14 is 83.49.