This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
data <- read.csv(file="who.csv", head=TRUE, sep=",", stringsAsFactors = FALSE)
colnames(data)
## [1] "Country" "LifeExp" "InfantSurvival" "Under5Survival"
## [5] "TBFree" "PropMD" "PropRN" "PersExp"
## [9] "GovtExp" "TotExp"
plot(LifeExp ~ TotExp,data)
m1 = lm(LifeExp ~ TotExp, data)
abline(m1)
summary(m1)
##
## Call:
## lm(formula = LifeExp ~ TotExp, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.764 -4.778 3.154 7.116 13.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01 7.535e-01 85.933 < 2e-16 ***
## TotExp 6.297e-05 7.795e-06 8.079 7.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2537
## F-statistic: 65.26 on 1 and 188 DF, p-value: 7.714e-14
plot(m1)
F statistic test if any coefficients in mutiple regression has significance and since this is not a mutiple regression. F statistics is not very useful P Value tells the significance of the model. in this case P < 0.05 which means the model is statistically significant R-squared statistic provides an overall measure of how well the model fits the data.R-squared is 0.2577 means model can explain 25.77% of data variation. The Residual plot tells that there is a pattarn in the variation of Residuals and the QQ plot shows the Residuals is not normally distributed.Both of the plots indicates that the linear regression model doesn’t fit
data2 <- data
data2$LifeExp <- data2$LifeExp^4.6
data2$TotExp <- data2$TotExp^0.06
m2 <- lm(LifeExp ~ TotExp, data2)
plot(LifeExp ~ TotExp, data2)
abline(m2)
summary(m2)
##
## Call:
## lm(formula = LifeExp ~ TotExp, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -308616089 -53978977 13697187 59139231 211951764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -736527910 46817945 -15.73 <2e-16 ***
## TotExp 620060216 27518940 22.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared: 0.7298, Adjusted R-squared: 0.7283
## F-statistic: 507.7 on 1 and 188 DF, p-value: < 2.2e-16
plot(m2)
F statistics is 507.7 means the cofficient of TotExp is not 0 P < 0.05 which means the model is statistically significant R-squared is 0.7298 means model can explain 72.98% of data variation. Since R-squared is higher in this model and residual plot the QQ plot suggest the linear regression assumption is better fullfilled than model 1 So model 2 is better
results <- c((-736527909 + (620060216 * 1.5))^(1/4.6), (-736527909 + (620060216 * 2.5))^(1/4.6))
Estimated life expectancy is 63.3163.31 years when TotExp (0.06) is 1.5.
Estimated life expectancy is 86.5186.51 years when TotExp (0.06) is 2.5.
m3 <- lm(LifeExp ~ PropMD + TotExp + PropMD * TotExp, data )
summary(m3)
##
## Call:
## lm(formula = LifeExp ~ PropMD + TotExp + PropMD * TotExp, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.320 -4.132 2.098 6.540 13.074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.277e+01 7.956e-01 78.899 < 2e-16 ***
## PropMD 1.497e+03 2.788e+02 5.371 2.32e-07 ***
## TotExp 7.233e-05 8.982e-06 8.053 9.39e-14 ***
## PropMD:TotExp -6.026e-03 1.472e-03 -4.093 6.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared: 0.3574, Adjusted R-squared: 0.3471
## F-statistic: 34.49 on 3 and 186 DF, p-value: < 2.2e-16
plot(m3)
## Warning in sqrt(crit * p * (1 - hh)/hh): 产生了NaNs
## Warning in sqrt(crit * p * (1 - hh)/hh): 产生了NaNs
Average Life Expectancy=62.77270326+1497.49395252×PropMD+0.00007233×TotExp???0.00602569×PropMD TotExpAverage Life Expectancy=62.77270326+1497.49395252×PropMD+0.00007233×TotExp???0.00602569×PropMDXTotExp
F statistics is 34.49 means the cofficient of with P < 0.05 imeans at least one variable is a significant predcitor For each of three varibles, the P value is less than 0.05 means all three variables are significant R-squared is 0.3574 means model can explain 35.74% of data variation. however, the residual plot shows that the varition of residual is not constant which means the linear regression assumption doesn’t meet
y <- round(m3$coefficients[1], 4) + (round(m3$coefficients[2], 4) * 0.03) +
(round(m3$coefficients[3], 4) * 14) + (round(m3$coefficients[4], 4) * 14 * 0.03)
print(y)
## (Intercept)
## 107.6964
The exepect life is 107 which is unrealistic in common sense and also the higghest age in the data set is just around 80