# Load the data
who_data <- read.csv("who.csv")
# 1. Scatterplot and Simple Linear Regression
plot(who_data$TotExp, who_data$LifeExp, xlab = "Total Expenditures", ylab = "Life Expectancy")
lm_fit <- lm(LifeExp ~ TotExp, data = who_data)
summary(lm_fit)
##
## Call:
## lm(formula = LifeExp ~ TotExp, data = who_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.764 -4.778 3.154 7.116 13.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01 7.535e-01 85.933 < 2e-16 ***
## TotExp 6.297e-05 7.795e-06 8.079 7.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2537
## F-statistic: 65.26 on 1 and 188 DF, p-value: 7.714e-14
The simple linear regression model between LifeExp and TotExp is statistically significant as indicated by the F-statistic of 65.26 and a very low p-value of 7.714e-14. This means that there is a significant relationship between LifeExp and TotExp. The R-squared value of 0.2577 indicates that 25.77% of the variation in LifeExp can be explained by TotExp. The intercept term (b0) has an estimated value of 6.475e+01, which represents the average value of LifeExp when TotExp is equal to zero. The coefficient of TotExp (b1) is 6.297e-05, which means that for every one-unit increase in TotExp, the LifeExp increases by 6.297e-05. The standard error of the estimate (residual standard error) is 9.371, which means that the average distance between the observed and predicted values of LifeExp is 9.371 years. The assumptions of simple linear regression, including linearity, normality, independence, and equal variance, should be checked before making any conclusions about the model.
# 2. Transform Variables and Re-run Regression
who_data$LifeExp_trans <- who_data$LifeExp^4.6
who_data$TotExp_trans <- who_data$TotExp^0.06
plot(who_data$TotExp_trans, who_data$LifeExp_trans, xlab = "Total Expenditures (transformed)", ylab = "Life Expectancy (transformed)")
lm_fit_trans <- lm(LifeExp_trans ~ TotExp_trans, data = who_data)
summary(lm_fit_trans)
##
## Call:
## lm(formula = LifeExp_trans ~ TotExp_trans, data = who_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -308616089 -53978977 13697187 59139231 211951764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -736527910 46817945 -15.73 <2e-16 ***
## TotExp_trans 620060216 27518940 22.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared: 0.7298, Adjusted R-squared: 0.7283
## F-statistic: 507.7 on 1 and 188 DF, p-value: < 2.2e-16
The output is from a simple linear regression model where LifeExp^4.6 is regressed on TotExp^0.06 after raising LifeExp to the power of 4.6 and TotExp to the power of 0.06. The intercept is -736,527,910 and the slope is 620,060,216, indicating that when TotExp^0.06 is zero, the expected value of LifeExp^4.6 is -736,527,910 and on average, for each one-unit increase in TotExp^0.06, the expected value of LifeExp^4.6 increases by 620,060,216. The standard error for the slope coefficient is 27,518,940, indicating that the estimate of the slope is quite precise. The p-value associated with the slope coefficient is less than 0.001, which suggests that there is strong evidence that the slope is significantly different from zero. The R-squared is 0.7298, which means that 72.98% of the variation in LifeExp^4.6 can be explained by the linear relationship with TotExp^0.06. The F-statistic is 507.7 with a p-value less than 0.001, indicating that the regression model is significant. Overall, this model is a better fit than the simple linear regression model with LifeExp and TotExp as the original variables.
# 3. Forecast Life Expectancy
new_data <- data.frame(TotExp_trans = c(1.5, 2.5))
predicted_lifeexp <- predict(lm_fit_trans, newdata = new_data, inverse = TRUE)
predicted_lifeexp
## 1 2
## 193562414 813622630
# 4. Multiple Linear Regression
lm_mult <- lm(LifeExp ~ PropMD + TotExp + PropMD*TotExp, data = who_data)
summary(lm_mult)
##
## Call:
## lm(formula = LifeExp ~ PropMD + TotExp + PropMD * TotExp, data = who_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.320 -4.132 2.098 6.540 13.074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.277e+01 7.956e-01 78.899 < 2e-16 ***
## PropMD 1.497e+03 2.788e+02 5.371 2.32e-07 ***
## TotExp 7.233e-05 8.982e-06 8.053 9.39e-14 ***
## PropMD:TotExp -6.026e-03 1.472e-03 -4.093 6.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared: 0.3574, Adjusted R-squared: 0.3471
## F-statistic: 34.49 on 3 and 186 DF, p-value: < 2.2e-16
The coefficients table provides the estimated coefficients for the model’s intercept and each explanatory variable. The coefficient estimates indicate that the intercept is 62.77, PropMD has a positive effect on LifeExp (beta = 1497, p < 0.001), and TotExp has a positive effect on LifeExp (beta = 0.00007233, p < 0.001). The interaction between PropMD and TotExp is also significant (beta = -0.006026, p = 0.0000635), indicating that the effect of TotExp on LifeExp depends on the value of PropMD.
The Residuals table shows the minimum, 1st quartile, median, 3rd quartile, and maximum of the model residuals.
The model’s R-squared value of 0.3574 indicates that the model explains 35.74% of the variation in LifeExp. The adjusted R-squared value of 0.3471 suggests that adding PropMD, TotExp, and their interaction to the model did not result in overfitting. The F-statistic of 34.49 with 3 and 186 degrees of freedom and a p-value of < 2.2e-16 indicates that the model is statistically significant.
Overall, the model suggests that both PropMD and TotExp are significant predictors of LifeExp, and their interaction should be considered when interpreting the effect of TotExp on LifeExp.
# 5. Forecast Life Expectancy
new_data2 <- data.frame(PropMD = 0.03, TotExp = 14)
predicted_lifeexp2 <- predict(lm_mult, newdata = new_data2)
predicted_lifeexp2
## 1
## 107.696
When PropMD is 0.03 and TotExp is 14, the predicted life expectancy is 107.696