Import libraries

library(tidyverse)
library(readr)
library(ggplot2)

Question 1

Scatterplot of average life expectancy for the country in years vs sum of personal and government expenditures.

# Get the data
who_data <- read_csv("https://raw.githubusercontent.com/Kossi-Akplaka/Data605_Computational_mathematics/main/data605/Week%2012/who.csv")

# Create a scatterplot of LifeExp against TotExp
ggplot(who_data, aes(x = TotExp, y = LifeExp)) +
  geom_point() +
  labs(x = "Total Expenditure (USD)", y = "Life Expectancy (years)",
       title = "Scatterplot of Life Expectancy vs. Total Expenditure")

Run simple linear regression

lm_model <- lm(LifeExp ~ TotExp, data = who_data)
summary(lm_model)
## 
## Call:
## lm(formula = LifeExp ~ TotExp, data = who_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.764  -4.778   3.154   7.116  13.292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## TotExp      6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537 
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14

Provide explanation of F statistics, R^2, standard error,and p-values only, and Discuss whether the assumptions of simple linear regression met.

F-statistic: Overall significance of the regression model. In this case, the F-statistic is 65.26 with a very low p-value (7.714e-14), indicating that the regression model is statistically significant.

R-squared : Measures the proportion of variance in the dependent variable (LifeExp) that is explained by the independent variable (TotExp). Here, R-squared is 0.2577, which means that approximately 25.77% of the variability in life expectancy is explained by total expenditure on healthcare.

Residual standard error: This represents the standard deviation of the residuals, which are the differences between the observed and predicted values. In this case, it’s approximately 9.371.

P-values: The p-value associated with the coefficient of TotExp is 7.71e-14, indicating that total expenditure on healthcare is significantly associated with life expectancy.

The relationship doesn’t appear linear.

Question 2

# Transform variables
who_data_transformed <- who_data %>%
  mutate(LifeExp_transformed = LifeExp^4.6,
         TotExp_transformed = TotExp^0.06)

# Plot transformed variables
ggplot(who_data_transformed, aes(x = TotExp_transformed, y = LifeExp_transformed)) +
  geom_point() +
  labs(x = "Transformed Total Expenditure", y = "Transformed Life Expectancy",
       title = "Scatterplot of Transformed Life Expectancy vs. Transformed Total Expenditure")

# Run simple linear regression with transformed variables
lm_model_transformed <- lm(LifeExp_transformed ~ TotExp_transformed, data = who_data_transformed)
summary(lm_model_transformed)
## 
## Call:
## lm(formula = LifeExp_transformed ~ TotExp_transformed, data = who_data_transformed)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -308616089  -53978977   13697187   59139231  211951764 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -736527910   46817945  -15.73   <2e-16 ***
## TotExp_transformed  620060216   27518940   22.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared:  0.7298, Adjusted R-squared:  0.7283 
## F-statistic: 507.7 on 1 and 188 DF,  p-value: < 2.2e-16

The R-squared value for the model using the transformed variables (0.7298) is higher than that of the original model (0.2577). This suggests that the model using the transformed variables explains more variability in the data.

Question 3

# Coefficients
intercept <- -736527910
coef_totexp <- 620060216

# Values of TotExp_transformed
totexp_1 <- 1.5
totexp_2 <- 2.5

# Forecast life expectancy
lifeexp_1 <- intercept + coef_totexp * totexp_1
lifeexp_2 <- intercept + coef_totexp * totexp_2

# Print forecasts
print(paste("Forecasted life expectancy when TotExp^.06 = 1.5:", round(lifeexp_1, digits = 2)))
## [1] "Forecasted life expectancy when TotExp^.06 = 1.5: 193562414"
print(paste("Forecasted life expectancy when TotExp^.06 = 2.5:", round(lifeexp_2, digits = 2)))
## [1] "Forecasted life expectancy when TotExp^.06 = 2.5: 813622630"

Question 4

Build the following multiple regression model and interpret the F Statistics, R^2, standard error, and p-values. How good is the model?

# Run multiple linear regression
multiple_lm_model <- lm(LifeExp ~ PropMD + TotExp + PropMD * TotExp, data = who_data)

# Summary of the regression model
summary(multiple_lm_model)
## 
## Call:
## lm(formula = LifeExp ~ PropMD + TotExp + PropMD * TotExp, data = who_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.320  -4.132   2.098   6.540  13.074 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.277e+01  7.956e-01  78.899  < 2e-16 ***
## PropMD         1.497e+03  2.788e+02   5.371 2.32e-07 ***
## TotExp         7.233e-05  8.982e-06   8.053 9.39e-14 ***
## PropMD:TotExp -6.026e-03  1.472e-03  -4.093 6.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared:  0.3574, Adjusted R-squared:  0.3471 
## F-statistic: 34.49 on 3 and 186 DF,  p-value: < 2.2e-16

Based on these results, the model appears to be statistically significant, as indicated by the low p-value of the F-statistic. However, the R-squared value of 0.3574 suggests that the model explains only about 35.74% of the variability in life expectancy, leaving a considerable amount of variability unexplained. Therefore, while the model is statistically significant, it may not be considered particularly good in terms of explaining life expectancy variability.

Question 5

# Coefficients
intercept <- 62.77
coef_propmd <- 1497
coef_totexp <- 0.00007233
coef_propmd_totexp <- -0.006026

# Values
propmd <- 0.03
totexp <- 14

# Forecast life expectancy
lifeexp_forecast <- intercept + coef_propmd * propmd + coef_totexp * totexp + coef_propmd_totexp * propmd * totexp

# Print forecast
print(paste("Forecasted life expectancy:", round(lifeexp_forecast, digits = 2)))
## [1] "Forecasted life expectancy: 107.68"

The forecasted life expectancy of 107.68 years seems extremely high and unrealistic. …

LS0tDQp0aXRsZTogIldlZWsgMTIiDQphdXRob3I6ICJLb3NzaSBBa3BsYWthIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQojIyBJbXBvcnQgbGlicmFyaWVzDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkocmVhZHIpDQpsaWJyYXJ5KGdncGxvdDIpDQpgYGANCg0KDQojIyBRdWVzdGlvbiAxDQoNCiMjIyBTY2F0dGVycGxvdCBvZiBhdmVyYWdlIGxpZmUgZXhwZWN0YW5jeSBmb3IgdGhlIGNvdW50cnkgaW4geWVhcnMgdnMgc3VtIG9mIHBlcnNvbmFsIGFuZCBnb3Zlcm5tZW50IGV4cGVuZGl0dXJlcy4NCg0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiMgR2V0IHRoZSBkYXRhDQp3aG9fZGF0YSA8LSByZWFkX2NzdigiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL0tvc3NpLUFrcGxha2EvRGF0YTYwNV9Db21wdXRhdGlvbmFsX21hdGhlbWF0aWNzL21haW4vZGF0YTYwNS9XZWVrJTIwMTIvd2hvLmNzdiIpDQoNCiMgQ3JlYXRlIGEgc2NhdHRlcnBsb3Qgb2YgTGlmZUV4cCBhZ2FpbnN0IFRvdEV4cA0KZ2dwbG90KHdob19kYXRhLCBhZXMoeCA9IFRvdEV4cCwgeSA9IExpZmVFeHApKSArDQogIGdlb21fcG9pbnQoKSArDQogIGxhYnMoeCA9ICJUb3RhbCBFeHBlbmRpdHVyZSAoVVNEKSIsIHkgPSAiTGlmZSBFeHBlY3RhbmN5ICh5ZWFycykiLA0KICAgICAgIHRpdGxlID0gIlNjYXR0ZXJwbG90IG9mIExpZmUgRXhwZWN0YW5jeSB2cy4gVG90YWwgRXhwZW5kaXR1cmUiKQ0KYGBgDQoNCiMjIyBSdW4gc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uDQoNCmBgYHtyfQ0KbG1fbW9kZWwgPC0gbG0oTGlmZUV4cCB+IFRvdEV4cCwgZGF0YSA9IHdob19kYXRhKQ0Kc3VtbWFyeShsbV9tb2RlbCkNCmBgYA0KIyMjIFByb3ZpZGUgZXhwbGFuYXRpb24gb2YgRiBzdGF0aXN0aWNzLCBSXjIsIHN0YW5kYXJkIGVycm9yLGFuZCBwLXZhbHVlcyBvbmx5LCBhbmQgRGlzY3VzcyB3aGV0aGVyIHRoZSBhc3N1bXB0aW9ucyBvZiBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gbWV0Lg0KDQpGLXN0YXRpc3RpYzogT3ZlcmFsbCBzaWduaWZpY2FuY2Ugb2YgdGhlIHJlZ3Jlc3Npb24gbW9kZWwuIEluIHRoaXMgY2FzZSwgdGhlIEYtc3RhdGlzdGljIGlzIDY1LjI2IHdpdGggYSB2ZXJ5IGxvdyBwLXZhbHVlICg3LjcxNGUtMTQpLCBpbmRpY2F0aW5nIHRoYXQgdGhlIHJlZ3Jlc3Npb24gbW9kZWwgaXMgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudC4NCg0KUi1zcXVhcmVkIDogTWVhc3VyZXMgdGhlIHByb3BvcnRpb24gb2YgdmFyaWFuY2UgaW4gdGhlIGRlcGVuZGVudCB2YXJpYWJsZSAoTGlmZUV4cCkgdGhhdCBpcyBleHBsYWluZWQgYnkgdGhlIGluZGVwZW5kZW50IHZhcmlhYmxlIChUb3RFeHApLiBIZXJlLCBSLXNxdWFyZWQgaXMgMC4yNTc3LCB3aGljaCBtZWFucyB0aGF0IGFwcHJveGltYXRlbHkgMjUuNzclIG9mIHRoZSB2YXJpYWJpbGl0eSBpbiBsaWZlIGV4cGVjdGFuY3kgaXMgZXhwbGFpbmVkIGJ5IHRvdGFsIGV4cGVuZGl0dXJlIG9uIGhlYWx0aGNhcmUuDQoNClJlc2lkdWFsIHN0YW5kYXJkIGVycm9yOiBUaGlzIHJlcHJlc2VudHMgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB0aGUgcmVzaWR1YWxzLCB3aGljaCBhcmUgdGhlIGRpZmZlcmVuY2VzIGJldHdlZW4gdGhlIG9ic2VydmVkIGFuZCBwcmVkaWN0ZWQgdmFsdWVzLiBJbiB0aGlzIGNhc2UsIGl0J3MgYXBwcm94aW1hdGVseSA5LjM3MS4NCg0KUC12YWx1ZXM6IFRoZSBwLXZhbHVlIGFzc29jaWF0ZWQgd2l0aCB0aGUgY29lZmZpY2llbnQgb2YgVG90RXhwIGlzIDcuNzFlLTE0LCBpbmRpY2F0aW5nIHRoYXQgdG90YWwgZXhwZW5kaXR1cmUgb24gaGVhbHRoY2FyZSBpcyBzaWduaWZpY2FudGx5IGFzc29jaWF0ZWQgd2l0aCBsaWZlIGV4cGVjdGFuY3kuDQoNClRoZSByZWxhdGlvbnNoaXAgZG9lc24ndCBhcHBlYXIgbGluZWFyLg0KDQojIyBRdWVzdGlvbiAyDQoNCmBgYHtyfQ0KIyBUcmFuc2Zvcm0gdmFyaWFibGVzDQp3aG9fZGF0YV90cmFuc2Zvcm1lZCA8LSB3aG9fZGF0YSAlPiUNCiAgbXV0YXRlKExpZmVFeHBfdHJhbnNmb3JtZWQgPSBMaWZlRXhwXjQuNiwNCiAgICAgICAgIFRvdEV4cF90cmFuc2Zvcm1lZCA9IFRvdEV4cF4wLjA2KQ0KDQojIFBsb3QgdHJhbnNmb3JtZWQgdmFyaWFibGVzDQpnZ3Bsb3Qod2hvX2RhdGFfdHJhbnNmb3JtZWQsIGFlcyh4ID0gVG90RXhwX3RyYW5zZm9ybWVkLCB5ID0gTGlmZUV4cF90cmFuc2Zvcm1lZCkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgbGFicyh4ID0gIlRyYW5zZm9ybWVkIFRvdGFsIEV4cGVuZGl0dXJlIiwgeSA9ICJUcmFuc2Zvcm1lZCBMaWZlIEV4cGVjdGFuY3kiLA0KICAgICAgIHRpdGxlID0gIlNjYXR0ZXJwbG90IG9mIFRyYW5zZm9ybWVkIExpZmUgRXhwZWN0YW5jeSB2cy4gVHJhbnNmb3JtZWQgVG90YWwgRXhwZW5kaXR1cmUiKQ0KDQpgYGANCg0KYGBge3J9DQojIFJ1biBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gd2l0aCB0cmFuc2Zvcm1lZCB2YXJpYWJsZXMNCmxtX21vZGVsX3RyYW5zZm9ybWVkIDwtIGxtKExpZmVFeHBfdHJhbnNmb3JtZWQgfiBUb3RFeHBfdHJhbnNmb3JtZWQsIGRhdGEgPSB3aG9fZGF0YV90cmFuc2Zvcm1lZCkNCnN1bW1hcnkobG1fbW9kZWxfdHJhbnNmb3JtZWQpDQpgYGANClRoZSBSLXNxdWFyZWQgdmFsdWUgZm9yIHRoZSBtb2RlbCB1c2luZyB0aGUgdHJhbnNmb3JtZWQgdmFyaWFibGVzICgwLjcyOTgpIGlzIGhpZ2hlciB0aGFuIHRoYXQgb2YgdGhlIG9yaWdpbmFsIG1vZGVsICgwLjI1NzcpLiBUaGlzIHN1Z2dlc3RzIHRoYXQgdGhlIG1vZGVsIHVzaW5nIHRoZSB0cmFuc2Zvcm1lZCB2YXJpYWJsZXMgZXhwbGFpbnMgbW9yZSB2YXJpYWJpbGl0eSBpbiB0aGUgZGF0YS4NCg0KIyMgUXVlc3Rpb24gMw0KDQpgYGB7cn0NCiMgQ29lZmZpY2llbnRzDQppbnRlcmNlcHQgPC0gLTczNjUyNzkxMA0KY29lZl90b3RleHAgPC0gNjIwMDYwMjE2DQoNCiMgVmFsdWVzIG9mIFRvdEV4cF90cmFuc2Zvcm1lZA0KdG90ZXhwXzEgPC0gMS41DQp0b3RleHBfMiA8LSAyLjUNCg0KIyBGb3JlY2FzdCBsaWZlIGV4cGVjdGFuY3kNCmxpZmVleHBfMSA8LSBpbnRlcmNlcHQgKyBjb2VmX3RvdGV4cCAqIHRvdGV4cF8xDQpsaWZlZXhwXzIgPC0gaW50ZXJjZXB0ICsgY29lZl90b3RleHAgKiB0b3RleHBfMg0KDQojIFByaW50IGZvcmVjYXN0cw0KcHJpbnQocGFzdGUoIkZvcmVjYXN0ZWQgbGlmZSBleHBlY3RhbmN5IHdoZW4gVG90RXhwXi4wNiA9IDEuNToiLCByb3VuZChsaWZlZXhwXzEsIGRpZ2l0cyA9IDIpKSkNCnByaW50KHBhc3RlKCJGb3JlY2FzdGVkIGxpZmUgZXhwZWN0YW5jeSB3aGVuIFRvdEV4cF4uMDYgPSAyLjU6Iiwgcm91bmQobGlmZWV4cF8yLCBkaWdpdHMgPSAyKSkpDQoNCmBgYA0KDQoNCiMjIFF1ZXN0aW9uIDQNCg0KQnVpbGQgdGhlIGZvbGxvd2luZyBtdWx0aXBsZSByZWdyZXNzaW9uIG1vZGVsIGFuZCBpbnRlcnByZXQgdGhlIEYgU3RhdGlzdGljcywgUl4yLCBzdGFuZGFyZCBlcnJvciwgYW5kIHAtdmFsdWVzLiBIb3cgZ29vZCBpcyB0aGUgbW9kZWw/DQoNCg0KYGBge3J9DQojIFJ1biBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbg0KbXVsdGlwbGVfbG1fbW9kZWwgPC0gbG0oTGlmZUV4cCB+IFByb3BNRCArIFRvdEV4cCArIFByb3BNRCAqIFRvdEV4cCwgZGF0YSA9IHdob19kYXRhKQ0KDQojIFN1bW1hcnkgb2YgdGhlIHJlZ3Jlc3Npb24gbW9kZWwNCnN1bW1hcnkobXVsdGlwbGVfbG1fbW9kZWwpDQpgYGANCkJhc2VkIG9uIHRoZXNlIHJlc3VsdHMsIHRoZSBtb2RlbCBhcHBlYXJzIHRvIGJlIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQsIGFzIGluZGljYXRlZCBieSB0aGUgbG93IHAtdmFsdWUgb2YgdGhlIEYtc3RhdGlzdGljLiBIb3dldmVyLCB0aGUgUi1zcXVhcmVkIHZhbHVlIG9mIDAuMzU3NCBzdWdnZXN0cyB0aGF0IHRoZSBtb2RlbCBleHBsYWlucyBvbmx5IGFib3V0IDM1Ljc0JSBvZiB0aGUgdmFyaWFiaWxpdHkgaW4gbGlmZSBleHBlY3RhbmN5LCBsZWF2aW5nIGEgY29uc2lkZXJhYmxlIGFtb3VudCBvZiB2YXJpYWJpbGl0eSB1bmV4cGxhaW5lZC4gVGhlcmVmb3JlLCB3aGlsZSB0aGUgbW9kZWwgaXMgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCwgaXQgbWF5IG5vdCBiZSBjb25zaWRlcmVkIHBhcnRpY3VsYXJseSBnb29kIGluIHRlcm1zIG9mIGV4cGxhaW5pbmcgbGlmZSBleHBlY3RhbmN5IHZhcmlhYmlsaXR5Lg0KDQojIyBRdWVzdGlvbiA1DQoNCmBgYHtyfQ0KIyBDb2VmZmljaWVudHMNCmludGVyY2VwdCA8LSA2Mi43Nw0KY29lZl9wcm9wbWQgPC0gMTQ5Nw0KY29lZl90b3RleHAgPC0gMC4wMDAwNzIzMw0KY29lZl9wcm9wbWRfdG90ZXhwIDwtIC0wLjAwNjAyNg0KDQojIFZhbHVlcw0KcHJvcG1kIDwtIDAuMDMNCnRvdGV4cCA8LSAxNA0KDQojIEZvcmVjYXN0IGxpZmUgZXhwZWN0YW5jeQ0KbGlmZWV4cF9mb3JlY2FzdCA8LSBpbnRlcmNlcHQgKyBjb2VmX3Byb3BtZCAqIHByb3BtZCArIGNvZWZfdG90ZXhwICogdG90ZXhwICsgY29lZl9wcm9wbWRfdG90ZXhwICogcHJvcG1kICogdG90ZXhwDQoNCiMgUHJpbnQgZm9yZWNhc3QNCnByaW50KHBhc3RlKCJGb3JlY2FzdGVkIGxpZmUgZXhwZWN0YW5jeToiLCByb3VuZChsaWZlZXhwX2ZvcmVjYXN0LCBkaWdpdHMgPSAyKSkpDQoNCmBgYA0KDQpUaGUgZm9yZWNhc3RlZCBsaWZlIGV4cGVjdGFuY3kgb2YgMTA3LjY4IHllYXJzIHNlZW1zIGV4dHJlbWVseSBoaWdoIGFuZCB1bnJlYWxpc3RpYy4NCi4uLg0KDQo=