## country iso2c iso3c year status lastupdated life_expectancy
## 1 Afghanistan AF AFG 2019 2025-10-07 62.941
## 2 Albania AL ALB 2019 2025-10-07 79.467
## 3 Algeria DZ DZA 2019 2025-10-07 75.682
## 4 American Samoa AS ASM 2019 2025-10-07 72.751
## 5 Andorra AD AND 2019 2025-10-07 84.098
## 6 Angola AO AGO 2019 2025-10-07 63.051
## gdp_per_capita region capital longitude latitude
## 1 557.8615 South Asia Kabul 69.1761 34.5228
## 2 4563.4674 Europe & Central Asia Tirane 19.8172 41.3317
## 3 4672.6641 Middle East & North Africa Algiers 3.05097 36.7397
## 4 12524.0160 East Asia & Pacific Pago Pago -170.691 -14.2846
## 5 39346.2750 Europe & Central Asia Andorra la Vella 1.5218 42.5075
## 6 2664.4385 Sub-Saharan Africa Luanda 13.242 -8.81155
## income lending
## 1 Low income IDA
## 2 Upper middle income IBRD
## 3 Upper middle income IBRD
## 4 High income Not classified
## 5 High income Not classified
## 6 Lower middle income IBRD
# Fit the linear model
model_fit <- lm(life_expectancy ~ gdp_per_capita, data = clean_data)
# Show results in a clean table
kable(tidy(model_fit), caption = "Regression Results")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 69.5704919 | 0.5386633 | 129.15394 | 0 |
| gdp_per_capita | 0.0001967 | 0.0000178 | 11.03726 | 0 |
# Show model fit statistics (R-squared, etc.)
kable(glance(model_fit), caption = "Model Fit Statistics")
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.3750405 | 0.3719619 | 6.278971 | 121.8211 | 0 | 1 | -666.5047 | 1339.009 | 1348.979 | 8003.372 | 203 | 205 |
# We use 'tidy' from broom to get data for plotting
coef_data <- tidy(model_fit, conf.int = TRUE) %>%
filter(term != "(Intercept)") # Remove intercept to focus on the variable
ggplot(coef_data, aes(x = term, y = estimate)) +
geom_point(size = 3, color = "darkred") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) +
labs(title = "Coefficient Estimate for GDP per Capita",
y = "Estimate (Impact on Life Expectancy)",
x = "Variable") +
theme_minimal()
# Add residuals to the original data
data_with_resid <- clean_data %>%
mutate(residuals = residuals(model_fit))
ggplot(data_with_resid, aes(x = residuals)) +
geom_histogram(binwidth = 2, fill = "orange", color = "white") +
labs(title = "Histogram of Model Residuals",
x = "Residual Value",
y = "Count") +
theme_minimal()
par(mfrow = c(2, 2)) # Arrange plots in a 2x2 grid
plot(model_fit)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
Comment on Linearity: