library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv("~/Dropbox/Teaching/second_year_stats/titanic/life_expectancy_data.csv")
randomized_data <- data |>
mutate(
gdp_per_capita= gdp_capita*100,
alcohol = alcohol+5
)
head(randomized_data)
## X country pollution gdp_capita alcohol
## 1 1 Afghanistan 61.30970 560.8847 5.21
## 2 2 Albania 19.72553 4425.1019 12.17
## 3 3 Algeria 34.90462 4333.4257 5.93
## 4 4 Antigua and Barbuda 20.23048 13893.8370 10.89
## 5 5 Argentina 15.11450 12819.5533 14.55
## 6 6 Armenia 34.90932 3760.0216 10.69
## cause_of_death_communicable dpt_im meas_im pol_im hospital_beds diabetes
## 1 39.390981 66.0 63.5 68.5 0.500 8.90
## 2 3.023108 99.0 97.5 99.0 2.885 6.75
## 3 14.766231 95.0 95.0 95.0 1.900 7.60
## 4 11.285998 98.0 96.0 95.5 3.010 10.10
## 5 15.491782 91.5 94.0 90.0 4.810 5.80
## 6 4.808442 94.0 97.0 96.0 4.055 6.95
## overweight life_exp fertility_rate pop gdp_per_capita
## 1 21.1 62.966 5.163 33892198 56088.47
## 2 55.5 77.813 1.673 2884904 442510.19
## 3 59.8 75.878 3.022 39325856 433342.57
## 4 46.3 76.349 1.998 93064 1389383.70
## 5 61.2 75.913 2.312 42900733 1281955.33
## 6 52.8 74.273 1.732 2918978 376002.16
toy.model <- lm(life_exp~log(gdp_capita) +
alcohol , data = randomized_data )
summary(toy.model)
##
## Call:
## lm(formula = life_exp ~ log(gdp_capita) + alcohol, data = randomized_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.8634 -1.8544 0.5227 2.6087 7.5855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.88654 2.04585 17.052 <2e-16 ***
## log(gdp_capita) 4.36819 0.25934 16.843 <2e-16 ***
## alcohol -0.05773 0.08688 -0.665 0.507
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.094 on 157 degrees of freedom
## Multiple R-squared: 0.6877, Adjusted R-squared: 0.6837
## F-statistic: 172.8 on 2 and 157 DF, p-value: < 2.2e-16
Reporting estimates
The magnitude of the coefficient tells you the amount of change in the outcome for a one-unit change in the predictor. For example, if the coefficient for “hours of study” is 0.5, then for every additional hour of study, the predicted test score increases by 0.5 points.
For every one extra litre of alcohol drunk per person over the age of 15, the average life expectancy of the country decreases by 0.058 years. I’m ignoring that its non-signifcant just to give you an example today.
Log predictors
https://library.virginia.edu/data/articles/interpreting-log-transformations-in-a-linear-model
“Only independent/predictor variable(s) is log-transformed. Divide the coefficient by 100. This tells us that a 1% increase in the independent variable increases (or decreases) the dependent variable by (coefficient/100) units. Example: the coefficient is 0.198. 0.198/100 = 0.00198. For every 1% increase in the independent variable, our dependent variable increases by about 0.002. For x percent increase, multiply the coefficient by log(1.x). Example: For every 10% increase in the independent variable, our dependent variable increases by about 0.198 * log(1.10) = 0.02.”
gdp_per_capita Its not as simple as “a 1 dollar increase in gdp per capita increases life expectancy by 4.36 years”. Rather 4.29/100 = 0.0436. So For every 1% increase in gdp per capita, the life expectancy of the country increases by 0.04%.