library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- read.csv("~/Dropbox/Teaching/second_year_stats/titanic/life_expectancy_data.csv")



randomized_data <- data |>
 mutate(
    gdp_per_capita= gdp_capita*100,
  alcohol = alcohol+5
  )


head(randomized_data)
##   X             country pollution gdp_capita alcohol
## 1 1         Afghanistan  61.30970   560.8847    5.21
## 2 2             Albania  19.72553  4425.1019   12.17
## 3 3             Algeria  34.90462  4333.4257    5.93
## 4 4 Antigua and Barbuda  20.23048 13893.8370   10.89
## 5 5           Argentina  15.11450 12819.5533   14.55
## 6 6             Armenia  34.90932  3760.0216   10.69
##   cause_of_death_communicable dpt_im meas_im pol_im hospital_beds diabetes
## 1                   39.390981   66.0    63.5   68.5         0.500     8.90
## 2                    3.023108   99.0    97.5   99.0         2.885     6.75
## 3                   14.766231   95.0    95.0   95.0         1.900     7.60
## 4                   11.285998   98.0    96.0   95.5         3.010    10.10
## 5                   15.491782   91.5    94.0   90.0         4.810     5.80
## 6                    4.808442   94.0    97.0   96.0         4.055     6.95
##   overweight life_exp fertility_rate      pop gdp_per_capita
## 1       21.1   62.966          5.163 33892198       56088.47
## 2       55.5   77.813          1.673  2884904      442510.19
## 3       59.8   75.878          3.022 39325856      433342.57
## 4       46.3   76.349          1.998    93064     1389383.70
## 5       61.2   75.913          2.312 42900733     1281955.33
## 6       52.8   74.273          1.732  2918978      376002.16

Question 5: Describing your model. Report the results of your model. This includes significance statistics (t and p values), estimate sizes and errors (8 marks) and most importantly a narrative explanation for each significant predictor (35 Marks). This is more involved than that for question 2. You can’t just say “There is an increasing linear relationship between life expectancy and log (gdp)”. You have carried out a statisitcal analysis now, you can say by looking at the estimates “a 1000 dollar increase in per capita gdp increases average life expectancy in a country by 4 months” as a made up example. Be very careful with predictors that you log transformed, they are not trivial to work out.

toy.model <- lm(life_exp~log(gdp_capita) + 
                   alcohol , data = randomized_data )
summary(toy.model)
## 
## Call:
## lm(formula = life_exp ~ log(gdp_capita) + alcohol, data = randomized_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.8634  -1.8544   0.5227   2.6087   7.5855 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     34.88654    2.04585  17.052   <2e-16 ***
## log(gdp_capita)  4.36819    0.25934  16.843   <2e-16 ***
## alcohol         -0.05773    0.08688  -0.665    0.507    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.094 on 157 degrees of freedom
## Multiple R-squared:  0.6877, Adjusted R-squared:  0.6837 
## F-statistic: 172.8 on 2 and 157 DF,  p-value: < 2.2e-16

Reporting estimates

The magnitude of the coefficient tells you the amount of change in the outcome for a one-unit change in the predictor. For example, if the coefficient for “hours of study” is 0.5, then for every additional hour of study, the predicted test score increases by 0.5 points.

For every one extra litre of alcohol drunk per person over the age of 15, the average life expectancy of the country decreases by 0.058 years. I’m ignoring that its non-signifcant just to give you an example today.

Log predictors

https://library.virginia.edu/data/articles/interpreting-log-transformations-in-a-linear-model

“Only independent/predictor variable(s) is log-transformed. Divide the coefficient by 100. This tells us that a 1% increase in the independent variable increases (or decreases) the dependent variable by (coefficient/100) units. Example: the coefficient is 0.198. 0.198/100 = 0.00198. For every 1% increase in the independent variable, our dependent variable increases by about 0.002. For x percent increase, multiply the coefficient by log(1.x). Example: For every 10% increase in the independent variable, our dependent variable increases by about 0.198 * log(1.10) = 0.02.”

gdp_per_capita Its not as simple as “a 1 dollar increase in gdp per capita increases life expectancy by 4.36 years”. Rather 4.29/100 = 0.0436. So For every 1% increase in gdp per capita, the life expectancy of the country increases by 0.04%.