The file gapminderData5.csv contains a subset of the GapMinder dataset for 142 countries over 12 time steps (every 5 years between 1952 and 2007).
Read the data in and use the tapply() function to estimate the mean life expectancy by country
setwd("N:/Projects/geog6000/lab06")
gap = read.csv("../datafiles/gapminderData5.csv")
mlifeexp = tapply(gap$lifeExp, gap$country, mean)
mlifeexp
## Afghanistan Albania Algeria
## 37.47883 68.43292 59.03017
## Angola Argentina Australia
## 37.88350 69.06042 74.66292
## Austria Bahrain Bangladesh
## 73.10325 65.60567 49.83408
## Belgium Benin Bolivia
## 73.64175 48.77992 52.50458
## Bosnia and Herzegovina Botswana Brazil
## 67.70783 54.59750 62.23950
## Bulgaria Burkina Faso Burundi
## 69.74375 44.69400 44.81733
## Cambodia Cameroon Canada
## 47.90275 48.12850 74.90275
## Central African Republic Chad Chile
## 43.86692 46.77358 67.43092
## China Colombia Comoros
## 61.78514 63.89775 52.38175
## Congo, Dem. Rep. Congo, Rep. Costa Rica
## 44.54375 52.50192 70.18142
## Cote d'Ivoire Croatia Cuba
## 48.43617 70.05592 71.04508
## Czech Republic Denmark Djibouti
## 71.51050 74.37017 46.38075
## Dominican Republic Ecuador Egypt
## 61.55450 62.81683 56.24300
## El Salvador Equatorial Guinea Eritrea
## 59.63333 42.96000 45.99925
## Ethiopia Finland France
## 44.47575 72.99192 74.34892
## Gabon Gambia Germany
## 51.22050 44.40058 73.44442
## Ghana Greece Guatemala
## 52.34067 73.73317 56.72942
## Guinea Guinea-Bissau Haiti
## 43.23983 39.21025 50.16525
## Honduras Hong Kong, China Hungary
## 57.92083 73.49283 69.39317
## Iceland India Indonesia
## 76.51142 53.16608 54.33575
## Iran Iraq Ireland
## 58.63658 56.58175 73.01725
## Israel Italy Jamaica
## 73.64583 74.01383 68.74933
## Japan Jordan Kenya
## 74.82692 59.78642 52.68100
## Korea, Dem. Rep. Korea, Rep. Kuwait
## 63.60733 65.00100 68.92233
## Lebanon Lesotho Liberia
## 65.86567 50.00708 42.47625
## Libya Madagascar Malawi
## 59.30417 47.77058 43.35158
## Malaysia Mali Mauritania
## 64.27958 43.41350 52.30208
## Mauritius Mexico Mongolia
## 64.95325 65.40883 55.89033
## Montenegro Morocco Mozambique
## 70.29917 57.60883 40.37950
## Myanmar Namibia Nepal
## 53.32167 53.49133 48.98633
## Netherlands New Zealand Nicaragua
## 75.64850 73.98950 58.34942
## Niger Nigeria Norway
## 44.55867 43.58133 75.84300
## Oman Pakistan Panama
## 58.44267 54.88225 67.80175
## Paraguay Peru Philippines
## 66.80908 58.85933 60.96725
## Poland Portugal Puerto Rico
## 70.17692 70.41983 72.73933
## Reunion Romania Rwanda
## 66.64425 68.29067 41.48158
## Sao Tome and Principe Saudi Arabia Senegal
## 57.89633 58.67875 50.62592
## Serbia Sierra Leone Singapore
## 68.55100 36.76917 71.22025
## Slovak Republic Slovenia Somalia
## 70.69608 71.60075 40.98867
## South Africa Spain Sri Lanka
## 53.99317 74.20342 66.52608
## Sudan Swaziland Sweden
## 48.40050 49.00242 76.17700
## Switzerland Syria Taiwan
## 75.56508 61.34617 70.33667
## Tanzania Thailand Togo
## 47.91233 62.20025 51.49875
## Trinidad and Tobago Tunisia Turkey
## 66.82800 60.72100 59.69642
## Uganda United Kingdom United States
## 47.61883 73.92258 73.47850
## Uruguay Venezuela Vietnam
## 70.78158 66.58067 57.47950
## West Bank and Gaza Yemen, Rep. Zambia
## 60.32867 46.78042 45.99633
## Zimbabwe
## 52.66317
Make a scatter plot of lifeExp vs. year for all countries
plot(gap$lifeExp, gap$year)
make two line plots of lifeExp vs. year for two countries (not the United States!):
ecart_hexagone = subset(gap, country == 'France')
plot(ecart_hexagone$year, ecart_hexagone$lifeExp,
xlab = "Year", ylab = "Life Expectancy (yr)",
type = "l")
Use these data to build a simple linear model (lm()) of lifeExp as a function of year. Before making the model, center the year variable by subtracting 1952 from all values (0 now represents the year 1952)
gap$cyear = gap$year - 1952
gap_lm1 = lm(lifeExp ~ cyear, data = gap)
summary(gap_lm1)
##
## Call:
## lm(formula = lifeExp ~ cyear, data = gap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.949 -9.651 1.697 10.335 22.158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.51208 0.53000 95.31 <2e-16 ***
## cyear 0.32590 0.01632 19.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.63 on 1702 degrees of freedom
## Multiple R-squared: 0.1898, Adjusted R-squared: 0.1893
## F-statistic: 398.6 on 1 and 1702 DF, p-value: < 2.2e-16
Mean life expectancy in 1952, the first year reported, across all countries was 50.5 years (intercept). For each consecutive year, the linear model predicts that life expectancy will increase by about .3 years.
The p-value indicates that this relationship between life expectancy and years is highly significant.
Give the F-statistic and associated p-value
F-statistic: 398.6, p-value: < 2.2e-16
Now build a random intercept model using the lme() function, using country as the grouping variable for the random effect (the random effect will look something like random = ~ 1 | country).
library(nlme)
gap_lme1 = lme(lifeExp ~ cyear,
random = ~1 | country,
data = gap)
summary(gap_lme1)
## Linear mixed-effects model fit by REML
## Data: gap
## AIC BIC logLik
## 9874.263 9896.022 -4933.132
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 11.09716 3.584196
##
## Fixed effects: lifeExp ~ cyear
## Value Std.Error DF t-value p-value
## (Intercept) 50.51208 0.9454671 1561 53.42553 0
## cyear 0.32590 0.0050305 1561 64.78581 0
## Correlation:
## (Intr)
## cyear -0.146
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.1691513 -0.4910339 0.1216455 0.6030002 2.4194109
##
## Number of Observations: 1704
## Number of Groups: 142
Report: The fixed effect coefficients and associated p-values (see under Fixed effects in the summary() output)
fixed.effects(gap_lme1)
## (Intercept) cyear
## 50.5120841 0.3259038
The fixed effects are the same as in our linear model, with the intercept saying that mean life expectancy in 1952, across all countries, is 50.5 years (grand mean intercept), and for each consecutive year, life expectancy increases by .3 years. I was given a p-value of 0 on both, which means highly significant terms
Report: The reported standard deviation for the two random effects ((Intercept) and Residual) Intercept: 11.097 Residual: 3.584 We are allowing the intercept to vary by country, and this allows us to look at the spread of how much the intercepts vary. The residual term gives us the residuals errors left over in the individual countries (unexplained), which looks to be pretty low. Variation between groups is much larger than variation within groups.
Calculate the ICC for the random intercept in this model. You will need to use VarCorr() to find the variance of the intercept and residual term (quick reminder, the intercept ICC is the variance of the intercept, divided by the sum of variances). Given the value you obtain, does this support the use of a random intercept model?
VarCorr(gap_lme1)
## country = pdLogChol(1)
## Variance StdDev
## (Intercept) 123.14688 11.097156
## Residual 12.84646 3.584196
We get the intra-class correlation (ICC) by dividing the intercept by the sum of the residual and the intercept. The ICC for this model says that over 90% of the variance is explained by variance between countries, opposed to within the countries. Thus we know a mixed-effects model is appropriate.
Now build a third model with a random intercept and slope. The random effect will look something like random = ~ year2 | country. Show the output of summary() for this model.
gap_lme2 = lme(lifeExp ~ cyear,
random = ~ cyear|country,
data = gap)
summary(gap_lme2)
## Linear mixed-effects model fit by REML
## Data: gap
## AIC BIC logLik
## 8747.277 8779.915 -4367.639
##
## Random effects:
## Formula: ~cyear | country
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 12.3027957 (Intr)
## cyear 0.1582867 -0.434
## Residual 2.1807917
##
## Fixed effects: lifeExp ~ cyear
## Value Std.Error DF t-value p-value
## (Intercept) 50.51208 1.0371995 1561 48.70045 0
## cyear 0.32590 0.0136312 1561 23.90865 0
## Correlation:
## (Intr)
## cyear -0.439
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -8.08192929 -0.34891405 0.02690066 0.39742349 4.67388085
##
## Number of Observations: 1704
## Number of Groups: 142
Finally, use the AIC() function to get the AIC for the three models (linear, random intercept and random intercept and slope).Which of your models is best?
AIC(gap_lm1)
## [1] 13201.73
AIC(gap_lme1)
## [1] 9874.263
AIC(gap_lme2)
## [1] 8747.277
The linear mixed effects model where we allowed both slope and intercept to be random resulted in the lowest AIC value. It looks like we should go with this model.The AIC for the linear model is significantly higher; this makes sense as the 1952 life expectancy varies quite a bit among countries.