EricGoodwin_Lab06.rmd

Lab 06 Exercises

Eric Goodwin || October 21, 2021

Exercise 1

The file gapminderData5.csv contains a subset of the GapMinder dataset for 142 countries over 12 time steps (every 5 years between 1952 and 2007).

Read the data in and use the tapply() function to estimate the mean life expectancy by country

setwd("N:/Projects/geog6000/lab06")
gap = read.csv("../datafiles/gapminderData5.csv")
mlifeexp = tapply(gap$lifeExp, gap$country, mean)
mlifeexp

##              Afghanistan                  Albania                  Algeria 
##                 37.47883                 68.43292                 59.03017 
##                   Angola                Argentina                Australia 
##                 37.88350                 69.06042                 74.66292 
##                  Austria                  Bahrain               Bangladesh 
##                 73.10325                 65.60567                 49.83408 
##                  Belgium                    Benin                  Bolivia 
##                 73.64175                 48.77992                 52.50458 
##   Bosnia and Herzegovina                 Botswana                   Brazil 
##                 67.70783                 54.59750                 62.23950 
##                 Bulgaria             Burkina Faso                  Burundi 
##                 69.74375                 44.69400                 44.81733 
##                 Cambodia                 Cameroon                   Canada 
##                 47.90275                 48.12850                 74.90275 
## Central African Republic                     Chad                    Chile 
##                 43.86692                 46.77358                 67.43092 
##                    China                 Colombia                  Comoros 
##                 61.78514                 63.89775                 52.38175 
##         Congo, Dem. Rep.              Congo, Rep.               Costa Rica 
##                 44.54375                 52.50192                 70.18142 
##            Cote d'Ivoire                  Croatia                     Cuba 
##                 48.43617                 70.05592                 71.04508 
##           Czech Republic                  Denmark                 Djibouti 
##                 71.51050                 74.37017                 46.38075 
##       Dominican Republic                  Ecuador                    Egypt 
##                 61.55450                 62.81683                 56.24300 
##              El Salvador        Equatorial Guinea                  Eritrea 
##                 59.63333                 42.96000                 45.99925 
##                 Ethiopia                  Finland                   France 
##                 44.47575                 72.99192                 74.34892 
##                    Gabon                   Gambia                  Germany 
##                 51.22050                 44.40058                 73.44442 
##                    Ghana                   Greece                Guatemala 
##                 52.34067                 73.73317                 56.72942 
##                   Guinea            Guinea-Bissau                    Haiti 
##                 43.23983                 39.21025                 50.16525 
##                 Honduras         Hong Kong, China                  Hungary 
##                 57.92083                 73.49283                 69.39317 
##                  Iceland                    India                Indonesia 
##                 76.51142                 53.16608                 54.33575 
##                     Iran                     Iraq                  Ireland 
##                 58.63658                 56.58175                 73.01725 
##                   Israel                    Italy                  Jamaica 
##                 73.64583                 74.01383                 68.74933 
##                    Japan                   Jordan                    Kenya 
##                 74.82692                 59.78642                 52.68100 
##         Korea, Dem. Rep.              Korea, Rep.                   Kuwait 
##                 63.60733                 65.00100                 68.92233 
##                  Lebanon                  Lesotho                  Liberia 
##                 65.86567                 50.00708                 42.47625 
##                    Libya               Madagascar                   Malawi 
##                 59.30417                 47.77058                 43.35158 
##                 Malaysia                     Mali               Mauritania 
##                 64.27958                 43.41350                 52.30208 
##                Mauritius                   Mexico                 Mongolia 
##                 64.95325                 65.40883                 55.89033 
##               Montenegro                  Morocco               Mozambique 
##                 70.29917                 57.60883                 40.37950 
##                  Myanmar                  Namibia                    Nepal 
##                 53.32167                 53.49133                 48.98633 
##              Netherlands              New Zealand                Nicaragua 
##                 75.64850                 73.98950                 58.34942 
##                    Niger                  Nigeria                   Norway 
##                 44.55867                 43.58133                 75.84300 
##                     Oman                 Pakistan                   Panama 
##                 58.44267                 54.88225                 67.80175 
##                 Paraguay                     Peru              Philippines 
##                 66.80908                 58.85933                 60.96725 
##                   Poland                 Portugal              Puerto Rico 
##                 70.17692                 70.41983                 72.73933 
##                  Reunion                  Romania                   Rwanda 
##                 66.64425                 68.29067                 41.48158 
##    Sao Tome and Principe             Saudi Arabia                  Senegal 
##                 57.89633                 58.67875                 50.62592 
##                   Serbia             Sierra Leone                Singapore 
##                 68.55100                 36.76917                 71.22025 
##          Slovak Republic                 Slovenia                  Somalia 
##                 70.69608                 71.60075                 40.98867 
##             South Africa                    Spain                Sri Lanka 
##                 53.99317                 74.20342                 66.52608 
##                    Sudan                Swaziland                   Sweden 
##                 48.40050                 49.00242                 76.17700 
##              Switzerland                    Syria                   Taiwan 
##                 75.56508                 61.34617                 70.33667 
##                 Tanzania                 Thailand                     Togo 
##                 47.91233                 62.20025                 51.49875 
##      Trinidad and Tobago                  Tunisia                   Turkey 
##                 66.82800                 60.72100                 59.69642 
##                   Uganda           United Kingdom            United States 
##                 47.61883                 73.92258                 73.47850 
##                  Uruguay                Venezuela                  Vietnam 
##                 70.78158                 66.58067                 57.47950 
##       West Bank and Gaza              Yemen, Rep.                   Zambia 
##                 60.32867                 46.78042                 45.99633 
##                 Zimbabwe 
##                 52.66317

Make a scatter plot of lifeExp vs. year for all countries

plot(gap$lifeExp, gap$year)

make two line plots of lifeExp vs. year for two countries (not the United States!):

ecart_hexagone = subset(gap, country == 'France')
plot(ecart_hexagone$year, ecart_hexagone$lifeExp,
     xlab = "Year", ylab = "Life Expectancy (yr)",
     type = "l")

Use these data to build a simple linear model (lm()) of lifeExp as a function of year. Before making the model, center the year variable by subtracting 1952 from all values (0 now represents the year 1952)

gap$cyear = gap$year - 1952
gap_lm1 = lm(lifeExp ~ cyear, data = gap)
summary(gap_lm1)

## 
## Call:
## lm(formula = lifeExp ~ cyear, data = gap)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.949  -9.651   1.697  10.335  22.158 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 50.51208    0.53000   95.31   <2e-16 ***
## cyear        0.32590    0.01632   19.96   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.63 on 1702 degrees of freedom
## Multiple R-squared:  0.1898, Adjusted R-squared:  0.1893 
## F-statistic: 398.6 on 1 and 1702 DF,  p-value: < 2.2e-16

Mean life expectancy in 1952, the first year reported, across all countries was 50.5 years (intercept). For each consecutive year, the linear model predicts that life expectancy will increase by about .3 years.

The p-value indicates that this relationship between life expectancy and years is highly significant.

Give the F-statistic and associated p-value

F-statistic: 398.6, p-value: < 2.2e-16

Now build a random intercept model using the lme() function, using country as the grouping variable for the random effect (the random effect will look something like random = ~ 1 | country).

library(nlme)
gap_lme1 = lme(lifeExp ~ cyear,
               random = ~1 | country,
               data = gap)
summary(gap_lme1)

## Linear mixed-effects model fit by REML
##   Data: gap 
##        AIC      BIC    logLik
##   9874.263 9896.022 -4933.132
## 
## Random effects:
##  Formula: ~1 | country
##         (Intercept) Residual
## StdDev:    11.09716 3.584196
## 
## Fixed effects:  lifeExp ~ cyear 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 50.51208 0.9454671 1561 53.42553       0
## cyear        0.32590 0.0050305 1561 64.78581       0
##  Correlation: 
##       (Intr)
## cyear -0.146
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -6.1691513 -0.4910339  0.1216455  0.6030002  2.4194109 
## 
## Number of Observations: 1704
## Number of Groups: 142

Report: The fixed effect coefficients and associated p-values (see under Fixed effects in the summary() output)

fixed.effects(gap_lme1)

## (Intercept)       cyear 
##  50.5120841   0.3259038

The fixed effects are the same as in our linear model, with the intercept saying that mean life expectancy in 1952, across all countries, is 50.5 years (grand mean intercept), and for each consecutive year, life expectancy increases by .3 years. I was given a p-value of 0 on both, which means highly significant terms

Report: The reported standard deviation for the two random effects ((Intercept) and Residual) Intercept: 11.097 Residual: 3.584 We are allowing the intercept to vary by country, and this allows us to look at the spread of how much the intercepts vary. The residual term gives us the residuals errors left over in the individual countries (unexplained), which looks to be pretty low. Variation between groups is much larger than variation within groups.

Calculate the ICC for the random intercept in this model. You will need to use VarCorr() to find the variance of the intercept and residual term (quick reminder, the intercept ICC is the variance of the intercept, divided by the sum of variances). Given the value you obtain, does this support the use of a random intercept model?

VarCorr(gap_lme1)

## country = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 123.14688 11.097156
## Residual     12.84646  3.584196

We get the intra-class correlation (ICC) by dividing the intercept by the sum of the residual and the intercept. The ICC for this model says that over 90% of the variance is explained by variance between countries, opposed to within the countries. Thus we know a mixed-effects model is appropriate.

Now build a third model with a random intercept and slope. The random effect will look something like random = ~ year2 | country. Show the output of summary() for this model.

gap_lme2 = lme(lifeExp ~ cyear, 
               random = ~ cyear|country,
               data = gap)
summary(gap_lme2)

## Linear mixed-effects model fit by REML
##   Data: gap 
##        AIC      BIC    logLik
##   8747.277 8779.915 -4367.639
## 
## Random effects:
##  Formula: ~cyear | country
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 12.3027957 (Intr)
## cyear        0.1582867 -0.434
## Residual     2.1807917       
## 
## Fixed effects:  lifeExp ~ cyear 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 50.51208 1.0371995 1561 48.70045       0
## cyear        0.32590 0.0136312 1561 23.90865       0
##  Correlation: 
##       (Intr)
## cyear -0.439
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -8.08192929 -0.34891405  0.02690066  0.39742349  4.67388085 
## 
## Number of Observations: 1704
## Number of Groups: 142

Finally, use the AIC() function to get the AIC for the three models (linear, random intercept and random intercept and slope).Which of your models is best?

AIC(gap_lm1)

## [1] 13201.73

AIC(gap_lme1)

## [1] 9874.263

AIC(gap_lme2)

## [1] 8747.277

The linear mixed effects model where we allowed both slope and intercept to be random resulted in the lowest AIC value. It looks like we should go with this model.The AIC for the linear model is significantly higher; this makes sense as the 1952 life expectancy varies quite a bit among countries.