Instructions: The file gapminderData5.csv contains a subset of the GapMinder dataset for 142 countries over 12 time steps (every 5 years between 1952 and 2007).
gap <- read.csv("gapminderData5.csv")
m_le <- tapply(gap$lifeExp, gap$country, mean)
head(m_le)
## Afghanistan Albania Algeria Angola Argentina Australia
## 37.47883 68.43292 59.03017 37.88350 69.06042 74.66292
plt_lifeExp_year <- ggplot(gap, aes(year, lifeExp, color=continent)) +
geom_point(alpha = 0.3, cex = 2) +
labs(y = "Life Expectancy (y)", x = "Year") +
ggtitle("Life Expectancy for All Countries by Year") +
theme_bw() +
theme(legend.title = element_blank()) +
scale_colour_brewer(palette = "Set1")
plt_lifeExp_year
gap_subset <- gap[which(gap$country == "Montenegro" | gap$country == "Morocco"),]
plt_line <- ggplot(gap_subset, aes(x = year, y = lifeExp,
colour = country)) +
geom_line(size = 0.75) +
theme_bw() +
labs(x = "Year", y = "Life Expectancy (y)") +
ggtitle("Life Expectancy Comparison Between \nMontenegro and Morocco") +
scale_colour_brewer(palette = "Dark2")
plt_line
gap$year_c_s <- (gap$year - 1952) / 10
summary(life_lm <- lm(lifeExp ~ year_c_s,
data = gap))
##
## Call:
## lm(formula = lifeExp ~ year_c_s, data = gap)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.949 -9.651 1.697 10.335 22.158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.5121 0.5300 95.31 <2e-16 ***
## year_c_s 3.2590 0.1632 19.96 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.63 on 1702 degrees of freedom
## Multiple R-squared: 0.1898, Adjusted R-squared: 0.1893
## F-statistic: 398.6 on 1 and 1702 DF, p-value: < 2.2e-16
Answer: The intercept is 50.5121 (p<2e-16), meaning that in the year 1952 (t = 0), the expected value for life expectancy globally was 50.5 years or so. The coefficient estimate for year is 3.259 (p < 2e-16) — every decade after 1952, the life expectancy increased by about 3 years and 3 months. Our F-statistic is 398.6 (p < 2.2e-16).
summary(life_lme <- lme(lifeExp ~ year_c_s,
random = ~ 1 | country,
data = gap))
## Linear mixed-effects model fit by REML
## Data: gap
## AIC BIC logLik
## 9869.658 9891.417 -4930.829
##
## Random effects:
## Formula: ~1 | country
## (Intercept) Residual
## StdDev: 11.09716 3.584196
##
## Fixed effects: lifeExp ~ year_c_s
## Value Std.Error DF t-value p-value
## (Intercept) 50.51208 0.9454671 1561 53.42553 0
## year_c_s 3.25904 0.0503048 1561 64.78581 0
## Correlation:
## (Intr)
## year_c_s -0.146
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.1691513 -0.4910339 0.1216455 0.6030002 2.4194109
##
## Number of Observations: 1704
## Number of Groups: 142
Answer: The fixed effect coefficients are: Intercept: 50.512 (p = 0), Year: 3.259 (p = 0). The standard deviations are 11.097 (Intercept) and 3.584 (Residual).
icc <- VarCorr(life_lme)
icc
## country = pdLogChol(1)
## Variance StdDev
## (Intercept) 123.14688 11.097156
## Residual 12.84646 3.584196
123.14688 + 12.84646 #Total Variance in the Model: 135.993
## [1] 135.9933
123.14688 / 135.993 # Proportion of Variation Due to Slope (ICC): 0.9055
## [1] 0.9055384
12.84646 + 3.584196 # Residual Variation: 16.43
## [1] 16.43066
Answer: With an ICC of 90.55%, there is support for using the random intercept model.
summary(life_rand <- lme(lifeExp ~ year_c_s,
random = ~ year_c_s | country,
data = gap))
## Linear mixed-effects model fit by REML
## Data: gap
## AIC BIC logLik
## 8742.672 8775.31 -4365.336
##
## Random effects:
## Formula: ~year_c_s | country
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 12.302796 (Intr)
## year_c_s 1.582867 -0.434
## Residual 2.180792
##
## Fixed effects: lifeExp ~ year_c_s
## Value Std.Error DF t-value p-value
## (Intercept) 50.51208 1.0371995 1561 48.70045 0
## year_c_s 3.25904 0.1363121 1561 23.90865 0
## Correlation:
## (Intr)
## year_c_s -0.439
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -8.08192929 -0.34891405 0.02690066 0.39742349 4.67388085
##
## Number of Observations: 1704
## Number of Groups: 142
Answer:
AIC(life_lm, life_lme, life_rand)
## Warning in AIC.default(life_lm, life_lme, life_rand): models are not all fitted
## to the same number of observations
## df AIC
## life_lm 3 13201.732
## life_lme 4 9869.658
## life_rand 6 8742.672
Answer: With an AIC of 8742.672, the model with the best fit is the model with a random intercept and slope. This makes intuitive sense, since there was a lot of variation in life expectancy between countries in 1952 — we would expect the intercept to vary by country.
While global average life expectancy increased, we would also expect countries to have different trends — for instance, the famine in Somalia caused life expectancy to drop around 1990, but at the same time the India saw a relatively steep increase in life expectancy. Zimbabwe had, on average, a decreased life expectancy of -3.97 years per decade — the lowest slope of the datasets. Oman had the highest increase in life expectancy overall, with a slope of 3.26.
x <- data.frame(life_rand[["coefficients"]][["random"]][["country"]])
max(x$year_c_s)
## [1] 4.236716
min(x$year_c_s)
## [1] -3.972096
max(life_lme[["coefficients"]][["fixed"]][["year_c_s"]])
## [1] 3.259038
min(life_lme[["coefficients"]][["fixed"]][["year_c_s"]])
## [1] 3.259038
gap_subset_2 <- gap[which(gap$country == "India" | gap$country == "Somalia"),]
gap_subset_3 <- gap[which(gap$country == "Zimbabwe" | gap$country == "Oman"),]
ggplot(gap_subset_2, aes(x = year, y = lifeExp,
colour = country)) +
geom_line(size = 0.75) +
theme_bw() +
labs(x = "Year", y = "Life Expectancy (y)") +
ggtitle("Highs and Lows in 1992") +
scale_colour_brewer(palette = "Dark2") +
geom_vline(xintercept = c(1992, 1992), linetype = "dotted")
ggplot(gap_subset_3, aes(x = year, y = lifeExp,
colour = country)) +
geom_line(size = 0.75) +
theme_bw() +
labs(x = "Year", y = "Life Expectancy (y)") +
ggtitle("Lowest Slope vs. Highest Slope") +
scale_colour_brewer(palette = "Dark2") +
geom_vline(xintercept = c(1992, 1992), linetype = "dotted")