Macronutrient and diet has been hypothesized to explain why some countries live longer than others, particularly East Asia and Southern Europe. To test for this, I downloaded data from the United Nations and Wikipedia to test whether macronutrient intake or the ingestion of certain alcohols explains why some countries live longer than others.
First, a map of life expectancy by country:
HDI$alpha3 <- HDI$iso3
HDI2021 <- HDI %>% select(alpha3, le_2021, gnipc_2021, eys_2021)
world_map <- map_data("world")
world_map$alpha3 <- countrycode(world_map$region, origin = "country.name", destination = "iso3c")
Warning: Some values were not matched unambiguously: Ascension Island, Azores, Barbuda, Bonaire, Canary Islands, Chagos Archipelago, Grenadines, Heard Island, Kosovo, Madeira Islands, Micronesia, Saba, Saint Martin, Siachen Glacier, Sint Eustatius, Virgin Islands
HDI2021 <- HDI2021 %>% filter(!is.na(alpha3))
world_map_data <- left_join(world_map, HDI2021, by = c('alpha3'))
HDI2021$le_2021
[1] 61.98240 61.64340 76.46260 80.36840 78.71040 75.38990 72.04310 78.49680 84.52650 81.57970 69.36580 61.66270 81.87870 59.82100
[15] 59.26960 72.38110 71.79800 78.76050 71.59830 75.30030 72.43800 70.46970 63.63040 72.75040 77.57140 74.64240 71.81500 61.14090
[29] 53.89470 82.65650 83.98720 78.94350 78.21070 58.59830 60.33340 59.19300 63.51870 72.82960 63.41740 74.05180 77.02320 73.68290
[43] 81.20330 77.72830 80.63010 62.30490 72.81400 81.37530 72.61460 76.37670 73.67000 70.22070 66.53580 83.01000 77.14360 64.97480
[57] 82.03810 67.11400 82.49880 70.71000 65.82100 80.74220 71.69400 63.79540 58.89220 62.08300 59.65230 60.59430 80.11060 74.93620
[71] 69.23680 65.67340 85.47340 70.12290 77.58040 63.19240 74.53010 67.57030 67.23980 81.99760 73.87490 70.37790 82.67820 82.25500
[85] 82.85020 70.50030 74.25630 84.78390 69.36220 61.42700 69.97740 69.58350 67.41720 71.68220 83.69780 78.67290 68.06080 75.04720
[99] 60.74720 71.91120 71.11140 83.25750 76.39910 53.06200 73.72010 82.62870 73.57860 74.04190 85.94630 68.84590 64.48530 79.91820
[113] 70.21270 65.26880 73.84150 58.94140 83.77690 65.67160 76.34260 70.97540 59.32470 64.36360 73.55520 62.90440 74.88390 59.26900
[127] 61.57630 52.67600 73.83680 81.68730 83.23390 68.44950 63.61700 82.45130 72.54060 66.09790 76.22330 72.37690 69.26640 66.02040
[141] 65.35060 76.45720 73.28450 81.04430 70.26170 73.47270 79.27160 74.18470 69.41900 66.07220 76.93600 65.26670 67.09260 82.75450
[155] 70.34770 60.06240 70.74790 80.87860 55.28030 74.19230 54.97520 67.59120 70.27430 74.91010 80.69040 82.98330 57.06570 71.29450
[169] 72.06270 52.52540 61.61940 78.71540 71.59420 69.26440 67.73690 70.98570 72.97090 73.77190 76.03240 64.54660 66.20070 62.70480
[183] 71.62400 75.43560 77.19820 70.86160 69.62910 70.55360 73.61810 70.44900 72.76750 63.75340 62.34100 61.22340 59.25310 78.52130
[197] 74.70909 67.43832 61.31099 70.89504 75.57965 72.85653 72.09989 67.85553 60.11247 71.36546
quantile(HDI2021$le_2021)
0% 25% 50% 75% 100%
52.52540 65.71030 71.68810 76.44268 85.94630
world_map_data$color_category <- cut(world_map_data$le_2021,
breaks = c(-Inf, 63, 66, 69, 72, 75, 78, 81, Inf),
labels = c('=< 63', '63 to 66', '66 to 69', '69 to 72', '72 to 75', '75 to 78', '78 to 81', '> 81'),
right = TRUE)
p <- ggplot(data = world_map_data, aes(x = long, y = lat, group = group, fill = color_category)) +
geom_polygon(color = "black") +
scale_fill_manual(name = "Life Expectancy",
values = c("=< 63" = "darkred",
"63 to 66" = "red1",
"66 to 69" = "orange",
"69 to 72" = "#FFDD55",
"72 to 75" = "#FFEECC",
"75 to 78" = "#66CCFF",
"78 to 81" = "#4477EE",
"> 81" = "#4422AA")) +
theme_minimal() +
theme(plot.background = element_rect(fill = "white"),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.line = element_blank()) +
labs(title = "")
p
Follows the infamous “map of everything”. A version where GNI per capita is controlled for:
HDI2021$logGNI <- log(HDI2021$gnipc_2021 + 1)
HDI2021$resid_le[!is.na(HDI2021$logGNI) & !is.na(HDI2021$eys_2021)] <- correctfor(deevee='le_2021', form2 = 'rcs(logGNI, 5)', datafr=HDI2021)
Warning: number of items to replace is not a multiple of replacement length
HDI2021 <- HDI2021 %>% filter(!is.na(alpha3))
world_map_data <- left_join(world_map, HDI2021, by = c('alpha3'))
HDI2021$le_2021
[1] 61.98240 61.64340 76.46260 80.36840 78.71040 75.38990 72.04310 78.49680 84.52650 81.57970 69.36580 61.66270 81.87870 59.82100
[15] 59.26960 72.38110 71.79800 78.76050 71.59830 75.30030 72.43800 70.46970 63.63040 72.75040 77.57140 74.64240 71.81500 61.14090
[29] 53.89470 82.65650 83.98720 78.94350 78.21070 58.59830 60.33340 59.19300 63.51870 72.82960 63.41740 74.05180 77.02320 73.68290
[43] 81.20330 77.72830 80.63010 62.30490 72.81400 81.37530 72.61460 76.37670 73.67000 70.22070 66.53580 83.01000 77.14360 64.97480
[57] 82.03810 67.11400 82.49880 70.71000 65.82100 80.74220 71.69400 63.79540 58.89220 62.08300 59.65230 60.59430 80.11060 74.93620
[71] 69.23680 65.67340 85.47340 70.12290 77.58040 63.19240 74.53010 67.57030 67.23980 81.99760 73.87490 70.37790 82.67820 82.25500
[85] 82.85020 70.50030 74.25630 84.78390 69.36220 61.42700 69.97740 69.58350 67.41720 71.68220 83.69780 78.67290 68.06080 75.04720
[99] 60.74720 71.91120 71.11140 83.25750 76.39910 53.06200 73.72010 82.62870 73.57860 74.04190 85.94630 68.84590 64.48530 79.91820
[113] 70.21270 65.26880 73.84150 58.94140 83.77690 65.67160 76.34260 70.97540 59.32470 64.36360 73.55520 62.90440 74.88390 59.26900
[127] 61.57630 52.67600 73.83680 81.68730 83.23390 68.44950 63.61700 82.45130 72.54060 66.09790 76.22330 72.37690 69.26640 66.02040
[141] 65.35060 76.45720 73.28450 81.04430 70.26170 73.47270 79.27160 74.18470 69.41900 66.07220 76.93600 65.26670 67.09260 82.75450
[155] 70.34770 60.06240 70.74790 80.87860 55.28030 74.19230 54.97520 67.59120 70.27430 74.91010 80.69040 82.98330 57.06570 71.29450
[169] 72.06270 52.52540 61.61940 78.71540 71.59420 69.26440 67.73690 70.98570 72.97090 73.77190 76.03240 64.54660 66.20070 62.70480
[183] 71.62400 75.43560 77.19820 70.86160 69.62910 70.55360 73.61810 70.44900 72.76750 63.75340 62.34100 61.22340 59.25310 78.52130
[197] 74.70909 67.43832 61.31099 70.89504 75.57965 72.85653 72.09989 67.85553 60.11247 71.36546
quantile(HDI2021$resid_le, na.rm=T)
0% 25% 50% 75% 100%
-13.8651384 -2.4687010 0.2122233 3.0680857 7.5332925
world_map_data$color_category <- cut(world_map_data$resid_le,
breaks = c(-Inf, -10, -6, -2, 2, 6, Inf),
labels = c('=< -10', '-10 to -6', '-6 to -2', '-2 to 2', '2 to 6', '> 6'),
right = TRUE)
p <- ggplot(data = world_map_data, aes(x = long, y = lat, group = group, fill = color_category)) +
geom_polygon(color = "black") +
scale_fill_manual(name = "Residual Life Expectancy",
values = c("=< -10" = "darkred",
"-10 to -6" = "red1",
"-6 to -2" = "orange",
"-2 to 2" = "#FFEECC",
"2 to 6" = "#66CCFF",
"> 6" = "#4422AA")) +
theme_minimal() +
theme(plot.background = element_rect(fill = "white"),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.line = element_blank()) +
labs(title = "")
p
Much more interesting. Most Sub-Saharan African countries (barring Ethiopia and South Africa) have low life expectancies, even when taking into account economic development. The same goes for the Balkans. Southern Europe and East Asia are notable overperformers.
Here I test for the effect of the consumption of particular alcoholic beverages on residual life expectancy (aka controlled for GNI per capita). I do get a p < .01, but the p-value for the whole multiple linear regression is p = .04. Very uninspiring. No effect for overall consumption either.
alcohol$alpha3 <- countrycode(alcohol$Country, origin = "country.name", destination = "iso3c")
Warning: Some values were not matched unambiguously: Guyana, Micronesia
HDI2 <- left_join(HDI2021, alcohol, by = c('alpha3'))
HDI2$Unrecorded.consumption[HDI2$Unrecorded.consumption=='<0.1'] <- '0.05'
HDI2$Recorded.consumption[HDI2$Recorded.consumption=='<0.1'] <- '0.05'
HDI2$Unrecorded.consumption <- as.numeric(HDI2$Unrecorded.consumption)
HDI2$Recorded.consumption <- as.numeric(HDI2$Recorded.consumption)
HDI2$consumption <- as.numeric(HDI2$Recorded.consumption) + as.numeric(HDI2$Unrecorded.consumption)
HDI2$Spirits <- as.numeric(HDI2$Spirits)
Warning: NAs introduced by coercion
HDI2$Beer <- as.numeric(HDI2$Beer)
Warning: NAs introduced by coercion
HDI2$Wine <- as.numeric(HDI2$Wine)
Warning: NAs introduced by coercion
lr <- lm(data=HDI2,resid_le ~ consumption)
summary(lr)
Call:
lm(formula = resid_le ~ consumption, data = HDI2)
Residuals:
Min 1Q Median 3Q Max
-14.0285 -2.4507 0.2816 3.0221 7.2409
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.38005 0.51250 0.742 0.459
consumption -0.05159 0.07385 -0.699 0.486
Residual standard error: 4.084 on 181 degrees of freedom
(23 observations deleted due to missingness)
Multiple R-squared: 0.002689, Adjusted R-squared: -0.002821
F-statistic: 0.4879 on 1 and 181 DF, p-value: 0.4857
lr <- lm(data=HDI2,resid_le ~ Spirits + Beer + Wine)
summary(lr)
Call:
lm(formula = resid_le ~ Spirits + Beer + Wine, data = HDI2)
Residuals:
Min 1Q Median 3Q Max
-12.1407 -2.3095 0.0303 3.0045 7.7123
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.33401 1.42076 -2.347 0.02011 *
Spirits 0.03709 0.01713 2.165 0.03180 *
Beer 0.03188 0.01896 1.681 0.09466 .
Wine 0.05703 0.02106 2.708 0.00747 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.047 on 168 degrees of freedom
(34 observations deleted due to missingness)
Multiple R-squared: 0.0483, Adjusted R-squared: 0.0313
F-statistic: 2.842 on 3 and 168 DF, p-value: 0.03947
Note how unpredictive wine is of residual life expectancy. Most of the efect seems to be driven by four (semi) outliers.
GG_scatter(HDI2, 'Wine', 'resid_le', case_names='alpha3')
Same, but for food intake. First, testing for the effect of total intake:
nut$alpha3 <- nut$Code
nut$animalprot <- nut$Daily.calorie.supply.per.person.that.comes.from.animal.protein
nut$plantprot <- nut$Daily.calorie.supply.per.person.that.comes.from.vegetal.protein
nut$fat <- nut$Daily.calorie.supply.per.person.from.fat
nut$carbs <- nut$Daily.calorie.supply.per.person.from.carbohydrates
nut$totalintake <- nut$animalprot + nut$plantprot + nut$fat + nut$carbs
HDI3 <- left_join(HDI2, nut %>% filter(Year==2021), by = c('alpha3'))
lr <- lm(data=HDI3, resid_le ~ totalintake)
summary(lr)
Call:
lm(formula = resid_le ~ totalintake, data = HDI3)
Residuals:
Min 1Q Median 3Q Max
-13.6510 -2.6715 0.2402 2.8008 7.6710
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.4653456 1.8897817 -1.305 0.194
totalintake 0.0008994 0.0006375 1.411 0.160
Residual standard error: 4.055 on 178 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.01106, Adjusted R-squared: 0.005501
F-statistic: 1.99 on 1 and 178 DF, p-value: 0.1601
Nothing. Effect for specific macronutrients is also null (reference group is carbs).
lr <- lm(data=HDI3, resid_le ~ fat + plantprot + animalprot)
summary(lr)
Call:
lm(formula = resid_le ~ fat + plantprot + animalprot, data = HDI3)
Residuals:
Min 1Q Median 3Q Max
-13.4967 -2.7278 0.2213 2.8366 7.8182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.375151 1.675069 -1.418 0.158
fat 0.001456 0.001883 0.773 0.441
plantprot 0.005872 0.007653 0.767 0.444
animalprot 0.001427 0.006877 0.207 0.836
Residual standard error: 4.052 on 176 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.02365, Adjusted R-squared: 0.007008
F-statistic: 1.421 on 3 and 176 DF, p-value: 0.2382
This project is a failure. Beyond the regional patterns of the Balkans and Sub-Saharan Africans being unusually short-lived and the Southern Europeans/Eastern Asians being unusually long-lived, there doesn’t seem to be any stable variation in life expectancy. The outliers within regions (e.g. Russia) are probably due to idiosyncratic conditions; in the case of Russia this would be alcoholism.