library(WHO)
library(plyr)
library(dplyr)
library(ggplot2)
library(scales)
library(showtext)
library(scatterD3)
library(stringr)
library(RColorBrewer)
library(dygraphs)
I watched a presentataion by Zoë Harcombe and where she referenced her data analysis of World Health Organization data on cholesterol and cardiovascular disease mortality rates. She mentioned that the relationshio between cholesterol numbers and cardiovascular related deaths is the opposite of what you’d expect. I found her original post on the subject Zoë Harcombe’s cholesterol and heart disease relationship
I was pretty surprised that she did not publish how she came to this conculsion. In other words, the results were not reproducable. I wanted to reproduce this result and share all of the data and code so that other’s can critique my methods and results. Here is my repo with all code and data acquisition code so others can investigate my work Github.
You can find all of the source code at Github
World Health Organization (WHO) Global Health Observatory (GHO)
The data takes quite a while to download from the WHO (3-4 minutes on a fast internet connection). You will find a stand-alone R script in the Github repo that will save the data to local files to speed up ad-hoc analysis.
CHO_Age_Stand <- get_data("CHOL_03")
CHO_Crude <- get_data("CHOL_04")
#CVD_Cerebrovascular_DALY <- get_data("SA_0000001689")
CVD_Cerebrovascular <- get_data("SA_0000001690")
#CVD_Ischaemic_DALY <- get_data("SA_0000001425")
CVD_Ischaemic <- get_data("SA_0000001444")
The Cholesterol (CHO) values are stored in the as a character string “5.0 [4.8-5.3]” - the following code strips out everything between and including the [ ] leaving us with a numeric value for each entry. I also convert from mmol/L to mg/dl by multiplying by 38.67, since I’m in the US and we have to be different.
CHO_Age_Stand$value <- gsub(" *\\[.*?\\] *", "", CHO_Age_Stand$value)
CHO_Age_Stand$value <- as.numeric(CHO_Age_Stand$value)
## Warning: NAs introduced by coercion
CHO_Age_Stand$value <- CHO_Age_Stand$value * 38.67
CHO_Crude$value <- gsub( " *\\[.*?\\] *", "", CHO_Crude$value)
CHO_Crude$value <- as.numeric(CHO_Crude$value)
## Warning: NAs introduced by coercion
CHO_Crude$value <- CHO_Crude$value * 38.67
Combine Cerebrovascular (strokes) and Ischaemic (heart attack) death rates to get a picture of all Cardiovascualr death rates.
CVD_ALL <- merge(CVD_Cerebrovascular, CVD_Ischaemic, by=c("year", "region", "country", "sex", "publishstate"))
CVD_ALL$totalCVDvalue = CVD_ALL$value.x + CVD_ALL$value.y
colnames(CVD_ALL)[7] <- "cerrbrovascularValue"
colnames(CVD_ALL)[9] <- "ischaemicValue"
CVD_ALL$gho.y <- NULL
CVD_ALL$gho.x <- NULL
Merge cholesterol data with all cardiovascular death data
db <- merge(CVD_ALL, CHO_Crude, by=c("year", "region", "country", "sex", "publishstate"))
colnames(db)[11] <- "choCrudeValue"
db$gho <- NULL
choDeathData <- merge(db, CHO_Age_Stand, by=c("year", "region", "country", "sex", "publishstate", "agegroup"))
colnames(choDeathData)[12] <- "choAgeStandValue"
choDeathData$gho <- NULL
choDeathData$publishstate <- NULL
choDeathData$agegroup <- NULL
choDeathData$sex <- as.factor((choDeathData$sex))
Must decide on one or the other since not all values are present for all data points.
Mean Age Standardized: 182.9794091
Mean Crude: 182.0314213
Median Age Standardized: 181.749
Median Crude: 181.749
IQR Age Standardized: 27.069
IQR Crude: 30.936
Number of complete samples Crude: 356
Number of complete samples Age Standardized: 374
I have decided to use Age Standardized value since there are ~20 more complete cases.
#choDeathData$choCrudeValue <- NULL
choDeathData <- choDeathData[complete.cases(choDeathData[,c(7,8)]),]
This chart is interactive. You can zoom in and out on the data points. You can also hover over the Legend on the right to highlight only the data that corresponds to the symbol or color you are hovering over. Hovering over a datapoint gives your more information. Finally you can lasso (SHIFT-Click and Drag) a set of points to single them out.
It seems from the data that there is no correlation between high cholesterol and cadiovascular disease deaths. As a matter of fact the opposite seems to be true; countries with higher cholesterol levels have lower CVD death rates than countries with lower cholesterol levels.