Load packages

library(WHO)
library(plyr)
library(dplyr)
library(ggplot2)
library(scales)
library(showtext)
library(scatterD3)
library(stringr)
library(RColorBrewer)
library(dygraphs)

Background and Inspiration

I watched a presentataion by Zoë Harcombe and where she referenced her data analysis of World Health Organization data on cholesterol and cardiovascular disease mortality rates. She mentioned that the relationshio between cholesterol numbers and cardiovascular related deaths is the opposite of what you’d expect. I found her original post on the subject Zoë Harcombe’s cholesterol and heart disease relationship

I was pretty surprised that she did not publish how she came to this conculsion. In other words, the results were not reproducable. I wanted to reproduce this result and share all of the data and code so that other’s can critique my methods and results. Here is my repo with all code and data acquisition code so others can investigate my work Github.

Data Acquisition

The data takes quite a while to download from the WHO (3-4 minutes on a fast internet connection). You will find a stand-alone R script in the Github repo that will save the data to local files to speed up ad-hoc analysis.

Use the WHO API to get the data we want to analyze.

CHO_Age_Stand <- get_data("CHOL_03")
CHO_Crude <- get_data("CHOL_04")
#CVD_Cerebrovascular_DALY <- get_data("SA_0000001689")
CVD_Cerebrovascular <- get_data("SA_0000001690")
#CVD_Ischaemic_DALY <- get_data("SA_0000001425")
CVD_Ischaemic <- get_data("SA_0000001444")

Tidying up the data

The Cholesterol (CHO) values are stored in the as a character string “5.0 [4.8-5.3]” - the following code strips out everything between and including the [ ] leaving us with a numeric value for each entry. I also convert from mmol/L to mg/dl by multiplying by 38.67, since I’m in the US and we have to be different.

CHO_Age_Stand$value <- gsub(" *\\[.*?\\] *", "", CHO_Age_Stand$value)
CHO_Age_Stand$value <- as.numeric(CHO_Age_Stand$value)