Steve Carufel
February 16, 2020
The dataset can be grabbed here : https://data.world/makeovermonday/2020w3-is-it-time-to-treat-sugar-like-smoking
Loading some packages first.
library(httr)
library(readxl)
library(tidyverse)
library(RColorBrewer)Getting the data.
GET("https://query.data.world/s/xeuu2paegkke62iu4sc2x76dkbyder", write_disk(tf <- tempfile(fileext = ".xlsx")))df <- read_excel(tf)Taking a look at the data.
print(df)## # A tibble: 15 x 5
## `Free sugars in~ `(2008/09 - 200~ `(2010/11 - 201~ `(2012/13 - 201~
## <chr> <dbl> <dbl> <dbl>
## 1 Children 1.5-3 ~ 12.1 13 12.8
## 2 Children 4-10 y~ 14.7 15.5 14
## 3 Children 11-18 ~ 15.9 15.8 15.8
## 4 Adults 19-64 ye~ 11.8 11.7 12.1
## 5 Men 19-64 years 12 12.3 12.5
## 6 Women 19-64 yea~ 11.6 11.2 11.6
## 7 Adults 65 years~ 10.9 11.4 10.8
## 8 Men 65 years an~ 9.7 11 11.6
## 9 Women 65 years ~ 10.1 11.3 9.5
## 10 Adults 65-74 ye~ 9.9 11.2 10.5
## 11 Men 65-74 years 9.7 11 11.6
## 12 Women 65-74 yea~ 10.1 11.3 9.5
## 13 Adults 75 years~ 12.2 11.8 11.3
## 14 Men 75 years an~ 11.8 12.3 11.5
## 15 Women 75 years ~ 12.4 11.2 11.2
## # ... with 1 more variable: `(2014/15-2015/16)` <dbl>
Some duplicates we can't compare, like Men 65 years old and over and Men 65-74 years old and over. We'll just compare the most we can with no overlapping categories.
df_agegroups <- rbind(df[1:4,], df[10,], df[13,])Column names are a bit messy.
colnames(df_agegroups) <- c("Age", "2008-2010", "2010-2012", "2012-2014", "2014-2016")
print(colnames(df_agegroups))## [1] "Age" "2008-2010" "2010-2012" "2012-2014" "2014-2016"
We need to put the data into a long format, as opposed to the wide format.
df_tidy_age <- gather(df_agegroups, paired_years, proportion, -Age)
df_tidy_age$Age <- factor(df_tidy_age$Age, levels = c("Children 1.5-3 years", "Children 4-10 years", "Children 11-18 years", "Adults 19-64 years", "Adults 65-74 years", "Adults 75 years and over"))Now in my vizzes I'd like children and adults to have a group color. This way below, R looks into the first column and gives back a TRUE statement if he detects the word "Children" and a FALSE if he doesn't - meaning our Adults here.
df_tidy_age$IsItChild <- as.character(str_detect(df_tidy_age$Age, "Children"))
df_tidy_age$IsItChild <- str_replace(df_tidy_age$IsItChild, "TRUE", "Children")
df_tidy_age$IsItChild <- str_replace(df_tidy_age$IsItChild, "FALSE", "Adults")Let's make some sort of plot in which it will be easy to see the picture per age group, and compare them between them. I'll get a message telling me using transparency for discrete variables is not advised, but in this case I think it's appropriate.
In the following viz, the darker a dot, the more recency the time period has. Shown as below, I just thought this would be a quick way to show the range of proportional sugar consumption each age group have been consuming.
ggplot(df_tidy_age, aes(x = Age, y = proportion, col = IsItChild)) +
geom_point(aes(alpha = paired_years), size = 6) +
# Transparency - known as alpha in jargon
scale_alpha_discrete(range = c(0.2, 1)) +
# Labels on the X acis looked weird since they are long, so we'll rotate them a bit
theme(
text = element_text(face = "bold"),
axis.text.x = element_text(
angle = 45,
vjust = 1,
hjust = 1
),
panel.grid = element_blank(),
axis.ticks = element_blank(),
legend.title=element_blank()
) +
# Choosing our labels ourselves. By default R will put the variables names
labs(
x = "Age group",
y = "% of daily calories which come from free sugars",
title = "Children do seem to consume more free sugars than adults",
caption = "Source : National Diet and Nutrition Survey, 2008-2014 (UK)") +
coord_cartesian(ylim = c(5, 17)) # 5 % being the recommended maximum proportion## Warning: Using alpha for a discrete variable is not advised.
Cool. Now let's try something else. Like a timeline - which is pretty much our previous viz that is being rotated, sort of.
ggplot(df_tidy_age, aes(x = paired_years, y = proportion, group = Age, color = IsItChild)) +
geom_line(aes(alpha = Age), size = 4) +
scale_alpha_discrete(range = c(0.2, 1)) +
# Everything else below is just for customization - labels, titles, font, etc.
theme(
text = element_text(face = "bold"),
panel.grid = element_blank(),
axis.ticks = element_blank(),
legend.title=element_blank()
) +
# Choosing our labels ourselves. By default R will put the variables names
labs(
x = "Paired years",
y = "% of daily calories which come from free sugars",
title = "Children do seem to consume more free sugars than adults",
caption = "Source : National Diet and Nutrition Survey, 2008-2014 (UK)") +
coord_cartesian(ylim = c(5, 17)) # 5 % being the recommended maximum proportion## Warning: Using alpha for a discrete variable is not advised.
Fair enough. Now let's try a last viz type : the heatmap (I'm a big fan).
There is a little adjustment we'll do. By default, heatmaps usually works like this : the lowest value gets the lowest color possible. But a 9% sugar consumption is still about twice the recommended proportion, so 9% shouldn't not be seen as a good, low value. Our basis is 5%, so if it were in the heatmap, this would be our value with the lightest color. But it's absent from it.
mycol <- brewer.pal(9,"Greens") # From the RColorBrewer package
df_tidy_age_dev <- df_tidy_age %>%
mutate(Deviation = proportion/5) # Tells how much in term of multiplication the consumption deviates from the recommended 5%
ggplot(df_tidy_age_dev, aes(paired_years, Age)) +
geom_tile(aes(fill = Deviation)) +
geom_text(aes(label = round(Deviation, 1)), size = 8) +
scale_fill_gradientn(limits = c(1, 3.5), colours = c("white", "red")) +
# Everything else below is just for customization - labels, titles, font, etc.
theme(
text = element_text(face = "bold"),
panel.grid = element_blank(),
axis.ticks = element_blank()
) +
# Choosing our labels ourselves. By default R will put the variables names
labs(
x = "Paired years",
y = element_blank(),
title = "Free sugar consumption. Everyone is in the red zone!",
subtitle = "How many times do each group ingest more than the recommended level?",
caption = "Source : National Diet and Nutrition Survey, 2008-2014 (UK)")Here you have it, a nice and sweet heatmap.