1 Introduction

This DataViz aims to provide an overview of the variation in healthy life expectancy (HALE), which quantifies the number of years of life expected to be lived in good health, across 195 countries and territories, in 2017. It also assesses the major disease and injury causes of premature deaths and disability, measured using disability-adjusted life years (DALYs) - a complementary measure of HALE, in Singapore in 2017. DALY is a composite measure of disease burden capturing both premature mortality (i.e. years of life lost due to premature mortality, YLLs) and prevalence and severity of ill health (i.e. years of life lost due to disability, YLDs) as illustrated in the image below. Image Credit: https://d-rev.org/2014/04/dalys-cost-effectiveness-analysis-brilliance/

One DALY is essentially equal to one year of healthy life lost. DALYs were developed in 1990 by Harvard University for the World Bank. In 1996, The World Health Organization (WHO) adopted DALYs as their standard metric for quantifying the impact of a disease or health condition across a population in terms of the deaths (“mortality”) and disabilities (“morbidity”) caused.

1.1 Datasets

Data used is based on study findings from the Global Burden of Disease, Injuries, and Risk Factors Study (GBD) 2017, the third annual update in the series, which provides a comparative assessment of health loss across 359 specific diseases and injuries and 73 age and sex groups for 195 countries and territories.

Results for life expectancy (LE) at birth and HALE at birth by countries/territories as well as DALYs, YLLs and YLDs for a total of 169 disease and injury groupings, were queried and downloaded in separate csv files via the GBD Results Tool available on http://ghdx.healthdata.org/gbd-results-tool.

2 Major Data and Design Challenges

  • The measures required have to be obtained separately via the GBD results query tool which are then downloaded as different csv files. These different data files on HALE and LE would need to be merged together for further processing for visualisation purposes.

  • The country code used in the GBD study is different from that used in the rnaturalearth package. Thus name of countries would be used for matching between the GBD study and rnaturalearth datasets in order to obtain the respective geometries of these countries in world mapping. However, there are inconsistencies in the naming of the countries between these 2 datasets. For example, Taiwan from the rnaturalearth vs. Taiwan (Province of China) used in the GBD dataset.

  • Given the large number of countries/territories (195), it could be challenging to visualise and compare HALE and related measures across the different countries effectively. Similar issue also lies for visualisation the large number of disease and injury categories available in the dataset.

  • Data have to be restructured into the forms useful for our intent. E.g. the dataset containing the measures for DALYs, YLLs and YLDs would have to the restructured into the wide format; data for YLL and YLDs for the different disease and injury categories are likely to be skewed and transformation of these data using logarithm scale would be more appropriate for charting purposes.

3 Suggestions to Overcome these Challenges

  • Appropriate data wrangling using the dplyr and tidyr functions in the tidyverse library would be performed to transform the data.

  • Recoding of the country name in the GBD dataset would be performed to match that used in the rnaturalearth dataset.

  • Visualisation of the variation in HALE across the numerous countries would be explored using a map (static and interactive) displaying the geographical location of these countries and using color on the polygon shape of these countries to reflect the different value range of HALE. For the visualisation of major causes of disease burden (DALYs, YLLs and YLDs) in Singapore, an interactive bubble chart would be explored.

  • There are a total of 359 specific diseases and injuries used in the GBD study, and these are in turn classified into 169 disease and injury categories and then into 3 broad disease categories. The 169 disease and injury categories would be used instead of the detailed specific diseases and injuries. These disease and injury categories are then manually mapped into 3 broad disease categories: Communicable, maternal, neonatal, and nutritional diseases; Non-communicable diseases; Injuries for visualisation purposes.

3.1 Sketch of the Proposed Design

One would first use a choropleth map (static) to visualise HALE at birth across the different countries followed by exploring interactive faceted choropleth maps which look at both HALE at birth and proportion of life expectancy at birth spent in poor health in sync, side by side. An interactive horizontal bar chart showing the top 10 and bottom 10 countries with the highest and lowest HALE at birth at at glance would be useful. Lastly, the major causes of DALYs in Singapore would be visualised via an interactive bubble plot.

4 DataViz Step-by-step Guide

4.1 Installing and Launching R Packages

The following required packages would be installed and loaded:

  • sf: Provides simple features as records in a data.frame or tibble with a geometry list-column in R.

  • tmap: Draws thematic maps.

  • rnaturalearth: Provides a map of countries of the entire world.

  • tidyverse: Includes ggplot2 for data visualisation; dplyr for data manipulation; tidyr for data tidying, readr for data import, purrr for functional programming; tibble for re-imagining of data frames; stringr for strings; and forcats for factors.

  • plotly: Provides interactive web graphics.

packages = c('sf','tmap','rnaturalearth','tidyverse','plotly')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

4.2 Data Wrangling

4.2.1 Preparation of data for HALE visualisation

Natural Earth vector data from the library rnaturalearth is loaded to obtain the list of countries and their respective geometries in world mapping. The function ne_countries is used to pull country data with the scale and return class specified. There are a total 241 countries/administrative areas in the dataset with 64 variables.

rm(list=ls())
world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
## [1] "sf"         "data.frame"
#Inspect the list of variables and structure of the dataset
str(world)
## Classes 'sf' and 'data.frame':   241 obs. of  64 variables:
##  $ scalerank : int  3 1 1 1 1 3 3 1 1 1 ...
##  $ featurecla: chr  "Admin-0 country" "Admin-0 country" "Admin-0 country" "Admin-0 country" ...
##  $ labelrank : num  5 3 3 6 6 6 6 4 2 6 ...
##  $ sovereignt: chr  "Netherlands" "Afghanistan" "Angola" "United Kingdom" ...
##  $ sov_a3    : chr  "NL1" "AFG" "AGO" "GB1" ...
##  $ adm0_dif  : num  1 0 0 1 0 1 0 0 0 0 ...
##  $ level     : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ type      : chr  "Country" "Sovereign country" "Sovereign country" "Dependency" ...
##  $ admin     : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ adm0_a3   : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ geou_dif  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ geounit   : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ gu_a3     : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ su_dif    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ subunit   : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ su_a3     : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ brk_diff  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ name      : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ name_long : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ brk_a3    : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ brk_name  : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ brk_group : chr  NA NA NA NA ...
##  $ abbrev    : chr  "Aruba" "Afg." "Ang." "Ang." ...
##  $ postal    : chr  "AW" "AF" "AO" "AI" ...
##  $ formal_en : chr  "Aruba" "Islamic State of Afghanistan" "People's Republic of Angola" NA ...
##  $ formal_fr : chr  NA NA NA NA ...
##  $ note_adm0 : chr  "Neth." NA NA "U.K." ...
##  $ note_brk  : chr  NA NA NA NA ...
##  $ name_sort : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ name_alt  : chr  NA NA NA NA ...
##  $ mapcolor7 : num  4 5 3 6 1 4 1 2 3 3 ...
##  $ mapcolor8 : num  2 6 2 6 4 1 4 1 1 1 ...
##  $ mapcolor9 : num  2 8 6 6 1 4 1 3 3 2 ...
##  $ mapcolor13: num  9 7 1 3 6 6 8 3 13 10 ...
##  $ pop_est   : num  103065 28400000 12799293 14436 3639453 ...
##  $ gdp_md_est: num  2258 22270 110300 109 21810 ...
##  $ pop_year  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ lastcensus: num  2010 1979 1970 NA 2001 ...
##  $ gdp_year  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ economy   : chr  "6. Developing region" "7. Least developed region" "7. Least developed region" "6. Developing region" ...
##  $ income_grp: chr  "2. High income: nonOECD" "5. Low income" "3. Upper middle income" "3. Upper middle income" ...
##  $ wikipedia : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ fips_10   : chr  NA NA NA NA ...
##  $ iso_a2    : chr  "AW" "AF" "AO" "AI" ...
##  $ iso_a3    : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ iso_n3    : chr  "533" "004" "024" "660" ...
##  $ un_a3     : chr  "533" "004" "024" "660" ...
##  $ wb_a2     : chr  "AW" "AF" "AO" NA ...
##  $ wb_a3     : chr  "ABW" "AFG" "AGO" NA ...
##  $ woe_id    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ adm0_a3_is: chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ adm0_a3_us: chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ adm0_a3_un: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ adm0_a3_wb: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ continent : chr  "North America" "Asia" "Africa" "North America" ...
##  $ region_un : chr  "Americas" "Asia" "Africa" "Americas" ...
##  $ subregion : chr  "Caribbean" "Southern Asia" "Middle Africa" "Caribbean" ...
##  $ region_wb : chr  "Latin America & Caribbean" "South Asia" "Sub-Saharan Africa" "Latin America & Caribbean" ...
##  $ name_len  : num  5 11 6 8 7 5 7 20 9 7 ...
##  $ long_len  : num  5 11 6 8 7 13 7 20 9 7 ...
##  $ abbrev_len: num  5 4 4 4 4 5 4 6 4 4 ...
##  $ tiny      : num  4 NA NA NA NA 5 5 NA NA NA ...
##  $ homepart  : num  NA 1 1 NA 1 NA 1 1 1 1 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 241; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:10, 1:2] -69.9 -69.9 -69.9 -70 -70.1 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:63] "scalerank" "featurecla" "labelrank" "sovereignt" ...
#Check the list of countries available in the dataset
world <- world %>% 
  mutate(across(where(is_character),as_factor))
#levels(world$admin)

Only variables on the country names and the continent the country is in, together with the countries’ respective geometry information would be retained in the dataset.

world <- world %>% 
  select(`admin`,`continent`)

Next, data on HALE and LE at birth by countries, year, age and sex in 2 separate csv files are imported using read_csv. Only variables of interest are selected and data is further filtered to retain only results for year 2017 and for both sexes (i.e. males and females combined). These 2 dataframes obtaining HALE and LE measures separately are then joined together by location_id, location_name, sex_name, age_name and yearusing the left_join() function as joined1.

Function recode_factor() is used to recode location_name to match the exact spelling of countries used in the world dataframe. Recoded variable is assigned to the a newly created variable named admin using the mutate function. In addition, 2 other measures are created: years_poor_health_2017 which refers to the number of years lived in poor health, and is obtained by substracting HALE at birth from LE at birth; and percent_poorhealth_2017 which is the proportion of LE at birth lived in poor health.

hale_cty <- read_csv("data/IHME-GBD_2017_DATA_HALEatbirth_CTY.csv")
le_cty <- read_csv("data/IHME-GBD_2017_DATA_LEatbirth_CTY.csv")

hale_cty_2017 <- hale_cty %>% 
  select(location_id, location_name, sex_name, age_name, year, hale) %>% 
  filter(sex_name == "Both") %>% 
  filter(year==2017)

le_cty_2017 <- le_cty %>% 
  select(location_id, location_name, sex_name, age_name, year, le) %>% 
  filter(sex_name == "Both") %>% 
  filter(year==2017)

joined1 <- left_join(hale_cty_2017, le_cty_2017, 
                   by = c("location_id", "location_name", "sex_name", "age_name", "year"))

joined1 <- joined1 %>% 
  mutate(across(where(is_character),as_factor)) %>% 
  mutate(admin = recode_factor(location_name,
                               `Guinea-Bissau` = "Guinea Bissau",
                               `Congo` = "Republic of Congo",
                               `Serbia` =   "Republic of Serbia",
                               `Russian Federation` = "Russia",
                               `Taiwan (Province of China)` = "Taiwan",
                               `United States` = "United States of America",
                               `Cote d'Ivoire` = "Ivory Coast",
                               `Tanzania`   = "United Republic of Tanzania",
                               `The Gambia` = "Gambia",
                               `Timor-Leste`= "East Timor",
                               `Virgin Islands, U.S.` = "United States Virgin Islands"
                               )) %>% 
  mutate(years_poorhealth_2017 = le - hale) %>% 
  mutate(percent_poorhealth_2017 = years_poorhealth_2017/le*100)

str(joined1)
## tibble [195 x 10] (S3: tbl_df/tbl/data.frame)
##  $ location_id            : num [1:195] 154 155 156 157 160 161 162 163 164 165 ...
##  $ location_name          : Factor w/ 195 levels "Tunisia","Turkey",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ sex_name               : Factor w/ 1 level "Both": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age_name               : Factor w/ 1 level "<1 year": 1 1 1 1 1 1 1 1 1 1 ...
##  $ year                   : num [1:195] 2017 2017 2017 2017 2017 ...
##  $ hale                   : num [1:195] 67.3 67.9 63.1 56.7 53 ...
##  $ le                     : num [1:195] 78.3 78.9 73.3 68.1 63.4 ...
##  $ admin                  : Factor w/ 195 levels "Guinea Bissau",..: 12 13 14 15 16 17 18 19 20 21 ...
##  $ years_poorhealth_2017  : num [1:195] 11 11.1 10.2 11.3 10.3 ...
##  $ percent_poorhealth_2017: num [1:195] 14.1 14 13.9 16.7 16.3 ...

The data is then further merged with the world dataframe that contains the geometry list-column information of the countries using merge function. Countries with missing HALE value are removed and the dataset sorted by HALE in desccending order.

merge_df_cty <- merge(world, joined1, all=TRUE) %>% 
  filter(!is.na(hale)) %>% 
  arrange(desc(hale))

#str(merge_df_cty)

4.2.2 Preparation of data for DALY visualisation for Singapore

In this data preprocessing step, data for the measures: DALYs, YLLs and YLDs by disease and injuries categories for Singapore for the year 2017 (all sexes and ages combined) are imported into the dataframe named daly_2017. These 3 measures are currently imported as separate rows in the dataframe, and thus are restructured as column variables via spread().

Further filtering was done to exclude DALYs value less than or equal 100 life years (small values). Log(YLD)s and log(YLLs) are created to handle skewness in the data for charting purposes. YLDs and YLLs with values 0 or NA were recoded as 1 to avoid instances of log(0).

daly <- read_csv("data/IHME-GBD_2017_DATA_DALY_SpecificDis.csv")
cause_mapping <- read_csv("data/specificdisease_broadcause_mapping.csv")

daly_2017 <- daly %>% 
  mutate(across(where(is_character),as_factor)) %>% 
  filter(measure_name %in% c("DALYs (Disability-Adjusted Life Years)","YLDs (Years Lived with Disability)","YLLs (Years of Life Lost)")) %>% 
  filter(age_name == "All Ages") %>% 
  filter(sex_name == "Both") %>% 
  filter(metric_name == "Number") %>% 
  filter(location_name == "Singapore") %>% 
  filter(year == 2017) %>% 
  select(measure_name, cause_name, cause_id, val) %>% 
  spread(measure_name, val) %>% 
  rename("DALYs" = "DALYs (Disability-Adjusted Life Years)",
         "YLDs" = "YLDs (Years Lived with Disability)",
         "YLLs" = "YLLs (Years of Life Lost)") %>%
  filter(DALYs > 100) %>% 
  mutate(YLDs = recode(YLDs, `0` = 1)) %>% 
  mutate(YLLs = recode(YLLs, `0` = 1)) %>% 
  mutate_if(is.numeric, funs(replace_na(., 1))) %>% 
  mutate(logDALY = log(DALYs)) %>% 
  mutate(logYLD = log(YLDs)) %>% 
  mutate(logYLL = log(YLLs))

daly_2017 <- left_join(daly_2017, cause_mapping, 
                   by = c("cause_id","cause_name"))

4.3 Choropleth map showing the geographical distribution of HALE at birth by countries/territories

To draw a choropleth map showing the geographical distribution of HALE at birth in 2017 by countries/terriorities, tm_shape() is used to define the input data (i.e. merge_df_cty) and the target variable hale is assigned as dependency to tm_fill(). The colour palette RdYlGn and cont style is used to present a large number of colors over the continuous variable hale.

To add more insights from the visualisation, annotations were added on the plot to identify the top 10 countries with the highest HALE at birth by applying another tm_shape() layer to extract these 10 countries and using tm_text(). One limitation however is that one is not able to add on additional text to reflect the HALE value for these top 10 countries.

The map revealed that there are large variations across countries in healthy life expectancies at birth in 2017, ranging from 45 years till approximately 75 years. Healthy life expectancies at birth in countries located in Central Africa are comparatively lower than that in other countries. The top 10 countries with the highest HALE at birth are: Bermuda, Iceland, France, Israel, Italy, Japan, Singapore, South Korea, Spain and Switzerland (not in any order of HALE).

tmap_mode("plot")

fig1 <- tm_shape(merge_df_cty) +
  tm_fill("hale",
          style = "cont",
          title = "Years of healthy life",
          palette = "RdYlGn",
          border.col = "gray30",
          legend.is.portrait = FALSE) +
  tm_layout(legend.height = 0.5, 
            legend.width = 0.25,
            legend.title.size = 0.8,
            legend.outside = FALSE,
            legend.position = c("left","bottom"),
            frame = FALSE,
            #inner.margins = c(0.01, 0.01, 0.01, 0.01),
            main.title = "Healthy Life Expectancy* at birth, both sexes, 2017", 
            main.title.size = 1, main.title.position="center") +
  tm_credits(paste0("*Healthy life expectancy is the number of years that a person at a given age can expect to live in full health,\n",
  "taking into account mortality and disability. Top 10 countries with the highest HALE in 2017 are annotated."),
             position = c("right", "bottom")) +
  tm_borders(alpha = 0.2) + 
  tm_shape(merge_df_cty %>% slice_max(order_by = hale, n = 10)) + 
  tm_text("location_name",size = 0.55, col = "gray30", auto.placement = TRUE)

fig1

4.4 Faceted interactive maps of HALE and Proportion of LE spent in poor health in sync

A unique feature of tmap is its ability to create static and interactive maps using the same code. Maps can be viewed interactively by simply switching to view mode, using the command tmap_mode("view").

The tmap’s view mode also works with faceted plots. The argument sync in tm_facets() is used in this case to produce 2 maps on HALE at birth and proportion of LE at birth spent in poor health respectively with synchronized zoom and pan settings. The quantilestyle map is used here with five (the default n) categories. In addition, additional information on the country/territory are also included using the function popup.vars, which would be revealed when one clicks on any polygon in the map.

tmap_mode("view")

fig2 <- tm_shape(merge_df_cty) +
  tm_fill(c("hale","percent_poorhealth_2017"),
          style = c("quantile","quantile"),
          palette = list("RdYlGn","Reds"),
          title = c("Healthy Life Expectancy 2017 (years)","% of Life Expectancy in Poor Health 2017"),
          popup.vars = c("HALE at birth 2017 (years)" = "hale",
                         "Life Expectancy (LE) at birth 2017 (years)" = "le",
                         "Years spent in poor health, (LE-HALE)" = "years_poorhealth_2017",
                         "% LE spent in poor health, (LE-HALE)/LE" = "percent_poorhealth_2017")) +
  tm_borders(alpha = 0.5) +
  tm_facets(sync = TRUE, ncol = 2)

fig2

One can switch tmap back to plotting mode with the following code.

tmap_mode("plot")

4.5 Interactive bar chart on HALE by countries/territories (top 10 highest and lowest)

Next, a horizontal bar plot but with interactivity using plot_ly function is explored but only on a subset of the countries with the highest and lowest HALE at birth. Colour is added to distinguish the continent these countries are in. A tooltip has been added to to reveal specific details of these countries: life expectancy at birth, years lived in poor health and proportion of life expectancy at birth lived in poor health when one hovers over the bar. This supplements the choropleth static map above with additional information for a selected subset of countries.

From this chart, in a glance, one can easily distinguish which are the 10 countries with the highest and lowest HALE at birth respectively in 2017, their relative ranking, HALE values and other specific info using the additional interactivity included. Colour is added to the bar to reflect the continent that the country is in.

Singapore has the highest HALE at birth, at 74.2 years in 2017 out of 195 countries/territories, followed by Japan (73.1 years) and Spain (72.1 years). For Singapore, about 10.6 years on average would be spent in poor health, which accounted for 12.5% of its life expectancy at birth in 2017.

#Identify the top 10 countries with highest HALE at birth in 2017
maxhale <- merge_df_cty %>% 
  arrange(desc(hale)) %>% 
  slice_max(order_by = hale, n = 10)

#Identify the bottom 10 countries with lowest HALE at birth in 2017
minhale <- merge_df_cty %>% 
  arrange(desc(hale)) %>% 
  slice_min(order_by = hale, n = 10)

#Combining the 2 datasets above via row bind
minmaxhale <- rbind(maxhale,minhale) %>% 
  select(continent, location_name, hale, le, years_poorhealth_2017, percent_poorhealth_2017)

data <- data.frame(minmaxhale$continent, minmaxhale$location_name, minmaxhale$hale, minmaxhale$le, minmaxhale$years_poorhealth_2017, minmaxhale$percent_poorhealth_2017)
#str(data)
minmaxhale$location_name <- as.factor(minmaxhale$location_name) %>% 
  fct_reorder(minmaxhale$hale)

colors <- c('#4AC6B7', '#1972A4', '#965F8A', '#FF7070', '#C61951')

# reusable function for creating annotation object
label <- function(txt,x,y) {
  list(
    text = txt, x = x, y = y,
    ax = 0, ay = 0,
    xref = "paper", yref = "paper", 
    align = "center",
    font = list(family = "serif", size = 15, color = "white"),
    bgcolor = "#b3b3b3", bordercolor = "black", borderwidth = 2
  )
}

fig3 <- plot_ly(
  data,
  x = ~minmaxhale$hale, y = ~minmaxhale$location_name, color = ~minmaxhale$continent,
  type = 'bar', orientation = 'h',
  colors = ~colors, opacity = 0.6,
  text = ~paste('HALE at birth 2017 (years): ', round(`minmaxhale.hale`,1),
                '<br>Life Expectancy (LE) at birth 2017 (years): ',round(`minmaxhale.le`,1),
                '<br>Years spent in poor health, (LE-HALE): ', round(`minmaxhale.years_poorhealth_2017`,1),
                '<br>% LE spent in poor health, (LE-HALE)/LE: ', round(`minmaxhale.percent_poorhealth_2017`,1))
) %>% 
  
 layout(
    annotations = label("Bottom 10 Countries from here downwards",0.96,0.47),
    #legend=list(orientation="h"),
    title="Ten countries with the highest and lowest HALE at birth respectively, both sexes, 2017 ",
        xaxis = list(title = 'Healthy Life Expectancy(HALE), in years',
                     range = c(0,80),
                     gridcolor = 'rgb(243, 243, 243)',
                     ticklen = 5,
                     gridwidth = 1),
        yaxis = list(title ='',
                     gridcolor = 'rgb(243, 243, 243)',
                     ticklen = 5,
                     gridwith = 1)
 )

fig3

4.6 Interactive bubble chart on the major disease/injury causes of YLLs, YLDs and DALYs for Singapore in 2017

Lastly, the major disease/injury causes that contributes to the life years lost in Singapore population would be examined and visualised. The code below creates an interactive bubble plot with years lost due to premature mortality (YLLs) in logarithm scale on the y-axis vs the years lost due to disability (YLDs) in logarithm scale on the x-axis by disease/injury categories as defined in the GBD study. The total disability-adjusted life years lost (DALYs) for each disease/injury categories is represented by the size of the bubble and three broad disease category is represented by the colour.

Similar to the interactive charts/maps above, a tooltip has been added to to reveal specific details on rates of DALYs, YLLs and YLDs (per 100,000 population in Singapore). The population is based on the population estimate for Singapore from the same GBD 2017 study.

From the interactive bubble charts, it is revealed that Low back pain, headache disorders and depressive disorders are the top 3 health problems causing the most disability in Singapore population (i.e. highest log YLD values); while ischaemic heart disease, lower respiratory tract infections and lung cancer are the top 3 causes of premature mortality (highest log YLL values). Combining both premature deaths and disabilty, the top 3 causes are ischaemic heart, low back pain and stroke (bubbles with the largest size).

colors <- c('#4AC6B7', '#1972A4', '#965F8A')
SingaporePopn_2017 <- 5568480

daly_2017 <- daly_2017 %>% 
  arrange(desc(DALYs))

fig4 <- plot_ly(
  daly_2017, x = ~`logYLD`, y = ~`logYLL`, color = ~`broad_cause`, 
  type = "scatter",
  mode = "markers", 
  colors = ~colors, 
  size = ~`DALYs`,
  marker = list(symbol = 'circle', sizemode = 'diameter', sizeref = 1.5,
                line = list(width = 2, color = '#FFFFFF'), opacity=0.4),
  text = ~paste(sep='','Disease Category: ', `cause_name`,
                '<br>DALYs :',round(`DALYs`,0),
                '<br>DALY per 100,000 population ', round(`DALYs`/SingaporePopn_2017*100000,1),
                '<br>YLL per 100,000 population ', round(`YLLs`/SingaporePopn_2017*100000,1),
                '<br>YLD rate per 100,000 population ', round(`YLDs`/SingaporePopn_2017*100000,1))
          
  )%>%
  layout(
        title="Causes of Premature Deaths and Disability in Singapore, 2017",
        xaxis = list(title = 'log Years lost due to disability (log YLDs)',
                      gridcolor = 'rgb(243, 243, 243)',
                      ticklen = 5,
                      gridwidth = 1),
        yaxis = list(title = 'log Years lost due to premature deaths (log YLLs)',
                      gridcolor = 'rgb(243, 243, 243)',
                      ticklen = 5,
                      gridwith = 1),
        legend = list(x = 0, y = 0.2)
  )
fig4

5 Insights and Final Visualisation

All the above maps and plots shall be included in the final visualisation for this dataviz.

Global trend in healthy life expectancy at birth (HALE) and the major causes of premature deaths and disability in Singapore, 2017

There are large variations across countries in healthy life expectancies at birth in 2017 globally, ranging from 45 years till 74 years. Healthy life expectancies at birth in countries located in Central Africa are comparatively lower than in other countries (Fig. 1). Comparative information on healthy life expectancy and its corresponding proportion of life expectancy spent in poor health across countries can be visualised via (Fig.2). Approximately 11.6%-16.7% of the life expectancy at birth across the various countries is spent on poor health.

Singapore has the highest healthy life expectancy at birth, at 74.2 years in 2017 out of 195 countries/territories, followed by Japan (73.1 years) and Spain (72.1 years). For Singapore, about 10.6 years on average would be spent in ill-health, which accounted for 12.5% of its life expectancy at birth in 2017 (Fig. 3).

Low back pain, headache disorders and depressive disorders are the top 3 health problems causing the most disability in Singapore’s population (i.e. highest log YLD values) in 2017, while ischaemic heart disease, lower respiratory tract infections and lung cancer are the top 3 causes of premature mortality (Fig. 4). In terms of both premature deaths and disabilty combined, the top 3 causes of years lost are ischaemic heart, low back pain and stroke (bubbles with the largest size).

Figure 1

Figure2 Detailed information for specific country can be obtained by locating the country in the interactive map in Fig.2 and clicking on it

Figure3 Hover over the bar for additional information

Figure 4 Hover over the bubble for additional information

6 References

Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2018. Available from http://ghdx.healthdata.org/gbd-results-tool.