Introduction

Background

Health care and how it should be delivered are perennial topics in the political life of the United States. In discussing this topic, there are a great deal of anecdotal examples thrown around in support of or in opposition to our current system, a government-provided health insurance system, or a state-provided (universal) health care system. According to the World Bank, the United States spends more than any other country as a percent of GDP (except Tuvalu and the Marshall Islands) and more in US dollars per capita than any other country. As a result, the most common debate centers around whether Americans get good value for their money, or if other systems work better. (DE ABREU, M. P. (2021) ) Several studies have argued that the United States, despite spending so much on health care, actually performs worse than others. (healthsystemtracker.org, pgpf.org)

Comparing the United States, Canada, and the United Kingdom is a useful way to explore this issue. The United States has a health care system in which everyone but the very poorest or retirees pays for their own private insurance (or their own health care if they do not have insurance). Canada has a government-provided health insurance system and the United Kingdom has a universal health care system. They are also all high-income countries, members of the OECD, and have other similarities that make comparisons useful. In a study of the health systems in these three countries, M.M.O. Seipel noted there were statistically significant differences in the access to health care in each of these countries that makes them useful to study. (Seipel (2012))

How to effectively measure the differences in these health systems is a key question. There is no established framework for evaluating healthcare systems, but an effective way to do so is by looking at the burden of disease in a population. The sum of mortality and morbidity is referred to as the ‘burden of disease’ and can be measured by a metric called ‘Disability Adjusted Life Years‘ (DALYs). Conceptually, one DALY is the equivalent of losing one year in good health because of either premature death or disease or disability. One DALY represents one lost year of healthy life. (Roser and Ritchie (2016)) This is the measurement I will use to try to analyze the performance of the healthcare systems in the United States, United Kingdom, and Canada.

The Data

The data collection for this project is quite complicated and involves a multi-tiered approach. In brief: “High‐quality, ongoing estimation requires a constant stream of the most up‐to‐date data available for a wide range of indicators. This necessitates continuous extraction of studies from the literature and the addition of key data sources throughout the GBD (Global Burden of Disease).” (http://www.healthdata.org/sites/default/files/files/Projects/GBD/March2020_GBD%20Protocol_v4.pdf)

The GBD takes input data (microdata or tabulated data obtained directly from data holders or publications), cleans it, aggregates it, and processes it, and then produces fully imputed datasets providing detailed information.

Thus, analysis of the data in this project will require some degree of trust in the skill and integrity of the GBD team. We will do this for a couple reasons. First, they cite and make publicly available the data at each step in the process (thus, if we were not beginners constrained by time, we could independently verify their results). Second, the Institute for Health Metrics and Evaluation (IHME), which provides these final data, is widely respected and relied upon by global experts. While this is no guarantee of quality, it is good enough for our purposes.

The potential biases in this sort of methodology primarily stem from the perspectives of the researchers compiling the data. In addition, since the data is observational and derived primarily from literature studies, it could reflect a publication bias (i.e. publications only publicize “meaningful” results so relying on them would tend to hide trends that are more benign). Finally, without looking at each individual article and examining its methodology, we cannot be certain that there is not also sampling bias in these data.

Statistics

The primary statistic I will use is the DALY—Disability-adjusted life years. It is the sum of mortality and morbidity. Conceptually one DALY is equivalent to one lost year of good health due to premature death or disease or disability. The lower a DALY, the better the health in a country. I will compare the distributions of DALYs in each of the three countries.

I will also use the health expenditure per capita and the burden of disease due to communicable and non-communicable diseases by GDP of the country. I will also use burden of disease by age, to see if I can eliminate population age as a variable in the outcome of disease burden as well. Using these other data should allow me to build models that lend themselves to preliminary conclusions and hypotheses about the healthcare systems in these three countries.

Variables

Name: country name, with 231 unique entities

Code: three-letter name abbreviation

Year: year of measurement ranging from 1990 to 2017

DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Age-standardized (Rate): one lost year of healthy life for each number in DALY, these are given per 100,000 people

DALYs by cause the absolute numbers of DALYs by cause

Health expenditure per capita (current US$)

Population (historical estimates): Population of country

Continent: Location of country

GDP per capita, PPP (constant 2017 international $): GDP of country

Age the relative breakdown of the total disease burden and by the rates of burden per 100,000 individuals within a given age group

Overarching Question

The question I would like to explore with this data, statistics, and variables is: how does the outcome of the U.S. health care system compare with other types of systems regarding the health quality of the population?

Data Exploration/Analysis

Load and Clean Data

#Load libraries
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(BSDA)
## Loading required package: lattice
## 
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
## 
##     Orange
library(ggthemes)

# Set Working Directory
setwd("~/Documents/Math 217/Final Project")
# Note:  in my initial data exploration I realize I was mixing DALY rates 
# and DALY totals.  I've adjusted to all totals here
#Load datasets
dalys_all <- read_csv("total-disease-burden-by-cause.csv")
burden_vs_expenditure <- read_csv("disease-burden-vs-health-expenditure-per-capita.csv")
burden_by_cause <- read_csv("burden-of-disease-by-cause.csv")
burden_communicable_diseases_vs_gdp <- read_csv("share-of-disease-burden-from-communicable-diseases-vs-gdp.csv")
burden_from_ncds_vs_gdp <- read_csv("share-of-disease-burden-from-ncds-vs-gdp.csv")
burden_age <- read_csv("disease-burden-by-age.csv")
#Examine the structures to get a better understanding of the data
str(dalys_all)
## tibble [6,468 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                                                                  : chr [1:6468] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                                                                    : chr [1:6468] "AFG" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                                                                    : num [1:6468] 1990 1991 1992 1993 1994 ...
##  $ DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Number)                                 : num [1:6468] 3597007 3623008 3874902 4350359 4636101 ...
##  $ DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number): num [1:6468] 7753778 7747820 8202150 10048535 11195441 ...
##  $ DALYs (Disability-Adjusted Life Years) - Injuries - Sex: Both - Age: All Ages (Number)                                                  : num [1:6468] 1157180 1380655 1531035 1692441 2155120 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Injuries - Sex: Both - Age: All Ages (Number)` = col_double()
##   .. )

The DALYs_All dataset has summary information (non-communicable diseases, injuries, and communicable, maternal, neonatal, and nutritional diseases) for each country by year. This will give us a good overview of the total health and general wellness categories for each country.

#Examine the structures to get a better understanding of the data
str(burden_vs_expenditure)
## tibble [57,469 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                        : chr [1:57469] "Abkhazia" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                          : chr [1:57469] "OWID_ABK" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                          : num [1:57469] 2015 1990 1991 1992 1993 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Age-standardized (Rate): num [1:57469] NA 104103 101241 90918 88234 ...
##  $ Health expenditure per capita (current US$)                                                   : num [1:57469] NA NA NA NA NA NA NA NA NA NA ...
##  $ Population (historical estimates)                                                             : num [1:57469] NA 12412311 13299016 14485543 15816601 ...
##  $ Continent                                                                                     : chr [1:57469] "Asia" NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Age-standardized (Rate)` = col_double(),
##   ..   `Health expenditure per capita (current US$)` = col_double(),
##   ..   `Population (historical estimates)` = col_double(),
##   ..   Continent = col_character()
##   .. )

Burden_vs_expenditure includes information on total DALYs and health expenditure per capita by country and year.

#Examine the structures to get a better understanding of the data
str(burden_by_cause)
## tibble [6,468 × 26] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                                                                         : chr [1:6468] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                                                                           : chr [1:6468] "AFG" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                                                                           : num [1:6468] 1990 1991 1992 1993 1994 ...
##  $ DALYs (Disability-Adjusted Life Years) - HIV/AIDS and tuberculosis - Sex: Both - Age: All Ages (Number)                                        : num [1:6468] 382490 403593 433692 471877 513017 ...
##  $ DALYs (Disability-Adjusted Life Years) - Diarrhea, lower respiratory, and other common infectious diseases - Sex: Both - Age: All Ages (Number): num [1:6468] 4809769 4851414 4965971 5252930 5469160 ...
##  $ DALYs (Disability-Adjusted Life Years) - Neglected tropical diseases and malaria - Sex: Both - Age: All Ages (Number)                          : num [1:6468] 792564 761387 730885 747450 716255 ...
##  $ DALYs (Disability-Adjusted Life Years) - Maternal disorders - Sex: Both - Age: All Ages (Number)                                               : num [1:6468] 128907 132931 160901 195583 209111 ...
##  $ DALYs (Disability-Adjusted Life Years) - Neonatal disorders - Sex: Both - Age: All Ages (Number)                                               : num [1:6468] 1609085 1633639 1781428 2428191 2655559 ...
##  $ DALYs (Disability-Adjusted Life Years) - Nutritional deficiencies - Sex: Both - Age: All Ages (Number)                                         : num [1:6468] 274045 273272 302851 356457 401777 ...
##  $ DALYs (Disability-Adjusted Life Years) - Other communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number) : num [1:6468] 133627 137317 144862 154332 165391 ...
##  $ DALYs (Disability-Adjusted Life Years) - Neoplasms - Sex: Both - Age: All Ages (Number)                                                        : num [1:6468] 309042 311985 330363 355911 372074 ...
##  $ DALYs (Disability-Adjusted Life Years) - Cardiovascular diseases - Sex: Both - Age: All Ages (Number)                                          : num [1:6468] 1174287 1185606 1235672 1299673 1342037 ...
##  $ DALYs (Disability-Adjusted Life Years) - Chronic respiratory diseases - Sex: Both - Age: All Ages (Number)                                     : num [1:6468] 210322 212743 226521 243687 253746 ...
##  $ DALYs (Disability-Adjusted Life Years) - Cirrhosis and other chronic liver diseases - Sex: Both - Age: All Ages (Number)                       : num [1:6468] 47665 48075 50693 54572 57421 ...
##  $ DALYs (Disability-Adjusted Life Years) - Digestive diseases - Sex: Both - Age: All Ages (Number)                                               : num [1:6468] 168461 170895 185328 207639 221799 ...
##  $ DALYs (Disability-Adjusted Life Years) - Neurological disorders - Sex: Both - Age: All Ages (Number)                                           : num [1:6468] 162476 165872 186267 211757 224701 ...
##  $ DALYs (Disability-Adjusted Life Years) - Mental and substance use disorders - Sex: Both - Age: All Ages (Number)                               : num [1:6468] 165265 172773 205121 239490 250027 ...
##  $ DALYs (Disability-Adjusted Life Years) - Diabetes, urogenital, blood, and endocrine diseases - Sex: Both - Age: All Ages (Number)              : num [1:6468] 329602 344649 367629 396913 429143 ...
##  $ DALYs (Disability-Adjusted Life Years) - Musculoskeletal disorders - Sex: Both - Age: All Ages (Number)                                        : num [1:6468] 136720 139316 153442 168411 172884 ...
##  $ DALYs (Disability-Adjusted Life Years) - Other non-communicable diseases - Sex: Both - Age: All Ages (Number)                                  : num [1:6468] 824540 812043 858316 1075174 1221071 ...
##  $ DALYs (Disability-Adjusted Life Years) - Transport injuries - Sex: Both - Age: All Ages (Number)                                               : num [1:6468] 279677 286719 334799 410369 454481 ...
##  $ DALYs (Disability-Adjusted Life Years) - Exposure to forces of nature - Sex: Both - Age: All Ages (Number)                                     : num [1:6468] 1194 83758 41668 11521 15145 ...
##  $ DALYs (Disability-Adjusted Life Years) - Conflict and terrorism - Sex: Both - Age: All Ages (Number)                                           : num [1:6468] 322882 443580 541085 561631 885499 ...
##  $ DALYs (Disability-Adjusted Life Years) - Self-harm - Sex: Both - Age: All Ages (Number)                                                        : num [1:6468] 31451 32682 38347 45404 48301 ...
##  $ DALYs (Disability-Adjusted Life Years) - Interpersonal violence - Sex: Both - Age: All Ages (Number)                                           : num [1:6468] 78443 93278 104732 120217 140623 ...
##  $ DALYs (Disability-Adjusted Life Years) - Unintentional injuries - Sex: Both - Age: All Ages (Number)                                           : num [1:6468] 444333 523972 511557 554181 625504 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - HIV/AIDS and tuberculosis - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Diarrhea, lower respiratory, and other common infectious diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Neglected tropical diseases and malaria - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Maternal disorders - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Neonatal disorders - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Nutritional deficiencies - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Other communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Neoplasms - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Cardiovascular diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Chronic respiratory diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Cirrhosis and other chronic liver diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Digestive diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Neurological disorders - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Mental and substance use disorders - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Diabetes, urogenital, blood, and endocrine diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Musculoskeletal disorders - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Other non-communicable diseases - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Transport injuries - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Exposure to forces of nature - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Conflict and terrorism - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Self-harm - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Interpersonal violence - Sex: Both - Age: All Ages (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Unintentional injuries - Sex: Both - Age: All Ages (Number)` = col_double()
##   .. )

Burden_by_cause has information on the DALYs for a number of different causes (diseases and injuries) by country by year.

#Examine the structures to get a better understanding of the data
str(burden_communicable_diseases_vs_gdp)
## tibble [57,942 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                                                                   : chr [1:57942] "Abkhazia" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                                                                     : chr [1:57942] "OWID_ABK" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                                                                     : num [1:57942] 2015 1990 1991 1992 1993 ...
##  $ DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Percent): num [1:57942] NA 62 60.8 60.3 62.4 ...
##  $ GDP per capita, PPP (constant 2017 international $)                                                                                      : num [1:57942] NA NA NA NA NA NA NA NA NA NA ...
##  $ Population (historical estimates)                                                                                                        : num [1:57942] NA 12412311 13299016 14485543 15816601 ...
##  $ Continent                                                                                                                                : chr [1:57942] "Asia" NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Percent)` = col_double(),
##   ..   `GDP per capita, PPP (constant 2017 international $)` = col_double(),
##   ..   `Population (historical estimates)` = col_double(),
##   ..   Continent = col_character()
##   .. )

Burden_communicable_diseases_vs_gdp provides the DALYs from communicable diseases and the GDP per capita for each country for each year.

#Examine the structures to get a better understanding of the data
str(burden_from_ncds_vs_gdp)
## tibble [57,942 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                                  : chr [1:57942] "Abkhazia" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                                    : chr [1:57942] "OWID_ABK" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                                    : num [1:57942] 2015 1990 1991 1992 1993 ...
##  $ DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Percent): num [1:57942] NA 28.8 28.4 28.5 27 ...
##  $ GDP per capita, PPP (constant 2017 international $)                                                     : num [1:57942] NA NA NA NA NA NA NA NA NA NA ...
##  $ Population (historical estimates)                                                                       : num [1:57942] NA 12412311 13299016 14485543 15816601 ...
##  $ Continent                                                                                               : chr [1:57942] "Asia" NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Percent)` = col_double(),
##   ..   `GDP per capita, PPP (constant 2017 international $)` = col_double(),
##   ..   `Population (historical estimates)` = col_double(),
##   ..   Continent = col_character()
##   .. )

Burden_from_ncds_vs_gdp provides the DALYs from non-communicable diseases and the GDP per capita for each country for each year.

#Examine the structures to get a better understanding of the data
str(burden_age)
## tibble [6,468 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Entity                                                                                     : chr [1:6468] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                                                                                       : chr [1:6468] "AFG" "AFG" "AFG" "AFG" ...
##  $ Year                                                                                       : num [1:6468] 1990 1991 1992 1993 1994 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Under 5 (Number)    : num [1:6468] 7840001 7845275 8251367 10253173 11723996 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 5-14 years (Number) : num [1:6468] 607886 642064 679501 717363 773289 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 15-49 years (Number): num [1:6468] 1921965 2097622 2481363 2885627 3208874 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 50-69 years (Number): num [1:6468] 1534624 1555981 1580855 1613093 1647469 ...
##  $ DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 70+ years (Number)  : num [1:6468] 603490 610541 615000 622078 633035 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Under 5 (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 5-14 years (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 15-49 years (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 50-69 years (Number)` = col_double(),
##   ..   `DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 70+ years (Number)` = col_double()
##   .. )

Burden_age provides the DALYs by age for each country for each year.

# Clean the datasets so I have just what I need, reducing it to just Canada, 
# the United Kingdom, and the United States

dalys_all <- dalys_all %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
burden_vs_expenditure <- burden_vs_expenditure %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
burden_by_cause <- burden_by_cause %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
burden_communicable_diseases_vs_gdp <- burden_communicable_diseases_vs_gdp %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
burden_from_ncds_vs_gdp <- burden_from_ncds_vs_gdp %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
burden_age <- burden_age %>% filter(Entity == "United States"|Entity == "United Kingdom"|Entity == "Canada")
# Check summary statistics to see if there are anomalies

summary(dalys_all)
##     Entity              Code                Year     
##  Length:84          Length:84          Min.   :1990  
##  Class :character   Class :character   1st Qu.:1997  
##  Mode  :character   Mode  :character   Median :2004  
##                                        Mean   :2004  
##                                        3rd Qu.:2010  
##                                        Max.   :2017  
##  DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Number)
##  Min.   : 5916964                                                                                       
##  1st Qu.: 7235163                                                                                       
##  Median :15502210                                                                                       
##  Mean   :31298192                                                                                       
##  3rd Qu.:65453363                                                                                       
##  Max.   :85221367                                                                                       
##  DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number)
##  Min.   : 363401                                                                                                                         
##  1st Qu.: 415773                                                                                                                         
##  Median :1087098                                                                                                                         
##  Mean   :2383952                                                                                                                         
##  3rd Qu.:5252688                                                                                                                         
##  Max.   :6896255                                                                                                                         
##  DALYs (Disability-Adjusted Life Years) - Injuries - Sex: Both - Age: All Ages (Number)
##  Min.   :  785922                                                                      
##  1st Qu.:  839346                                                                      
##  Median : 1252894                                                                      
##  Mean   : 3915686                                                                      
##  3rd Qu.: 9261626                                                                      
##  Max.   :10449884

The summary statistics for the DALYs_all dataset (which I’ve now reduced to just Canada, the United Kingdom, and the United States) contains 84 entries. The data range from 1990 to 2017, and include information on DALYs for three major categories: non-communicable diseases; communicable, maternal, neonatal, and nutritional diseases; and injuries. Over this 27 year period, the mean number of DALYs for non-communicable diseases is 31298192 with a minimum of 5916964 and a maximum of 85221367. The mean number of DALYs for communicable, maternal, neonatal, and nutritional diseases is 2383952 with a minimum of 363401 and a maximum of 6896255. The mean number of DALYs for injuries is 3915686, with a minimum of 785922 and a maximum of 10449884.

These numbers are so large that they are difficult to visualize, but the largest mean number of DALYs is in the non-communicable disease category, with 31,298,192 DALYs per 100,000 people. This is not surprising in high-GDP countries where communicable, maternal, neonatal, and nutritional diseases should be more controlled.

# Rename the variables so they are easier to work with

dalys_all <- dalys_all %>% 
rename("DALYs_NCDs" = "DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Number)") %>%
rename("DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases" = "DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number)") %>% 
rename("DALYs_Injuries" = "DALYs (Disability-Adjusted Life Years) - Injuries - Sex: Both - Age: All Ages (Number)")
# Add a column with total DALYs for all three groups of causes
dalys_all <- mutate(dalys_all, "Total_DALYs" = dalys_all$DALYs_NCDs + dalys_all$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + dalys_all$DALYs_Injuries)
# This dataset includes strange years:  I suspect that they did some extrapolation
#to come up with "historic estimates" of the population that required them to go
#backwards and insert dates.  We will delete everything prior to 1990.

burden_vs_expenditure2 <- burden_vs_expenditure %>% 
filter(Year >= 1990)
summary(burden_vs_expenditure2)
##     Entity              Code                Year     
##  Length:96          Length:96          Min.   :1990  
##  Class :character   Class :character   1st Qu.:1998  
##  Mode  :character   Mode  :character   Median :2006  
##                                        Mean   :2006  
##                                        3rd Qu.:2013  
##                                        Max.   :2021  
##                                                      
##  DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Age-standardized (Rate)
##  Min.   :19214                                                                                 
##  1st Qu.:20657                                                                                 
##  Median :23058                                                                                 
##  Mean   :22853                                                                                 
##  3rd Qu.:24566                                                                                 
##  Max.   :27651                                                                                 
##  NA's   :12                                                                                    
##  Health expenditure per capita (current US$) Population (historical estimates)
##  Min.   :1364                                Min.   : 27541323                
##  1st Qu.:2237                                1st Qu.: 35572387                
##  Median :3899                                Median : 60554651                
##  Mean   :4254                                Mean   :129809436                
##  3rd Qu.:5587                                3rd Qu.:272579053                
##  Max.   :9403                                Max.   :332915074                
##  NA's   :36                                                                   
##   Continent        
##  Length:96         
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

A summary of the burden_vs_expenditure data shows that there are 96 entries ranging from 1990 to 2021. The mean DALYs rate (number of DALYs per 100,000 people) for Canada, the United Kingdom, and the United States for this period is 22853, the minimum is 19214 and the maximum is 27651. The mean health expenditure per capita in USD is 4254 per year. The minimum is 1364 and the maximum is 9403.

# Rename the variable for ease of use
burden_vs_expenditure2 <- burden_vs_expenditure2 %>% 
rename("Health_Spending" = "Health expenditure per capita (current US$)")
# Rename the variables to make them easier to use
burden_by_cause <- burden_by_cause %>% 
rename("DALYs_HIV/AIDS_and_tuberculosis" = "DALYs (Disability-Adjusted Life Years) - HIV/AIDS and tuberculosis - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Common_Infectious_Diseases" = "DALYs (Disability-Adjusted Life Years) - Diarrhea, lower respiratory, and other common infectious diseases - Sex: Both - Age: All Ages (Number)") %>% 
rename("DALYs_tropical_diseases" = "DALYs (Disability-Adjusted Life Years) - Neglected tropical diseases and malaria - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Maternal_Disorders" = "DALYs (Disability-Adjusted Life Years) - Maternal disorders - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Neonatal_Disorders" = "DALYs (Disability-Adjusted Life Years) - Neonatal disorders - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Nutritional_Deficiencies" = "DALYs (Disability-Adjusted Life Years) - Nutritional deficiencies - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Other_Diseases" = "DALYs (Disability-Adjusted Life Years) - Other communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Neoplasms" = "DALYs (Disability-Adjusted Life Years) - Neoplasms - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Cardiovascular_Diseases" = "DALYs (Disability-Adjusted Life Years) - Cardiovascular diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Chronic_Respiratory_Diseases" = "DALYs (Disability-Adjusted Life Years) - Chronic respiratory diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Liver_Diseases" = "DALYs (Disability-Adjusted Life Years) - Cirrhosis and other chronic liver diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Digestive_Diseases" = "DALYs (Disability-Adjusted Life Years) - Digestive diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Neurological_Disorders" = "DALYs (Disability-Adjusted Life Years) - Neurological disorders - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Mental_and_Substance_Disorders" = "DALYs (Disability-Adjusted Life Years) - Mental and substance use disorders - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Diabetes_Etc" = "DALYs (Disability-Adjusted Life Years) - Diabetes, urogenital, blood, and endocrine diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Musculoskeletal_Disorders" = "DALYs (Disability-Adjusted Life Years) - Musculoskeletal disorders - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Other_NCDs" = "DALYs (Disability-Adjusted Life Years) - Other non-communicable diseases - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Transport_Injuries" = "DALYs (Disability-Adjusted Life Years) - Transport injuries - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Nature" = "DALYs (Disability-Adjusted Life Years) - Exposure to forces of nature - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Conflict_and_Terrorism" = "DALYs (Disability-Adjusted Life Years) - Conflict and terrorism - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Self-harm" = "DALYs (Disability-Adjusted Life Years) - Self-harm - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Interpersonal_Violence" = "DALYs (Disability-Adjusted Life Years) - Interpersonal violence - Sex: Both - Age: All Ages (Number)") %>% rename("DALYs_Accidents" = "DALYs (Disability-Adjusted Life Years) - Unintentional injuries - Sex: Both - Age: All Ages (Number)")
# Raname variables to make them easier to use
burden_communicable_diseases_vs_gdp <- rename(burden_communicable_diseases_vs_gdp, "DALYs_Communicable_maternal_neonatal_and_nutritional_diseases_(percent)" = "DALYs (Disability-Adjusted Life Years) - Communicable, maternal, neonatal, and nutritional diseases - Sex: Both - Age: All Ages (Percent)")
# This dataset has the same date problem as above, so filter to get only 1990 on
burden_communicable_gdp <- burden_communicable_diseases_vs_gdp %>% filter(Year >= 1990)
summary(burden_communicable_gdp)
##     Entity              Code                Year     
##  Length:96          Length:96          Min.   :1990  
##  Class :character   Class :character   1st Qu.:1998  
##  Mode  :character   Mode  :character   Median :2006  
##                                        Mean   :2006  
##                                        3rd Qu.:2013  
##                                        Max.   :2021  
##                                                      
##  DALYs_Communicable_maternal_neonatal_and_nutritional_diseases_(percent)
##  Min.   :4.562                                                          
##  1st Qu.:5.206                                                          
##  Median :5.926                                                          
##  Mean   :5.919                                                          
##  3rd Qu.:6.397                                                          
##  Max.   :8.462                                                          
##  NA's   :12                                                             
##  GDP per capita, PPP (constant 2017 international $)
##  Min.   :30036                                      
##  1st Qu.:40415                                      
##  Median :44146                                      
##  Mean   :44949                                      
##  3rd Qu.:48954                                      
##  Max.   :62631                                      
##  NA's   :10                                         
##  Population (historical estimates)  Continent        
##  Min.   : 27541323                 Length:96         
##  1st Qu.: 35572387                 Class :character  
##  Median : 60554651                 Mode  :character  
##  Mean   :129809436                                   
##  3rd Qu.:272579053                                   
##  Max.   :332915074                                   
## 

A summary of the burden_communicable_gdp dataset shows that there are 96 entries covering the years from 1990 to 2021. The mean percent of DALYs that were communicable, maternal, neonatal, and nutritional diseases was 5.919. The minimum percent was 4.562 and the maximum percent was 8.462. The mean GDP per capita in these three countries during these years was 44,949 USD. The minimum was 30,036 USD and the maximum was 62,631 USD.

# Rename variables to make them easier to use
burden_from_ncds_vs_gdp <- rename(burden_from_ncds_vs_gdp, "DALYs_NCDs_(Percent)" = "DALYs (Disability-Adjusted Life Years) - Non-communicable diseases - Sex: Both - Age: All Ages (Percent)")
# Filter out pre-1990 dates and summarize data
burden_ncds_gdp <- burden_from_ncds_vs_gdp %>% filter(Year >= 1990)
summary(burden_ncds_gdp)
##     Entity              Code                Year      DALYs_NCDs_(Percent)
##  Length:96          Length:96          Min.   :1990   Min.   :78.76       
##  Class :character   Class :character   1st Qu.:1998   1st Qu.:83.18       
##  Mode  :character   Mode  :character   Median :2006   Median :85.28       
##                                        Mean   :2006   Mean   :84.59       
##                                        3rd Qu.:2013   3rd Qu.:86.50       
##                                        Max.   :2021   Max.   :87.11       
##                                                       NA's   :12          
##  GDP per capita, PPP (constant 2017 international $)
##  Min.   :30036                                      
##  1st Qu.:40415                                      
##  Median :44146                                      
##  Mean   :44949                                      
##  3rd Qu.:48954                                      
##  Max.   :62631                                      
##  NA's   :10                                         
##  Population (historical estimates)  Continent        
##  Min.   : 27541323                 Length:96         
##  1st Qu.: 35572387                 Class :character  
##  Median : 60554651                 Mode  :character  
##  Mean   :129809436                                   
##  3rd Qu.:272579053                                   
##  Max.   :332915074                                   
## 

The burden_ncds_gdp dataset is similar to the burden_communicable_gdp dataset except it looks at DALYs from non-communicable diseases (NCDs). Here you can really see the dominant role that NCDs play in health in Canada, the United Kingdom, and the United States. The mean percent of DALYs from NCDs in the years from 1990 to 2021 was 84.59. The minimum was 78.76 percent and the maximum was 87.11 percent. The GDP data is the same as in the communicable disease dataset.

# Rename variables to make them easier to use and then summarize
burden_age <- rename(burden_age, "DALYs_<_5" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Under 5 (Number)")
burden_age <- rename(burden_age, "DALYs_5-14" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 5-14 years (Number)")
burden_age <- rename(burden_age, "DALYs_15-49" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 15-49 years (Number)")
burden_age <- rename(burden_age, "DALYs_50-69" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 50-69 years (Number)")
burden_age <- rename(burden_age, "DALYs_70+" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: 70+ years (Number)")

summary(burden_age)
##     Entity              Code                Year        DALYs_<_5      
##  Length:84          Length:84          Min.   :1990   Min.   : 223616  
##  Class :character   Class :character   1st Qu.:1997   1st Qu.: 259996  
##  Mode  :character   Mode  :character   Median :2004   Median : 482229  
##                                        Mean   :2004   Mean   :1409003  
##                                        3rd Qu.:2010   3rd Qu.:3038625  
##                                        Max.   :2017   Max.   :4506149  
##    DALYs_5-14       DALYs_15-49        DALYs_50-69         DALYs_70+       
##  Min.   : 193400   Min.   : 2559285   Min.   : 2137379   Min.   : 1872328  
##  1st Qu.: 221412   1st Qu.: 2659810   1st Qu.: 2893325   1st Qu.: 2565938  
##  Median : 427328   Median : 5074429   Median : 5419465   Median : 6535803  
##  Mean   :1009097   Mean   :12507146   Mean   :11730484   Mean   :10942098  
##  3rd Qu.:2339701   3rd Qu.:29297533   3rd Qu.:22702519   3rd Qu.:23087372  
##  Max.   :2490733   Max.   :30961450   Max.   :36232317   Max.   :28194157

The burden_age dataset shows DALYs from 1990 to 2017 by the following age groups: under 5, 5-14, 15-49, 50-69, and 70 and over. These are DALYs per 100,000 people. The mean DALYs for under 5 were 1,409,003; for 5-14, they were 1,009,097; for 15-49, they were 12,507,146; for 50-69, they were 11,730,84; and for 70 and older, they were 10,942,098. This is where the concept of a DALY as equivalent to one lost year of good health due to premature death or disease or disability is confusing, as death and disease are lumped together and it is impossible to determine how many years in each category are caused by disease or death and how much lost time is assigned to each affliction. (In other words, according to the World Bank life expectancy in Canada is 82.05 years. Thus, if someone died at 70, they would have 12.05 DALYs, but if someone died at 10, they would have 72.05 DALYs.) This dataset does not include population by age, so we also do not know what those individual DALYs would be multiplied by. (If there were 100 70 year olds who died, that would be 1205 DALYs, and if there was only one 10 year old who died, that would still be 72.05 DALYs. Clearly population numbers are relevant to overall DALYs, and the distribution of ages in a country could affect the distribution of DALYs.)

Population size is also relevant for assessing heath quality in a country. The number of DALYs is just an absolute number, so we need to know how many people are in a country to know if the number is high or low relative to the population. Happily, there is a dataset with DALYs rates (which presents DALYs per 100,000 rather than just DALYs.)

# Import dalys-rate data

dalys_rate <- read_csv("dalys-rate-from-all-causes.csv")
#Clean as above by renaming variable and filtering out the data for just the three countries

dalys_rate <- dalys_rate %>% rename("Total_DALYs_Rate" = "DALYs (Disability-Adjusted Life Years) - All causes - Sex: Both - Age: Age-standardized (Rate)") %>% filter(Entity == "United Kingdom" | Entity == "United States" | Entity == "Canada")
#Wrangle the data into one dataset

burden_of_disease <- merge(burden_age, burden_ncds_gdp, all=TRUE)
burden_of_disease <- merge(burden_of_disease, burden_communicable_gdp, all=TRUE)
burden_of_disease <- merge(burden_of_disease, burden_by_cause, all = TRUE )
burden_of_disease <- merge(burden_of_disease, burden_vs_expenditure2, all = TRUE)
burden_of_disease <- merge(burden_of_disease, dalys_all, all = TRUE )
burden_of_disease <- merge(burden_of_disease, dalys_rate, all = TRUE)
# Create country-specific data sets

canada <- burden_of_disease %>% filter(Entity == "Canada")
united_kingdom <- burden_of_disease %>% filter(Entity == "United Kingdom")
united_states <- burden_of_disease %>% filter(Entity == "United States")

Initial Visualizations

DALYs per Country

To get a sense of the overall performance of each country’s health system, we should examine the number and trend of DALYs in each country.

# Plot distribution of annual DALYs in Canada

ggplot(canada)+
  geom_histogram(aes(x=Total_DALYs_Rate), fill = "midnight blue")+
  labs(x = "\nTotal DALYs per 100,000 people", title = "Canada:", subtitle = "\nDALYs per 100,000 people (frequency count)")+
  theme_economist()

This is a distribution of the number of DALYs per 100,000 people annually in Canada. It shows the number of times each number of DALYs occurred during the time period 1990 to 2017. We can see that, with the exception of two cases, lower DALYs occur more frequently.

#Plot DALYs by year in Canada

ggplot(canada)+
  geom_point(aes(x = Year, y = Total_DALYs_Rate))+
  labs(y = "Total DALYs per 100,000 people\n", title = "Canada:  DALYs Rate by Year")+
  theme_economist()

This shows the distribution of DALYs by year in Canada. The negative slope of the curve shows a somewhat steady improvement in the quality of health in Canada (with fewer DALYs each year), except for a slight hiccup between 2010 and 2017.

# Plot distribution of annual DALYs in the United Kingdom

ggplot(united_kingdom)+
  geom_histogram(aes(x=Total_DALYs_Rate), fill = "midnight blue")+
  labs(x = "\nTotal DALYs per 100,000 people", title = "United Kingdom:", subtitle =  "\nDALYs per 100,000 people (frequency count)")+
  theme_economist()

This is a distribution of the number of DALYs per 100,000 people annually in the United Kingdom. It shows the number of times each number of DALYs occurred during the time period 1990 to 2017. The most frequently occurring number is 20,000 DALYs.

#Plot DALYs by year in the United Kingdom

ggplot(united_kingdom)+
  geom_point(aes(x = Year, y = Total_DALYs_Rate))+
  labs(y = "Total DALYs per 100,000 people\n", title = "United Kingdom:  DALYs Rate by Year")+
  theme_economist()

Like the Canada plot of DALYs by year, the United Kingdom plot also shows a steady improvement in health and decline in DALYs with a slight increase and leveling out around 2015.

# Plot distribution of annual DALYs in the United States

ggplot(united_states)+
  geom_histogram(aes(x=Total_DALYs_Rate), fill = "midnight blue")+
  labs(x = "\nTotal DALYs per 100,000 people", title = "United States:", subtitle = "\nDALYs per 100,000 people (frequency count)")+
  theme_economist()

The United States’ plot shows more frequent, higher numbers of DALYs than the United Kingdom and Canada.

#Plot DALYs by year in the United States

ggplot(united_states)+
  geom_point(aes(x = Year, y = Total_DALYs_Rate))+
  labs(y = "Total DALYs per 100,000 people\n", title = "United States:  DALYs Rate by Year")+
  theme_economist()

The United States plot of DALYs by years also shows a decrease in DALYs over time, but with a marked increase after about 2012. These patterns look similar in all three countries, but the y-axes scales are different, so direct comparison is misleading.

#Plot DALYs by year in all three countries

ggplot(burden_of_disease)+
  geom_point(aes(x= Year, y = Total_DALYs_Rate, color = Entity))+
  labs(y = "Total DALYs per 100,000 people\n", title = "Canada, United Kingdom, United States:", subtitle = "\n  DALYs Rate by Year")+
  theme_economist()

This chart seems to make it clear that the United States has a higher rate of DALYs than either the United Kingdom or Canada.

How do the mean DALY rates compare for each country? First let’s take the mean for each country.

# Get summary statistics 
summary(canada$Total_DALYs_Rate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   19214   19487   20759   20991   22203   23699       4
# Get summary statistics 
summary(united_kingdom$Total_DALYs_Rate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   19977   20584   22172   22285   23741   25623       4
# Get summary statistics 
summary(united_states$Total_DALYs_Rate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   23651   24189   25161   25283   26137   27651       4

The mean DALYs rate for Canada is 20991, for the United Kingdom it is 22285, and for the United States it is 25283. That looks like a significant difference to me, but is it a statistically significant difference?

We already know that the distributions are small and not approximately normal (the histograms above are all skewed right), so to answer this question we will compare the means of the total DALY rate in all three countries simultaneously using the analysis of variance (ANOVA) test.

The null hypothesis is that the mean total DALY rate in the US = that in Canada = that in the UK. The alternative hypothesis is that the means are different.

# Create vectors with the data we want to find the mean of
canadaTDR <- canada$Total_DALYs_Rate
ukTDR <- united_kingdom$Total_DALYs_Rate
usTDR <- united_states$Total_DALYs_Rate
# Create a dataframe with total DALY rates by country
totalDALY_frame <- data.frame(totalDALYsRate = c(canadaTDR, ukTDR, usTDR), country = factor(rep(c("canadaTDR", "ukTDR", "usTDR"), times = c(length(canadaTDR), length(ukTDR), length(usTDR)))))
# Run the ANOVA test
fit5 <- aov(totalDALYsRate~country, data = totalDALY_frame)
summary(fit5)
##             Df    Sum Sq   Mean Sq F value   Pr(>F)    
## country      2 271392816 135696408   56.28 4.75e-16 ***
## Residuals   81 195285941   2410938                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 12 observations deleted due to missingness

The p-value is very small (basically 0), so it is very strong evidence to reject the null hypothesis that there is no difference in mean total DALY rate between Canada, the UK, and the US.

Here is a graphical depiction of this result.

# Plot the Anova test
boxplot(totalDALYsRate~country, data = totalDALY_frame)

# Show the means for each country and the grand mean
print(model.tables(fit5, "means"), digits = 3)
## Tables of means
## Grand mean
##         
## 22853.2 
## 
##  country 
## country
## canadaTDR     ukTDR     usTDR 
##     20991     22285     25283
# Plot residuals to check for normality
qqnorm(residuals(fit5))
qqline(residuals(fit5))

The qqplot of residuals is not a perfectly straight line, but it is normal enough (with some skew on the left and right). The table of means and box plot show that while all of the means are different, the U.S. mean total DALY rate is significantly higher than the UK and Canada.

Now that we have established with an ANOVA test that there is a difference in the distribution of the data for each country, we can see which means are different from the others by doing pairwise tests with Wilcoxon-Mann-Whitney tests.

The null hypothesis is that there is no difference in the distribution of the Total DALYs rate between Canada and the United Kingdom. The alternative hypothesis is that there is a shift in distributions in the Total DALYs rate.

# Perform the test on Canada - UK
wilcox.test(canadaTDR, ukTDR)
## 
##  Wilcoxon rank sum exact test
## 
## data:  canadaTDR and ukTDR
## W = 228, p-value = 0.00669
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of Total DALYs rate for Canada and the UK is .007. With an alpha (significance value) of .10, .007 is very strong evidence to reject the null hypothesis and determine there is a shift in the distribution of total DALYs rate between Canada and the UK.

The null hypothesis is that there is no difference in the distribution of the Total DALYs rate between the United Kingdom and the United States. The alternative hypothesis is that there is a shift in distributions in the Total DALYs rate.

# Perform the test on UK - US
wilcox.test(ukTDR, usTDR)
## 
##  Wilcoxon rank sum exact test
## 
## data:  ukTDR and usTDR
## W = 76, p-value = 1.616e-08
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of Total DALYs rate for the US and the UK is .000000016. With an alpha (significance value) of .10, .000000016 is very strong evidence to reject the null hypothesis and determine there is a shift in the distribution of total DALYs rate between the US and the UK.

The null hypothesis is that there is no difference in the distribution of the Total DALYs rate between Canada and the United States. The alternative hypothesis is that there is a shift in distributions in the Total DALYs rate.

# Perform the test on US - Canada
wilcox.test(canadaTDR, usTDR)
## 
##  Wilcoxon rank sum exact test
## 
## data:  canadaTDR and usTDR
## W = 2, p-value = 1.046e-15
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of Total DALYs rate for Canada and the US is .000000000000001 (let’s just say it is zero). With an alpha (significance value) of .10, 0 is very strong evidence to reject the null hypothesis and determine there is a shift in the distribution of total DALYs rate between Canada and the US.

The pairwise tests support what we found with our ANOVA test, and they indicate that the strongest shifts in the distributions are between the United States and each of the other two countries.

Health Expenditure per Capita

Now we’ll look at health expenditure per capita to see how each country compares when it comes to spending on health care.

# PLot frequency distribution for health expenditure per capita by year in Canada
ggplot(canada)+
  geom_histogram(aes(x=Health_Spending), fill = "dark red")+
  labs(x = "\nHealth Expenditure per Capita (in USD)", title = "Canada:", subtitle = "\nHealth Expenditure per Capita (frequency count)")+
  theme_economist()

The plot of frequency distribution shows Canada’s spending per capita on health care ranges from less than 2000 to less than 6000 USD.

# Plot spending per capita per year in Canada

ggplot(canada)+
  geom_point(aes(x = Year, y = Health_Spending))+
  labs(y = "Health Expenditure per Capita (in USD)\n", title = "Canada:  Health Expenditure per Capita")+
  theme_economist()

The plot of Canadian health expenditures per capita by year shows an increasing amount per year, with a peak in about 2011 and a decrease since then.

# PLot frequency distribution for health expenditure per capita by year in the United Kingdom

ggplot(united_kingdom)+
  geom_histogram(aes(x=Health_Spending), fill = "dark red")+
  labs(x = "\nHealth Expenditure per Capita (in USD)", title = "United Kingdom:", subtitle = "\nHealth Expenditure per Capita (frequency count)")+
  theme_economist()

The frequency plot for spending per capita in the UK shows that the UK tends to spend toward the upper part of their distribution, which is a little less than 4000 USD per capita. There are also a number of years where the UK spends below 2000 USD per capita.

# Plot spending per capita per year in the UK

ggplot(united_kingdom)+
  geom_point(aes(x = Year, y = Health_Spending))+
  labs(y = "Health Expenditure per Capita (in USD)\n", title = "United Kingdom:  Health Expenditure per Capita")+
  theme_economist()

The plot of UK health spending per capita by year shows an increase until about 2007, when there was a drop followed by another increase up to 2007 levels.

# PLot frequency distribution for health expenditure per capita by year in the US

ggplot(united_states)+
  geom_histogram(aes(x=Health_Spending), fill = "dark red")+
  labs(x = "\nHealth Expenditure per Capita (in USD)", title = "United States:", subtitle = "\nHealth Expenditure per Capita (frequency count)")+
  theme_economist()

The United States’ frequency distribution of health spending per capita is roughly flat, with the exception of less than 4000 USD per capita which occurred in two years. The numbers range from slightly below the high end of the UK (4000 USD per capita) to about 10,000 USD per capita.

# Plot spending per capita per year in the US

ggplot(united_states)+
  geom_point(aes(x = Year, y = Health_Spending))+
  labs(y = "Health Expenditure per Capita (in USD)\n", title = "United States:  Health Expenditure per Capita")+
  theme_economist()

Health care spending per capita has increased steadily (and sharply relative to the other two countries) in the United States, almost going off our plot by 2017.

# Plot health spending in all three countries

ggplot(burden_of_disease)+
  geom_point(aes(x= Year, y = Health_Spending, color = Entity))+
  labs(y = "Health Expenditure per Capita (in USD)\n", title = "Health Expenditure per Capita")+
  theme_economist()

Once again, the United States is notably different from the UK and Canada, with consistently higher spending. Are these visual differences statistically meaningful? We will start by looking at the mean expenditures per capital in each country, and then, because we are again dealing with right skewed data, we will compare the data using an ANOVA test.

# Get means for each country
summary(canada$Health_Spending)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1831    2061    3269    3517    4834    5719      12
summary(united_kingdom$Health_Spending)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1364    1768    3070    2759    3649    3937      12
summary(united_states$Health_Spending)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    3788    4722    6555    6485    8085    9403      12

The mean health expenditure per capita (each year) for Canada is 3517 USD, for the UK it is 2759 USD, and for the United States it is 6485 USD.

# Create vectors for health expenditures
canadaHE <- canada$Health_Spending
ukHE <- united_kingdom$Health_Spending
usHE <- united_states$Health_Spending

We can compare the distribution for all three countries simultaneously using the analysis of variance (ANOVA) test.

The null hypothesis is that the mean health expenditure in the US = that in Canada = that in the UK. The alternative hypothesis is that the means are different.

# Create a dataframe with health expenditures by country
expense_frame <- data.frame(expenses = c(canadaHE, ukHE, usHE), country = factor(rep(c("canadaHE", "ukHE", "usHE"), times = c(length(canadaHE), length(ukHE), length(usHE)))))
summary(expense_frame) # show the summary table
##     expenses        country  
##  Min.   :1364   canadaHE:32  
##  1st Qu.:2237   ukHE    :32  
##  Median :3899   usHE    :32  
##  Mean   :4254                
##  3rd Qu.:5587                
##  Max.   :9403                
##  NA's   :36
# create the ANOVA model
fit4 <- aov(expenses~country, data = expense_frame)
summary(fit4)
##             Df    Sum Sq  Mean Sq F value   Pr(>F)    
## country      2 155119717 77559859   34.45 1.56e-10 ***
## Residuals   57 128338042  2251545                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 36 observations deleted due to missingness

The p-value is very small (.00000000016), so it is very strong evidence to reject the null hypothesis that there is no difference in mean health expenditure between Canada, the UK, and the US.

Here is a graphical depiction of this result.

boxplot(expenses~country, data = expense_frame)

print(model.tables(fit4, “means”), digits = 3)

print(model.tables(fit4, "means"), digits = 3)
## Tables of means
## Grand mean
##          
## 4253.617 
## 
##  country 
## country
## canadaHE     ukHE     usHE 
##     3517     2759     6485
qqnorm(residuals(fit4))
qqline(residuals(fit4))

The qqplot of residuals is not a perfectly straight line, but it is normal enough. The table of means and box plot show that while all of the means are different, the U.S. mean health expenditure per capita is significantly higher than the UK and Canada.

Now that the ANOVA test has shown a difference in the means among the three countries, we can do a pairwise examination of each of the three.

The Wilcoxon Mann Whitney test will tell us if the difference of the distributions between each of the three is statistically significant.

First pairwise test:

The null hypothesis is that there is no difference in the distribution of the health expenditures per capita in Canada and the UK. The alternative hypothesis is that there is a shift in distributions of the health expenditures per capita.

# Perform the test on Canada - UK
wilcox.test(canadaHE, ukHE)
## 
##  Wilcoxon rank sum exact test
## 
## data:  canadaHE and ukHE
## W = 268, p-value = 0.0675
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of health expenditure per capita for Canada and the UK is .068. With an alpha (significance value) of .10, .068 is some evidence to reject the null hypothesis and determine there is a shift in the distribution of health expenditure per capita between Canada and the UK.

Second pairwise test:

The null hypothesis is that there is no difference in the distribution of the health expenditures per capita in the UK and the US. The alternative hypothesis is that there is a shift in distributions of the health expenditures per capita.

# Perform the test on UK - US
wilcox.test(usHE, ukHE)
## 
##  Wilcoxon rank sum exact test
## 
## data:  usHE and ukHE
## W = 397, p-value = 1.016e-10
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of health expenditure per capita for the UK and the US is .0000000001. With an alpha (significance value) of .10, .0000000001 is very strong evidence to reject the null hypothesis and determine there is a shift in the distribution of health expenditure per capita between the UK and the US.

Third pairwise test:

The null hypothesis is that there is no difference in the distribution of the health expenditures per capita in Canada and the US. The alternative hypothesis is that there is a shift in distributions of the health expenditures per capita.

# Perform the test on US - Canada
wilcox.test(canadaHE, usHE)
## 
##  Wilcoxon rank sum exact test
## 
## data:  canadaHE and usHE
## W = 54, p-value = 2.898e-05
## alternative hypothesis: true location shift is not equal to 0

The p-value for the test of the distributions of health expenditure per capita for Canada and the US is .000029. With an alpha (significance value) of .10, .000029 is very strong evidence to reject the null hypothesis and determine there is a shift in the distribution of health expenditure per capita between Canada and the US.

DALYs by Cause

The causes of disease in each country may be a source of the difference in DALYs and health expenditure per capita.

# pivot data to enable us to look at causes

canada3 <- pivot_longer(canada, cols = 39:41, names_to = "causes", values_to = "cases")

# Plot frequency of different causes

ggplot(canada3, aes(x = causes, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nCauses of DALYs", title = "Canada:  Causes of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases", "DALYs_Injuries", "DALYs_NCDs"),
                      labels=c("Communicable Diseases", "Injuries", "Non-Communicable Diseases"))+
  theme_economist()

The leading cause of DALYs in Canada are non-communicable diseases.

# pivot data to assess causes

uk1 <- pivot_longer(united_kingdom, cols = 39:41, names_to = "causes", values_to = "cases")

# plot frequency of DALYs by cause

ggplot(uk1, aes(x = causes, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nCauses of DALYs", title = "United Kingdom:  Causes of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases", "DALYs_Injuries", "DALYs_NCDs"),
                      labels=c("Communicable Diseases", "Injuries", "Non-Communicable Diseases"))+
  theme_economist()

For the United Kingdom, the leading cause of DALYs is also non-communicable diseases.

# Pivot data to assess causes

us1 <- pivot_longer(united_states, cols = 39:41, names_to = "causes", values_to = "cases")

# Plot DALYs by cause

ggplot(us1, aes(x = causes, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nCauses of DALYs", title = "United States:  Causes of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases", "DALYs_Injuries", "DALYs_NCDs"),
                      labels=c("Communicable Diseases", "Injuries", "Non-Communicable Diseases"))+
  theme_economist()

In the United States, non-communicable diseases are also the leading cause of DALYs. It appears that analyzing DALYs by cause (at least at a basic level) will not clarify the difference in DALYs and spending among Canada, the United Kingdom, and the United States.

DALYs by Age

Examining how the DALYs are distributed by age group in each country may also show differences between the three countries that could explain their different total DALYs rates.

# Pivot data to analyze by age group

canada4 <- pivot_longer(canada, cols = 7:11, names_to = "ages", values_to = "cases")

# order so it groups by age order
canada4$ages <- ordered(canada4$ages, levels = c("DALYs_<_5", "DALYs_5-14", "DALYs_15-49", "DALYs_50-69", "DALYs_70+"))

# Plot DALY's by age group 
ggplot(canada4, aes(x = ages, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nAge Groups of DALYs", title = "Canada:  Age Groups of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_<_5", "DALYs_15-49", "DALYs_5-14", "DALYs_50-69", "DALYs_70+"),
                      labels=c("<5", "15-49", "5-14", "50-69", "70+"))+
  theme_economist()

In Canada, most DALYs occur among people over the age of 15.

# Pivot data to assess DALYs by age
uk2 <- pivot_longer(united_kingdom, cols = 7:11, names_to = "ages", values_to = "cases")

# order by age group

uk2$ages <- ordered(uk2$ages, levels = c("DALYs_<_5", "DALYs_5-14", "DALYs_15-49", "DALYs_50-69", "DALYs_70+"))

# Plot DALYs by age group

ggplot(uk2, aes(x = ages, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nAge Groups of DALYs", title = "United Kingdom:  Age Groups of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_<_5", "DALYs_15-49", "DALYs_5-14", "DALYs_50-69", "DALYs_70+"),
                      labels=c("<5", "15-49", "5-14", "50-69", "70+"))+
  theme_economist()

In the United Kingdom most DALYs occur among those over 15 as well.

# Pivot data to work with ages
us2 <- pivot_longer(united_states, cols = 7:11, names_to = "ages", values_to = "cases")

# order by age group
us2$ages <- ordered(us2$ages, levels = c("DALYs_<_5", "DALYs_5-14", "DALYs_15-49", "DALYs_50-69", "DALYs_70+"))

#plot by age group
ggplot(us2, aes(x = ages, y = cases))+
  geom_col( fill = "purple")+
  labs(x = "\nAge Groups of DALYs", title = "United States:  Age Groups of DALYs\n")+
  scale_x_discrete(breaks=c("DALYs_<_5", "DALYs_15-49", "DALYs_5-14", "DALYs_50-69", "DALYs_70+"),
                      labels=c("<5", "15-49", "5-14", "50-69", "70+"))+
  theme_economist()

In the United States, like in Canada and the United Kingdom, most DALYs occur among those over 15. The similarity of this pattern among the three countries indicates it too is not the likely explanation for the difference in DALYs and health expenditure.

Relationships between Variables

Now we should look at the relationship between DALYs and health expenditure per capita.

We can begin with a graphical examination of the relationship between these two variables in each country.

# Plot spending vs daly's in Canada
ggplot(canada)+
  geom_point(aes(x = Health_Spending, y = Total_DALYs_Rate))+
  labs(x = "\nHealth Expenditure per Capita (in USD)", y = "Total DALYs per 100,000 people\n", title = "Canada:  DALYs vs. Spending")+
  theme_economist()

In Canada, DALYs appear to decrease as spending per capita increases.

## Plot spending vs daly's in the UK
ggplot(united_kingdom)+
  geom_point(aes(x = Health_Spending, y = Total_DALYs_Rate))+
  labs(x = "\nHealth Expenditure per Capita (in USD)", y = "Total DALYs per 100,000 people\n", title = "United Kingdom:  DALYs vs. Spending")+
  theme_economist()

The United Kingdom’s results are less linear than Canada’s, with a decreasing trend, but some ups and downs as you get to the higher end of their spending range.

# Plot spending vs daly's in the US

ggplot(united_states)+
  geom_point(aes(x = Health_Spending, y = Total_DALYs_Rate))+
  labs(x = "\nHealth Expenditure per Capita (in USD)", y = "Total DALYs per 100,000 people\n", title = "United States:  DALYs vs. Spending")+
  theme_economist()

The relationship between spending and DALYs in the United States shows that as spending increases, DALYs decrease until you reach about 8000 USD per capita (suggesting that at a certain point there is diminishing marginal utility in health care spending).

Linear regression models should give us more detailed information about what these plots suggest.

# create a linear regression model for Canada spending vs DALYs
modelcanada <- lm(canada$Total_DALYs_Rate~canada$Health_Spending)
summary(modelcanada)
## 
## Call:
## lm(formula = canada$Total_DALYs_Rate ~ canada$Health_Spending)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -527.05 -186.34  -14.57  134.29  959.48 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.316e+04  2.100e+02  110.25  < 2e-16 ***
## canada$Health_Spending -7.067e-01  5.506e-02  -12.84  1.7e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 363.9 on 18 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.9015, Adjusted R-squared:  0.896 
## F-statistic: 164.7 on 1 and 18 DF,  p-value: 1.696e-10

This model shows a negative linear relationship between health spending and DALYs in Canada, with a p-score of almost zero. The adjusted \(R^2\) score is .896, which means the model explains about 89.6 percent of the variability in the data.

# What does normalizing the data do to the model?
burden_of_disease_norm <- burden_of_disease %>% 
  mutate(Total_DALYs_Rate_norm = Total_DALYs_Rate/max(Total_DALYs_Rate,na.rm=TRUE)) %>%
  mutate(Health_Spending_norm = Health_Spending/max(Health_Spending,na.rm=TRUE))
burden <- burden_of_disease_norm[-c(38,55),]
# run the linear model with normalized data
fit2 <- lm(burden$Total_DALYs_Rate_norm ~ burden$Health_Spending_norm + burden$Entity)
summary(fit2)
## 
## Call:
## lm(formula = burden$Total_DALYs_Rate_norm ~ burden$Health_Spending_norm + 
##     burden$Entity)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.044286 -0.008593  0.002103  0.008192  0.042752 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  0.833171   0.006911 120.562  < 2e-16 ***
## burden$Health_Spending_norm -0.228651   0.015047 -15.196  < 2e-16 ***
## burden$EntityUnited Kingdom  0.027588   0.005943   4.643 2.24e-05 ***
## burden$EntityUnited States   0.226492   0.007397  30.618  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01793 on 54 degrees of freedom
##   (36 observations deleted due to missingness)
## Multiple R-squared:  0.9493, Adjusted R-squared:  0.9465 
## F-statistic: 336.8 on 3 and 54 DF,  p-value: < 2.2e-16

With normalized data, the model shows a negative slope for Canada, with positive slopes for the UK and US. This model has an adjusted \(R^2\) of .9465, meaning it explains about 94.65 percent of the variability in the data.

# plot residuals to check for normality, etc.
plot(fit2)

As the plot suggested, in the case of Canada, there is a statistically very strong relationship between healthcare spending and DALYs. The negative slope on health care spending suggests that as Canada spends more, they are getting a lower number of DALYs (years wasted on illness and premature death). In contrast, in the UK and the US, all other things being held constant, there is a slightly (and in the US case more significant) positive relationship between health expenditures per capita and DALYs. In addition, our adjusted \(R^2\) score indicates that this linear regression model explains 94.7% of the variability in the data.

Multiple Linear Regression

Now that we have established a relationship between health expenditure and DALYs, we can explore what other factors might have an impact on the quality of the health care system (as measured by the Total DALYs rate). To do this, we will create a multiple linear regression model adding in information on DALYs by age and DALYs by cause.

# Create a multiple linear regression model 

fit3 <- lm(burden$Total_DALYs_Rate ~ burden$Health_Spending + burden$Entity + burden$DALYs_NCDs + burden$`DALYs_<_5` + burden$`DALYs_15-49`+ burden$`DALYs_5-14` + burden$`DALYs_50-69` + burden$`DALYs_70+` + burden$`DALYs_HIV/AIDS_and_tuberculosis` +burden$DALYs_Accidents + burden$DALYs_Cardiovascular_Diseases + burden$DALYs_Chronic_Respiratory_Diseases + burden$DALYs_Common_Infectious_Diseases + burden$DALYs_tropical_diseases + burden$DALYs_Maternal_Disorders + burden$DALYs_Neonatal_Disorders + burden$DALYs_Conflict_and_Terrorism + burden$DALYs_Nutritional_Deficiencies + burden$DALYs_Other_Diseases + burden$DALYs_Neoplasms + burden$DALYs_Liver_Diseases + burden$DALYs_Digestive_Diseases + burden$DALYs_Neurological_Disorders + burden$DALYs_Mental_and_Substance_Disorders + burden$DALYs_Diabetes_Etc + burden$DALYs_Musculoskeletal_Disorders + burden$DALYs_Other_NCDs + burden$DALYs_Transport_Injuries + burden$DALYs_Nature + burden$`DALYs_Self-harm` + burden$DALYs_Interpersonal_Violence + burden$DALYs_Accidents)

summary(fit3)
## 
## Call:
## lm(formula = burden$Total_DALYs_Rate ~ burden$Health_Spending + 
##     burden$Entity + burden$DALYs_NCDs + burden$`DALYs_<_5` + 
##     burden$`DALYs_15-49` + burden$`DALYs_5-14` + burden$`DALYs_50-69` + 
##     burden$`DALYs_70+` + burden$`DALYs_HIV/AIDS_and_tuberculosis` + 
##     burden$DALYs_Accidents + burden$DALYs_Cardiovascular_Diseases + 
##     burden$DALYs_Chronic_Respiratory_Diseases + burden$DALYs_Common_Infectious_Diseases + 
##     burden$DALYs_tropical_diseases + burden$DALYs_Maternal_Disorders + 
##     burden$DALYs_Neonatal_Disorders + burden$DALYs_Conflict_and_Terrorism + 
##     burden$DALYs_Nutritional_Deficiencies + burden$DALYs_Other_Diseases + 
##     burden$DALYs_Neoplasms + burden$DALYs_Liver_Diseases + burden$DALYs_Digestive_Diseases + 
##     burden$DALYs_Neurological_Disorders + burden$DALYs_Mental_and_Substance_Disorders + 
##     burden$DALYs_Diabetes_Etc + burden$DALYs_Musculoskeletal_Disorders + 
##     burden$DALYs_Other_NCDs + burden$DALYs_Transport_Injuries + 
##     burden$DALYs_Nature + burden$`DALYs_Self-harm` + burden$DALYs_Interpersonal_Violence + 
##     burden$DALYs_Accidents)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.096 -28.281  -2.367  31.089 147.125 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  2.238e+04  3.961e+03   5.650
## burden$Health_Spending                      -3.818e-02  9.664e-02  -0.395
## burden$EntityUnited Kingdom                 -1.902e+04  6.090e+03  -3.122
## burden$EntityUnited States                   2.101e+04  3.724e+04   0.564
## burden$DALYs_NCDs                           -7.626e-03  3.807e-03  -2.003
## burden$`DALYs_<_5`                          -9.617e-03  8.364e-03  -1.150
## burden$`DALYs_15-49`                         6.189e-03  3.580e-03   1.729
## burden$`DALYs_5-14`                          2.890e-02  1.071e-02   2.699
## burden$`DALYs_50-69`                         5.293e-03  3.307e-03   1.600
## burden$`DALYs_70+`                           8.420e-03  3.217e-03   2.617
## burden$`DALYs_HIV/AIDS_and_tuberculosis`    -5.375e-03  3.291e-03  -1.633
## burden$DALYs_Accidents                      -2.925e-02  6.004e-03  -4.872
## burden$DALYs_Cardiovascular_Diseases        -1.374e-03  1.640e-03  -0.838
## burden$DALYs_Chronic_Respiratory_Diseases   -1.189e-02  6.348e-03  -1.873
## burden$DALYs_Common_Infectious_Diseases     -3.845e-03  4.768e-03  -0.806
## burden$DALYs_tropical_diseases               4.966e-01  1.532e-01   3.241
## burden$DALYs_Maternal_Disorders             -5.042e-02  3.067e-02  -1.644
## burden$DALYs_Neonatal_Disorders              1.334e-02  1.959e-02   0.681
## burden$DALYs_Conflict_and_Terrorism         -2.410e-03  3.476e-03  -0.693
## burden$DALYs_Nutritional_Deficiencies        1.532e-01  3.119e-02   4.912
## burden$DALYs_Other_Diseases                 -5.662e-02  1.463e-02  -3.871
## burden$DALYs_Neoplasms                       2.922e-03  2.192e-03   1.333
## burden$DALYs_Liver_Diseases                 -2.472e-02  3.318e-02  -0.745
## burden$DALYs_Digestive_Diseases              2.266e-02  2.299e-02   0.985
## burden$DALYs_Neurological_Disorders          2.152e-02  8.435e-03   2.551
## burden$DALYs_Mental_and_Substance_Disorders -1.082e-02  5.790e-03  -1.870
## burden$DALYs_Diabetes_Etc                   -3.415e-03  2.114e-03  -1.615
## burden$DALYs_Musculoskeletal_Disorders      -1.366e-02  3.763e-03  -3.629
## burden$DALYs_Other_NCDs                      2.878e-02  8.490e-03   3.390
## burden$DALYs_Transport_Injuries             -9.479e-03  6.758e-03  -1.403
## burden$DALYs_Nature                          2.313e-02  5.607e-03   4.126
## burden$`DALYs_Self-harm`                     8.997e-03  9.955e-03   0.904
## burden$DALYs_Interpersonal_Violence         -2.139e-03  4.311e-03  -0.496
##                                             Pr(>|t|)    
## (Intercept)                                 7.01e-06 ***
## burden$Health_Spending                      0.696145    
## burden$EntityUnited Kingdom                 0.004491 ** 
## burden$EntityUnited States                  0.577619    
## burden$DALYs_NCDs                           0.056144 .  
## burden$`DALYs_<_5`                          0.261110    
## burden$`DALYs_15-49`                        0.096169 .  
## burden$`DALYs_5-14`                         0.012295 *  
## burden$`DALYs_50-69`                        0.122090    
## burden$`DALYs_70+`                          0.014830 *  
## burden$`DALYs_HIV/AIDS_and_tuberculosis`    0.114905    
## burden$DALYs_Accidents                      5.19e-05 ***
## burden$DALYs_Cardiovascular_Diseases        0.410126    
## burden$DALYs_Chronic_Respiratory_Diseases   0.072857 .  
## burden$DALYs_Common_Infectious_Diseases     0.427695    
## burden$DALYs_tropical_diseases              0.003363 ** 
## burden$DALYs_Maternal_Disorders             0.112642    
## burden$DALYs_Neonatal_Disorders             0.501989    
## burden$DALYs_Conflict_and_Terrorism         0.494548    
## burden$DALYs_Nutritional_Deficiencies       4.68e-05 ***
## burden$DALYs_Other_Diseases                 0.000690 ***
## burden$DALYs_Neoplasms                      0.194475    
## burden$DALYs_Liver_Diseases                 0.463237    
## burden$DALYs_Digestive_Diseases             0.333848    
## burden$DALYs_Neurological_Disorders         0.017236 *  
## burden$DALYs_Mental_and_Substance_Disorders 0.073285 .  
## burden$DALYs_Diabetes_Etc                   0.118844    
## burden$DALYs_Musculoskeletal_Disorders      0.001275 ** 
## burden$DALYs_Other_NCDs                     0.002323 ** 
## burden$DALYs_Transport_Injuries             0.173018    
## burden$DALYs_Nature                         0.000358 ***
## burden$`DALYs_Self-harm`                    0.374739    
## burden$DALYs_Interpersonal_Violence         0.624183    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 72.64 on 25 degrees of freedom
##   (36 observations deleted due to missingness)
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9989 
## F-statistic:  1549 on 32 and 25 DF,  p-value: < 2.2e-16

This model has an adjusted \(R^2\) score of .9989, which means it explains about 99.9% of the data. The significance codes show that things like nutritional deficiencies and exposure to forces of nature are significant causes of DALYs.

# Plot residuals
plot(fit3)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

The data are normally distributed with no clear pattern to the residuals, so the model looks good from that perspective. There are a large number of points with high leverage, however. We will remove three of the points with the most leverage to see what the model looks like without them.

# remove those points from the dataset

newBurden <- burden[-c(68, 74, 79), ]

# Create a new multiple linear regression model 

fit12 <- lm(newBurden$Total_DALYs_Rate ~ newBurden$Health_Spending + newBurden$Entity + newBurden$DALYs_NCDs + newBurden$`DALYs_<_5` + newBurden$`DALYs_15-49`+ newBurden$`DALYs_5-14` + newBurden$`DALYs_50-69` + newBurden$`DALYs_70+` + newBurden$`DALYs_HIV/AIDS_and_tuberculosis` +newBurden$DALYs_Accidents + newBurden$DALYs_Cardiovascular_Diseases + newBurden$DALYs_Chronic_Respiratory_Diseases + newBurden$DALYs_Common_Infectious_Diseases + newBurden$DALYs_tropical_diseases + newBurden$DALYs_Maternal_Disorders + newBurden$DALYs_Neonatal_Disorders + newBurden$DALYs_Conflict_and_Terrorism + newBurden$DALYs_Nutritional_Deficiencies + newBurden$DALYs_Other_Diseases + newBurden$DALYs_Neoplasms + newBurden$DALYs_Liver_Diseases + newBurden$DALYs_Digestive_Diseases + newBurden$DALYs_Neurological_Disorders + newBurden$DALYs_Mental_and_Substance_Disorders + newBurden$DALYs_Diabetes_Etc + newBurden$DALYs_Musculoskeletal_Disorders + newBurden$DALYs_Other_NCDs + newBurden$DALYs_Transport_Injuries + newBurden$DALYs_Nature + newBurden$`DALYs_Self-harm` + newBurden$DALYs_Interpersonal_Violence + newBurden$DALYs_Accidents)

summary(fit12)
## 
## Call:
## lm(formula = newBurden$Total_DALYs_Rate ~ newBurden$Health_Spending + 
##     newBurden$Entity + newBurden$DALYs_NCDs + newBurden$`DALYs_<_5` + 
##     newBurden$`DALYs_15-49` + newBurden$`DALYs_5-14` + newBurden$`DALYs_50-69` + 
##     newBurden$`DALYs_70+` + newBurden$`DALYs_HIV/AIDS_and_tuberculosis` + 
##     newBurden$DALYs_Accidents + newBurden$DALYs_Cardiovascular_Diseases + 
##     newBurden$DALYs_Chronic_Respiratory_Diseases + newBurden$DALYs_Common_Infectious_Diseases + 
##     newBurden$DALYs_tropical_diseases + newBurden$DALYs_Maternal_Disorders + 
##     newBurden$DALYs_Neonatal_Disorders + newBurden$DALYs_Conflict_and_Terrorism + 
##     newBurden$DALYs_Nutritional_Deficiencies + newBurden$DALYs_Other_Diseases + 
##     newBurden$DALYs_Neoplasms + newBurden$DALYs_Liver_Diseases + 
##     newBurden$DALYs_Digestive_Diseases + newBurden$DALYs_Neurological_Disorders + 
##     newBurden$DALYs_Mental_and_Substance_Disorders + newBurden$DALYs_Diabetes_Etc + 
##     newBurden$DALYs_Musculoskeletal_Disorders + newBurden$DALYs_Other_NCDs + 
##     newBurden$DALYs_Transport_Injuries + newBurden$DALYs_Nature + 
##     newBurden$`DALYs_Self-harm` + newBurden$DALYs_Interpersonal_Violence + 
##     newBurden$DALYs_Accidents)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -85.914 -25.093  -7.121  31.547  98.692 
## 
## Coefficients:
##                                                  Estimate Std. Error t value
## (Intercept)                                     2.559e+04  4.668e+03   5.482
## newBurden$Health_Spending                      -6.774e-02  9.584e-02  -0.707
## newBurden$EntityUnited Kingdom                 -1.963e+04  5.783e+03  -3.394
## newBurden$EntityUnited States                   4.465e+04  4.053e+04   1.102
## newBurden$DALYs_NCDs                           -6.344e-03  3.788e-03  -1.675
## newBurden$`DALYs_<_5`                          -1.018e-02  8.516e-03  -1.195
## newBurden$`DALYs_15-49`                         4.680e-03  3.536e-03   1.323
## newBurden$`DALYs_5-14`                          2.662e-02  1.147e-02   2.321
## newBurden$`DALYs_50-69`                         4.654e-03  3.449e-03   1.349
## newBurden$`DALYs_70+`                           7.577e-03  3.768e-03   2.011
## newBurden$`DALYs_HIV/AIDS_and_tuberculosis`    -3.318e-03  3.274e-03  -1.014
## newBurden$DALYs_Accidents                      -3.243e-02  9.301e-03  -3.487
## newBurden$DALYs_Cardiovascular_Diseases        -2.493e-03  2.144e-03  -1.163
## newBurden$DALYs_Chronic_Respiratory_Diseases   -1.183e-02  6.659e-03  -1.777
## newBurden$DALYs_Common_Infectious_Diseases      5.031e-04  4.865e-03   0.103
## newBurden$DALYs_tropical_diseases               4.165e-01  1.675e-01   2.486
## newBurden$DALYs_Maternal_Disorders             -3.878e-02  2.984e-02  -1.300
## newBurden$DALYs_Neonatal_Disorders              1.417e-02  1.828e-02   0.775
## newBurden$DALYs_Conflict_and_Terrorism          7.533e-03  5.431e-03   1.387
## newBurden$DALYs_Nutritional_Deficiencies        1.565e-01  2.996e-02   5.223
## newBurden$DALYs_Other_Diseases                 -3.070e-02  1.655e-02  -1.854
## newBurden$DALYs_Neoplasms                       1.773e-03  2.017e-03   0.879
## newBurden$DALYs_Liver_Diseases                 -6.064e-02  3.346e-02  -1.812
## newBurden$DALYs_Digestive_Diseases              4.843e-02  2.334e-02   2.075
## newBurden$DALYs_Neurological_Disorders          2.234e-02  8.934e-03   2.500
## newBurden$DALYs_Mental_and_Substance_Disorders -1.952e-02  6.524e-03  -2.991
## newBurden$DALYs_Diabetes_Etc                   -3.316e-03  2.026e-03  -1.636
## newBurden$DALYs_Musculoskeletal_Disorders      -1.517e-02  4.413e-03  -3.438
## newBurden$DALYs_Other_NCDs                      3.393e-02  8.692e-03   3.903
## newBurden$DALYs_Transport_Injuries             -1.190e-02  7.184e-03  -1.656
## newBurden$DALYs_Nature                          3.155e-02  8.360e-03   3.774
## newBurden$`DALYs_Self-harm`                     1.457e-02  9.253e-03   1.574
## newBurden$DALYs_Interpersonal_Violence          7.986e-03  9.250e-03   0.863
##                                                Pr(>|t|)    
## (Intercept)                                    1.65e-05 ***
## newBurden$Health_Spending                      0.487124    
## newBurden$EntityUnited Kingdom                 0.002609 ** 
## newBurden$EntityUnited States                  0.282496    
## newBurden$DALYs_NCDs                           0.108181    
## newBurden$`DALYs_<_5`                          0.244725    
## newBurden$`DALYs_15-49`                        0.199271    
## newBurden$`DALYs_5-14`                         0.029931 *  
## newBurden$`DALYs_50-69`                        0.190984    
## newBurden$`DALYs_70+`                          0.056755 .  
## newBurden$`DALYs_HIV/AIDS_and_tuberculosis`    0.321807    
## newBurden$DALYs_Accidents                      0.002090 ** 
## newBurden$DALYs_Cardiovascular_Diseases        0.257433    
## newBurden$DALYs_Chronic_Respiratory_Diseases   0.089383 .  
## newBurden$DALYs_Common_Infectious_Diseases     0.918570    
## newBurden$DALYs_tropical_diseases              0.020984 *  
## newBurden$DALYs_Maternal_Disorders             0.207142    
## newBurden$DALYs_Neonatal_Disorders             0.446608    
## newBurden$DALYs_Conflict_and_Terrorism         0.179331    
## newBurden$DALYs_Nutritional_Deficiencies       3.08e-05 ***
## newBurden$DALYs_Other_Diseases                 0.077143 .  
## newBurden$DALYs_Neoplasms                      0.388828    
## newBurden$DALYs_Liver_Diseases                 0.083607 .  
## newBurden$DALYs_Digestive_Diseases             0.049912 *  
## newBurden$DALYs_Neurological_Disorders         0.020355 *  
## newBurden$DALYs_Mental_and_Substance_Disorders 0.006730 ** 
## newBurden$DALYs_Diabetes_Etc                   0.116017    
## newBurden$DALYs_Musculoskeletal_Disorders      0.002350 ** 
## newBurden$DALYs_Other_NCDs                     0.000763 ***
## newBurden$DALYs_Transport_Injuries             0.111871    
## newBurden$DALYs_Nature                         0.001045 ** 
## newBurden$`DALYs_Self-harm`                    0.129677    
## newBurden$DALYs_Interpersonal_Violence         0.397254    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 64.76 on 22 degrees of freedom
##   (36 observations deleted due to missingness)
## Multiple R-squared:  0.9996, Adjusted R-squared:  0.999 
## F-statistic:  1686 on 32 and 22 DF,  p-value: < 2.2e-16

This model is slightly better than the model with the influential residuals (adjusted \(R^2\) of .999 and opposed to .989).

# plot residuals

plot(fit12)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

Now according to the residual plots, there are still influential outliers, but with an adjusted \(R^2\) of .999, and a normal looking qqplot, there does not seem to be much point in further refining the model.

Exploring DALYs by Country

Looking at the main cause of DALYs by country may help elucidate the differences in total DALYs rate between the three countries. The simplest way to do this will be to look at the groups of non-communicable diseases, communicable diseases, and injuries.

Canada:

# Create a multiple linear regression model 

fit6 <- lm(canada$Total_DALYs_Rate ~ canada$DALYs_NCDs + canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + canada$DALYs_Injuries)

summary(fit6)
## 
## Call:
## lm(formula = canada$Total_DALYs_Rate ~ canada$DALYs_NCDs + canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + 
##     canada$DALYs_Injuries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -624.53 -205.16   65.67  226.34  568.02 
## 
## Coefficients:
##                                                                    Estimate
## (Intercept)                                                       2.761e+04
## canada$DALYs_NCDs                                                -2.879e-03
## canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  7.207e-03
## canada$DALYs_Injuries                                             1.240e-02
##                                                                  Std. Error
## (Intercept)                                                       1.779e+03
## canada$DALYs_NCDs                                                 1.525e-04
## canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  4.466e-03
## canada$DALYs_Injuries                                             4.154e-03
##                                                                  t value
## (Intercept)                                                       15.520
## canada$DALYs_NCDs                                                -18.879
## canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases   1.614
## canada$DALYs_Injuries                                              2.987
##                                                                  Pr(>|t|)    
## (Intercept)                                                      5.16e-14 ***
## canada$DALYs_NCDs                                                6.60e-16 ***
## canada$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  0.11965    
## canada$DALYs_Injuries                                             0.00641 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 315.1 on 24 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9608, Adjusted R-squared:  0.9559 
## F-statistic:   196 on 3 and 24 DF,  p-value: < 2.2e-16

The Canadian model, with an adjusted \(R^2\) of .9559, shows that non-communicable diseases and injuries have the most significance in shaping the model.

United Kingdom:

# Create a multiple linear regression model 

fit7 <- lm(united_kingdom$Total_DALYs_Rate ~ united_kingdom$DALYs_NCDs + united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + united_kingdom$DALYs_Injuries)

summary(fit7)
## 
## Call:
## lm(formula = united_kingdom$Total_DALYs_Rate ~ united_kingdom$DALYs_NCDs + 
##     united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + 
##     united_kingdom$DALYs_Injuries)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1616.54  -630.52   -28.27   763.59  1194.82 
## 
## Coefficients:
##                                                                            Estimate
## (Intercept)                                                              -9.582e+03
## united_kingdom$DALYs_NCDs                                                 4.587e-04
## united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  1.747e-02
## united_kingdom$DALYs_Injuries                                             3.949e-03
##                                                                          Std. Error
## (Intercept)                                                               7.698e+03
## united_kingdom$DALYs_NCDs                                                 1.265e-03
## united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  3.225e-03
## united_kingdom$DALYs_Injuries                                             1.360e-02
##                                                                          t value
## (Intercept)                                                               -1.245
## united_kingdom$DALYs_NCDs                                                  0.363
## united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases   5.416
## united_kingdom$DALYs_Injuries                                              0.290
##                                                                          Pr(>|t|)
## (Intercept)                                                                 0.225
## united_kingdom$DALYs_NCDs                                                   0.720
## united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases 1.45e-05
## united_kingdom$DALYs_Injuries                                               0.774
##                                                                             
## (Intercept)                                                                 
## united_kingdom$DALYs_NCDs                                                   
## united_kingdom$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases ***
## united_kingdom$DALYs_Injuries                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 851.9 on 24 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.8017, Adjusted R-squared:  0.7769 
## F-statistic: 32.34 on 3 and 24 DF,  p-value: 1.347e-08

The United Kingdom model, with an adjusted \(R^2\) of only .7769, shows that communicable diseases have the most significance in shaping the model. The lower adjusted \(R^2\) score suggests this model is not the best explanation for total DALYs rate in the United Kingdom

United States:

# Create a multiple linear regression model 

fit8 <- lm(united_states$Total_DALYs_Rate ~ united_states$DALYs_NCDs + united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + united_states$DALYs_Injuries)

summary(fit8)
## 
## Call:
## lm(formula = united_states$Total_DALYs_Rate ~ united_states$DALYs_NCDs + 
##     united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases + 
##     united_states$DALYs_Injuries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -358.76 -220.20    2.54  207.73  544.05 
## 
## Coefficients:
##                                                                           Estimate
## (Intercept)                                                              2.279e+04
## united_states$DALYs_NCDs                                                -1.465e-04
## united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases -2.382e-04
## united_states$DALYs_Injuries                                             1.480e-03
##                                                                         Std. Error
## (Intercept)                                                              1.695e+03
## united_states$DALYs_NCDs                                                 1.481e-05
## united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  3.022e-04
## united_states$DALYs_Injuries                                             2.769e-04
##                                                                         t value
## (Intercept)                                                              13.445
## united_states$DALYs_NCDs                                                 -9.889
## united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases  -0.788
## united_states$DALYs_Injuries                                              5.346
##                                                                         Pr(>|t|)
## (Intercept)                                                             1.15e-12
## united_states$DALYs_NCDs                                                6.11e-10
## united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases    0.438
## united_states$DALYs_Injuries                                            1.74e-05
##                                                                            
## (Intercept)                                                             ***
## united_states$DALYs_NCDs                                                ***
## united_states$DALYs_Communicable_Maternal_NeoNatal_Nutritional_Diseases    
## united_states$DALYs_Injuries                                            ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 277.4 on 24 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9605, Adjusted R-squared:  0.9555 
## F-statistic: 194.4 on 3 and 24 DF,  p-value: < 2.2e-16

The United States model, with an adjusted \(R^2\) of .9555, shows that non-communicable diseases and injuries have the most significance in shaping the model.

These three models all show statistically significant results. In Canada, non-communicable diseases detract from the overall DALYs, but injuries contribute to them. In the United Kingdom, communicable, maternal, neo-natal, and nutritional diseases contribute significantly to the total DALYs. In the United States, like in Canada, non-communicable diseases detract from the overall DALYs, but injuries contribute to them.

We can also examine the impact of age in the three countries to see if that explains the difference in DALYs.

Canada:

# create a multiple linear regression model by age
fit9 <- lm(canada$Total_DALYs_Rate ~ canada$`DALYs_<_5` + canada$`DALYs_5-14` + canada$`DALYs_15-49` + canada$`DALYs_50-69` + canada$`DALYs_70+`)
summary(fit9)
## 
## Call:
## lm(formula = canada$Total_DALYs_Rate ~ canada$`DALYs_<_5` + canada$`DALYs_5-14` + 
##     canada$`DALYs_15-49` + canada$`DALYs_50-69` + canada$`DALYs_70+`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -226.78  -39.23   12.75   66.09  170.39 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           7.441e+03  2.904e+03   2.562  0.01778 *  
## canada$`DALYs_<_5`    1.064e-02  2.995e-03   3.553  0.00178 ** 
## canada$`DALYs_5-14`   6.381e-02  8.912e-03   7.160 3.54e-07 ***
## canada$`DALYs_15-49`  1.037e-03  9.332e-04   1.111  0.27843    
## canada$`DALYs_50-69`  5.016e-04  8.050e-04   0.623  0.53963    
## canada$`DALYs_70+`   -2.773e-03  1.010e-03  -2.744  0.01184 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 103.5 on 22 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9961, Adjusted R-squared:  0.9952 
## F-statistic:  1128 on 5 and 22 DF,  p-value: < 2.2e-16

The Canada model, with an adjusted \(R^2\) of .9952, shows that DALYs among those under 14 years old have the most significance in shaping the model.

United Kingdom:

# create a multiple linear regression model by age
fit10 <- lm(united_kingdom$Total_DALYs_Rate ~ united_kingdom$`DALYs_<_5` + united_kingdom$`DALYs_5-14` + united_kingdom$`DALYs_15-49` + united_kingdom$`DALYs_50-69` + united_kingdom$`DALYs_70+`)
summary(fit10)
## 
## Call:
## lm(formula = united_kingdom$Total_DALYs_Rate ~ united_kingdom$`DALYs_<_5` + 
##     united_kingdom$`DALYs_5-14` + united_kingdom$`DALYs_15-49` + 
##     united_kingdom$`DALYs_50-69` + united_kingdom$`DALYs_70+`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -269.66  -66.36  -28.39   60.63  245.45 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.449e+02  5.207e+03   0.105   0.9176    
## united_kingdom$`DALYs_<_5`    1.268e-02  9.976e-04  12.708 1.31e-11 ***
## united_kingdom$`DALYs_5-14`   4.321e-02  6.786e-03   6.368 2.08e-06 ***
## united_kingdom$`DALYs_15-49` -5.617e-05  6.567e-04  -0.086   0.9326    
## united_kingdom$`DALYs_50-69` -1.068e-03  3.947e-04  -2.706   0.0129 *  
## united_kingdom$`DALYs_70+`    4.779e-04  3.924e-04   1.218   0.2361    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 124 on 22 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9961, Adjusted R-squared:  0.9953 
## F-statistic:  1138 on 5 and 22 DF,  p-value: < 2.2e-16

The United Kingdom model, with an adjusted \(R^2\) of .9953, shows that DALYs among those under 14 years old have the most significance in shaping the model.

United States:

# create a multiple linear regression model by age
fit11 <- lm(united_states$Total_DALYs_Rate ~ united_states$`DALYs_<_5` + united_states$`DALYs_5-14` + united_states$`DALYs_15-49` + united_states$`DALYs_50-69` + united_states$`DALYs_70+`)
summary(fit11)
## 
## Call:
## lm(formula = united_states$Total_DALYs_Rate ~ united_states$`DALYs_<_5` + 
##     united_states$`DALYs_5-14` + united_states$`DALYs_15-49` + 
##     united_states$`DALYs_50-69` + united_states$`DALYs_70+`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -508.07 -218.45    8.28  183.96  493.77 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -1.157e+04  5.240e+03  -2.208   0.0379 *  
## united_states$`DALYs_<_5`    2.983e-03  3.654e-04   8.164 4.21e-08 ***
## united_states$`DALYs_5-14`   1.207e-02  2.427e-03   4.972 5.63e-05 ***
## united_states$`DALYs_15-49` -1.756e-04  1.219e-04  -1.441   0.1637    
## united_states$`DALYs_50-69`  1.250e-04  5.946e-05   2.101   0.0473 *  
## united_states$`DALYs_70+`   -2.104e-05  1.666e-04  -0.126   0.9006    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 286.5 on 22 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9613, Adjusted R-squared:  0.9526 
## F-statistic: 109.4 on 5 and 22 DF,  p-value: 8.804e-15

The Canada model, with an adjusted \(R^2\) of .9526, shows that DALYs among those under 14 years old have the most significance in shaping the model.

Disease burden by age in all three countries shows the greatest contribution to disease burden in the 14 and under age groups. This reflects the fact that this linear model is based upon already calculated statistics. As a result, it does not produce an equation to explain DALYs and age, but it does show which age groups are more influential in determining the slope of the DALYs rate curve.

The similarity in models among the three countries indicates that differences in DALYs among age groups does not explain the difference in total DALYs rate among the three countries.

Conclusion

The data analyzed above allows us to come to several conclusions and to posit some hypotheses about the how the outcome of the U.S. healthcare system compares with other types of systems.

Observations

Despite spending much more on health care, the United States has a significantly higher DALYs rate than the United Kingdom and Canada.

A comparison of the rate of DALYs per 100,000 people over time in each of the three countries shows an improving trend (with a slight hiccup in the two most recent data years), but the United States’ numbers are consistently higher than the UK’s and Canada’s. This observation is borne out by an examination of the summary statistics for DALYs rate in each of the countries. The United States’ mean DALY rate is 25283, the minimum is 23651, and the maximum is 27651 per 100,000 people per year. In contrast, Canada’s and the UK’s are lower. Canada’s mean is 20991 and the UK’s mean is 22285 per 100,000 people per year. Their minimums and maximums are also lower than the United States’. An Analysis of Variance (ANOVA) test established that there is a statistically significant difference with a p-value of 4.75e-16. A boxplot of the results also showed that there was very little overlap in the data between the United States and the other two countries.

The analysis also showed that the United States, Canada, and the United Kingdom spend more on health care per capita in order to obtain that decreasing DALY rate. The United States, however, spends consistently more than the other two countries. The United States’ mean annual healthcare spending per capita is 6485 USD, the minimum is 3788 USD, and the maximum is 9403 USD. In contrast, Canada spends a mean annual 3517 USD per capita and the UK spends only a mean annual 2759 USD per capita. As with the DALYs, the UK’s and Canada’s minimums and maximums are also lower than the United States’. An ANOVA test showed that these differences were statistically significant with a p-value of .00000000016.

An exploration of the distribution of the causes of the DALYs and the ages contributing to the DALYs did not yield any surprises. As expected in rich, developed countries, the leading causes of mortality and morbidity were non-communicable diseases. Similarly, the young in all three countries seemed to suffer fewer DALYs.

Because DALYs are already derived statistics and not raw data, it is impossible to use them to create explanatory models. That said, multiple linear regression can give an indication of what factors are having the biggest impact on the total DALY rate. In Canada and the United States, non-communicable diseases and injuries have the biggest impact on the total DALY rate. In the United Kingdom the biggest effect is had by communicable, maternal, neo-natal, and nutritional diseases. In all three countries, young people had the greatest impact on DALY rate.

Implications

Money (in the form of health care spending) does not seem to be the key to improving health outcomes. A linear regression analysis suggests that as Canada spends more on health care, they have a better outcome of fewer DALYs. In contrast, and surprisingly, as the United States and United Kingdom spend more they actually have more DALYs!

What does that mean? It is difficult to say based upon these observational data. There could be lurking variables that explain the different health outcomes in the three countries. For example, people in the UK or Canada may get more exercise than their American counterparts, or Americans could have other habits (smoking? driving?) that negatively affect their health. We also do not know what kind of health expenses are counted in the data. Do people in the United States seek health care for conditions that people in other countries do not, and are they seeking treatment for things that do not effect morbidity and mortality (like cosmetic surgery)? Does the United States’ health system treat conditions that other systems do not try to treat (because of things like cost and rarity of the disease)?

These data are fascinating and especially valuable for descriptive statistics. Because they are already derived statistics, however, they make explorations of relationships between variables more difficult. It would be nice to know things like the ages of people who become seriously ill and/or die; the causes of those deaths and illnesses; and the amount being spent per sector of health care. The methodology for deriving the DALYs also raises unanswered questions. Relying upon published articles for the data probably leaves out a great deal of information on healthy people. (I do not imagine a great many medical journals publish articles on healthy people who stay healthy.) They do, however, provide a useful starting point for exploring the topic and developing more questions to answer in order to determine how the outcome of the U.S. health care system compares with other types of systems.

Bibliography