HCAHPS Patient Survey Results by State

HCAHPS are patient surveys conducted independently after patients are hospitalized. In this dataset, they are reported by state. I’ll start by converting the .csv to dataframe. I also added a command to change “Not Available” to NA instead and ensure that the percent column is numeric. I also removed the three leftmost columns as these contained no new information – survey results are from the same timeframe between August 2018 through March of 2019.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)

HCAHPS_orig <- read.csv("https://raw.githubusercontent.com/hillt5/DATA607_Project2/master/Patient_survey__HCAHPS__-_State.csv") %>%
  na_if("Not Available")
HCAHPS_orig$HCAHPS.Answer.Percent <- as.numeric(HCAHPS_orig$HCAHPS.Answer.Percent)
HCAHPS_orig <- HCAHPS_orig[-c(6,7,8,9)]
head(HCAHPS_orig)
##   State
## 1    AK
## 2    AK
## 3    AK
## 4    AK
## 5    AK
## 6    AK
##                                                                                          HCAHPS.Question
## 1                                     Patients who reported that their nurses "Always" communicated well
## 2                       Patients who reported that their nurses "Sometimes" or "Never" communicated well
## 3                                    Patients who reported that their nurses "Usually" communicated well
## 4                Patients who reported that their nurses "Always" treated them with courtesy and respect
## 5 Patients who reported that their nurses "Sometimes" or "Never"  treated them with courtesy and respect
## 6              Patients who reported that their nurses "Usually"  treated them with courtesy and respect
##      HCAHPS.Measure.ID
## 1         H_COMP_1_A_P
## 2        H_COMP_1_SN_P
## 3         H_COMP_1_U_P
## 4  H_NURSE_RESPECT_A_P
## 5 H_NURSE_RESPECT_SN_P
## 6  H_NURSE_RESPECT_U_P
##                                               HCAHPS.Answer.Description
## 1                                     Nurses "always" communicated well
## 2                       Nurses "sometimes" or "never" communicated well
## 3                                    Nurses "usually" communicated well
## 4               Nurses "always" treated them with courtesy and  respect
## 5 Nurses "sometimes" or "never" treated them with courtesy and  respect
## 6             Nurses "usually"  treated them with courtesy and  respect
##   HCAHPS.Answer.Percent
## 1                    77
## 2                    34
## 3                     9
## 4                    85
## 5                    12
## 6                     4

As expected, results are reported by state/territory, then by ordinal response to HCAHPS question - ‘always, usually, sometimes-never’ or ‘Strongly agree, agree…’- in addition to answer percent. There is also a column with measure ID’s, which appear to characterize the question being asked in addition to the response. This column may be important for comparing reponses.

colnames((HCAHPS_orig))
## [1] "State"                     "HCAHPS.Question"          
## [3] "HCAHPS.Measure.ID"         "HCAHPS.Answer.Description"
## [5] "HCAHPS.Answer.Percent"
sort(unique(HCAHPS_orig$HCAHPS.Measure.ID))
##  [1] H_BATH_HELP_A_P       H_BATH_HELP_SN_P      H_BATH_HELP_U_P      
##  [4] H_CALL_BUTTON_A_P     H_CALL_BUTTON_SN_P    H_CALL_BUTTON_U_P    
##  [7] H_CLEAN_HSP_A_P       H_CLEAN_HSP_SN_P      H_CLEAN_HSP_U_P      
## [10] H_COMP_1_A_P          H_COMP_1_SN_P         H_COMP_1_U_P         
## [13] H_COMP_2_A_P          H_COMP_2_SN_P         H_COMP_2_U_P         
## [16] H_COMP_3_A_P          H_COMP_3_SN_P         H_COMP_3_U_P         
## [19] H_COMP_5_A_P          H_COMP_5_SN_P         H_COMP_5_U_P         
## [22] H_COMP_6_N_P          H_COMP_6_Y_P          H_COMP_7_A           
## [25] H_COMP_7_D_SD         H_COMP_7_SA           H_CT_MED_A           
## [28] H_CT_MED_D_SD         H_CT_MED_SA           H_CT_PREFER_A        
## [31] H_CT_PREFER_D_SD      H_CT_PREFER_SA        H_CT_UNDER_A         
## [34] H_CT_UNDER_D_SD       H_CT_UNDER_SA         H_DISCH_HELP_N_P     
## [37] H_DISCH_HELP_Y_P      H_DOCTOR_EXPLAIN_A_P  H_DOCTOR_EXPLAIN_SN_P
## [40] H_DOCTOR_EXPLAIN_U_P  H_DOCTOR_LISTEN_A_P   H_DOCTOR_LISTEN_SN_P 
## [43] H_DOCTOR_LISTEN_U_P   H_DOCTOR_RESPECT_A_P  H_DOCTOR_RESPECT_SN_P
## [46] H_DOCTOR_RESPECT_U_P  H_HSP_RATING_0_6      H_HSP_RATING_7_8     
## [49] H_HSP_RATING_9_10     H_MED_FOR_A_P         H_MED_FOR_SN_P       
## [52] H_MED_FOR_U_P         H_NURSE_EXPLAIN_A_P   H_NURSE_EXPLAIN_SN_P 
## [55] H_NURSE_EXPLAIN_U_P   H_NURSE_LISTEN_A_P    H_NURSE_LISTEN_SN_P  
## [58] H_NURSE_LISTEN_U_P    H_NURSE_RESPECT_A_P   H_NURSE_RESPECT_SN_P 
## [61] H_NURSE_RESPECT_U_P   H_QUIET_HSP_A_P       H_QUIET_HSP_SN_P     
## [64] H_QUIET_HSP_U_P       H_RECMND_DN           H_RECMND_DY          
## [67] H_RECMND_PY           H_SIDE_EFFECTS_A_P    H_SIDE_EFFECTS_SN_P  
## [70] H_SIDE_EFFECTS_U_P    H_SYMPTOMS_N_P        H_SYMPTOMS_Y_P       
## 72 Levels: H_BATH_HELP_A_P H_BATH_HELP_SN_P ... H_SYMPTOMS_Y_P

After sorting through all of the values for ID, there are 72 unique entries, with 3 values per question for the ratings. The ‘COMP’ values also have a number attached to them which appear to specify the staff member in question - nurse, doctor, staff. A few variables I’m particularly interested in looking at are ‘clean’ and ‘quiet’ responses, as I can say anecdotally this is something that every hospital I’ve worked at has monitored. There are also some yes/no questions about staff members communicating specific things, like information about side effects, what to expect after discharge, and whether the patient would recommend the hospital. Finally, there is a 1-10 rating of the hospital overall.

Looking at Favorable Ratings by State

I’ll start by looking at favorability ratings by state. I’ll start by looking at 1-10 rating, with good favorability being scores of 9-10.

HCAHPS_state_fav910 <- drop_na(HCAHPS_orig) %>%
  select(State, HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  filter(HCAHPS.Measure.ID == "H_HSP_RATING_9_10")%>%
  spread(HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  group_by(State) %>%
  arrange(desc(H_HSP_RATING_9_10))

head(HCAHPS_state_fav910,10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_HSP_RATING_9_10
##    <fct>             <dbl>
##  1 KS                   77
##  2 NE                   77
##  3 SD                   77
##  4 IA                   76
##  5 MN                   76
##  6 WI                   76
##  7 ID                   75
##  8 LA                   75
##  9 ND                   75
## 10 CO                   74
tail(HCAHPS_state_fav910)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_HSP_RATING_9_10
##   <fct>             <dbl>
## 1 NV                   65
## 2 MD                   63
## 3 NJ                   63
## 4 NY                   63
## 5 PR                   60
## 6 DC                   58
summary(HCAHPS_state_fav910$H_HSP_RATING_9_10)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   58.00   69.00   70.00   70.25   73.00   77.00
hist(HCAHPS_state_fav910$H_HSP_RATING_9_10)

States/territories reporting favorability data had at least of 58% of their hospitals rated as 9 or 10. Almost all of the top performers are the Midwestern states, with one exception of Louisiana. The high favorability dips in more densely populated areas, like Washington DC/ Maryland and New York/New Jersey. Puerto Rico is also noteworthy as a territory who may be still recovering from a series of devastating hurricaines in 2017. Overall, results indicate that over half of patients surveyed are giving the hospitals nearly perfect ratings. To verify this, I’m going to see if people would recommend the hospital, and also see if their are inverse patterns for poor ratings - less than 7/10 on a scale of 10.

HCAHPS_state_fav06 <- drop_na(HCAHPS_orig) %>%
  select(State, HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  filter(HCAHPS.Measure.ID == "H_HSP_RATING_0_6")%>%
  spread(HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  group_by(State) %>%
  arrange(desc(H_HSP_RATING_0_6))

tail(HCAHPS_state_fav06, 10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_HSP_RATING_0_6
##    <fct>            <dbl>
##  1 WI                  45
##  2 IA                  34
##  3 DC                   7
##  4 PR                   5
##  5 NJ                   3
##  6 CT                   2
##  7 FL                   2
##  8 MD                   2
##  9 NV                   2
## 10 NY                   2
head(HCAHPS_state_fav06)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_HSP_RATING_0_6
##   <fct>            <dbl>
## 1 AZ                  89
## 2 CA                  89
## 3 GA                  89
## 4 NM                  89
## 5 AL                  78
## 6 DE                  78
summary(HCAHPS_state_fav06$H_HSP_RATING_0_6)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0    45.0    67.0    57.6    78.0    89.0
hist(HCAHPS_state_fav06$H_HSP_RATING_0_6)

Conversely, there appears to be a lot more variability when scores 0-6 are compared on a state-by-state basis. Looking at the histogram, a handful of states tend to have low 0-6 ratings. Looking at the tail, these are generally more densely populated states, in addition to Florida and Nevada, two states attractive to retirees. States with persistently low hospital ratings include Southwestern states Arizona, New Mexico, and California, as well as Southern states Alabama and Georgia. Southwestern states like NM and AZ in particular have higher Native American populations serviced by their own federal health service - IHS or Indian Health Service. As a final step to gauging favorability, I will omit the cluster of low scores from the 0-6 ratings, and perform the same summaries on another revealing question – whether the patient would NOT recommend the hospital.

HCAHPS_state_fav06_norm <- HCAHPS_state_fav06 %>%
  filter(H_HSP_RATING_0_6 > 10)
  
tail(HCAHPS_state_fav06_norm, 10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_HSP_RATING_0_6
##    <fct>            <dbl>
##  1 OR                  56
##  2 VT                  56
##  3 KS                  45
##  4 MN                  45
##  5 ND                  45
##  6 NE                  45
##  7 SD                  45
##  8 UT                  45
##  9 WI                  45
## 10 IA                  34
summary(HCAHPS_state_fav06_norm$H_HSP_RATING_0_6)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    34.0    56.0    67.0    67.5    78.0    89.0
hist(HCAHPS_state_fav06_norm$H_HSP_RATING_0_6)

Omitting the outliers, this generally matches up with the highest-rated states mentioned previously - generally Midwest and Western states.

HCAHPS_state_bad_rec <- drop_na(HCAHPS_orig) %>%
  select(State, HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  filter(HCAHPS.Measure.ID == "H_RECMND_DN")%>%
  spread(HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  group_by(State) %>%
  arrange(H_RECMND_DN)

head(HCAHPS_state_bad_rec,10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_RECMND_DN
##    <fct>       <dbl>
##  1 DC              3
##  2 IA             12
##  3 NE             12
##  4 AK             23
##  5 ID             23
##  6 KS             23
##  7 MN             23
##  8 ND             23
##  9 SD             23
## 10 WI             23
tail(HCAHPS_state_bad_rec)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_RECMND_DN
##   <fct>       <dbl>
## 1 FL             67
## 2 MD             67
## 3 NJ             67
## 4 NY             67
## 5 NV             89
## 6 PR             89
summary(HCAHPS_state_bad_rec$H_RECMND_DN)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   34.00   45.00   42.29   56.00   89.00
hist(HCAHPS_state_bad_rec$H_RECMND_DN)

HCAHPS_state_bad_rec_norm <- HCAHPS_state_bad_rec %>%
  filter(H_RECMND_DN > 10 && H_RECMND_DN < 89)
  
head(HCAHPS_state_bad_rec_norm,10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_RECMND_DN
##    <fct>       <dbl>
##  1 IA             12
##  2 NE             12
##  3 AK             23
##  4 ID             23
##  5 KS             23
##  6 MN             23
##  7 ND             23
##  8 SD             23
##  9 WI             23
## 10 WY             23
tail(HCAHPS_state_bad_rec_norm)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_RECMND_DN
##   <fct>       <dbl>
## 1 SC             56
## 2 TN             56
## 3 FL             67
## 4 MD             67
## 5 NJ             67
## 6 NY             67
summary(HCAHPS_state_bad_rec_norm$H_RECMND_DN)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   34.00   45.00   41.18   45.00   67.00
hist(HCAHPS_state_bad_rec_norm$H_RECMND_DN)

Looking at the negative recommendation survey question, this mostly agrees with states having low favorability ratings. There were a few very high and very low values that I omitted, but it didn’t meaningfully change the other parameters. Overall, I feel safe to conclude that patients think favorably of more rural, Midwestern states and less favorably of urban hospitals. However, there also exist other important exceptions hinted at in the ratings, including states with large retiree populations, native populations, or areas recently affected by natural disasters.

Search for an Explanatory Variable: Cleanliness and Noisiness of Hospital Stay

As mentioned earlier, two explanatory variables that may underly favorability are the cleanliness and noisiness of the hospital environment. These are particularly interesting factors as they are tied toward the performance of support staff such as custodians and nursing assistants to respond to the non-medical issues of a patient’s hospital course. I also suspect they are tied to the turnover of a hospital - busier hospitals are louder spaces and require increased attention to keeping floors clean, call buttons answered, etc.

I will start by looking at cleanliness results:

HCAHPS_state_wide <- drop_na(HCAHPS_orig) %>%
  select(State, HCAHPS.Measure.ID, HCAHPS.Answer.Percent) %>%
  spread(HCAHPS.Measure.ID, HCAHPS.Answer.Percent)
         
HCAHPS_state_cleanliness <- HCAHPS_state_wide %>%
  select(State, H_CLEAN_HSP_A_P, H_CLEAN_HSP_U_P) %>%
  group_by(State) %>%
  mutate(Pct_Always_Usually_Clean = as.numeric(sum(H_CLEAN_HSP_A_P, H_CLEAN_HSP_U_P))) %>%
  arrange(desc(Pct_Always_Usually_Clean))

HCAHPS_state_cleanliness_bad <- HCAHPS_state_wide %>%
  select(State, H_CLEAN_HSP_SN_P) %>%
  group_by(State) %>%
  arrange(desc(H_CLEAN_HSP_SN_P))

head(HCAHPS_state_cleanliness,10)
## # A tibble: 10 x 4
## # Groups:   State [10]
##    State H_CLEAN_HSP_A_P H_CLEAN_HSP_U_P Pct_Always_Usually_Clean
##    <fct>           <dbl>           <dbl>                    <dbl>
##  1 IA                 81               6                       87
##  2 NE                 82               5                       87
##  3 ID                 79               7                       86
##  4 KS                 80               6                       86
##  5 MN                 79               7                       86
##  6 SD                 80               6                       86
##  7 WI                 79               7                       86
##  8 ME                 79               6                       85
##  9 UT                 75              10                       85
## 10 HI                 76               8                       84
tail(HCAHPS_state_cleanliness)
## # A tibble: 6 x 4
## # Groups:   State [6]
##   State H_CLEAN_HSP_A_P H_CLEAN_HSP_U_P Pct_Always_Usually_Clean
##   <fct>           <dbl>           <dbl>                    <dbl>
## 1 NC                 71              10                       81
## 2 NJ                 68              13                       81
## 3 NV                 72               9                       81
## 4 FL                 69              11                       80
## 5 SC                 70              10                       80
## 6 DC                 62              13                       75
summary(HCAHPS_state_cleanliness$Pct_Always_Usually_Clean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   75.00   82.00   83.00   82.88   84.00   87.00
hist(HCAHPS_state_cleanliness$Pct_Always_Usually_Clean)

head(HCAHPS_state_cleanliness_bad,10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_CLEAN_HSP_SN_P
##    <fct>            <dbl>
##  1 AZ                  89
##  2 DE                  89
##  3 GA                  89
##  4 NC                  89
##  5 NV                  89
##  6 NY                  89
##  7 AL                  78
##  8 CA                  78
##  9 CT                  78
## 10 KY                  78
tail(HCAHPS_state_cleanliness_bad)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_CLEAN_HSP_SN_P
##   <fct>            <dbl>
## 1 NE                  34
## 2 DC                   7
## 3 FL                   2
## 4 MD                   2
## 5 NJ                   2
## 6 SC                   2

Scores for cleanliness are overall very high with a minimum 75% of surveys reporting usually or always clean environment. The results are highly concentrated between 82-84%, which encompasses half of the states.

HCAHPS_state_noisiness <- HCAHPS_state_wide %>%
  select(State, H_QUIET_HSP_A_P, H_QUIET_HSP_U_P) %>%
  group_by(State) %>%
  mutate(Pct_Always_Usually_Quiet = as.numeric(sum(H_QUIET_HSP_A_P, H_QUIET_HSP_U_P))) %>%
  arrange(desc(Pct_Always_Usually_Quiet))

HCAHPS_state_noisiness_bad <- HCAHPS_state_wide %>%
  select(State, H_QUIET_HSP_SN_P) %>%
  group_by(State) %>%
  arrange(desc(H_QUIET_HSP_SN_P))

head(HCAHPS_state_noisiness,10)
## # A tibble: 10 x 4
## # Groups:   State [10]
##    State H_QUIET_HSP_A_P H_QUIET_HSP_U_P Pct_Always_Usually_Quiet
##    <fct>           <dbl>           <dbl>                    <dbl>
##  1 ND                 69              17                       86
##  2 NE                 69              17                       86
##  3 LA                 72              13                       85
##  4 SD                 68              17                       85
##  5 AL                 68              16                       84
##  6 IA                 64              20                       84
##  7 KS                 65              19                       84
##  8 MN                 66              18                       84
##  9 MS                 71              13                       84
## 10 WI                 63              21                       84
tail(HCAHPS_state_noisiness)
## # A tibble: 6 x 4
## # Groups:   State [6]
##   State H_QUIET_HSP_A_P H_QUIET_HSP_U_P Pct_Always_Usually_Quiet
##   <fct>           <dbl>           <dbl>                    <dbl>
## 1 CT                 49              26                       75
## 2 MA                 47              28                       75
## 3 NJ                 49              26                       75
## 4 NY                 48              27                       75
## 5 CA                 47              27                       74
## 6 DC                 52              20                       72
summary(HCAHPS_state_noisiness$Pct_Always_Usually_Quiet)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   72.00   78.00   81.00   80.46   83.00   86.00
hist(HCAHPS_state_noisiness$Pct_Always_Usually_Quiet)

head(HCAHPS_state_noisiness_bad, 10)
## # A tibble: 10 x 2
## # Groups:   State [10]
##    State H_QUIET_HSP_SN_P
##    <fct>            <dbl>
##  1 IL                  89
##  2 IN                  89
##  3 MO                  89
##  4 NC                  89
##  5 OH                  89
##  6 SC                  89
##  7 WY                  89
##  8 AK                  78
##  9 AR                  78
## 10 GA                  78
tail(HCAHPS_state_noisiness_bad)
## # A tibble: 6 x 2
## # Groups:   State [6]
##   State H_QUIET_HSP_SN_P
##   <fct>            <dbl>
## 1 PA                   3
## 2 WV                   3
## 3 MI                   2
## 4 NM                   2
## 5 OR                   2
## 6 VA                   2

Again, these perecents are overwhelmingly favorable but some of the top favorable states do appear to be the quietests. As one last chance to see any correlation between favorability and noisiness, I’ll plot favorability versus quiet.

library(ggplot2)
HCAHPS_fav_vs_noisiness <- HCAHPS_state_wide %>%
  select(State, H_HSP_RATING_9_10, H_QUIET_HSP_A_P, H_QUIET_HSP_U_P) %>%
  group_by(State) %>%
  mutate(Pct_Always_Usually_Quiet = as.numeric(sum(H_QUIET_HSP_A_P, H_QUIET_HSP_U_P))) %>%
  ggplot(aes(x = Pct_Always_Usually_Quiet, y = H_HSP_RATING_9_10)) + geom_point()

HCAHPS_fav_vs_noisiness

Conclusions

There appears to be a loose positive correlation between how quiet a hospital is during a patient’s stay and a consequent high rating. For the purposes of this project I limited myself to investigating the relationship between favorability and two plausible explanatory variables. Alternatively, I could have taken a more comprehensive exploration of HCAHPS scores by initially using ggplot and the function facet_wrap to compare all variables to favorable survey results. This would have generated dozens of scatterplots and the most highly correlated plots could have been investigated further.


Under 5 Mortality by Country, Year

Mortality for children under ages 5 is a noteworthy statistic as it is historically a driver of increasing life expectancy and indicates transition to a more developed society. In this data set, the under 5 mortality rate is given as the rate per 1,000 live births.

under_5_mort_orig <- read.csv("https://raw.githubusercontent.com/hillt5/DATA607_Project2/master/unicef-u5mr.csv")
head(under_5_mort_orig)
##         CountryName U5MR.1950 U5MR.1951 U5MR.1952 U5MR.1953 U5MR.1954 U5MR.1955
## 1       Afghanistan        NA        NA        NA        NA        NA        NA
## 2           Albania        NA        NA        NA        NA        NA        NA
## 3           Algeria        NA        NA        NA        NA       251     249.9
## 4           Andorra        NA        NA        NA        NA        NA        NA
## 5            Angola        NA        NA        NA        NA        NA        NA
## 6 Antigua & Barbuda        NA        NA        NA        NA        NA        NA
##   U5MR.1956 U5MR.1957 U5MR.1958 U5MR.1959 U5MR.1960 U5MR.1961 U5MR.1962
## 1        NA        NA        NA        NA        NA     356.5     350.6
## 2        NA        NA        NA        NA        NA        NA        NA
## 3       249       248     247.5     246.7     246.3     246.1     246.2
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1963 U5MR.1964 U5MR.1965 U5MR.1966 U5MR.1967 U5MR.1968 U5MR.1969
## 1     345.0     339.7     334.1     328.7     323.3     318.1     313.0
## 2        NA        NA        NA        NA        NA        NA        NA
## 3     246.8     247.4     248.2     248.7     248.4     247.4     245.3
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1970 U5MR.1971 U5MR.1972 U5MR.1973 U5MR.1974 U5MR.1975 U5MR.1976
## 1     307.8     302.1     296.4     290.8     284.9     279.4     273.6
## 2        NA        NA        NA        NA        NA        NA        NA
## 3     241.7     236.5     230.0     222.5     214.2     205.0     195.2
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA        NA        NA        NA        NA
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1977 U5MR.1978 U5MR.1979 U5MR.1980 U5MR.1981 U5MR.1982 U5MR.1983
## 1     267.8     261.6     255.5     249.1     242.7     236.2     229.7
## 2        NA      91.1      84.7      78.6      73.0      67.8      62.8
## 3     184.9     173.8     161.8     148.1     132.5     115.8      99.2
## 4        NA        NA        NA        NA        NA        NA        NA
## 5        NA        NA        NA     234.1     232.8     231.5     230.2
## 6        NA        NA        NA        NA        NA        NA        NA
##   U5MR.1984 U5MR.1985 U5MR.1986 U5MR.1987 U5MR.1988 U5MR.1989 U5MR.1990
## 1     222.9     216.0     209.2     202.1     195.0     187.8     181.0
## 2      58.3      54.3      50.7      47.6      44.9      42.5      40.6
## 3      83.8      71.2      61.9      55.4      51.2      48.5      46.8
## 4        NA        NA        NA        NA        NA        NA       8.5
## 5     229.1     228.3     227.5     226.9     226.5     226.2     226.0
## 6        NA        NA        NA        NA        NA        NA      25.5
##   U5MR.1991 U5MR.1992 U5MR.1993 U5MR.1994 U5MR.1995 U5MR.1996 U5MR.1997
## 1     174.2     167.8     162.0     156.8     152.3     148.6     145.5
## 2      38.8      37.3      36.0      34.6      33.2      31.8      30.3
## 3      45.7      44.9      44.1      43.3      42.5      41.8      41.1
## 4       7.9       7.4       6.9       6.4       6.0       5.7       5.3
## 5     225.9     226.0     225.8     225.5     224.8     224.0     222.6
## 6      24.2      23.1      21.9      20.8      19.7      18.8      17.9
##   U5MR.1998 U5MR.1999 U5MR.2000 U5MR.2001 U5MR.2002 U5MR.2003 U5MR.2004
## 1     142.6     139.9     137.0     133.8     130.3     126.8     123.2
## 2      28.9      27.5      26.2      24.9      23.6      22.5      21.5
## 3      40.6      40.2      39.7      38.9      37.8      36.5      35.1
## 4       5.0       4.8       4.6       4.4       4.2       4.1       4.0
## 5     220.8     218.9     216.7     214.1     211.7     209.2     206.7
## 6      17.0      16.2      15.5      14.8      14.1      13.5      12.9
##   U5MR.2005 U5MR.2006 U5MR.2007 U5MR.2008 U5MR.2009 U5MR.2010 U5MR.2011
## 1     119.6     116.3     113.2     110.4     107.6     105.0     102.3
## 2      20.5      19.5      18.7      17.9      17.3      16.6      16.0
## 3      33.6      32.1      30.7      29.4      28.3      27.3      26.6
## 4       3.9       3.7       3.6       3.5       3.4       3.3       3.2
## 5     203.9     200.5     196.4     192.0     187.3     182.5     177.3
## 6      12.4      11.8      11.3      10.9      10.4       9.9       9.5
##   U5MR.2012 U5MR.2013 U5MR.2014 U5MR.2015
## 1      99.5      96.7      93.9      91.1
## 2      15.5      14.9      14.4      14.0
## 3      26.1      25.8      25.6      25.5
## 4       3.1       3.0       2.9       2.8
## 5     172.2     167.1     162.2     156.9
## 6       9.1       8.7       8.4       8.1

I’ll start by getting the data in a tall format. Judging from the head, there are also many missing values prior to the 1980’s.

u5m_tall <- under_5_mort_orig %>%
  gather(Year, Under_5_Mortality_Rate, U5MR.1950:U5MR.2015) %>%
  separate("Year", c(NA, "Year"), convert = TRUE)

head(u5m_tall)
##         CountryName Year Under_5_Mortality_Rate
## 1       Afghanistan 1950                     NA
## 2           Albania 1950                     NA
## 3           Algeria 1950                     NA
## 4           Andorra 1950                     NA
## 5            Angola 1950                     NA
## 6 Antigua & Barbuda 1950                     NA

Then I calculated the average under 5 mortality by year.

average_u5m_by_year <- u5m_tall %>%
group_by(Year) %>%
  summarize(mean_U5MR = mean(Under_5_Mortality_Rate, na.rm = TRUE))


ggplot(average_u5m_by_year, aes(x = Year, y = mean_U5MR )) + geom_point()

After some ups and downs in the late 1950’s, under five infant mortality has steadily decreased to an average value of 40 deaths per 1,000 live births. However, this number is likely to underestimate the true decrease in mortality rate as it is not population weighted. Countries like India and China have populations an order of magnitude larger than many other countries, so for future analysis I’ll make comparisons on a by-country basis.

North_America <- c("Canada", "Mexico", "United States of America")
u5m_plot <- u5m_tall %>%
  filter(CountryName == North_America) %>%
  ggplot( aes(x = Year, y = Under_5_Mortality_Rate, color = CountryName)) + geom_point()
u5m_plot
## Warning: Removed 4 rows containing missing values (geom_point).

Above I’ve looked at the three countries that comprise North America. All three countries experienced significant decreases in under 5 mortality rates, although Mexico experienced the most extreme decrease over the time period.

Conclusion

This data set is very useful to commpare a given country’s decrease in under 5 mortality over the appxorimately 65 year period from 1950 to 2015. Limitations include missing information with respect to mortality rates for many countries in the first two decades. In addition there is a lack of other demographic information like population that would add context to the absolute under 5 mortality as well as provide the necessary information for a worldwide mortality rate.


National Survey of Drug Use, Ages 12 +

library(rvest)
## Warning: package 'rvest' was built under R version 3.6.3
## Loading required package: xml2
url <- "https://www.drugabuse.gov/national-survey-drug-use-health"
drug_use_orig <- url %>% 
  xml2::read_html() %>%
  html_nodes(xpath='//*[@id="node-4661"]/div/div/div/div/div/table') %>%
  html_table(header = TRUE, fill = TRUE)
drug_use_orig <- as.data.frame(drug_use_orig)
head(drug_use_orig)
##                   Drug Time.Period Ages.12.or.Older Ages.12.or.Older.1
## 1                 Drug Time Period             2016               2017
## 2              Alcohol    Lifetime            80.20              80.90
## 3                        Past Year            64.80              65.70
## 4                       Past Month            50.70              51.70
## 5 Cigarettes (any use)    Lifetime            57.40              57.10
## 6                        Past Year            22.70              21.50
##   Ages.12.or.Older.2 Ages.12.to.17 Ages.12.to.17.1 Ages.12.to.17.2
## 1               2018          2016            2017            2018
## 2              80.80         27.00           27.10           26.30
## 3              65.50         21.60           21.90           20.80
## 4              51.10          9.20            9.90            9.00
## 5              55.70         11.60           10.80            9.60
## 6              21.00          7.20            6.30            5.50
##   Ages.18.to.25 Ages.18.to.25.1 Ages.18.to.25.2 Ages.26.or.Older
## 1          2016            2017            2018             2016
## 2         81.30           81.10           79.90            86.40
## 3         74.40           74.00           73.10            68.40
## 4         57.10           56.30           55.10            54.60
## 5         50.50           49.50           45.90            64.00
## 6         31.70           31.00           27.90            23.10
##   Ages.26.or.Older.1 Ages.26.or.Older.2
## 1               2017               2018
## 2              87.10              87.30
## 3              69.50              69.50
## 4              55.80              55.30
## 5              63.80              62.60
## 6              21.70              21.70
drug_use_orig <- drug_use_orig[-1,]
head(drug_use_orig)
##                   Drug Time.Period Ages.12.or.Older Ages.12.or.Older.1
## 2              Alcohol    Lifetime            80.20              80.90
## 3                        Past Year            64.80              65.70
## 4                       Past Month            50.70              51.70
## 5 Cigarettes (any use)    Lifetime            57.40              57.10
## 6                        Past Year            22.70              21.50
## 7                       Past Month            19.10              17.90
##   Ages.12.or.Older.2 Ages.12.to.17 Ages.12.to.17.1 Ages.12.to.17.2
## 2              80.80         27.00           27.10           26.30
## 3              65.50         21.60           21.90           20.80
## 4              51.10          9.20            9.90            9.00
## 5              55.70         11.60           10.80            9.60
## 6              21.00          7.20            6.30            5.50
## 7              17.20          3.40            3.20            2.70
##   Ages.18.to.25 Ages.18.to.25.1 Ages.18.to.25.2 Ages.26.or.Older
## 2         81.30           81.10           79.90            86.40
## 3         74.40           74.00           73.10            68.40
## 4         57.10           56.30           55.10            54.60
## 5         50.50           49.50           45.90            64.00
## 6         31.70           31.00           27.90            23.10
## 7         23.50           22.30           19.10            20.20
##   Ages.26.or.Older.1 Ages.26.or.Older.2
## 2              87.10              87.30
## 3              69.50              69.50
## 4              55.80              55.30
## 5              63.80              62.60
## 6              21.70              21.70
## 7              18.90              18.50

Above are survey results for 2016 - 2018 for drug, alcohol, and tobacco abuse for subjects ages 12 and older. According to authors, there were some methodological changes after 2016 that make it difficult to compare to previous years. Looking at the first few rows, there appears to be a subheader further distinguishing between the three years. Additionally, the original table has subheaders for each substance indicating rates of lifetime use, as well as use in the last month and year.

drug_abuse_tall_2016 <- drug_use_orig %>%
  select(Drug, Time.Period, Ages.12.or.Older, Ages.12.to.17, Ages.18.to.25, Ages.26.or.Older) %>%
  gather(Age_Group, "Percent Use, 2016", Ages.12.or.Older:Ages.26.or.Older)
drug_abuse_tall_2016$`Percent Use, 2016` <- as.numeric(drug_abuse_tall_2016$`Percent Use, 2016`)
## Warning: NAs introduced by coercion
drug_abuse_tall_2017 <- drug_use_orig %>%
  select(Drug, Time.Period, Ages.12.or.Older.1, Ages.12.to.17.1, Ages.18.to.25.1, Ages.26.or.Older.1) %>%
  gather(Age_Group, "Percent Use, 2017", Ages.12.or.Older.1:Ages.26.or.Older.1)
drug_abuse_tall_2017$`Percent Use, 2017` <- as.numeric(drug_abuse_tall_2017$`Percent Use, 2017`)
## Warning: NAs introduced by coercion
drug_abuse_tall_2018 <- drug_use_orig %>%
  select(Drug, Time.Period, Ages.12.or.Older.2, Ages.12.to.17.2, Ages.18.to.25.2, Ages.26.or.Older.2) %>%
  gather(Age_Group, "Percent Use, 2018", Ages.12.or.Older.2:Ages.26.or.Older.2)
drug_abuse_tall_2018$`Percent Use, 2018` <- as.numeric(drug_abuse_tall_2018$`Percent Use, 2018`)
## Warning: NAs introduced by coercion
drug_abuse_tall <- cbind(drug_abuse_tall_2016,"Percent Use, 2017" = drug_abuse_tall_2017$`Percent Use, 2017`,"Percent Use, 2018" = drug_abuse_tall_2018$`Percent Use, 2018`)
head(drug_abuse_tall)
##                   Drug Time.Period        Age_Group Percent Use, 2016
## 1              Alcohol    Lifetime Ages.12.or.Older              80.2
## 2                        Past Year Ages.12.or.Older              64.8
## 3                       Past Month Ages.12.or.Older              50.7
## 4 Cigarettes (any use)    Lifetime Ages.12.or.Older              57.4
## 5                        Past Year Ages.12.or.Older              22.7
## 6                       Past Month Ages.12.or.Older              19.1
##   Percent Use, 2017 Percent Use, 2018
## 1              80.9              80.8
## 2              65.7              65.5
## 3              51.7              51.1
## 4              57.1              55.7
## 5              21.5              21.0
## 6              17.9              17.2

After separating the original dataframe into three separate ones per year, I added the Percent Use columns into a single dataframe. Next, I’ll explore the Percent Lifetime Use for Illicit Drugs.

drug_abuse_lifetime <- drug_abuse_tall %>%
  filter(Time.Period == 'Lifetime', Drug == 'Illicit Drugs') %>%
  gather(Year, Percent, 'Percent Use, 2016':'Percent Use, 2018') %>%
  separate(Year, c(NA, "Year"), sep = ",", convert = TRUE)
drug_abuse_lifetime
##             Drug Time.Period        Age_Group Year Percent
## 1  Illicit Drugs    Lifetime Ages.12.or.Older 2016    48.5
## 2  Illicit Drugs    Lifetime    Ages.12.to.17 2016    23.0
## 3  Illicit Drugs    Lifetime    Ages.18.to.25 2016    56.3
## 4  Illicit Drugs    Lifetime Ages.26.or.Older 2016    50.2
## 5  Illicit Drugs    Lifetime Ages.12.or.Older 2017    49.5
## 6  Illicit Drugs    Lifetime    Ages.12.to.17 2017    23.9
## 7  Illicit Drugs    Lifetime    Ages.18.to.25 2017    57.0
## 8  Illicit Drugs    Lifetime Ages.26.or.Older 2017    51.3
## 9  Illicit Drugs    Lifetime Ages.12.or.Older 2018    49.2
## 10 Illicit Drugs    Lifetime    Ages.12.to.17 2018    23.9
## 11 Illicit Drugs    Lifetime    Ages.18.to.25 2018    55.6
## 12 Illicit Drugs    Lifetime Ages.26.or.Older 2018    51.2
library(ggplot2)
plot_drug_abuse <- drug_abuse_lifetime %>%
  ggplot(aes(x= Year, y = Percent)) + geom_bar(stat = "identity") +
  facet_wrap(~Age_Group)

plot_drug_abuse

Conclusion

I only had time to perform a preliminary look into drug abuse rates, but I was able to facet_wrap in order to see a comparison of the three age groups part of the survey. Illicit drug use remains lowest in children, with ages 18-25 most likely to report use. However, the biggest increase from 2016 to 2018 happened in the 12-17 group.