Problem Statement

We work for a public health agency that has been tasked with identifying counties to target for implementation of environmental interventions to decrease acute asthma. Utilizing data of environmental measures from Cal EnviroScreen and rates of asthma emergency department visits, we will summarize data into county level measures. We are interested in county-level CES scores; PM 2.5 concentrations; and age-adjusted, asthma-related emergency department visit rates. Using this summarized data, we will create visualizations to display the relationships for our variables of interest. These visualizations will allow us to select the environmental measure of interest that best demonstrates a relationship with asthma ED visits and use that data to select counties to target for interventions.

Methods

We had access to three datasets for this project which provide information on environmental measures and asthma emergency department visits.

The first dataset “calenviroscreen_measures_2021.csv” comes from the California Census tract for the year 2021 and contains environmental measures. We first cleaned the column names by changing all names to snake case for readability and consistency between datasets. Following the initial column name cleaning, we changed the column name “california_county” to “county” and cleaned the data in that column to remove the word “county” from each value. We created a new variable “county_level_pm25” from this dataset by grouping by county and finding the median PM 2.5 concentration for each county.

The second dataset “calenviroscreen_scores_demog_2021.csv” comes from Cal Enviroscreen from the year 2021 and contains tract level Cal Enviro Screen (CES) scores and demographics. We first changed all column names to snake case. In the “county” column we cleaned the data to remove the word “county” after each county name for consistency with other datasets. We created a new variable “county_level_ces” from this dataset by grouping by county and finding the mean CES score for each county. 

The third dataset “chhs_asthma_ed.csv” is from the California Health and Human Services department and contains age-adjusted asthma emergency department (ED) visits/rates by county, year, age group, and race/ethnicity. This dataset contained data from 2015 to 2020. As with the other two datasets, we changed the column names to snake case. In the “county” column of this dataset, we cleaned the data to only have the first letter of each county name be capitalized for consistency with other datasets. For the columns “strata name” and “age group” in this dataset, we recoded values in the columns that did not read correctly. We also subset the data to the most recent year, 2020. Additionally, we selected a demographic strata of interest, race, and subset the data then pivoted the data to include only one row per county.

Visualizations

Plot 1: Asthma visits compared to CES scores

Plot 2: Asthma visits compared to PM2.5 concentrations

Table 1: Correlation between asthma visits and CES scores/PM2.5 concentrations

Table 1: Associations between Age-Adjusted ED Visit Rates and Environmental Measures at the County Level
Environmental Measure Mean Correlation Direction Interpretation
Mean CES Score 23.21 0.2637849 Positive Weak-Moderate, may warrant further study
Median PM2.5 Concentration 8.19 -0.0752427 Negative Weak, unlikely to warrant further study
Note:
Mean age-adjusted ED rate across counties = 26.60, (SD= 9.66)

Results

Plot 1 displays the relationship between county level mean CES score and age-adjusted asthma ED visit rates, with each point representing one county in California. This plot suggests a moderate positive association between a county’s CES score and its age-adjusted asthma-related ED visit rate when the mean CES score is below ~20; increasing mean CES scores above this threshold does not appear to significantly affect asthma ED visit rates.

Plot 2 displays the relationship between county level median PM2.5 concentration and age-adjusted asthma ED visit rates, with each point representing one county in California. This plot does not suggest a clear association between these two variables. It is possible that using the median as the county-level summary statistic for this environmental measure conceals an existing relationship, and/or that a stronger association exists at a more granular, sub-county level or within strata of another variable, such as ethnicity. Because a biological relationship exists between particulate air pollution and asthma, further analysis is recommended before ruling out an epidemiological association.

The table created uses Pearson correlation coefficients to assess the strength of the association between asthma-related ED visit rates and CES scores and PM2.5 concentrations. The findings in this table corroborate plots 1 and 2, finding that mean CES score is moderately correlated with asthma ED visits, while median PM2.5 concentration is unlikely to be correlated under these analysis parameters.

Discussion

Based on the results of our visualizations, CES score is moderately correlated with asthma ED visits and of the two environmental measures we examined, warrants further study for determining counties that should receive interventions. Median PM 2.5 concentration and age-adjusted asthma ED visit rates do not appear to have a clear association. It is possible that using the county-level median as our summary statistic conceals an existing relationship or a stronger relationship may exist within strata of a different variable. Further analysis is recommended before ruling out an epidemiological association and related intervention methods.

QUESTIONS FOR TEACHING TEAM

  1. Should we include the breakdown of who made each visualization as was requested in milestone 4 (since each team member is supposed to contribute at least one)?
  2. Do you have any recommendations for legibility of the charts? E.g. improving the data-ink ratio/styling. We used default styling/color schemes but we were not sure if that’s considered unprofessional or amateurish within the R community so-to-speak.