#Analysis of California Environmental Factors and Asthma Rates

##For Milestone 4, we perform a final analysiS TO address our 2 research questions: ##1.Compare county asthma ED rates with a county CES measure to assess if there is a correlation. ##2.Compare asthma ED rates with county-level summaries for specific environmental measures to determine if those specific measures may be worth further investigation.

In this report, we analyze the relationship between environmental factors,asthma rates and Emergency Department visits across California Counties.

Each of our 3 datasets has undergone steps to clean and transform the data for analysis: CalEnviro Measures, CalEnviro Scores, and Asthma ER Visits. All datasets are reflecting data from 2019 because CalEnviro Measures and CalEnviro Scores datasets are 2019 year data, so Asthma ER Visits must be filtered to isolate the 2019 yearly data as we make comparisons across.

###Note to Reader: This analysis for milestone 4 leverages dataframes created via milestone 3. Refer to milestone 3 for code, narrative, tables and graphs. Here we leverage only the code and resulting dataframes and the narrative, tables and graphs from milestone 3 are not included in this file.The Data Dictionary of key variables is retained in the last section for your reference.

#BEGIN MILESTONE 4 # The CalEnviron Study used 21 indicators to characterize Pollution Burden and Population Characteristics by County. Our study evaluated asthma, expressed as the Age Adjusted Rate of Emergency Department Visits, and its potential association with select environmental pollution variables PM2.5, diesel PM, traffic, toxic release. We found a weak correlation with pollution variables as shown in the tables and visuals below.

##Cleaning on Dataset 1 to prepare for joining with Dataset 3

## # A tibble: 8,035 × 15
##    census_tract ces_4_0_score ces_4_0_percentile ces_4_0_percentile_range
##           <dbl>         <dbl>              <dbl> <chr>                   
##  1   6001400100          4.85               2.80 1-5% (lowest scores)    
##  2   6001400200          4.88               2.87 1-5% (lowest scores)    
##  3   6001400300         11.2               15.9  15-20%                  
##  4   6001400400         12.4               19.0  15-20%                  
##  5   6001400500         16.7               29.7  25-30%                  
##  6   6001400600         20.0               37.6  35-40%                  
##  7   6001400700         36.7               70.1  70-75%                  
##  8   6001400800         37.1               70.7  70-75%                  
##  9   6001400900         40.7               76.1  75-80%                  
## 10   6001401000         43.7               80.4  80-85%                  
## # ℹ 8,025 more rows
## # ℹ 11 more variables: total_population <dbl>, children_10_years_percent <dbl>,
## #   pop_10_64_years_percent <dbl>, elderly_64_years_percent <dbl>,
## #   hispanic <dbl>, white <dbl>, black <dbl>, native_american <dbl>,
## #   asian <dbl>, other <dbl>, county <chr>

##Cleaning on Dataset 3 to prepare for joining with Dataset 1

Join Datasets 1 and 3

Join All 3 Datasets

#Correlation between CalEnviron Variables and Median Age Adjusted Asthma ED Visit Rate by County

#Correlation Coefficients between CalEnviron Variables and Median Age Adjusted Asthma ED Visit Rate by County

#Through the analysis above, we ascertained that the majority of counties with highest Asthma ED Visit Rates are in the Central Valley region, and this set of counties also reflect high CES 4.0 CalEnviron Scores. Thus in the below section we engage in further evaluation of pollution variables in the Central Valley region counties. Pearson correlation coefficients were found to be higher for this subset of counties compared to all California counties.

#Correlation between CalEnviron Variables and Median Age Adjusted Asthma ED Visit Rate within Central Valley Counties of California

#Correlation Coefficients between Pollution Variables and Median Age Adjusted Asthma ED Visit Rate by Central Valley Counties Subset

#As revelaed by the analyses above, eight out of the eleven Central Valley counties rank in the top 20 counties for highest ED visit rates. #At this stage of analysis, we took the decision to focus on Central Valley Counties. To further understand the data, it was necessary to examine by Census Tract within the counties of the Central Valley, to ascertain how Asthma ED Visit Rates are distributed, as opposed to median rates by county which was the measure we initially utilized for rough analysis. ##At https://www.cdph.ca.gov/Programs/RPHO, Central California is defined (see Data Dictionary)

#Asthma ED Visit Rates by County Census Tracts. ##: Counties are ordered by median Asthma Rates (ascending)

#CES Score and Asthma ED Visit Rates by County Census Tracts. ##Note: Counties are ordered by median Asthma Rates (ascending). ##The correlation between CES score and Asthma ED Visit Rates is 0.636

#Correlation between Median Age Adjsted Asthma ED Visits and Mean CES 4.0 Score by County
Correlation Coefficient Between Median Age-Adjusted Asthma ED Visits and Mean CES 4.0 Score by County
Variable1 Variable2 Correlation
median asthma ED rate mean ces 4.0 score 0.636

#Correlation between Median Asthma ED Visit Rates and CES 4.0 Score by County ##Note: Correlation Coefficient is found to be 0.636

Interactive Boxplots: ED Visit Rates by Central Valley Counties Census Tracts

Interactive Boxplots of CES Scores by Central Valley Counties’ Census Tracts

#Correlation between Median Age Adjsted Asthma ED Visits and Mean CES 4.0 Score for Central Valley Counties ##Note: Correlation Coefficient is found to be 0.751, demonstrating a stronger signal for the Central Valley as subset of total California counties
Correlation Coefficient Between Median Age-Adjusted Asthma ED Visits and Mean CES 4.0 Score by County
Variable1 Variable2 Correlation
median asthma ED rate mean ces 4.0 score 0.751

#Correlation between Median Asthma ED Visit Rates and CES 4.0 Score for Central Valley County Subset ##Correlation Coefficient 0.751

#Interactive Chart: Asthma ED Visit Rates and CES Scores for Central Valley Counties

#Filtered Dataset for Central Valley Counties

#Correlation Between Median Age-Adjusted ED Visit Rates and Age Groups in Central Valley, 2019

##                              Age_Group Spearman_Correlation
## median_children_visits Children (0-17)                0.305
## median_adults_visits    Adults (18-64)                0.465
## median_seniors_visits    Seniors (65+)                0.365

##Interpretation of the Spearman Correlation Coefficients for Median Age-Adjusted ED Visits Rate and Age Groups (Central Valley): ###The results shows a moderate positive correlation (0.465) for adults (18-64), indicating that changes in the studied factors moderately influence ED visit rates for this age group. For seniors (65+) and children (0-17), the weaker positive correlations (0.365 and 0.305, respectively) suggest a less pronounced relationship, but these groups are still affected to a lesser extent. Moving forward, we will explore selected environmental factors that correlate most with each age group.

#Correlation Between Environmental Factors and Age Group Visits (Central Valley, 2019)

###The data from the Central Valley shows positive correlations between pollution variables (Traffic, Diesel PM, PM2.5, and Toxic Release) and median asthma ED visits across different age groups, with stronger associations observed for Traffic and Diesel PM. These findings suggest that increased exposure to certain pollutants is linked to higher asthma-related emergency visits, particularly in children and adults.

#Correlation of Race Ethnicity and PM2.5 Measurements to Assess Impacted Demographic Groups
Median PM2.5 Air Quality Measures Associated with Population Demography in California for Year 2019
Demographic_Group Correlation
white White -0.584
hispanic Hispanic 0.558
native_american Native American -0.451
black Black 0.404
asian Asian 0.283
other Other -0.065

#Geospatial Exploration: Geographic Map of ER Visit Rates with Demographic Density

#Data Dictionary # Analysis of California Environmental Factors and Asthma Rates ## New Variables for Analysis:

##Term used througout: “Central California” region is defined as: Calaveras, Fresno, Kern, Kings, Madera, Mariposa, Merced, San Joaquin, Stanislaus, Tulare, Tuolumne; refer to https://www.cdph.ca.gov/Programs/RPHO.

Dataset 1:

asthma_median_by_county: median age-adjusted rate of emergency department visits for asthma, by county.

education_median_by_county: median percent of population over 25 with less than a high school education, by county

poverty_median_by_county: median percent of population living below two times the federal poverty level, by county

unemployment_median_by_county: median percent of the population over the age of 16 that is unemployed and eligible for the labor force, by county

housing_burden_median_by_county: median percent housing burdened low income households, by county

pm2_5_median_by_county: median annual mean PM 2.5 concentrations, by county

diesel_pm_median_by_county: median diesel PM emissions from on-road and non-road sources, by county

traffic_median_by_county: median traffic density, in vehicle-kilometers per hour per road length, within 150 meters of the census tract boundary, by county

tox_release_median_by_county: median toxicity-weighted concentrations of modeled chemical releases to air from facility emissions and off-site incineration (from RSEI)

Dataset 2:

county: California county name

mean_CES_score: mean CES score by county as a simple average

total_population: sum population from census tracts aggregated by county

hispanic: hispanic race proportion by county, a weighted avg of total_population

white: White race proportion by county, a weighted avg of total_population

black: Black race proportion by county, a weighted avg of total_population

native_american: Native American race proportion by county, a weighted avg of total_population

asian: Asian race proportion by county, a weighted avg of total_population

other: Other race proportion by county, a weighted avg of total_population

Dataset 3:

county: The name of the county where the data is collected.

strata_name: Original demographic categorization, includes both age and race/ethnicity information

age_group: Age group of individuals visiting the ED

race_ethnicity: Race/ethnicity of individuals visiting the ED

age_category: Broader age categorization based on age_group

age_adjusted_ed_visit_rate: Age-adjusted rate of ED visits per 100,000 population

total_ed_visits: Aggregated count of ED visits per county

mean_ed_visits: Average number of ED visits per county or demographic subgroup

mean_age_adjusted_rate: Average age-adjusted ED visit rate per county or demographic subgroup