Title: Exploratory correlation analysis of high contamination in drinking water and low birth rate
Subtitle: A study of Southern California counties
Data Source: CalEnviroScreen (CES) 4.0 dataset provided by PHW251B course facilitators: calenviroscreen40resultsdatadictionary.xlsx
Background CalEnviroScreen (CES) 4.0 is a data tool developed by the California Office of Environmental Health Hazard Assessment (OEHHA) to identify communities in California that are most affected by certain environmental health risks. CES 4.0 uses a scoring system which incorporates socioeconomic, environmental, and health data. This exploratory study aims to address the following research question: Is there a correlation between high levels of drinking water contamination and rate of low birth weight in Southern California counties?
This analysis utilizes the following data from the CES 4.0 dataset
and dictionary:
-Drinking Water - Drinking water contaminant index for selected
contaminants
-Drinking Water Pctl - Drinking water percentile
-Low Birth Weight - Percent low birth weight
-Low Birth Weight Pctl - Low birth weight percentile
Southern California counties in this analysis include: Imperial, Kern, Los Angeles, Orange, Riverside, San Bernardino, San Diego, San Luis Obispo, Santa Barbara, and Ventura counties.
The data and results presented are of a preliminary study of CES data to explore the relationship between contaminated drinking water and incidence of low birth weight in Southern California. They can be used as an example of further studies utilizing CES 4.0 data.
Results: After the dataset was filtered to include Southern California counties, the distribution of drinking water contamination was plotted by county. The 3 counties with the highest drinking water contaminant index (>600) were selected for further analysis. Those counties are Kern, Los Angeles, and San Bernardino.
Then, drinking water contamination was plotted against percent of low birth weight for each individual county. It was hypothesized that a positive correlation would be found (increased drinking water contamination occured with increased low birth weight incidence). But this trend was not found. All 3 graphs show a relatively flat line describing the scatter between the two variables, with different moderate variation in each county.
This indicates that the analysis as-is does not describe a trend among the Southern California counties with the highest contaminated drinking water. It is worth exploring the relationship further to include other counties and/or regions, and introducing more complex analyses to account for variation.