Problem Statement

We are researching communities in California to determine how funds should be allocated to improve healthcare facilities in rural areas. Populations in rural areas have been shown to have poorer health outcomes and less access to care than urban communities (CDC). A policy was recently developed which allocates funds to create a public-private partnership to improve healthcare facilities in rural areas. We will be analyzing rural populations based on criteria set forth by the California Department of Public Health Office of Health Equity (OHE).

The criteria we were provided for this funding includes rural counties in California that have received low funding over the last five years from the Department of Health Care Access and Information (HCAI). The populations we are interested in researching are those that live in rural areas, are renters (not home-owners), and are aging. In addition, we will be analyzing the mortality rates due to chronic health conditions in these counties.

Methods

CA Demographic Data

Source: The 2012 demographic data for all counties in California was pulled from the United States Census. The data was retrieved on August 29, 2020.
Years and/or dates of data: The data set only includes information for the year of 2012.
Description of cleaning and creating new variables:

  • Dropped all columns except name, pop2012, pop12_sqmi, med_age, owner_occ, and renter_occ.
  • Created a new variable called proprortion_renters which is equal to renter_occ / owner_occ. A higher value for proprortion_renters indicates that the county has a higher proportion of renters compared to homeowners.
  • We are left with a dataset with one row per county and a column for demographic characteristics of interest (ready for merging).

CA Mortality Data

Source: The mortality counts for counties in California came from the California Department of Public Health and were sourced from the California Open Data Portal. The data is stratified by gender, age, race-ethnicity, and death place type. The counts of death were based on information on death certificates and the categories used for the cause of death are coded the same as the International Classification of Diseases. The data was retrieved on August 25, 2022.
Years and/or dates of data: The dataset ranges from year 2014 to 2022.
Description of cleaning and creating new variables:

  • Renamed all columns so that they were in snake case and used underscores (no spaces)
  • Filtered to only keep rows where strata = “Total Population” (we don’t need data stratified by population subgroups)
  • Dropped all columns except year, county, cause, cause_desc, and count
  • We looked at all the causes of death reported and determined which ones we consider to be chronic diseases (codes = “ALZ”, “CLD”, “DIA”, “HTD”, “HYP”, “LIV”, “NEP”, “PAR”). We then created an indicator variable is_chronic that equals 1 if the death was due to chronic disease and 0 otherwise.
  • Recoded all NAs in the count column to be 0.
  • Filtered the dataset to only include rows for chronic disease deaths (where is_chronic = 1).
  • We grouped by county and year and summarized so that we ended up with one row per county-year that would tell us the number of deaths due to chronic disease in that year in that county.
  • We pivoted the table from long format to wide format so that we had one row per county and a column for each year that contained the total deaths due to chronic disease for that year.
  • We are left with a dataset with one row per county and a column for each year from 2014-2020 with how many chronic disease deaths occurred in that year (ready for merging).

HCAI Data

Source: The dataset, Total Construction Cost of Healthcare Projects, came from the Department of Health Care Access and Information and was sourced from California Open Data Portal. This dataset provides data on the total dollar value and the number of projects that are “in review”, “pending construction”, “in construction” or “in closure”. The raw data has its highest level of resolution at the county level and granularity with regard to city or district is not present in this dataset. The dataset has been updated biweekly since 2013. Data for this project was retrieved on August 29, 2020.
Years and/or dates of data: The dataset included information for dates between October, 14, 2013 and August 11, 2022.
Description of cleaning and creating new variables:

  • Renamed all columns so that they were in snake case and used underscores (no spaces) total_costs_oshpd was in character format with commas and a leading “$.” We recoded it to be numerical by stripping the $ and removing all commas and stored this in a new variable called total_costs_oshpd_num
  • Dropped all columns except county, data_generation_date, oshpd_project_status, total_costs_oshpd, total_costs_oshpd_num
  • We cleaned the county variable to strip the leading numbers and dash. For example, “01 - Alameda” was recoded to just “Alameda” so that it matches our other datasets.
  • We filtered to only include rows for projects “In Closure”
  • We filtered to only include the “most recent account,” which we interpreted as the most recent project date in the dataset, 8/11/22. - We are left with a dataset with one row per county that tells us the total cost of the most recent account for that county (ready for merging).

Analytic Methods

  • We started by merging the mortality dataset (which has one row per county and a column for each year that tells you the number of chronic disease deaths in that year) with the demographic dataset (which has one row per county and columns for various demographic characteristics of that county).
  • We wanted to estimate the chronic disease mortality rate in each county. To do this, for each county, we took the average chronic disease mortality rate from years 2014-2020, using the county’s 2012 population as the denominator in our average. We stored this value in a new column in our merged dataset called estimated_mortality_rate.
  • We now merged this dataset with our cleaned HCAI dataset, so we had one row per county, and columns for median age, the proportion of renters, the population density, the mortality rate due to chronic disease, and the OSHPD total cost of the most recent account. We were now ready to analyze this data and pick counties that are fit for funding.
  • We are looking for rural counties (i.e. low population density) with a high median age, a high proportion of renters, a high mortality rate due to chronic disease, and low OSHPD funding (i.e. low OSHPD total project cost). We created some tables and figures to help us reach this decision (see results).

Results

We started by creating a table to help us highlight potential counties that we should fund. Color scheme of Table 1

Table 1

County Median Age Proportion Renters vs. Homeowners Population per Square Mile Chronic Disease Mortality Rate Total HCAI Funding
Alameda 36.6 0.87 2062.40 5.25 $15,250,836.10
Alpine 46.4 0.39 1.54 0.00 $0
Amador 48.2 0.34 63.29 8.56 $0
Butte 37.1 0.72 132.55 9.36 $0
Calaveras 49.1 0.30 44.58 7.18 $0
Colusa 33.5 0.63 18.83 3.67 $0
Contra Costa 38.4 0.49 1405.33 5.86 $7,837,754.00
Del Norte 39.0 0.62 28.30 6.27 $0
El Dorado 43.5 0.37 102.16 6.62 $30,961.00
Fresno 30.7 0.82 157.17 6.69 $5,230,681.00
Glenn 35.3 0.61 21.49 4.64 $0
Humboldt 37.3 0.82 38.06 7.24 $0
Imperial 32.0 0.79 39.74 4.80 $0
Inyo 45.5 0.57 1.82 6.02 $0
Kern 30.7 0.67 104.28 6.69 $2,000,187.99
Kings 31.1 0.85 111.43 4.47 $0
Lake 45.0 0.52 49.08 8.71 $0
Lassen 37.0 0.53 7.42 3.31 $0
Los Angeles 34.8 1.10 2423.26 6.36 $129,179,056.02
Madera 33.1 0.56 71.07 5.56 $139,488.40
Marin 44.5 0.60 486.10 6.29 $5,788,177.72
Mariposa 49.2 0.47 12.61 5.78 $0
Mendocino 41.6 0.70 25.08 6.83 $34,803.00
Merced 29.6 0.84 129.90 5.19 $167,026.00
Modoc 46.0 0.46 2.33 5.85 $0
Mono 37.2 0.79 4.60 1.36 $0
Monterey 33.0 0.97 126.86 4.68 $10,657,237.90
Napa 39.7 0.60 172.31 7.45 $2,743,185.00
Nevada 47.5 0.39 102.56 8.22 $625,345.00
Orange 36.2 0.69 3822.42 6.09 $64,278,886.66
Placer 40.3 0.41 237.08 8.36 $3,985,582.15
Plumas 49.5 0.44 7.65 5.99 $0
Riverside 33.7 0.48 305.04 7.29 $268,651,237.29
Sacramento 34.8 0.74 1441.22 7.01 $12,724,854.30
San Benito 34.3 0.54 40.63 3.46 $0
San Bernardino 31.7 0.59 102.56 6.59 $55,980,818.58
San Diego 34.7 0.84 740.58 5.95 $58,237,267.71
San Francisco 38.5 1.80 17398.35 5.45 $17,012,804.99
San Joaquin 32.7 0.69 482.64 6.87 $0
San Luis Obispo 39.4 0.67 81.82 7.11 $89,105.00
San Mateo 39.2 0.68 1591.22 5.56 $4,254,277.00
Santa Barbara 33.7 0.90 154.04 6.53 $1,709,878.00
Santa Clara 36.2 0.73 1401.07 4.51 $21,401,921.35
Santa Cruz 36.8 0.74 587.52 5.09 $232,403.00
Shasta 41.8 0.55 46.48 12.14 $505,710.00
Sierra 51.0 0.39 3.35 0.00 $0
Siskiyou 46.8 0.54 7.12 9.38 $0
Solano 36.9 0.58 470.01 6.33 $0
Sonoma 39.8 0.66 306.32 7.05 $1,084,897.00
Stanislaus 32.9 0.66 342.54 8.26 $3,039,277.00
Sutter 34.6 0.64 157.13 5.77 $0
Tehama 39.5 0.55 21.52 7.90 $0
Trinity 49.2 0.42 4.38 5.06 $0
Tulare 29.6 0.70 92.74 6.17 $0
Tuolumne 47.1 0.43 24.30 8.82 $0
Ventura 36.2 0.53 444.79 6.25 $17,037,565.00
Yolo 30.5 0.89 199.66 5.31 $0
Yuba 32.2 0.68 113.15 8.16 $0

Dark red cells under population per square mile are defined as rural by a cut-off of 500 people/square mile (USDA). Dark red cells under the HCAI funding column were counties identified as receiving $0 in funding. These were the primary indicators. The gradient colors for proportion renters vs. homeowners, median age and chronic disease mortality rates show a wide range across counties meeting the criteria for rural and lack of funding. Further analysis is required to identify grantees.

Next, we filtered the dataset to only look at counties that are considered rural by USDA’s standards (<500 people / square mile) and only counties that received no recent funding from OSHPD. (n = 29). We took the median median age of the remaining counties and filtered to only include counties whose median age was above that value of 37.3 (n = 14). We took the median proportion_rent of the remaining counties and filtered to only include counties whose proportion_rent was above that median of .45 (n = 7). We are left with 7 counties and we plotted the mortality rate due to chronic disease (Figure 1). We quickly see that three counties stand out (Lake, Siskiyou, and Tehama) as potential targets. However, the remaining four counties have a similar mortality rate. To decide between these counties, we wanted to look at the mortality rate in these counties over time to see if which county has an increasing mortality rate trend and to prioritize those (Figure 2).


Shown in this figure are estimated mortality rates/1,000 people for the counties that met the criteria for being rural (based off the population per square mile), having a high proportion of renters (higher than the median), and having no HCAI funding. Note that the estimated mortality rates are calculated based on an average of mortality totals from years 2014-2020 in the numerator and the 2012 total population estimate in the denominator. In this figure, the counties of Lake, Siskiyou, and Tehama have the three highest estimated chronic disease mortality rates. It is more challenging to see the differences for the counties of Del Norte, Inyo, Mariposa, and Modoc. Further analysis is needed to narrow down the county selections. See Figure 2 for further analysis.
Shown plotted here are the chronic disease mortality rates for the remaining 7 counties to analyze. Of the four lowest mortality rates, both Del Norte and Inyo have decreasing mortality rates. Modoc’s rate is increasing while Mariposa’s is remaining steady.

Discussion

Based on Table 1, one can see there are many counties that meet the definition for rural in addition to not receiving recent HCAI funding. To further narrow it down, our team filtered the dataset to limit our analysis to counties with aging populations with a high proportion of renters, as described in the methods section. Shown in Figure 1, the counties that met these criteria were Del Norte, Inyo, Lake, Mariposa, Modoc, Siskiyou, and Tehama. The last criterion to consider when deciding which of these seven counties to fund is the mortality rate due to chronic disease . Figure 1 showed that there were three counties that clearly had the highest rates of mortality from chronic disease, which were the counties of Lake, Siskiyou, and Tehama. However, there was not a clear distinction in mortality rates between the remaining counties of Del Norte, Inyo, Mariposa, and Modoc.

In order to fairly choose the top two counties of these four, our team decided to plot a chart to show the trends in mortality rates, with the goal of discovering which counties have the most increasing mortality rate due to chronic illness and choosing those for funding (as opposed to counties with steady or decreasing chronic disease mortality rates). From this Figure 2 we can see that both Del Norte and Inyo have a decreasing chronic disease mortality rate. Modoc had a sharp increase in their chronic disease mortality rate and Mariposa’s is remaining fairly steady (Mariposa also had a notably high chronic disease mortality rate in 2017). From these criteria, we determined Mariposa and Modoc will be the two of these counties to receive this funding. Our team’s final decision based on the above analysis is to provide funding to the counties of Mariposa, Modoc, Lake, Siskiyou, and Tehama. It is important to note that mortality rates for each year were based on the respective county’s 2012 census total population. In the future, census data should be further researched to determine more precise estimates.

Our team’s final determination, based on data analysis and visualizations, is that the counties of Mariposa, Lake, Modoc, Siskiyou, and Tehama receive the funding to improve healthcare facilities.In the future our team would like to explore disparities of counties that meet the minimum criteria to guide funding decisions; specifically our team would like to disaggregate the disease mortality based on income and race/ethnicity.