The Oregon Department of Transportation’s Strategic Action Plan highlights Social Equity as a core priority for the agency to focus its projects, policies, and programs toward. As a part of recognizing Social Equity as a core priority, the agency developed the Social Equity Index (SEI) to help agency staff and leadership understand where communities of concern are located throughout Oregon. Using U.S. Census data at the block group level, the SEI aims to be a decision support tool that helps to target agency resources in a way that reduces social disparities related to transportation resource allocation. This report documents the data and methods used in the current SEI and updated SEI to make future updates easier and to fully disclose the process used by ODOT for this process.
This report documents the data and methods for multiple appraoches for updating the SEI along with transprotation outcome data aimed at helping staff determine if the index categorizes the state in a way that would help target investments in the places that would be most effective.
Data used in this report derives from multiple sources. Data from each source is documented below with description of the vintage used, processing and relevant scripts used to prepare the data.
The current SEI is based solely on data from the American Community Survey 5-year sample, a data product of the U.S. Census and includes the attributes described in the table below. For the 2023 SEI update, these data elements are acquired from the Census using an R script that utilizes the Census’ API and is available on Github here. This script requires an API key that can be retreived here for free but should otherwise work as long as all the necessary R librarties are installed.
As noted in the table below, the current version of the SEI (labeled “2021”) uses some tract level data elements that were then apportioned to the block group by using the proprotion of the tract’s population represented in each block group. For the 2023 this practice is being avoided by using data elements available at the block group level. Specifically, poverty level and population living with a disability were used in the 2021 SEI at the tract level. Poverty level is available at the block group level so this data element is unchanged but since disabled population is not available at block group the population of people age 20 to 64 with a disability is available at the block group and will be used instead.
The two analysis time periods require utilization of two spatial geometries for block groups since one period predates the 2020 Census while the other analysis period uses data from after this dicenial Census. In the earlier period 2010 block groups are used which included 2,634 block groups while in the latter period the 2020 block group geometries are used including 2,970 block groups.
Data Element | Version | Geographic Level | Table Name |
---|---|---|---|
Population | Both | Block Group | B01003 |
Population living at 200% Poverty or below | 2021 | Tract | C17002 |
Population living at 200% Poverty or below | 2023 | Block Group | C17002 |
Population w/ Disability | 2021 | Tract | B18101 |
Population Age 20 to 64 w/ Disability | 2023 | Block Group | B23024 |
Limited English Proficiency | Both | Block Group | B16004 |
Population Age 65 & Older | Both | Block Group | B01001 |
Population White | Both | Block Group | B03002 |
For the purposes of this report the following Census data variables will be used and referred to as the “input variables”. These data are used directly from Census by simplying dividing the value for the variable by the block group total population with the exception being the BIPOC input variable which is derived from Census data by subtracting the White population form the total population. A note for the Disability 20 to 64 population, this proportion is calculated by dividing by the population aged 20-64 and not the total population.
Data Element | Description | Calculation Method |
---|---|---|
Poverty % | Percent of population living at 200% of poverty or below | Direct from Census |
BIPOC % | Percent of population that is Black, Indigenous & People of Color (BIPOC) | Total population minus white population |
Limited English Proficiency % | Percent of population that speak English ‘no well’ or ‘not at all’ | Direct from Census |
Disability (20-64) | Percent of population age 20-64 that have a disability | Direct from Census |
Age Over 64 | Percent of the population that are 64 years or older | Direct from Census |
Crowded Households % | Percent of households that are classified as over crowded (1+ occupants per room) | Direct from Census |
Youth Population | Percent of population that are 18 years old or younger | Direct from Census |
Vulnerable Population % | Percent of population that is older & 64, under 18, or age 20 to 64 and have a disability | Added 64 & older, 18 and younger and 20 to 64 with disability |
Zero Vehicle Households % | Percent of households that do not own a vehicle | Direct from Census |
One of the uses of these Census data has been to analyze the relationship of sociodemographic information and traffic injuries. In order to understand how the SEI relates to the frequency and population based rate of traffic injury ODOT Crash data will be used. Since the Census data used represents a 5-year sample of the population this report uses 5 years of crash records data will also be used in each analysis period. These data are derived from citizen filed crash reports and police reports which are then processed and augmented by ODOT’s Crash Analysis and Reporting Unit (CARS) in order to produce the state’s official traffic injury data.
Crash data are joined to the block group using spatial joining function in R which relies on highly precise spatial location of the crash which then places the crash point in just one block group. Many crashes occur on roadways that coincide with block group boundaries as many Census geographies use roadways to inform boundaries so this method of spatial joining is imperfect. However, ODOT’s Research Unit documented how this issue is unlikely to significantly impact analysis using block group level data, especially when grouped into an index due to sptial autocorrelation of Census data. Spatial autocorrelation, in this context, relates to the phenomenon of sociodemographic data in which values in one block group are correlated or similar to the values in a neighboring block group. The implications of this phenonmeon for our analysis is minimal however for any statistical analysis would need to account for this using various techniques like geographically weighted regression or regression with mixed effects. The latter approach was employed in Roll and McNeil (2020) where multilevel modeling technique was used.
Analysis Period | Crash Data Years Used | Census Geographies |
---|---|---|
2014 to 2018 | 2014 to 2018 | 2010 |
2017 to 2021 | 2017 to 2021 | 2020 |
The Highway Performance Monitoring System (HPMS) data represents ODOT’s offical vehicle volume estimates and are available in a detailed geospatial network format. These network data are available from 2011 through 2020 and include important network features like annual average daily traffic (AADT), functional classification, speed limit among other geometric and operational data elements. This report utilizes these data to understand vehicle miles traveled (VMT) density and miles of high speed arterials and collector density in block groups. While crash points were assigned to just one block group network line features and associated VMT/speed limits are assigned to multiple block groups as in the case of roadways forming the boundary or part of a boundary of block groups. This seems logical given that a roadway on a boundary of a block group ‘exposes’ all of the associated block groups to the conditions of that roadway.
For the 2014 to 2018 analysis period 2014 to 2018 HPMS data is used while for the 2017 to 2021 analysis period 2017 to 2020 data is used since 2021 HPMS data is not yet fully released. In order to calculate the measures shareed below the HPMS segments are first split by the block groups and then bufferred to ensure they are associated with any nearby block groups. Then measures of total VMT and roadway lane miles by block group are computed. Lastly, density of these measures is computed using the land area of the block group with results presented in metrics per square mile.
Analysis Period | HPMS Years Used | Census Geographies |
---|---|---|
2014 to 2018 | 2014 to 2018 | 2010 |
2017 to 2021 | 2017 to 2020 | 2020 |
In order to understand how various indexing methods developed and trested in this report relate to transit usage data on the location of transit stops are used. ODOT maintains a database of transit stop locations based on the GTFS standard and copies of the data are stored every two years. Transit stops are an imperfect measure of transit usage and ridership or service miles would be best but transit stop count and transit stop count density can hihglight a lot about transit stop access within block groups. For the 2014 to 2018 analysis period GTFS transit stop data for 2015 and 2017 are used while for the 2017 to 2021 analysis period 2017, 2019 and 2021 data are utilized. Similar to the crash datra transit stop location is utilized to determine which block group they are assigned to using a spatial join. Since stop location is quite precise this spatial join should be clean and introduce little misassociation.
There are a few ways to create meaningful composite indices that utilize multiple factors and the goal for the SEI is to make the index calculation easily explainable and inutitive. The existing SEI calculation and categorization will be explained followed by a proposed approach for the updated SEI.
For the existing SEI the results of this calculation are values between 0.31 and 1.95 where values greater than one equates to more of the populations of concern living in the block group compared to block groups with an SEI index value less than 1.0. However the number itself is not meaningful and a value of 1.0 does not mean every person in the block group belongs to a population of concern since a single person can belong to multiple groups. Onec the composite index value is calculated, ArcGIS natural jenks algorith was used to create four categories to represent the presence of less to more concentrations of populations of concern including Low, Low/Medium, Medium/High, and High. ArcGIS’s natural jenks algorithm aim to create categoried based on natural groupings inherent in the data by grouping similar values together to maximizes the differences between classes (Geospatial Analsis 2021). For many applications this algorithm performs reasonably but for this application it appears to be skewing groupings in a way that might not be best for helping agency decision making as it groups many block groups with fewer bicycle and pedestrian injuries into the High category. Additionally, the ArcGIS jenks algorithm was not able to be relicated outside of that software package which might present problems in future updates like this one. For comparison and clarity purposes, the current SEI calculation uses a simple calculation that adds each of the populations of concern and then divides by the total population using the equation below where {i} represents each block group:
\[ \begin{aligned} SEI_{i} = \frac{{Poverty_{i} + BIPOC_{i} + Disability_{i} + Limited English_{i} + Population65Older}}{{TotalPopulation_{i}}} \end{aligned} \]
Since the values from the index calculation above are hard to explain and less intuitive, an approach is proposed that uses quintiles to bucket each input value and then add those quntiles into a composite score which is then categorized into Low to High categories using quantiles. Quintiles are statistical values of a data value that represents 20% of a given populationvalue, so the first quintile represents the lowest fifth of the data (1% to 20%); the second quintile represents the 2nd fifth (21% to 40%) and so on up to 100%. Each of the input values (Poverty, BIPOC, etc.) have their quintiles computed and used to inform breaks of the data where the first quintile is then assigned a 1, the second quintile a 2, and so on for each of the next 3 quintiles. Quintiles for each of the input values are then added resulting in a score of 5 to 25 for an index with 5 inputs variables. A block group with a score of 5 would mean that that block group has population values all in the bottom 20% of observed values. A score of 25 would mean that that block group ranks in the top 20% for all values used in the index. If only 4 input variables are used the lowest score would be a 4 and the highest score a 20.
To demonstrate how these input values are represented by quintiles the following table is presented. For the proposed approach the data elements used will not use the population count for each population of concern but instead its proportion of the total population. For instance instead of using the BIPOC population this approach uses the proportion of the total population in the block group that is BIPOC. Using proportions instead of nominal values normalizes the data and limits skewing for block groups with large populations.
Figure 3.1: Distribution of Input Census Variable by Quintile
Three potential approaches are summarized below and are identical in method but use different input variables for the index. The method of combining the quantile scores remains identical across the three approaches and simply sums each of the quantile categories for each input variables (Poverty %, BIPOC, Limited English Proficiency, etc.) to create a composite score. Then to determine the categories the composite score quantiles are calculated and the lowest quantile is categorized as “Low” SEI, the 2nd quantile the “Low/Medium” category, and 3rd quantile the “Medium/High” and the 4th quantile is the “High” SEI category. The difference in the three approaches is that the first uses all five input variables while the second approach adds some variables from the Transportation Disadvantaged Index (TDI) and the third approach keeps some TDI variables in but then combines Age Over 65, Youth Population, and Age 20 to 64 with a Disability and call this input Vulnerable Population. These three appraoches are laid out below in the equations below where i is the block group.
The first approach, titled Approach A uses the original five Census variables and are summarized below:
\[
\begin{aligned}
Approach A_{i} = Poverty Prop_{i} + BIPOC Prop_{i} + Disability Prop_{i} + Limited English Prop_{i} + Population65Older Prop_{i}
\end{aligned}
\]
The second approach, titled TDI Approach uses the original five Census variables but adds the Transportation Disadvantaged inputs including Crowded Houshing, Youth Population, and Zero Vehicle Households and is summarized below:
\[ \begin{aligned} TDI Approach_{i} = Poverty Prop_{i} + BIPOC Prop_{i} + Disability Prop_{i} + Limited English Prop_{i} + Population65Older Prop_{i} + Crowded_Housing Prop_{i} + Youth Prop_{i} + Zero Vehicle Hh Prop_{i} \end{aligned} \]
The third approach, titled TDI Light, uses four of the original inputs but removes Age Over 64 since it uses a new variable created by adding Age Over 65, Age Under 19, and Disabled Population Age 20-64 into a new variable called Vulnerable Population. This approach is summarized below:
\[ \begin{aligned} TDI Light Approach_{i} = Poverty Prop_{i} + BIPOC Prop_{i} + Limited English Prop_{i} + Vulnerable Population Prop_{i} + Zero vehicle Hh Prop_{i} \end{aligned} \]
*where:
\[
\begin{aligned}
Vulnerable Population Prop_{i} = Youth Prop_{i} + Disability Prop_{i} + Population65Older Prop_{i}
\end{aligned}
\]
The fourth approach disaggregates the vulnerable populations and is described below:
\[
\begin{aligned}
TDI Light Approach DisAggVulPop_{i} = Poverty Prop_{i} + BIPOC Prop_{i} + Limited English Prop_{i} + Youth Prop_{i} + Disability Prop_{i} + Population65Older Prop_{i} + Zero Vehicle Hh Prop_{i}
\end{aligned}
\]
The two approaches are summarized below. The results are shown in a summary table where the input variables are sumamrized by SEI category along with information on crasehs and the built environment which were added to the Census block group.
|
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> i Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
Figure 4.1: Chart of Socio-demographic Values by Index Approach and Index Category
The three maps below show the different approaches so that differences and similarities can be reviewed. Each point represents the centroid of a block group which are used in these maps instead of block group boundaries to help improve map rendering performance however in practive block group polygon boundaries would be used. The map order is as follows: