Data

This report summarizes the key insights extracted from the variables available in the CoSA Municipal Court InCode Court case management system database extracts.

The analysis below is based on the Citations, Violations, Violations sent to OmniBase, violations with warrants, and violation administrative history data extracts provided by the City of San Antonio Municipal Court. Please refer to the appendix for a description of the data delivery, import, and sub-setting process.

Descriptive statistics

The sections below present the characteristics of the key variables available in the data extract, and constructed variables. Descriptive statistics are provided for the continuous variables. Where appropriate, continuous and categorical variables, and bivariate group comparisons, are visualised. The table below summarizes the variables availeble in the citations file.

Distribution of Citations by age

The histogram shows that the rate of citations skews notably younger (the distribution is also fully congruent with insurance companies’ policy to charge elevated premiums for drivers under the age of 25). The mean age is 34.5 and the median is 31, further illustrating the skew (i.e., half of the citations are associated with individuals between 16 and 31 years of age). The variable “age” has been cleaned by removing observations with typos resulting in impossible values for age (e.g., 0 or negative), and also removing observations with perfectly possible but less plausible values (e.g. ages under 16, or ages over 90 [including values up to 120 currently recorded]). Recommendation: while the number of records affected is not very large, data quality can be improved with implementation of an auto check during data entry: 1) Rejecting impossible values, and 2) providing warnings for possible, but less plausible values

Distribution of Citations by race and gender

The chart above summarizes the race information as recorded in the court database for each citation. Records with unknown or missing race were removed, and the categories “Asian”, “Middle-eastern”, and “Native American” were recoded into “Other”. This distribution needs to be interpreted carefully due to the unknown degree of overlap between the categories “White” and “Hispanic” (as most Hispanic individuals are also white). For example, the population of San Antonio is 71.4% white and 64.5% Hispanic, i.e. with a fairly small minority of non-Hispanic whites. The proportion of Black residents in the database (10.5%) is higher than the city-wide share (6.8%).

Acknowledging the implementation difficulties, and the dependence on state and other system, we recommend exploring transitioning the race data in the two-question format used by the Census.

The gender distribution of court clients in this data base is 61.2% male, 38.4% female.

OmniBase holds by race and gender

The table below shows the distribution of OmniBase holds by race. There is some minor, but notable variation: while the rate of OmniBase holds for the entire data set is 17.4%, within the sub-group of Black court clients, this share is 21%, followed by Hispanic (18%), white (16%), and other (8%).

**Distribution of Omni flags by race**

Distribution of Omni flags by race

When assessing the distribution by gender, however, although male drivers are much more likely to be court clients (68%) there is absolutely no observable difference in rate of OmniBase holds within the sub-groups of male and female court clients. See table below.

**Distribution of Omni flags by sex**

Distribution of Omni flags by sex

Distribution of OmniBase holds by geography (zip code)

The first map below shows the rate of OmniBase holds per 10k population in each zip code, against the backdrop of every zip code median income. This approach is similar to the analysis implemented in the “Driven by Debt: the Failure of the OmniBase program” report prepared by Texas Apleseed and Texas Fair Defence Project.

Similar to their findings, we observe a notable virtually linear pattern of the rate of OmniBase holds inversely related to area median income, i.e. seemingly supporting the argument that the program represents especially problematic burden for the poorest segments of the population. This seems to be further supported by almost linear relationship between zip code median income and rate of OmniBase holds on tha scatter plot following the map.

We used the word “seemingly”, because we find the measure methodologically objectionable. Simply calculating the rate of OmniBase holds per population ignores the obvious detail that the rate of OmniBase holds will be related to overall rate of violations of residents within certain area. The rate of violations is not uniform across city areas. Indeed, it will be spuriously related to both income and rate of OmniBase holds: more intense traffic and greater number of traffic violations are much more likely in inner city areas (which also tend to be poorer) than on serene sub-division roads (which also tend to be more affluent). Perhaps one reason for the use of such sub optimal measure of OmniBase burden has been lack of violation-level court data like the one used in the present analysis

Accordingly, we propose a much more valid measure, avoiding the spurious effect of spurious variation in rate of violations, and it is the percent of violations for each zip code that were reported to OmniBase.

Rate of OmniBase holds and median household income by ZIP code (all San Antonio ZIP codes)

Plot of ZIP code median income vs. rate of OmniBase holds

The scatter plot above reinforces the pattern suggested by the zip code map that there is a notable, virtually linear inverse relationship between income and rate of omni base holds.

However, as noted above, this approach is at least partially methodologically questionable as it does not control for rate of violations which however is necessarily related to the rate of OmniBase holds even if all else is equal, but it is also likely spuriously related to area income: poorer inner city areas are more traffic heavy and will necessarily experience higher rate of violations (and attendant OmniBase holds) than less traffic dense more affluent areas in the outskirts.

To remedy this problem, below we repeat the analysis by replacing “rate of OmniBase holds per population” with “Percent of violations subjected to OmniBase holds”, which eliminates the spurious effect of rate of violations across different areas.

Percent of cases with OmniBase holds and median household income by ZIP code (all San Antonio ZIP codes)

Unlike the previous map showing the rate of OmniBase holds per population, the map below plots the percent of violations with OmniBase holds. Careful examination still demonstrates some degree of connection: at least in the most affluent areas, the percentage of OmniBase cases appears smaller than in all other areas. However, the relationship is not nearly as clear as when rate of OmniBase holds per population is used (as above). Rather than linear, the pattern appears bifurcated: no apparent trend in the lower range of incomes, but a drop (in OmniBase cases) in the most affluent areas. To further examine the nascent pattern, we also provide a plot of the two variables (income and percent of cases with OmniBase holds) below.

Plot of ZIP code median income vs. percent of cases with OmniBase holds

The plot reinforces and clarifies the nature of the somewhat muted relationship implicit in the zip code map. The plot further shows that there is no discernible relationship between income and percent of OmniBase holds up until an area reaches a median income of about $60,000. Only after that level of income an inverse relationship between the two variables becomes notable.

This necessitates introducing some further nuance when contemplating the consequences of the OmniBase program. Generally it is assumed that its burdens most heavily fall on the poorest and most vulnerable segments of the population. The results presented here suggest this concern might be somewhat exaggerated insofar areas with incomes around the area median income (i.e. not impoverished by definition) show very similar patterns in percent of violations referred to OmniBase to the patterns seen in poorer areas.

One interpretation is that a traffic violation and the possible attendant fine represent non-trivial challenge (financial or otherwise, as a disruption) for most individuals or families, including what is approximately considered middle class. Whether it is for financial reasons or for competing priorities and distractions, in all areas with median incomes from $20,000 to $60,000, a stable 15%-20% of violations remain unattended and accordingly get reported to OmniBase. Only after the zip code median income surpasses ~$60,000, some attendant reduction in percent of violations reported to OmniBase begins to drop. Even so, it should be noted that most of even the most affluent areas show non-trivial percent (10%+) of violations sent to OmniBase. This suggests that the implications of the program are not only financial: certain proportion of the population shows propensity to not prioritize case resolution regardless of degree of financial constraints, although certainly more so in lower income areas.

Vehicle Year by OmniBase status

The above comparisons were mased using aggregated Census data to estimate incomes in different ZIP codes in the city. While sound and widely used approach, its main limitation is that it relies on aggregate data to produce estimates at individual level. This is necessitated in part because there is no way to obtain income data at the individual violation data.

However, the data set contains information on the vehicle subject to citation, including year of manufacture. We propose using this variables as a reliable and somewhat valid proxy for income. While there is a wide variation, as a general trend more affluent individuals and households will drive newer vehicles. Existing studies have established the linear relationship between income and average vehicle age.

This is confirmed in the CoSA Municipal court data set. The mean difference in vehicle ages between citations reported to OmniBase and those not is approximately 2 years. When comparing the average model years for vehicles in citations reported to OmniBase and thos not, the averages are 2007 and 2009 respectively, i.e. court clients operating older vehicles are more likely to neglect their cases and experience and OmniBase report. While it cannot be estimated what difference in income this difference in the age of vehicles signifies, per the study above it can be up to $50,000 annually. (NOTE: vehicles with model years prior to 1973 were removed from this analysis; this cut off was somewhat arbitrary, chosen on the assumption that this is a cutoff that may mostly include vehicles that are realistically used for commuting or working, as opposed to older and more exotic vehicles that might show inverse correlation with income)

Appendix

Data provided by CoSA Municipal court

The UTSA team received 4 tables from the Municipal court team, with the following total number of records (prior to any subsetting):

  • Citations (n=690,103)
  • Violations (n=904,035)
  • Warrants (n=641,892)
  • OmniBase reports (n=141,104)
  • (Case) History (n=27,547,045)

Traffic violations represent 89.6% of all the violations in the provided extract. As agreed with the court, this iteration of the analysis was to focus on traffic violations only, accordingly they were removed, in conjunction with all associted records from the other tables. After removing the non-traffic violations from the violations file and all records, linked by citation number, in the omni, warrants, history, and citations data, the working set of observations is as listed below:

  • Citations (n=596,299)
  • Violations (n=810,166)
  • Warrants (n=572,510)
  • OmniBase reports (n=140,667)
  • (Case) History (n=24,425,084)

Further subsetting from the citations file includes: A) Dropped 1,934 records with missing data for race; B) Dropped 638 records with missing age (missing values for age are partially accounted by recoding into missing any values under 16 or over 90; although there are plausible scenarios where lower and higher ages are the correct age, the rarity of such scenarios does not justify the possible biases such outliers may introduce); C) Dropped 831 observations with missing data for gender; D) Dropped 641 observations with mising values for state; E) Dropped 4 of the remaining observations prior to 2001. All of the above reduces the final number of citations to 592,750. The final records after removing entries linked by citation number from the rest of the files are as follows:

  • Citations (n=592,750)
  • Violations (n=805,100)
  • Warrants (n=569,556)
  • OmniBase reports (n=140,176)
  • (Case) History (n=24,284,126)

This process is also summarized on the flow chart below

The data extraction process has not been explicitly documented, but per UTSA team’s understanding based on discussion during meetings the extract represents cases that have received some type of resolution/disposition during the past 5 years, regardless of the original violation date. As a result, the data set features citations going as far back as 2001, with a couple of outliers going back to 1972 and 1982 respectively (both cases are non-traffic violations - violation of water conservation rules and petty theft respectively), see chart below.

NOTE: This distribution of citations by year in the table above and the chart below includes all original records, prior to any subsetting. All other analyses presented in this report are based on the subset of data as described in the beginning of this section.

We also provide a bar chart visualizing the distribution of citations by year. Per the chart below, the “laggard” cases, i.e. cases with unusually long time to resolution are likely best defined as those with violation dates in 2015 and before. The

In this extract, the year with peak citations is 2017 (n=151,349). The data also shows a major drop in citations in 2020 (COVID and associated restrictions leading to major drop in mobility) - n=54,470, down from 115,633 in 2019. The number in 2021 appears exceptionally small (n=18,064). This is most likely due to recency - recent cases by definition have had less time to reach resolution. Secondarily, while some COVID restrictions and reduced activity continued in 2021, it the extent to which this accounts for the lower case number vis recency is unclear.

NOTE: Following this exploratory distribution, all records with violation date prior to 2001 were removed. However, further exclusion may be appropriate, as discussed below.

The court provided a table listing all violations sent to OmniBase. When reviewed by year, the data suggest sporadic maintenance and/or implementation in the early years.

Choice of geography

Geospatial analysis at the ZIP code level is by far the most common. Many agencies collect data up to the ZIP code level, and the ZIP code geographies are stable, and data is easily retrieveable as all levels (e.g., from local, to county, to state, to national). These practical advantages are partially diminished by shortcomings. Most notable among them are that 1) ZIP codes were primarily designed as mail route areas (rather than natural geographical groupings), and, related 2) That there is significant variation in socioeconomic conditions within ZIP codes, bringing the possibility of aggregation bias.

In general, smaller aggregation areas such as census tracts are preferable for geospatial analysis, but there are some difficulties that preclude wider application currently. The main among them is that few agencies explicitly collect census tract level flags. This requires an extra step, which is to cross-walk coordinates or address and to link them to a census tract. The UTSA team initially considered an attempt to conduct the geospatial analysis at the census tract data, which proved infeasible within the scope of the project as 1) The street address data contains a lot of “noise” (a lot of observations would need to be discarded due to incomplete, contaminated, or missing addresses) and 2) the existing capabilities for batch processing addresses and cross-walk with census tracts require either substantial computing time, or utilizing paid services.

Final reason to use zip codes was that legal boundaries of cities generally are far too complicated to allow clear delienation of city versus not city observations (e.g. locations that are legally outside of the boundaries of the city might nonetheless be part of organic neighborhood or community best understood as part of the city). Inthe San Antonio case, this includes the multitude of unincorporated cities within the metropolitan boundaries. For reference, the legal outline of the City of San Antonio is presented below.

FOr all these reasons the UTSA team settlet on using ZIP code levels for court data agregation and for merging with census data, consistent with prevailing practice. This involved selecting all ZIP codes beginning with “782”, which resulted in the area used for the study and reasonably well approximating the boundaries of the City of San Antonio, while also not including the most rural areas found in Bexar county.

The interactive map belows presents the final area in an interactive map, including selected (computed) court level and census variables. NOTE: additional variables can be added upon request.

Interactive map of San Antonio Zip codes (click on a ZIP code to see list of variable values)

Date ranges, case studies and (non)utilization of the History file

The main dates used in the abalysis presented above are the original violation date, and the violation status date; further, sent date and cleared date from the OmniBase table were used. No further dates (e.g. from the history file) have been used in this iteration of the analysis, as explained below.

For this iteration of the analysis the History file was of limited utility. The data contains multiple entries with administrative significance, but of limited substantive/analytical utility. Moreover, the main outcomes of interest (e.g., OmniBase reports, report and clearance dates, violation dates and last violation status) were readily available in the other data tables provided. Third, it is unclear to what extent the history file consistently tracks cases status - especially old cases, where we encountered significant periods with no activity (see below). Nonetheless, this is a potentially rich data source and the UTSA team would appreciate further planning discussion to potentially take advantage of this data.

The UTSA team reviewed several cases, in particular old ones, to gain better understanding of the nature of the data. For example, citation number L675430 records two violations in 2001 (driving without license and insurance). The violation status is DP, updated in April of 2020. The case history for this file contains 30 entries. The first two entries are from the violation year (2001) and are “warrant issued” (there is no record of actions that may have preceded the warrant). The rest of the entries are from 2020, i.e. there is no case history activity for this citation between 2001 and 2020. We have not made a systematic attempt to evaluate the prevalence of such cases with extended period of no recorded activity (and interesting question, but outside of the scope of the current analysis). The most recent StatusChangeDate is identical to the ViolationStatusDate in the violations file.

Citation L675453 (driving without license plate and insurance) is similar in that it shows no activity between the violation date (in 2001) and 2020 in the history file. However, this is a citation that was reported to OmniBase. The most recent date in the history file is identical to the cleared date in the omni table; however, the history entries for this citation do not list the date when the information was sent to OmniBase. This date is available in the omni file, in the SentDate field.

Finally, citation S854366 (from 2017, speeding), also shows 30 history entries. The case is resolved (dismissed after deferred disposition) within ~11 months, including OmniBase report and clearance. There is a small, likely normal due to administrative lag discrepancy between dates of sending an OmniBase report (12 April 2017 in the hostiry file, 21 April 2017 in the omni file) and OmniBase clearance (14 June 2017 in the history file, same in the omni file). The congruence betwene the dates, i addition to the greater ease of use, justified using the dates present in the omni file. Another consideration was that the omnibase reports and clearances are not uniformly recorded in the history file (variation in the strings and descriptions used).