This report summarizes the key insights extracted from the variables available in the CoSA Municipal Court InCode Court case management system database extracts.

The analysis below is based on the Citations, Violations, Violations sent to OmniBase, violations with warrants, and violation administrative history data extracts provided by the City of San Antonio Municipal Court. Please refer to the appendix for a description of the data delivery, import, and sub-setting process.

Executive summary

The analysis of court cases data (traffic violations) revealed informative and actionable descriptive patterns, including distribution of cases with OmniBase report, estimates of times to disposition, and variance in OmniBase report likelihood resulting rom demographic variables and geography.

OmniBase reports. In the entire citations data set 16.5% of all citations are associated with an OmniBase report. Regarding demographic comparisons, clients with cases reported to OmniBase are more likely to be younge, they may be generally poorer, have citations with greater number of violations. Finally, Black and Hispanic clients are slightly over-represented within the group of cases reported to OmniBase, and clients with cases reported to OmniBase are more likely to have triggered racial profiling data entry process.

Number of violations. Majority of court clients (72%) are cited for one violation, but citations with two violations are also common (20% of citations); a smaller percentage have three violations (7.8%). The distributions of number of violations by OmniBase report status, however, differs notably: citations with 2 and 3 violations are more likely (26% and 11% respectively) to be reported to OmniBase. For citations not reported, these shares are 19% and 7.1% respectively). Most important, citations with more than 1 violation take drastically longer time to resolve. Considering strategies to alleviate administrative complexity (e.g., different procedures to settle differnt cases) and financial burden (e.g., multiple fines), may be an effective approach to reducing times to disposition.

Mean and median times to disposition. For the entire data set (i.e., all years) the median time to disposition is 212 days, the mean is 845 days, appropriately reflecting a very skewed distribution. For the subset of cases occurring in 2017 and 2018 (chosen as most representative), the median time to disposition is 148 days and the mean is 278 days. Unsurprisingly, for cases reported to OmniBase, the time to disposition is more than double of those not reported (mean = 596 days, and median = 476 days).

Timelines for cases (cases reported to OmniBase only). Cases reported to OmniBase have a time to disposition that is more than double of those not reported (mean = 596 days, and median = 476 days). The median number of days elapsing between violation date and an OmniBase report is 128 days for all cases, i.e. approximately 4 months. However, once reported to OmniBase, cases tend to be unresolved for a very significant period of time: the median time between report to OmniBase and clearance is over two years for all cases. Within the subset of most typical case (2017-2018) the period is still quite long: 418 days median. There is no difference across race in the time elapsing between violation and OmniBase report. Similarly, regarding days between OmniBase report and clearance, there are very minor variations across racial groups. Finally, within the subset of OmniBase recorded cases (2017-2018) we find no statistically significant variation across racial groups. We find some minor variation in clearance rates. We do not find evidence that the process is uniquely burdensome to court clients members of minority groups.

Distribution of OmniBase holds by geography (zip code). We observed a linear pattern of the rate of OmniBase holds being inversely related to area median income. Calculating the rate of OmniBase holds per population, however, ignores the fact that the rate of OmniBase holds will be related to overall rate of violations of residents within certain area. Using a more valid measure, the percent of violations for each zip code that were reported to OmniBase, reveals the relationship is not nearly as clear. Rather than linear, the pattern appears bifurcated showing no apparent trend in the lower range of incomes, but a drop in OmniBase reports in the most affluent areas (areas with incomes of about $60,000) where after that level of income an inverse relationship between the two variables becomes notable. While some relationship exists, overall we do not find evidence that the rate of OmniBase reports per area is uniquely reflective of socio-economic disadvantage.

Distributions by disposition and OmniBase reports. Violations reported to OmniBase show higher proportions of closed cases, and lower proportion of cases disposed of via alternative means. For example, cases reported to OmniBase are almost twice less likely to be dismissed after a probationary period, but they are much more likely (more than twice as likely) to appear in court for plea appearance. For the very top offenses (speeding and driving without a license), the proportion of these offenses within the OmniBase-reported group is notably lower. However, we also see a distinct set of offenses demonstrating a completely diverging pattern - i.e. their share within the group reported to OmniBase is notably higher than within the group not reported. These include driving without proof of insurance and failure to display registration. A third category, is “driving while license invalid”: the share of such violations is 3 times as high in the group reported to OmniBase.

The remainder of the text contains the detailed tables and charts serving as the basis of this summary, with extended interpretation of the findings.

Descriptive analysis

The sections below present the characteristics of the key variables available in the citations and violations data tables, and the constructed variables. Descriptive statistics are provided for the continuous variables (medians and inter-quartile ranges). Where appropriate, continuous and categorical variables, and bivariate group comparisons, are visualized. Table 1 below summarizes selected variables available in the citations file. (See Appendix for a discussion of levels of analysis, including citation vs. violation); the same table also provides the distributions of these variables contingent on OmniBase status (binary variable indicating if any violation associated with a citation number has been reported to OmniBase).

Overall distributions and comparison by OmniBase report status

OmniBase reports - share of cases. In the entire citations data table 16.5% of the citations (n= 97,833) are associated with an OmniBase report. For variation in the likelihood of this outcome see the remainder of this section. NOTE: the estimate of OmniBase reports will vary depending on unit of analysis; for example, within the violations file, the proportion of violations reported to OmniBase is 17.4%; the estimates for warrants for citation- and violation- level data are 42% and 51% respectively. Such discrepancies normal in the context of this analysis: as noted, citations involving more than one violation are more likely to have at least one of these violations resulting in OmniBase report or warrant; the citation level estimate provides merely an indication if any of the violations on the ticket (if multiple violations) are associated with an OmniBase report or warrant, while the violation level estimate provides this information against the benchmark of all other violations.

The unit of analysis for the descriptive statistics in Table 1 above is the citation. Although demographic data is available, the unit of analysis is not the individual, but every specific event when a written notice (i.e. citation) is issued to an entity (individual) with specific characteristics; certainly there are duplicated individuals in the citations set, although it is neither desirable (due to privacy and legal issues), nor feasible (in the data extract provided by Court) to asses to what extent. This caveat does not invalidate any analysis including demographics, but needs to be kept in mind.

The variables in Table 1 are arranged in no particular order. Table 1 provides all overall descriptive statistics and comparisons by OmniBase report status. The discussion below summarizes the results, and adds further visualisations as appropriate.

Age. The demographics of court clients for traffic citations skews young (see visualization below), with a median age 31 (younger than the median age for San Antonio, which according to US census data is 33.8 years even though the underlying distribution in the court data does not include young children).

Contingent on OmniBase report status, court clients with cases reported to OmniBase tend to be younger (median age = 30) than those resolving their case without an OmniBase report (median age = 32)

The histogram showing the full distribution by age reveals that the rate of citations skews notably younger, with drivers aged 22-24 years received the most citations compared to any other age (this distribution is also fully congruent with insurance companies’ policy to charge elevated premiums for drivers under the age of 25). The mean age is 34.5 and the median is 31, further illustrating the skew (i.e., half of the citations are associated with individuals between 16 and 31 years of age).

NOTE: The variable “age” has been cleaned by removing observations with typos resulting in impossible values (e.g., 0 or negative), and also removing observations with perfectly possible but less plausible values (e.g. ages under 16, or ages over 90 [including values up to 120 currently recorded]). Recommendation: while the number of records affected is not very large, data quality can be improved with implementation of an auto check during data entry: 1) Rejecting impossible values, and 2) providing warnings for possible, but less plausible values.

Vehicle Age (proxy for income). The data allows calculating the age of the vehicle in which the violation was committed. The relationship between income and case outcomes is of special interest of the Court and related stakeholders, however no direct information on income is available in the data; at the aggregate, vehicle age can be used as crude proxy for personal income in bivariate comparisons. The median vehicle age in the court data is 8 years (mean 8.6).

The vehicle age estimates for court clients are interesting, as the mean vehicle age of court clients (8.6) appears to be significantly lower than the national vehicle age of 12.2 years. The importance of this - if any - is unclear, as is whether it signifies behavioral patterns correlated with types of vehicles owned, enforcement patterns, traffic patterns, or all of the above.

The citations data set contains information on the vehicle operated by the individual the citation is issued to, including year of manufacture. We propose using this variable as a reliable (aside from data entry errors) and somewhat valid crude proxy for income. The face validity is acceptable, but only applies at the aggregate: while there is a wide variation, as a general trend more affluent individuals and households will drive newer (and likely more expensive, even when of the same vintage) vehicles at the aggregate. Existing studies have established a linear relationship between income and average vehicle age. For example, the data gathered by the Federal Highway Administration in 2017 (below), showing that 1 year difference in average vehicle age might reflect up to ~$25,000 difference in household income.

This is indirectly confirmed in the CoSA Municipal court data set insofar it is expected individuals with lower incomes are more likely to face difficulties, including resolving their cases prior to an OmniBase report. The median difference in vehicle ages between citations reported to OmniBase and those not is approximately 1 year, i.e. court clients operating older vehicles are more likely to neglect their cases and experience and OmniBase report. While it cannot be exactly determined what difference in income this difference in the age of vehicles signifies, and 1 year may not sound like much, per the study above it may approximate income difference of up to $25,000.

NOTE: vehicles with model years prior to 1973 were removed from this analysis; this cut off was somewhat arbitrary, chosen on the assumption that this is a cutoff that may mostly retain vehicles that are realistically still used for commuting or working, as opposed to older and more exotic vehicles that might show positive correlation with income.

Further, as often acknowledged but rarely explored, means and medians alone do not provide assessment of the underlying distributions of the variable of interest. The plot below presents both boxplots (means and interquartile ranges) combined with density distributions of vehicle age, By OmniBase report status. The distributions in both cases are bimodal, however for the violations not reported to OmniBase, they skew towards newer vehicles, while in the reported violations the second “hump” of the distribution is larger, i.e. older vehicles are marginally more prevalent in spite of similarity in means.

Number of violations. Majority of court clients (72%) are cited for one violation, but citations with two violations are also common (20% of citations), as are tickets with three violations (7.8%). The distributions of number of violations by OmniBase report status differ notably: citations with 2 and 3 violations are more likely (26% and 11% respectively) to be reported to OmniBase (in citations not reported, these shares are 19% and 7.1% respectively). This suggests that both financial burden (e.g., more fines) and administrative complexity (e.g., settling more than one violation) are predictive factors for an OmniBase report.

Race. The figure below visualizes the distribution by race presented in Table 1 above. The chart summarizes the race information as recorded in the court database for each citation. Records with unknown or missing race were removed, and the categories “Asian”, “Middle-eastern”, and “Native American” were recoded into “Other”. This distribution needs to be interpreted carefully due to the unknown degree of overlap between race and ethnicity in the categories “White” and “Hispanic” (as most Hispanic individuals are also “White” for purposes of racial categories). For example, the population of San Antonio is 71.4% white and 64.5% Hispanic, i.e. with a fairly small minority of non-Hispanic whites. The proportion of Black residents in the database (10.5%) is higher than the city-wide share (6.8%). Acknowledging the implementation difficulties, and the dependence on state and other systems, we recommend exploring transitioning the race data in the two-question format used by the Census.

There is some minor, but notable variation in racial distribution of cases by OmniBase report status as shown in Table 1. African-American court clients are the most over-represented among violations reported to OmniBase, followed by Hispanic. While the proportion of African-American court clients is 10% overall, it raises to 13% within the OmniBase group; Hispanic clients represent 53% of cases overall, but 55% of OmniBase reports. Conversely, White clients and members of other race/ethnicity groups are under-represented within the OmniBase group relative to the overall distribution.

Gender. The gender distribution of court clients in this data base is skewed: 61.2% male, 38.4% female. The reason for this discrepancy is unclear. While recent research has contested the idea that there are significant gender differences in risk perceptions and driving skills, studies on driving behavior have found that the level of concern about risk may differ significantly across genders. Assuming no gender-based differences in patterns of traffic enforcement by police, this could partially explain the over-representation of men in the court database. With regards to OmniBase report, although male clients are over-represented among court clients overall (68%), there is absolutely no difference by gender in terms of likelihood for a case to be reported to OmniBase - male and female court clients demonstrate identical propensities to be reported (16.5%).

State. Vast majority of cases (98%) are associated with Texas residents. Since enforcement via OmniBase report is a state-level enforcement mechanism, unsurprisingly 100% of cases reported to OmniBase are Texas residents.

Racial profiling data. A follow-up request by the court involved providing an assessment of the racial profiling data (a set of 8 variables contained in the citations file). The court provided list of variable names and values, but not an explanation of the meaning of the racial profiling data collection process. It appears that at the enforcement side, certain procedure triggers the racial profiling data recording process. It is not clear what are the circumstances that can be inferred from the presence of racial profiling data for any given citation. Pending further clarification, in this iteration of the analysis, the UTSA team has created a binary racial profiling variable coded 1 if there is any data in any of the 8 racial profiling columns, and 0 otherwise. 20% of citations (n = 116,007) have racial profiling data entries associated with them. However, this share is higher (26%) within the OmniBase-reported group. This raises interesting questions about the racial profiling data collection process: if the share of racially profiled citations within more problematic cases (as OmniBase report by definition indicates) is higher than the one within cases that are not (18%), this puts the meaning of the collection system process in question; for example, we don’t know whether stops that result in citations result in police officers over-reporting instances of possible racial profiling. This is not possible to answer conclusively without data on racial profiling entries for stops that did not result in citations. Within the subset of racially profiled citations in 91% of cases the answer to the question “A2 - Race/Ethnicity known prior to detention” is “NO” (with 6% of missing data, and 2.7% “YES answers”). Further analysis within the sub-set of racially profiled citations is available on request if of interest to the court.

Thus the main obstacle to gauging the meaning of the racial profiling flag in the court data is that by definition all records represent real violations, while racial profiling generally means the reliance on racial stereotypes - rather than on the actual behavior of individuals - to initiate a stop. SAPD has no clear publicly available description of the racial profiling data collection, aside from prohibition on engaging in it; the same guide instructs police officers to submit racial profiling data for incidents that did not result in citations - something that does not apply to the court data - but also instructs to append this data to the citation information if the stop has resulted in citation. From this we can infer the process is reliant on officer acknowledgement that racial profiling has played a role in citations for which such data is entered, but once more, this is not necessarily the case, judging from the cases with racial profiling data entry by OmniBase status. Conversely, an entirely opposite scenario is possible: in cases where racial profiling may have played a role in the stop, officers may be more inclined to excercise their discretion to issue a citation (e.g. to penalize minor moving code violations, such as failure to signal lane change), rather than use the same discretion to issue non-citation alternatives (e.g. warnings, etc.). The US Department of Justice has prepared a “Resource Guide on Racial Profiling Data Collection Systems” which may be useful in further discussion on the meaning of this data collected at the citation level. Studying the distribution of cases with racial profiling data by type of violation may also be able to shed light on the role of officer discretion.

Warrants. Last, we also provide estimates of the distribution of warrants overall and by OmniBase report status. Of all citations, 42% are associated with some type of warrant. Since OmniBase report is by definition associated with a warrant (“OB warrant”), 100% of OmniBase cases must have a warrant flag as they do; within cases not reported to OmniBase, the share of citations with warrants is 31%. However since majority of cases are not reported to OmniBase it should also be noted that although the share is smaller, the sheer number of warrants issued to non-Omnibase cases is ~1.5 times higher than the number of OmniBase warrants.

Table 2 below supplements Table 1 by breaking down the descriptives by race (rather than OmniBase report status). This table further illustrates the abmiguity noted above: there is no clear pattern of racial profiling data entries within subgroups by race. It is highest in the group of whites (23%), followed by black (21%), hispanic (17%), and other (14%). This is also likely affected by the current data collection policy which does not distinguish between race and hispanic ethnicity (see comments to Table 1 above).

There are however other patterns by race that are meaningful. Age: Black court clients are the youngest, followed by hispanic, white (and other). Black and Hispanic clients are likely to have lower incomes (as approximated by vehicle age). The variation by race and number of violations is comparatively minor, however black and hispanic court clients tend to have slightly higher shares of citations with multiple violations. No significant variation is seen in gender distribution by race.

The most notable variation by race seen in this analysis is the share of citations with warrants (any kind of warrant) by racial group. It is the highest among black court clients (51%) followed by hispanic (46%) while white and other are under-represented.

In summary, clients with cases reported to OmniBase are more likely to be younger, they may be generally poorer, to have citations with greater number of violations; black and hispanic clients are slightly over-represented within the group of cases reported to OmniBase, and clients with cases reported to OmniBase are more likely to have triggered racial profiling data entry process.

This concludes the descriptive statistics section, conducted at the citation level as unit of analysis. In the remaining sections, the analysis proceeds at the violation level unit of analysis. The demographic and other variables needed from the citations file are merged into the violations file using citation number as the primary key. For operations at the violation level (e.g. creating a violation level OmniBase report flag) a composite key of citation number and violation number is created. (This function is generally fulfilled by the existing DocketNUmber field in the violations file, but it contains sufficient number of errors or discrepancies to warrant recreation).

Timelines analysis

Violation time to disposition (number of days)

The overall and subgroup patterns in length of time from violation to disposition are of special interest to the court. This section provides overall and subgroup estimates of the time periods elapsing between the date of violation, and the date some type of disposition is recorded by the court. NOTE: the variable is computed as the number of days elapsed between violation date and disposition date; this includes any type of disposition.

With regards to time disposition assessment, 2 important limitation of the data must be elaborated: 1) the implications of the court data extract which includes cases with any kind dispositions within the last 5 years, regardless of initial violation date; violation dates go as back as 2001 (see Appendix). 2) the inverse problem is that an unknown proportion of the more recent cases will be problematic and will obtain disposition at various times in the future and are not part of the data set.

The implication of 1) is that any estimate based on the entire data set will introduce very significant upward bias in the statistics computed for time elapsed between violation date and status date. While outlier cases will no doubt have this effects, they represent legitimate data points, rather than errors. The implication of 2) is the inverse: for any estimate of more recent cases (we suggest analyzing a subset of cases with violations in 2017 and 2018), there will be a minor but unknown downward bias in the time period estimates as problematic recent cases do not yet have dispositions and are therefore unavailable for analysis.

For these two reasons, below we provide two estimates for time to disposition: 1) Including all cases in the data set, and 2) one including only cases with violation date either in 2017 or 2018. These two years have the most citations of any other year and combined account for 43% of all citations in the data set. We suggest to use this as proxy for inferring that overall they represent the most typical cases and noting the remarkable effect of outliers as noted below.

Mean and median time to disposition:

  • For the entire data set (i.e., all years) the median time to disposition is 212 days, the mean is 845 days, appropriately reflecting a very skewed distribution.

  • For the subset of cases occurring in 2017 and 2018, the median time to disposition is 148 days and the mean is 278 days.

The main implication of these statistics is that any analysis of times to disposition needs to be cognizant of the implications of case selection and the related tradeoffs: a comprehensive analysis featuring all cases with any or certain dispositions regardless of date of violation will be heavily influenced by outliers and will grossly overstate average and median times from violation to disposition. An approach choosing most typical cases in a data set (as explored here) will produce more reliable estimates but with some downward bias as such approach will always by definition will exclude the cases unresolved at time of the data extract, which may include major outliers going forward. We feel that the second approach however is much more preferable as it provides actionable estimates, but a separate project investigating the characteristics of extreme outliers may be considered.

Table 3 below represents subgroup estimates of means and medians of time to disposition (2017 and 2018 violations only).

Unsurprisingly, for cases reported to OmniBase the time to disposition is more than double of those not reported (mean = 596 days, and median = 476 days).

The variation by race is not striking, however there is a continuum of length of time needed to resolve cases that mirrors the gradient of socio-economic disadvantage approximated by race: times to disposition are highest for black court clients, followed by hspanic (on average 3 week shorter disposition times), white, and other. As before, variation by gender is negligible (although statistically significant now unlike in OmniBase report status). Citations for which racial profiling data colection is reported exhibit significantly longer times to disposition. Finally and perhaps most notable, violations on part of tickets with multiple violations take significantly longer to resolve: the time to disposition on tickets with 3 citations is more than double that of tickets with one.

Timelines for cases (cases reported to OmniBase only)

The following is an estimate of the timelines between events for violations reported to OmniBase (i.e. the subset of violations sent to OmniBase). Four time variables (length in days) are calculated based on violation date, date of report to OmniBase, date of OmniBase clearance, and date of violation status.

The same problem with estimates biased by outliers that was noted above in computing times to disposition applies here - albeit to somewhat lesser extent. Thus Table 4 below reports both overall time estimates, and estimates for the subset of 2017-2018 violations for comparison. Clearance rate is also included. The table reports the the means, medians and the standard deviation of the computed continuous measures for both sets of cases. 37,476 records do not have a clearance date yet; this does not contradict the logic of the data extract as “Sent to Omnibase” is one of the disposition codes in the data provided.

The median number of days elapsing between violation date and an OmniBase report is 128 days for all cases, i.e. approximately 4 months, which appears consistent with the routine court processes including automatic resets, follow ups, etc. However, once reported to OmniBase, cases tend to be unresolved for a very significant periods of time: the median time between report to OmniBase and clearance is over two years for all cases. Within the subset of most typical case (2017-2018) the period is still quite long: 418 days median. This suggests a degree of doubt in the effectiveness of the program as negative incentive, at least as a timely one. OmniBase puts a hold on license renewal (rather than suspend it). Considering that most driver’s licenses are valid for 10 years, unless one’s driver’s license is about to expire, there is no apparent incentive to rush to resolution, hence the rather lengthy period to do so.

There is virtually no distinction between OmniBase clearance dates and status dates, and there should not be, as the clearance is typically the result of completing court requirements first. The mean number of days between disposition and clearance is 1 day, i.e. practically immediate. The mean number of days is negative, reflecting a subset of cases where clearance is contingent on fulfilling certain conditions following dosposition. Although computed anew from different sources, the overall time to disposition estimates (violation date to status date) for 2017-2018 cases reported to OmniBase are identical to the estimates in Table 3, as they should be.

Finally, the clearance rate is 73% overall, and 64% for 2017-2018 citations. This observation amplifies the observation earlier above about the questionable effectiveness of OmniBase report as an incentive to resolve a case: in addition to the already lengthy (over a year) period between OmniBase report and clearance, for the 2017-2018 subset of cases, the most recent of which has occured about 4 years (!) prior to the data extract, more than 1/3 (36%) are still in the system and pending clearance, almost certainly due to lack of follow-up action from the affected court clients.

Last, we present the estimates for these timelines by race in Table 5 below. Predictably, there is no difference across race in the time elapsing between violation and OmniBase report; although statistically significant, this is an example of possible artifact of a large data set - there is no conceivable conceptual reason that could consider the very minor (2-10 days) across groups as systematic. To a similar extent, the same may apply to days between OmniBase report and clearance - there are very minor variations across racial groups. This is more interesting and important as it is not solely administrative process, but contingent on individual efforts to resolve a case. Finally, related and most interesting, within the subset of OmniBase recorded cases (2017-2018) we find no statistically significant variation across racial groups. We find some minor variation in clearance rates.

Distribution of OmniBase holds by geography (zip code)

The first map below shows the rate of OmniBase holds per 10k population in each zip code, against the backdrop of every zip code median income. This approach is similar to the analysis implemented in the “Driven by Debt: the Failure of the OmniBase program” report prepared by Texas Apleseed and Texas Fair Defence Project.

Similar to their findings, we observe a notable virtually linear pattern of the rate of OmniBase holds inversely related to area median income, i.e. seemingly supporting the argument that the program represents especially problematic burden for the poorest segments of the population. This seems to be further supported by almost linear relationship between zip code median income and rate of OmniBase holds on tha scatter plot following the map.

We used the word “seemingly”, because we find the measure methodologically objectionable. Simply calculating the rate of OmniBase holds per population ignores the obvious detail that the rate of OmniBase holds will be related to overall rate of violations of residents within certain area. The rate of violations is not uniform across city areas. Indeed, it will be spuriously related to both income and rate of OmniBase holds: more intense traffic and greater number of traffic violations are much more likely in inner city areas (which also tend to be poorer) than on serene sub-division roads (which also tend to be more affluent). Perhaps one reason for the use of such sub optimal measure of OmniBase burden has been lack of violation-level court data like the one used in the present analysis

Accordingly, we propose a much more valid measure, avoiding the spurious effect of variation in rate of violations, and it is the percent of violations for each zip code that were reported to OmniBase.

Rate of OmniBase holds and median household income by ZIP code (all San Antonio ZIP codes)

Plot of ZIP code median income vs. rate of OmniBase holds

The scatter plot above reinforces the pattern suggested by the zip code map that there is a notable, virtually linear inverse relationship between income and rate of omni base holds.

However, as noted above, this approach is at least partially methodologically questionable as it does not control for rate of violations which however is necessarily related to the rate of OmniBase holds even if all else is equal, but it is also likely spuriously related to area income: poorer inner city areas are more traffic heavy and will necessarily experience higher rate of violations (and attendant OmniBase holds) than less traffic dense more affluent areas in the outskirts.

To remedy this problem, below we repeat the analysis by replacing “rate of OmniBase holds per population” with “Percent of violations subjected to OmniBase holds”, which eliminates the spurious effect of rate of violations across different areas.

Percent of cases with OmniBase holds and median household income by ZIP code (all San Antonio ZIP codes)

Unlike the previous map showing the rate of OmniBase holds per population, the map below plots the percent of violations with OmniBase holds. Careful examination still demonstrates some degree of connection: at least in the most affluent areas, the percentage of OmniBase cases appears smaller than in all other areas. However, the relationship is not nearly as clear as when rate of OmniBase holds per population is used (as above). Rather than linear, the pattern appears bifurcated: no apparent trend in the lower range of incomes, but a drop (in OmniBase cases) in the most affluent areas. To further examine the nascent pattern, we also provide a plot of the two variables (income and percent of cases with OmniBase holds) below.

Plot of ZIP code median income vs. percent of cases with OmniBase holds

The plot reinforces and clarifies the nature of the somewhat muted relationship implicit in the zip code map. The plot further shows that there is no discernible relationship between income and percent of OmniBase holds up until an area reaches a median income of about $60,000. Only after that level of income an inverse relationship between the two variables becomes notable.

This necessitates introducing some further nuance when contemplating the consequences of the OmniBase program. Generally it is assumed that its burdens most heavily fall on the poorest and most vulnerable segments of the population. The results presented here suggest this concern might be somewhat exaggerated insofar areas with incomes around the area median income (i.e. not impoverished by definition) show very similar patterns in percent of violations referred to OmniBase to the patterns seen in poorer areas.

One interpretation is that a traffic violation and the possible attendant fine represent non-trivial challenge (financial or otherwise, as a disruption) for most individuals or families, including what is approximately considered middle class. Whether it is for financial reasons or for competing priorities and distractions, in all areas with median incomes from $20,000 to $60,000, a stable 15%-20% of violations remain unattended and accordingly get reported to OmniBase. Only after the zip code median income surpasses ~$60,000, some attendant reduction in percent of violations reported to OmniBase begins to drop. Even so, it should be noted that most of even the most affluent areas show non-trivial percent (10%+) of violations sent to OmniBase. This suggests that the implications of the program are not only financial: certain proportion of the population shows propensity to not prioritize case resolution regardless of degree of financial constraints, although certainly more so in lower income areas.

Distributions by disposition, violation type and OmniBase reports

As is typical in analysis of administrative databases, the municipal court client management system also contains a very large number of categories, significantly larger than it would be practical to analyze: there are 76 distinct disposition codes, and 359 distinct offense codes. Accordingly, some cutoff points were necessary. For dispositions, we selected all dispositions accounting for more than 0.1% of all disposition codes; this resulted in reduction of the total categories to 12 accounting for 98.5% of all violations (+13th category = “Other”). For offenses, we selected all offences representing more than 1% of all offences, which resulted in 13 categories accounting for 87% of all offences.

Below we present the distribution of the top dispositions as well as distribution of dispositions by OmniBase report status. The same analysis is repeated for offence types.

Detailed interpretation of this breakdown should be left to subject matter experts. It appears that violations reported to OmniBase show higher proportions of closed cases, and lower proportion of cases disposed of via alternative means. For example, cases reported to OmniBase are almost twice less likely to be dismissed after a probationary period, as well as twice less likely to be dismissed for completing requirements. The option to dismiss the case after completing driver safety course seems virtually unavailable by definition to such cases. However, they are much more likely (more than twice as likely) to appear in court for plea appearance. This finding appears highly congruent with the Court aspiration to use OmniBase not punitively, but as a means to induce case resolution, i.e. for respondents to take action, including to appear back in court for a plea.

The overall distribution of the (top) offenses and distribution by OmniBase report status are presented below.

The distribution is also (numerically) presented below, contingent on OmniBase report. A close look at the table reveals potentially very interesting, important, and actionable insight.

For the very top offenses (speeding and driving without a license), the proportion of these offenses within the OmniBase-reported group is notably lower. Different offenses, but part of our interpretation is that these offences (in particular speeding) is what generally the public associates with a traffic ticket - a known, common, agreed-upon violation. Conversely, it is not surprising it is resolved at a higher rate. Perhaps to a somewhat lesser degree, the same reasoning applies to driving without a license, or speeding in a school zone.

However, we also see a distinct set of offenses demonstrating a completely diverging pattern - i.e. their share within the group reported to OmniBase is notably higher than within the group not resolved. These include driving without proof of insurance and failure to display registration. We suggest that at least partially the notably higher share of such violations being reported to OmniBase may be explain that in individual and public perception such infractions are not regarded as “real offences” (as opposed to speeding etc.), resulting in lower inclination to resolve the case. Related and perhaps more important - this type of ofences already demonstrate reduced concern with up-to date legally required paperwork; if that is the case there is a good reason to expect drivers cited for such offences will be less likely to respond to the prospect of OmniBase report.

A third but related category, interesting in its own right is “driving while license invalid”: the share of such violations is 3 times as high in the group reported to OmniBase, for what should be obvious reasons: if a citizen is already indifferent to operating a vehicle with invalid license, preventing renewal of already expired license by itself is not likely to prompt action to resolve the case.

Appendix

Data provided by CoSA Municipal court

The UTSA team received 4 tables from the Municipal court team, with the following total number of records (prior to any sub-setting):

  • Citations (n=690,103)
  • Violations (n=904,035)
  • Warrants (n=641,892)
  • OmniBase reports (n=141,104)
  • (Case) History (n=27,547,045)

Traffic violations represent 89.6% of all the violations in the provided extract. As agreed with the court, this iteration of the analysis was to focus on traffic violations only, accordingly they were removed, in conjunction with all associted records from the other tables. After removing the non-traffic violations from the violations file and all records, linked by citation number, in the omni, warrants, history, and citations data, the working set of observations is as listed below:

  • Citations (n=596,299)
  • Violations (n=810,166)
  • Warrants (n=572,510)
  • OmniBase reports (n=140,667)
  • (Case) History (n=24,425,084)

Further subsetting from the citations file includes: A) Dropped 1,934 records with missing data for race; B) Dropped 638 records with missing age (missing values for age are partially accounted by recoding into missing any values under 16 or over 90; although there are plausible scenarios where lower and higher ages are the correct age, the rarity of such scenarios does not justify the possible biases such outliers may introduce); C) Dropped 831 observations with missing data for gender; D) Dropped 641 observations with mising values for state; E) Dropped 4 of the remaining observations prior to 2001. All of the above reduces the final number of citations to 592,750. The final records after removing entries linked by citation number from the rest of the files are as follows:

  • Citations (n=592,750)
  • Violations (n=805,100)
  • Warrants (n=569,556)
  • OmniBase reports (n=140,176)
  • (Case) History (n=24,284,126)

This process is also summarized on the flow chart below

The data extraction process has not been explicitly documented, but per UTSA team’s understanding based on discussion during meetings the extract represents cases that have received some type of resolution/disposition during the past 5 years, regardless of the original violation date. As a result, the data set features citations going as far back as 2001, with a couple of outliers going back to 1972 and 1982 respectively (both cases are non-traffic violations - violation of water conservation rules and petty theft respectively), see chart below.

NOTE: This distribution of citations by year in the table above and the chart below includes all original records, prior to any subsetting. All other analyses presented in this report are based on the subset of data as described in the beginning of this section.

We also provide a bar chart visualizing the distribution of citations by year. Per the chart below, the “laggard” cases, i.e. cases with unusually long time to resolution are likely best defined as those with violation dates in 2015 and before. The

In this extract, the year with peak citations is 2017 (n=151,349). The data also shows a major drop in citations in 2020 (COVID and associated restrictions leading to major drop in mobility) - n=54,470, down from 115,633 in 2019. The number in 2021 appears exceptionally small (n=18,064). This is most likely due to recency - recent cases by definition have had less time to reach resolution and thus the data extract contains only the subset of unusually quickly resolved ones. Secondarily, while some COVID restrictions and reduced activity continued in 2021, it the extent to which this accounts for the lower case number vis recency is unclear.

NOTE: Following this exploratory distribution, all records with violation date prior to 2001 were removed. However, further exclusion may be appropriate, as discussed below.

The court provided a table listing all violations sent to OmniBase. When reviewed by year, the data suggest sporadic maintenance and/or implementation in the early years.

Choice of geography

Geospatial analysis at the ZIP code level is by far the most common. Many agencies collect data up to the ZIP code level, and the ZIP code geographies are stable, and data is easily retrieveable as all levels (e.g., from local, to county, to state, to national). These practical advantages are partially diminished by shortcomings. Most notable among them are that 1) ZIP codes were primarily designed as mail route areas (rather than natural geographical groupings), and, related 2) That there is significant variation in socioeconomic conditions within ZIP codes, bringing the possibility of aggregation bias.

In general, smaller aggregation areas such as census tracts are preferable for geospatial analysis, but there are some difficulties that preclude wider application currently. The main among them is that few agencies explicitly collect census tract level flags. This requires an extra step, which is to cross-walk coordinates or address and to link them to a census tract. The UTSA team initially considered an attempt to conduct the geospatial analysis at the census tract data, which proved infeasible within the scope of the project as 1) The street address data contains a lot of “noise” (a lot of observations would need to be discarded due to incomplete, contaminated, or missing addresses) and 2) the existing capabilities for batch processing addresses and cross-walk with census tracts require either substantial computing time, or utilizing paid services.

Final reason to use zip codes was that legal boundaries of cities generally are far too complicated to allow clear delineation of city versus not city observations (e.g. locations that are legally outside of the boundaries of the city might nonetheless be part of organic neighborhood or community best understood as part of the city). In the San Antonio case, this includes the multitude of unincorporated cities within the metropolitan boundaries. For reference, the legal outline of the City of San Antonio is presented below.

For all these reasons the UTSA team settled on using ZIP code levels for court data agregation and for merging with census data, consistent with prevailing practice. This involved selecting all ZIP codes beginning with “782”, which resulted in the area used for the study and reasonably well approximating the boundaries of the City of San Antonio, while also not including the most rural areas found in Bexar county.

The interactive map below presents the final area in an interactive map, including selected (computed) court level and census variables. NOTE: additional variables can be added upon request.

Interactive map of San Antonio Zip codes (click on a ZIP code to see list of variable values)

Date ranges, case studies and (non)utilization of the History file

The main dates used in the analysis presented above are the original violation date, the violation status date, and the violation last changed date; further, sent date and cleared date from the OmniBase table were used. No further dates (e.g. from the history file) have been used in this iteration of the analysis, as explained below.

For this iteration of the analysis the History file was extremely helpful, though mainly as source of anecdotes and case studies helpful to understand some of the court and client outcomes. The data contains multiple entries with administrative significance, but of limited substantive/analytical utility. Moreover, the main outcomes of interest (e.g., OmniBase reports, report and clearance dates, violation dates and last violation status) were readily available in the other data tables provided. Third, it is unclear to what extent the history file consistently tracks cases status - especially old cases, where we encountered significant periods with no activity (see below). Nonetheless, this is a rich data source and the UTSA team would appreciate further planning discussion to potentially take advantage of this data.

The UTSA team reviewed several cases, in particular old ones, to gain better understanding of the nature of the data. For example, citation number L675430 records two violations in 2001 (driving without license and insurance). The violation status is DP, updated in April of 2020. The case history for this file contains 30 entries. The first two entries are from the violation year (2001) and are “warrant issued” (there is no record of actions that may have preceded the warrant). The rest of the entries are from 2020, i.e. there is no case history activity for this citation between 2001 and 2020. We have not made a systematic attempt to evaluate the prevalence of such cases with extended period of no recorded activity (and interesting question, but outside of the scope of the current analysis). The most recent StatusChangeDate is identical to the ViolationStatusDate in the violations file.

Citation L675453 (driving without license plate and insurance) is similar in that it shows no activity between the violation date (in 2001) and 2020 in the history file. However, this is a citation that was reported to OmniBase. The most recent date in the history file is identical to the cleared date in the omni table; however, the history entries for this citation do not list the date when the information was sent to OmniBase. This date is available in the omni file, in the SentDate field.

Finally, citation S854366 (from 2017, speeding), also shows 30 history entries. The case is resolved (dismissed after deferred disposition) within ~11 months, including OmniBase report and clearance. There is a small, likely normal due to administrative lag discrepancy between dates of sending an OmniBase report (12 April 2017 in the history file, 21 April 2017 in the omni file) and OmniBase clearance (14 June 2017 in the history file, same in the omni file). The congruence between the dates, in addition to the greater ease of use, justified using the dates present in the omni file. Another consideration was that the omnibase reports and clearances are not uniformly recorded in the history file (variation in the strings and descriptions used).

Dates and levels of analysis: citation vs violation level of analysis

The analyses presented were performed at either the citation or the violation level as appropriate. For example, basic demographics were analyzed at the citation level (and even then it is important to emphasize that the unit of analysis is citations, not people), while others were analyzed at the violation level. For example, it is impractical to merge status and date from violations and into citations. While in most cases the subset of citations with multiple violations get resolved at once (i.e. the violation status and date for the different violations will be the same), there is a sizable group for which this is not the case. As we established above, the working set of citations is 592,750. Ideally retrieving a list of distinct citation numbers and violation status will result in the same number, however this is not the case, i.e. there is lack of uniformity in dispositions within some cases/citations. For example, there are 641,243 distinct citation number-violation status combinations in the violations file, i.e. a sizable number of citations with more than one violation can have different dispositions at a given point in time. The majority of the distinct status combinations are CL and DA (case closed and case dismissed after probationary period), but not exclusively so (there are other combinations, including D2 [dismissed after completing driver safety course] and JC [judgement/conviction] and many others). For example, citation S984735 has two violations (driving with invalid license and driving while using handheld communication device); the former violation is recorded as case closed in June 2018, while the latter as “dismissed after deferred disposition” in July 2018. Thus we proceed some of the analysis at the violation level, rather than citation level(as above). The same problem applies to slightly greater extent to distinct citation number - violation status date combinations (n=648,983), reflecting that a number of violations on the same citation that resolve at different times.

The variable whose distinct values in combination with citation numbers produces the closest resemblance to the original number of citation numbers is whether an Attorney is involved in the case (n=596,696). It is understandable that in vast majority of cases an attorney will represent a client on all violations in a ticket, and only rarely it may be the case that attorney is involved with one of the violations, but not another, at least this is what this data suggests.

A final reason to proceed at the violation level is that while in the citations file all citations have entries for the original violation date, there are 391,813 missing values in the field LastChangeDate in the citations file. The source of this problem is unclear as all violations and citations in the data extract have both some type of status/disposition, and disposition date. Rather than missing, most of these 391,813 entries are ‘0’’s. In the aforementioned case of citation S984735, for example, the value of CitationLastChangedDate is 0, while both violations have properly entered ViolationStatusDate dates in the violations file. We suspect the reason for the discrepancy may be that the citations LastChangedDate field may apply only to violations with conviction dates, and not other types of disposition.