Introduction

The purpose of this paper is to determine if increasing regulations on vehicle emissions, specifically by requiring more robust tests, has a significant effect on state-wide air pollution, specifically Nitrogen Dioxide, Carbon Monoxide, and Ozone. Because regulations and technology are advancing by the day, we would expect pollution levels to slowly be declining. As a result, I will be measuring for any increase in the relative rate of change in pollution levels compared to program areas with less stringent regulations. My research will help determine if increasing restrictions improve air quality.

California is currently home to six of the ten most polluted cities in the United States. Year-round air pollution in California has been consistently higher than the rest of the country (American Lung Association, n.d.). To combat pollution, California spend nearly $30 billion between 2012 and 2015, which is approximately $25 billion more than any of their closest geographic neighbors (US EPA, n.d.).

Air pollution is a considerable environmental and health risk. Between 2005 and 2007, air pollution cost Californians $193 million in medical expenses and resulted in over 12,000 emergency room visits to treat asthma in children 17 and under (Romley et al., 2010). Fifty-four percent of California’s CO2 emissions in 2013 could be attributed to transportation (US EPA, n.d.).

The Smog Check Program, first implemented in 1984, was created to inspect vehicle emissions and control smog levels across the state (SB 33). The program has undergone multiple amendments seeking to improve the efficacy of the program by increasing the stringency of vehicle checks by implementing further testing using On Board Diagnostics (OBDII) technology and enhanced tail pipe testing. For example, the Standardized Testing and Reporting (STAR) program was implemented on January 1, 2013 to meet these higher standards set by amendment AB-2289.

Literature Review

Few studies attempt to quantify the effect of the Smog Check program. The state of California’s Bureau of Automotive Repair (BAR) is required to conduct an annual report of the performance of the program (Bureau of Automotive Repair, 2019). However, this report only summarizes the percentage of vehicles tested at program stations that pass or fail inspection without attempting to quantify the effect on air quality. The findings by BAR are interesting because they see about 18-20% of vehicles tested on the roadside failing their test. BAR attempts to understand their findings by suggesting low-performing test stations may have inaccurately passed vehicles that should have failed. Another factor is possible fraudulent test results, which they are attempting to reduce.

In 2000, a study on the Enhanced level of the Smog Check program found that repairs of vehicles 10 years or older that fail then pass their inspection showed a significant reduction in local hydrocarbons (HC) and carbon monoxide (Wenzel et al., 2000). Newer vehicles that were tested and passed their first inspection actually increased their emissions over the months following inspection by a substantial amount. They also estimate that vehicles exempt from the Smog Check program, because of their age, account for about 4-8% of on-road emissions.

The latest study conducted to examine the effects of the program was released in 2015 studying the performance and quality of STAR stations and their local effect on smog levels (Sanders & Sandler, 2015). The study found that the test and repair of older vehicles improved local air quality, and low quality (less restrictive) STAR stations showed no significant impact on the local air quality. To this literature, this paper contributes an alternative analysis of California’s Smog Check Program using regression methods, for complete counties, across the state to provide a more causal estimate of program efficacy. This analysis provides insight into the large-scale efficacy of the Smog Check program, going beyond the experimental studies done before on individual stations.

Data

The data for this paper are all publicly available from government sources and cover the years 2000-2018. Air quality data was gathered from the EPA (US EPA, 2016), Population estimates were taken from the California Department of Finance (Department of Finance, n.d.), Program level information was estimated from California’s Bureau of Automotive Repair, and Area data was taken from the California State Association of Counties (California State Association of Counties, 2014). The California BAR also granted my request for program level information at the ZIP code level, which I then aggregated up to the county level for this project. This formed a panel set of data segmented by county and comprised of 974 observations across 55 counties. Of these 55 counties, only San Francisco county to change all ZIP codes from Basic level to Enhanced.

There are limitations, despite the amount of reliable governmental data. Some important variables such as vehicles on the road and travel time are omitted due to the lack of available data for all necessary periods in the study.

Key Measures

The key dependent variables I will consider are:

  • CO 2nd Max 1-hr - a measure of the 2nd highest carbon monoxide level in a county during the year in parts per million (ppm)

  • NO2 Mean 1-hr - a measure of the annual mean nitrogen dioxide level in a county in parts per billion (ppb)

  • Ozone 2nd Max 1-hr - a measure of the 2nd highest ozone level in a county during the year in parts per million (ppm)

These are chosen due to the availability and completeness of data, as well as because of their negative health effects.

My primary independent variable in each regression will be the “Program” variable, a categorical variable representing the level of a program in the county in a given year. The variable is measured on three levels:

  • CoO - Change of Ownership counties that only require an emissions check when a vehicle is sold. Areas with these policies are generally more rural and less polluted initially.

  • Basic - A general level of the program that checks emissions every two years after registration. Initial pollution levels are marginal in these counties.

  • Enhanced - A strict level combining CoO and Basic with further testing at STAR stations with higher standards. These counties have higher than average initial levels of Ozone and Carbon Monoxide.

Because some counties have many different ZIP codes of different program levels and for simplicity’s sake, the majority program level is used. There also exists a “Partial” program level that is considered Enhanced in this paper as it contains many properties of the Enhanced program level.

Changes to the program levels in some ZIP codes led to interesting findings about how the program levels were handled and made this project easier to interpret. ZIP codes in a Change of Ownership (CoO) program level never increased or eliminated their level of the program, meaning that their restrictions did not change over the course of this study. Similarly, Enhanced program levels did not decrease or eliminate their program level over the course of this study. Only program areas originally in the Basic program level increased their program level to Enhanced or Partial, and never decreased or removed their program level over the course of this study.

#Packages used
library(knitr)
knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE
)
library(tidyverse)
library(stargazer)
library(readxl)
library(Synth)
library(caret)

#Initial data cleaning and prep

#Load source data as a data frame and clean column types
Air <- as.data.frame(read_excel("Air_data.xlsx", 
    col_types = c("numeric", "text", "numeric", 
        "numeric", "numeric", "numeric", 
        "numeric", "numeric", "numeric", 
        "numeric", "numeric", "numeric", "numeric", "numeric", 
        "numeric", "numeric", "numeric", "numeric", 
        "numeric", "text", "numeric", "numeric", 
        "numeric", "numeric", "numeric", "numeric")))


#Create Population Density and Percentage variables
Air <- Air %>%
  mutate(PopDensity = Pop. / Area)

#Clean up data types
##Cleaned Synthetic Controls data first due to package requirements
Air_Syn <- Air %>%
  mutate_at(vars(Program), as.factor) %>%
  mutate(Program = fct_relevel(Program, "CoO", "Basic", "Enhanced"))

Air <- Air %>%
  mutate_at(vars(Program, Year, Area), as.factor) %>%
  mutate(Program = fct_relevel(Program, "Enhanced", "Basic", "CoO"))


#Create Enhanced dummy variable
Air <- Air %>%
  mutate(Enhanced_dummy = ifelse(Program == "Enhanced" & Post == 1, 1, 0))

Identification Strategy

I use a Generalized Difference-in-Differences (DD) to compare counties who increased their program level from a less stringent one to more stringent to those that do not change. Thirteen counties increased their program level between the years 2001 and 2003. All counties that increased their program level started in a Basic program area and changed to an Enhanced level.

I use multiple regression to estimate the effect of increasing the Smog Check program level on three forms of pollution (CO, NO2, O3). I use a Difference-in-Differences approach to estimate any increase or decrease in air pollution levels in counties that change their program level relative to counties that do not. This will help determine if implementing stricter emissions testing has a causal relationship with the reduction of air pollution.

The general equation to be estimated will take the following form:

\[ ln⁡(y_{it})= β_0+ β_1 (Program)_i*(Post)_t+φ_i+τ_t+ε_{it} \] A semi-log (log-lin) form is used as it is not logical for these particles to be completely eliminated from the atmosphere. County fixed effect controls include factors such as Population, Area, and whether a county is near a Coast or not. These are included because they can have a significant impact on pollution levels. As population increases, we would expect pollution to rise due to a greater number of sources. The ocean along coastal counties can absorb some of the air pollutants, which would reduce the number of particles left in the atmosphere.

The DD approach requires strong assumptions about parallel trends in pollutants, and, although policies are constrained by geopolitical boundaries, air is mobile. So, I also use an alternative approach that aims to estimate program impact for San Francisco county only. It is a large county by population and an important county for air quality. In the 2010 census year, San Francisco contained nearly 805,000 people in about 47 square miles. I used a Synthetic Controls method (Abadie et al., 2010) to build a control county that mimics the properties of San Francisco. To do this, I sampled from other counties in California that were in a Basic program level and weighted them to create a synthetic control county as the synthetic counterfactual San Francisco. The synthetic San Francisco meets the parallel trends assumption by construction.

Results

model_CO <- lm(log(`CO 2nd Max 1-hr`) ~ Enhanced_dummy + Year + Pop. + Area + Coast, data = Air)

model_O3 <- lm(log(`Ozone 2nd Max 1-hr`) ~ Enhanced_dummy + Year + Pop. + Area + Coast, data = Air)

model_NO2 <- lm(log(`NO2 Mean 1-hr`) ~ Enhanced_dummy + Year + Pop. + Area + Coast, data = Air)

stargazer(model_CO, model_O3, model_NO2, type = "html", omit = c("Year","Pop.","Area","Coast"))
Dependent variable:
log(CO 2nd Max 1-hr) log(Ozone 2nd Max 1-hr) log(NO2 Mean 1-hr)
(1) (2) (3)
Enhanced_dummy -0.045 0.033* 0.050
(0.088) (0.018) (0.034)
Constant 1.733*** -2.639*** 2.716***
(0.210) (0.045) (0.080)
Observations 546 921 634
R2 0.704 0.879 0.932
Adjusted R2 0.668 0.869 0.925
Residual Std. Error 0.352 (df = 486) 0.084 (df = 850) 0.139 (df = 577)
F Statistic 19.606*** (df = 59; 486) 87.907*** (df = 70; 850) 140.464*** (df = 56; 577)
Note: p<0.1; p<0.05; p<0.01

In this regression output, all three dependent variables are included in their own column. The “Prgm_Change” variable is the Difference-in-Differences term of interest. Other explanatory variables have been omitted for the sake of brevity and readability.

A change in program level is associated with -4.5, 3.3, and 5.0 percent change in county wide CO, O3, NO2 respectively, holding population, area, and coastal location constant and adjusting for non-parametric trends using year fixed effects. However, only the estimated impact on ozone levels is significant within 90 percent confidence. There is not enough evidence to support the hypothesis that an increase to Smog Check regulations will reduce air pollution.

Synthetic Controls Results

My Synthetic Controls method provided similar results to the multiple regression models above. The Ozone estimates were the only ones to be consistently below the synthetic control group in both periods, meaning that this model is not good fit for this outcome. Because the Carbon Monoxide and Nitrogen Dioxide estimates varied, they are not significant enough to support the hypothesis.

#Synthetic Controls prep

#Filter out missing values and return a vector of complete County Code observations
balvec_CO <- Air_Syn %>% 
  filter(`CO 2nd Max 1-hr` != is.na(`CO 2nd Max 1-hr`), Year < 2009) %>%
  count(`County Code`) %>%
  filter(n == 9) %>%
  select(`County Code`) %>%
  as_vector()

#Filter the source data for balancing County Codes in timing period and create new data frame
Bal_CO <- as.data.frame(Air_Syn %>% filter(`County Code` %in% balvec_CO, Year < 2009))

#Repeat for other variables
balvec_O3 <- Air_Syn %>% 
  filter(`Ozone 2nd Max 1-hr` != is.na(`Ozone 2nd Max 1-hr`), Year < 2009) %>%
  count(`County Code`) %>%
  filter(n == 9) %>%
  select(`County Code`) %>%
  as_vector()

Bal_O3 <- as.data.frame(Air_Syn %>% filter(`County Code` %in% balvec_O3, Year < 2009))

balvec_NO2 <- Air_Syn %>% 
  filter(`NO2 Mean 1-hr` != is.na(`NO2 Mean 1-hr`), Year < 2009) %>%
  count(`County Code`) %>%
  filter(n == 9) %>%
  select(`County Code`) %>%
  as_vector()

Bal_NO2 <- as.data.frame(Air_Syn %>% filter(`County Code` %in% balvec_NO2, Year < 2009))

#Create list of counties other than 6075 for each variable
basicvec_CO <- Bal_CO %>%
  filter(Program == "Basic") %>%
  distinct(`County Code`) %>%
  filter(`County Code` != 6075) %>%
  as_vector()

basicvec_O3 <- Bal_O3 %>%
  filter(Program == "Basic") %>%
  distinct(`County Code`) %>%
  filter(`County Code` != 6075) %>%
  as_vector()

basicvec_NO2 <- Bal_NO2 %>%
  filter(Program == "Basic") %>%
  distinct(`County Code`) %>%
  filter(`County Code` != 6075) %>%
  as_vector()

#Create the dataprep object to be loaded into the Synth function
dataprep.out_CO <- dataprep(
                  foo = Bal_CO,
                         predictors = "CO 2nd Max 1-hr",
                         predictors.op = "mean",
                         dependent = "CO 2nd Max 1-hr",
                         unit.variable = "County Code",
                         time.variable = "Year",
                         special.predictors = list(
                           list("Pop.", 2000:2003, "mean"),
                           list("Area", 2000:2003, "median"),
                           list("PopDensity", 2000:2003, "median")
                         ),
                         treatment.identifier = 6075,
                         controls.identifier = basicvec_CO,
                         time.predictors.prior = c(2000:2002),
                         time.optimize.ssr = c(2000:2003),
                         unit.names.variable = "County",
                         time.plot = 2000:2008)

synth.data_CO <- synth(dataprep.out_CO)

#Repeat for other variables
dataprep.out_O3 <- dataprep(
                  foo = Bal_O3,
                         predictors = "Ozone 2nd Max 1-hr",
                         predictors.op = "mean",
                         dependent = "Ozone 2nd Max 1-hr",
                         unit.variable = "County Code",
                         time.variable = "Year",
                         special.predictors = list(
                           list("Pop.", 2000:2003, "mean"),
                           list("Area", 2000:2003, "median"),
                           list("PopDensity", 2000:2003, "median")
                         ),
                         treatment.identifier = 6075,
                         controls.identifier = basicvec_O3,
                         time.predictors.prior = c(2000:2002),
                         time.optimize.ssr = c(2000:2003),
                         unit.names.variable = "County",
                         time.plot = 2000:2008)

synth.data_O3 <- synth(dataprep.out_O3)

dataprep.out_NO2 <- dataprep(
                  foo = Bal_NO2,
                         predictors = "NO2 Mean 1-hr",
                         predictors.op = "mean",
                         dependent = "NO2 Mean 1-hr",
                         unit.variable = "County Code",
                         time.variable = "Year",
                         special.predictors = list(
                           list("Pop.", 2000:2003, "mean"),
                           list("Area", 2000:2003, "median"),
                           list("PopDensity", 2000:2003, "median")
                         ),
                         treatment.identifier = 6075,
                         controls.identifier = basicvec_NO2,
                         time.predictors.prior = c(2000:2002),
                         time.optimize.ssr = c(2000:2003),
                         unit.names.variable = "County",
                         time.plot = 2000:2008)

synth.data_NO2 <- synth(dataprep.out_NO2)

#Create summary tables from Synth prep
synth.table_CO <- synth.tab(dataprep.res = dataprep.out_CO,
                         synth.res = synth.data_CO,)

synth.table_O3 <- synth.tab(dataprep.res = dataprep.out_O3,
                         synth.res = synth.data_O3,)

synth.table_NO2 <- synth.tab(dataprep.res = dataprep.out_NO2,
                         synth.res = synth.data_NO2,)

print(c(synth.table_CO, synth.table_O3, synth.table_NO2))
plot_CO <- path.plot(synth.res = synth.data_CO,
                  dataprep.res = dataprep.out_CO,
                  Ylab = c("CO (ppm)"),
                  tr.intake = 2003)

plot_O3 <- path.plot(synth.res = synth.data_O3,
                  dataprep.res = dataprep.out_O3,
                  Ylab = c("O3 (ppm)"),
                  tr.intake = 2003)

plot_NO2 <- path.plot(synth.res = synth.data_NO2,
                  dataprep.res = dataprep.out_NO2,
                  Ylab = c("NO2 (ppb)"),
                  tr.intake = 2003)

Trends in the pre-period of the Synthetic Controls indicate that this method is not a good fit for this outcome. Overall, the two analyses do not provide compelling evidence of a reduction in pollution due to the Smog Check program.

Conclusion

The methods used in this study were more likely to provide causal estimates of program efficacy, but no significant reduction was notable. Limitations from a lack of station quality data and travel data may have contributed to the lack of significance in these estimates. Assumptions as to when exactly programs changed also had to be made when aggregating ZIP code data that may have impacted the results.

The results of this study were not as expected given the findings of other authors on this subject. Despite other studies finding mixed results on how effective the Smog Check program has been, the consensus is that the program has have a positive effect on reducing air pollution. Factors such a low-quality testing stations seemed to be a primary concern for the lack of pollution reduction in some areas, as well as a possibility for testing fraud that would leave high polluting vehicles on the road (Bureau of Automotive Repair, 2019).

The Smog Check program should not be eliminated or rolled back due to these findings, but instead careful consideration should be given when evaluating an increase to testing requirements.

References

Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505. https://doi.org/10.1198/jasa.2009.ap08746

American Lung Association. (n.d.). Year-Round Particle Pollution | State of the Air. American Lung Association. Retrieved October 17, 2019, from https://www.lung.org/our-initiatives/healthy-air/sota/key-findings/year-round-particle-pollution.html

Bahadur, R., Feng, Y., Russell, L. M., & Ramanathan, V. (2011). Impact of California’s air pollution laws on black carbon and their implications for direct radiative forcing. Atmospheric Environment, 45(5), 1162–1167. https://doi.org/10.1016/j.atmosenv.2010.10.054

Bureau of Automotive Repair. (2019a). History of Smog Check in California—Bureau of Automotive Repair. https://www.bar.ca.gov/Consumer/Smog_Check_Program_History.aspx

Bureau of Automotive Repair. (2019b). Smog Check Performance Report 2019. Department of Consumer Affairs, 23.

Bureau, U. S. C. (n.d.). American FactFinder—Results. Retrieved November 1, 2019, from https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=bkmk

California State Association of Counties, S. of C. (2014). Square Mileage by County. California State Association of Counties. https://www.counties.org/pod/square-mileage-county

Department of Finance, S. of C. (n.d.). Estimates. Retrieved November 1, 2019, from http://dof.ca.gov/forecasting/demographics/estimates/

Romley, J. A., Hackbarth, A., & Goldman, D. P. (2010). The impact of air quality on hospital spending. RAND.

Sanders, N. J., & Sandler, R. (2015). Do Smog Checks Affect Smog? Emissions Inspections, Station Quality and Local Air Pollution (SSRN Scholarly Paper ID 2690341). Social Science Research Network. https://papers.ssrn.com/abstract=2690341

Sharma, S., Sharma, P., Khare, M., & Kwatra, S. (2016). Statistical behavior of ozone in urban environment. Sustainable Environment Research, 26(3), 142–148. https://doi.org/10.1016/j.serj.2016.04.006

US EPA. (n.d.). Environmental policy in California. Ballotpedia. Retrieved October 17, 2019, from https://ballotpedia.org/Environmental_policy_in_California

US EPA, O. (2016, August 11). Air Quality Statistics Report [Data and Tools]. US EPA. https://www.epa.gov/outdoor-air-quality-data/air-quality-statistics-report

US EPA, & US Census Bureau. (n.d.). Explore Air Pollution in the United States | 2018 Annual Report. America’s Health Rankings. Retrieved October 17, 2019, from https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL

Wenzel, T., Singer, B., Gumerman, E., & Sawyer, R. (n.d.). Evaluation of the Enhanced Smog Check Program: 68.