Abstract

When a car manufacturer recalls millions of vehicles, does Wall Street flinch? And does it matter why the recall happened, a failed sensor versus a cracked axle or how many vehicles are affected? As automobiles grow increasingly software-defined, recalls have quietly shifted from grease-and-metal problems to lines of corrupted code yet it remains unclear whether financial markets have noticed, or care.

This poster examines the relationship between National Highway Traffic Safety Administration (NHTSA) vehicle recalls and stock market performance across six major U.S. automotive manufacturers Ford, GM, Stellantis, Toyota, Honda, and Nissan from 2000 to 2025. Drawing on a dataset of 3,532 recalls linked to daily stock return data and covering an estimated 517 million vehicle-units, we ask two questions: Have technology-driven recalls grown at a significantly higher rate than mechanical recalls over this period? And does recall severity systematically translate into abnormal stock returns?

We employ negative binomial regression to characterize recall trends and seasonal patterns, and a two-way fixed effects (TWFE) panel model to estimate the within-brand financial impact of recall severity.

Keywords: NHTSA recalls · automotive industry · stock returns · two-way fixed effects · negative binomial regression · technology-driven recalls · recall severity

1 Introduction

Vehicle safety recalls represent one of the most visible and recurrent operational risk events in the U.S. automotive industry. Mandated by the National Traffic and Motor Vehicle Safety Act, the recall system requires manufacturers to notify registered owners of defects and provide remedy at no cost, creating both direct remediation expenditures and indirect reputational consequences. Between 2000 and 2025, the six largest U.S. light-vehicle manufacturers collectively issued 3,532 vehicle recalls covering an estimated 517 million vehicle-units roughly 1.6 vehicles for every person in the United States.

The financial implications of recalls are theoretically ambiguous. A recall announcement transmits negative information: it reveals a previously unpriced product defect, signals potential liability costs, and may damage brand equity accumulated over decades. Yet under semi-strong market efficiency, stock prices should already reflect observable recall risk continuously, leaving individual announcements with limited incremental information content. Whether investors update valuations upon specific recall events and whether the magnitude of their response scales with severity is ultimately an empirical question with direct relevance to corporate risk management, insurance pricing, and investment strategy.

Research questions:

How have vehicle recalls changed across the automotive industry from 2000 to 2025 in terms of trend, technological composition, and seasonal patterns?
Do recalls affect company stock returns, and does recall severity strengthen that relationship?

To answer these questions, we combine NHTSA microdata with daily stock price histories and apply two complementary econometric frameworks.

For the recall trend analysis, we use a negative binomial regression model with monthly seasonality terms and a Bai-Perron structural break test to detect the technology inflection.

For the financial impact analysis, we estimate a TWFE panel model with brand-by-severity interaction terms, where date fixed effects absorb all common daily market movements rendering the model robust to market-wide shocks including S&P 500 variation without requiring an explicit market-return regressor.

2 Data Construction

2.1 NHTSA Recall Data

The primary data source is NHTSA’s publicly available recall database, filtered to passenger vehicle recalls issued between January 2000 and December 2025. After restricting to the six study manufacturers and removing non-vehicle recall types (equipment, tire, child seat), the working sample comprises 3,532 recall events associated with an estimated total of 517 million potentially affected vehicle-units.

raw <- read.csv("BANL.csv", stringsAsFactors = FALSE)

recalls <- raw %>%
  dplyr::rename(
    recall_date  = Column1,
    manufacturer = Manufacturer,
    recall_type  = Recall.Type,
    component    = Component,
    consequence  = Consequence.Summary,
    do_not_drive = Do.Not.Drive.Advisory,
    affected_raw = Potentially.Affected
  ) %>%
  dplyr::mutate(
    recall_date = mdy(recall_date),
    year        = year(recall_date),
    month       = month(recall_date),
    brand       = map_brand(manufacturer),
    component   = str_trim(str_to_upper(component)),
    affected    = as.numeric(str_replace_all(affected_raw, ",", ""))
  ) %>%
  dplyr::filter(
    recall_type == "Vehicle",
    year >= 2000, year <= 2025,
    !is.na(recall_date), !is.na(brand)
  )

recalls %>%
  group_by(brand) %>%
  dplyr::summarise(
    N          = n(),
    Pct        = round(n()/nrow(recalls)*100, 1),
    Mean_aff   = round(mean(affected, na.rm=TRUE), 0),
    Total_aff  = sum(affected, na.rm=TRUE),
    First_year = min(year),
    Last_year  = max(year),
    .groups    = "drop"
  ) %>%
  dplyr::arrange(desc(N)) %>%
  dplyr::mutate(
    Mean_aff  = scales::comma(Mean_aff),
    Total_aff = scales::comma(Total_aff),
    Period    = paste0(First_year, "-", Last_year)
  ) %>%
  dplyr::select(
    Brand                     = brand,
    `N Recalls`               = N,
    `% of Total`              = Pct,
    `Mean Vehicles Affected`  = Mean_aff,
    `Total Vehicles Affected` = Total_aff,
    Period
  ) %>%
  kable(
    caption = "Table 1: Recall dataset summary by manufacturer, 2000-2025",
    align   = c("l","r","r","r","r","c")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white")

Table 1: Recall dataset summary by manufacturer, 2000-2025
Brand	N Recalls	% of Total	Mean Vehicles Affected	Total Vehicles Affected	Period
Ford	915	25.9	142,105	130,026,285	2000-2025
GM	766	21.7	146,928	112,546,541	2000-2025
Stellantis	765	21.7	131,124	100,309,566	2000-2025
Toyota	375	10.6	193,529	72,573,527	2000-2025
Honda	365	10.3	184,498	67,341,844	2000-2025
Nissan	346	9.8	100,154	34,653,155	2000-2025

2.2 Severity Classification via NLP

A key methodological contribution is a four-level severity taxonomy applied to each recall via keyword-based NLP of the NHTSA consequence summary field using NHTSA framework.

The critical design challenge is that approximately 80% of NHTSA consequence summaries include the legal boilerplate phrase “increasing the risk of a crash” regardless of actual defect severity a naive keyword match would classify 84% of recalls as Severe. Our classifier instead identifies the primary stated harm the component causes:

Level	Definition	Examples
Critical	Confirmed fatality language or Do Not Drive advisory	“death,” “fatal,” “explosion,” Do Not Drive = YES
Severe	Fire/burn as stated consequence; loss of control/steering; brake failure; airbag non-deployment	“risk of fire,” “loss of steering,” “wheel may detach”
Moderate	Injury as primary harm; stalling; overheating; smoke; loss of propulsion	“risk of injury,” “vehicle may stall,” “smoke”
Minor	Generic crash-risk boilerplate; labeling; unspecified consequences	“increasing the risk of a crash” only

critical_kw <- paste(c(
  "\\bdeath\\b","\\bfatal(?:ly|ities|ity)?\\b",
  "\\bexplosion\\b","\\belectrocution\\b"
), collapse="|")

severe_kw <- paste(c(
  "(?:risk of|may cause|can cause|could cause|lead(?:ing)? to|result(?:ing)? in)\\s+(?:a\\s+)?fire\\b",
  "\\bburn(?:ing)?\\b",
  "\\bloss of (?:vehicle\\s+)?control\\b","\\bloss of steering\\b",
  "\\bbrake failure\\b","\\brollover\\b","\\bwheel (?:can\\s+)?detach",
  "\\bair\\s*bag(?:s)? (?:may|might|could|can|will) not deploy\\b"
), collapse="|")

moderate_kw <- paste(c(
  "(?:risk of|may cause|can cause|could cause|lead(?:ing)? to|result(?:ing)? in)\\s+(?:serious\\s+)?injur",
  "\\bstall(?:ing)?\\b","\\boverheat(?:ing)?\\b","\\bshort circuit\\b",
  "\\bsmoke\\b","\\bloss of (?:drive |propulsion|power)\\b",
  "\\bwarning light\\b","\\bleak(?:age|ing)?\\b"
), collapse="|")

recalls <- recalls %>%
  dplyr::mutate(
    text_sev  = str_to_lower(coalesce(consequence, "")),
    dnd_flag  = str_to_upper(coalesce(do_not_drive, "No")) == "YES",
    inj_mod   = str_detect(text_sev,
      "(?:risk of|may cause|can cause|could cause|lead(?:ing)? to|result(?:ing)? in)\\s+(?:serious\\s+)?injur") |
      (str_detect(text_sev, "\\binjur(?:y|ies|ed)\\b") &
         !str_detect(text_sev, fixed("increasing the risk of a crash"))),
    crit_flag = dnd_flag | str_detect(text_sev, critical_kw),
    sev_flag  = str_detect(text_sev, severe_kw),
    mod_flag  = inj_mod  | str_detect(text_sev, moderate_kw),
    Severity  = factor(case_when(
      crit_flag                         ~ "Critical",
      !crit_flag & sev_flag             ~ "Severe",
      !crit_flag & !sev_flag & mod_flag ~ "Moderate",
      TRUE                              ~ "Minor"
    ), levels = sev_levels)
  ) %>%
  dplyr::select(-text_sev, -dnd_flag, -inj_mod, -crit_flag, -sev_flag, -mod_flag)

recalls %>%
  dplyr::count(Severity, name="N") %>%
  dplyr::mutate(
    `%`            = round(N/sum(N)*100, 1),
    `Cumulative %` = round(cumsum(N)/sum(N)*100, 1)
  ) %>%
  dplyr::arrange(match(Severity, sev_levels)) %>%
  kable(
    caption   = "Table 2: Severity classification distribution, 2000-2025",
    align     = "lrrr",
    col.names = c("Severity Level","N","%","Cumulative %")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white") %>%
  row_spec(1, background="#EEF4FB") %>%
  row_spec(2, background="#FEF3C7") %>%
  row_spec(3, background="#FDECEA") %>%
  row_spec(4, bold=TRUE, color="white", background="#8B1A1A")

Table 2: Severity classification distribution, 2000-2025
Severity Level	N	%	Cumulative %
Minor	1483	42.0	42.0
Moderate	1319	37.3	79.3
Severe	600	17.0	96.3
Critical	130	3.7	100.0

2.3 Electrical vs. Mechanical Classification

Each recall is additionally classified as Electrical/Technology-driven or Mechanical via keyword search across the component, subject, and description fields.

elec_kw <- paste(c(
  "electrical","software","electronic","sensor","module","camera","computer",
  "control unit","ecu","tcm","battery","wiring","infotainment",
  "adas","forward collision","back over","backup"
), collapse="|")

mech_kw <- paste(c(
  "engine","transmission","brake","suspension","steering",
  "fuel system","exhaust","power train","axle","driveshaft",
  "clutch","gearbox","differential","tire","wheel","coolant"
), collapse="|")

recalls <- recalls %>%
  dplyr::mutate(
    tc         = str_to_lower(paste(coalesce(component,""), coalesce(consequence,""), sep=" ")),
    is_elec    = str_detect(tc, elec_kw),
    is_mech    = str_detect(tc, mech_kw),
    type_class = case_when(
      is_elec & !is_mech ~ "Electrical/Tech",
      is_mech & !is_elec ~ "Mechanical",
      is_elec & is_mech  ~ "Both",
      TRUE               ~ "Other"
    )
  ) %>%
  dplyr::select(-tc)

2.4 Stock Return Data

Daily stock return data for each manufacturer were sourced from FactSet and matched to the recall dataset by brand and trading date. Panel coverage varies by listing history: GM re-listed in November 2010 following its Chapter 11 exit, and Stellantis began trading as STLA only in January 2021 following the PSA-FCA merger. After dropping observations with missing returns, the merged panel contains 33,131 usable brand-day observations across six brands and approximately 6,539 unique trading dates spanning 2000-2025.

3 RQ1: Recall Trends, Technology, and Seasonality

3.1 Structural Shift Toward Technology-Driven Recalls

annual_type <- recalls %>%
  dplyr::filter(type_class %in% c("Electrical/Tech","Mechanical")) %>%
  dplyr::count(year, type_class, name="n")

ts_elec <- ts(
  recalls %>%
    dplyr::filter(type_class=="Electrical/Tech") %>%
    dplyr::count(year) %>%
    dplyr::arrange(year) %>%
    dplyr::pull(n),
  start=2000
)

bp       <- breakpoints(ts_elec ~ 1)
break_yr <- (recalls %>%
               dplyr::filter(type_class=="Electrical/Tech") %>%
               dplyr::count(year) %>%
               dplyr::arrange(year) %>%
               dplyr::pull(year))[bp$breakpoints[1]]

ggplot(annual_type, aes(x=year, y=n, color=type_class, group=type_class)) +
  geom_line(linewidth=1.7) +
  geom_point(size=2.8) +
  geom_vline(xintercept=break_yr, linetype="dashed", color="grey40", linewidth=0.9) +
  annotate("text", x=break_yr+0.4, y=max(annual_type$n)*0.92,
           label=paste0(break_yr," structural break"), hjust=0, size=3.8, color="grey35",
           family="Times New Roman") +
  scale_color_manual(
    values = c("Electrical/Tech"="#E07B39","Mechanical"="#2E86AB"),
    labels = c("Electrical / Technology","Mechanical")
  ) +
  scale_y_continuous(labels=comma, expand=expansion(mult=c(0.02,0.08))) +
  scale_x_continuous(breaks=seq(2000,2025,5)) +
  labs(x=NULL, y="Annual Recall Count", color=NULL) +
  theme_minimal(base_size=13, base_family="Times New Roman") +
  theme(
    legend.position  = "bottom",
    legend.text      = element_text(size=12),
    panel.grid.minor = element_blank(),
    plot.background  = element_rect(fill="white",color=NA)
  )

Figure 1

Figure 1: Annual recall counts by component type. A marked structural inflection is visible circa 2013, after which Electrical/Technology recalls accelerate sharply. This inflection coincides with the industry-wide adoption of advanced driver-assistance systems (ADAS), mass-market software-controlled powertrains, and expanded back-over prevention requirements (FMVSS 111), all of which introduced electronic failure modes that were absent from the pre-2010 recall landscape.

3.2 Negative Binomial Regression - H1 Test

3.2.1 Model Specification

Hypothesis (H1) - Electrical recalls grew faster than mechanical recalls is tested via a negative binomial regression on monthly recall counts:

\[\log(\hat{\mu}_{t}) = \beta_0 + \beta_1 \text{Year}_t + \beta_2 \mathbb{1}[\text{Elec}] + \beta_3 (\text{Year}_t \times \mathbb{1}[\text{Elec}]) + \beta_4 \mathbb{1}[\text{Post-break}]\]

The count-specific negative binomial is preferred over Poisson regression due to overdispersion in monthly recall counts.

monthly_agg <- recalls %>%
  dplyr::filter(type_class %in% c("Electrical/Tech","Mechanical")) %>%
  dplyr::count(year, month, type_class, name="n_recalls") %>%
  dplyr::mutate(
    is_elec    = as.integer(type_class=="Electrical/Tech"),
    post_break = as.integer(year >= break_yr)
  )

nb_h1 <- MASS::glm.nb(n_recalls ~ year * is_elec + post_break, data=monthly_agg)

pre_elec  <- monthly_agg %>% dplyr::filter(is_elec==1, year <  break_yr) %>% dplyr::pull(n_recalls)
post_elec <- monthly_agg %>% dplyr::filter(is_elec==1, year >= break_yr) %>% dplyr::pull(n_recalls)
pre_mech  <- monthly_agg %>% dplyr::filter(is_elec==0, year <  break_yr) %>% dplyr::pull(n_recalls)
post_mech <- monthly_agg %>% dplyr::filter(is_elec==0, year >= break_yr) %>% dplyr::pull(n_recalls)
mw_elec   <- wilcox.test(post_elec, pre_elec, alternative="greater")

nb_coef <- coef(nb_h1)
nb_se   <- sqrt(diag(vcov(nb_h1)))
int_z   <- nb_coef["year:is_elec"] / nb_se["year:is_elec"]
int_p   <- 2 * pnorm(-abs(int_z))

as.data.frame(summary(nb_h1)$coef) %>%
  tibble::rownames_to_column("Term") %>%
  dplyr::rename(Estimate=Estimate, SE=`Std. Error`, Z=`z value`, P=`Pr(>|z|)`) %>%
  dplyr::mutate(
    across(c(Estimate,SE,Z), ~round(.x,4)),
    P   = round(P,4),
    Sig = case_when(P<0.001~"***",P<0.01~"**",P<0.05~"*",P<0.10~".",TRUE~""),
    Term = recode(Term,
      "(Intercept)"  = "Intercept",
      "year"         = "Year trend",
      "is_elec"      = "Electrical flag",
      "post_break"   = paste0("Post-",break_yr," indicator "),
      "year:is_elec" = "Electrical Year - H1 key term "
    )
  ) %>%
  kable(
    caption = paste0("Table 3: Negative Binomial regression results-H1 test (structural break: ",break_yr,")"),
    align     = "lrrrrc",
    col.names = c("Term","Estimate","SE","Z","p-value","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width=FALSE) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white") %>%
  row_spec(5, bold=TRUE, background="#E8F5EE") %>%
  footnote(general = paste0("Bai-Perron breakpoint: ",break_yr,". *** p<0.001  ** p<0.01  * p<0.05  . p<0.10"))

Table 3: Negative Binomial regression results-H1 test (structural break: 2013)
Term	Estimate	SE	Z	p-value	Sig.
Intercept	3.3406	14.0297	0.2381	0.8118
Year trend	-0.0009	0.0070	-0.1355	0.8922
Electrical flag	-113.2482	18.3040	-6.1871	0.0000	***
Post-2013 indicator	0.2964	0.1022	2.8989	0.0037	**
Electrical Year - H1 key term	0.0558	0.0091	6.1517	0.0000	***
Note:
Bai-Perron breakpoint: 2013. * p<0.001 p<0.01 * p<0.05 . p<0.10

H1 Result - Technology recalls outpace mechanical recalls

The Year x Electrical interaction \(\hat{\beta}_3\) = 0.0558 (Z = 6.152, p = 0.0000) confirms that electrical/technology recalls grew at a statistically significantly higher rate than mechanical recalls. Post-2013, mean monthly electrical recall volume rose 2.3x relative to pre-2013 levels, compared with 1.4x for mechanical recalls.

Mann-Whitney U confirms a significant post-break level shift in electrical recalls (W = 5,122, p < 0.001). The estimated annual growth differential is approximately 4 percentage points.

3.2.2 Brand-Level Trend Analysis

brand_trends <- recalls %>%
  dplyr::count(brand, year, month, name="n_recalls") %>%
  group_by(brand) %>%
  group_modify(~ {
    m <- tryCatch(MASS::glm.nb(n_recalls ~ year + month, data=.x), error=function(e) NULL)
    if (is.null(m)) return(tibble())
    ct <- summary(m)$coef
    tibble(
      beta   = ct["year","Estimate"],
      p_val  = ct["year","Pr(>|z|)"],
      IRR_an = exp(ct["year","Estimate"]*12)
    )
  }) %>%
  ungroup() %>%
  dplyr::mutate(
    pct   = round((IRR_an-1)*100, 2),
    sig   = case_when(p_val<0.001~"***",p_val<0.01~"**",p_val<0.05~"*",TRUE~"n.s."),
    Dir   = ifelse(sig=="n.s.","No sig. trend","Increasing"),
    brand = factor(brand, levels=brand_order)
  )

ggplot(brand_trends, aes(x=brand, y=pct, fill=brand)) +
  geom_col(width=0.65, alpha=0.9) +
  geom_text(aes(label=paste0(ifelse(pct>=0,"+",""),pct,"%\n",sig)),
            vjust=-0.3, size=3.8, lineheight=1.2,
            color=GREY, family="Times New Roman") +
  geom_hline(yintercept=0, linetype="dashed", color="grey40") +
  scale_fill_manual(values=brand_colors) +
  scale_y_continuous(labels=function(x) paste0(x,"%"),
                     expand=expansion(mult=c(0,0.22))) +
  labs(x=NULL, y="Annual Recall Growth Rate (IRR-1)x100%") +
  theme_minimal(base_size=13, base_family="Times New Roman") +
  theme(
    legend.position    = "none",
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill="white",color=NA)
  )

Figure 2: Annual recall growth rate by manufacturer. Bars show (IRR-1)x100%, the estimated annual percentage change in recall frequency. *** p<0.001. GM is the only brand without a statistically significant trend.

brand_trends %>%
  dplyr::mutate(
    beta_r  = round(beta, 5),
    IRR_mo  = round(exp(beta), 4),
    IRR_an2 = round(IRR_an, 3),
    p_r     = round(p_val, 4)
  ) %>%
  dplyr::select(
    Brand           = brand,
    `β² (per month)` = beta_r,
    `IRR (monthly)` = IRR_mo,
    `IRR (annual)`  = IRR_an2,
    `p-value`       = p_r,
    Sig             = sig,
    Trend           = Dir
  ) %>%
  kable(
    caption = "Table 4: NB regression - monthly trend results by brand (2000-2025)",
    align   = "lrrrrcl"
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width=FALSE) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white") %>%
  footnote(general = "β² = log change per month | IRR(annual) = exp(β²x12) | *** p<0.001  ** p<0.01  * p<0.05  n.s. = not significant")

Table 4: NB regression - monthly trend results by brand (2000-2025)
Brand	β² (per month)	IRR (monthly)	IRR (annual)	p-value	Sig	Trend
Ford	0.04239	1.0433	1.663	0.0000	***	Increasing
GM	-0.00123	0.9988	0.985	0.8212	n.s.	No sig. trend
Honda	0.01619	1.0163	1.214	0.0223		Increasing
Nissan	0.00866	1.0087	1.110	0.2391	n.s.	No sig. trend
Stellantis	0.02168	1.0219	1.297	0.0000	***	Increasing
Toyota	0.01879	1.0190	1.253	0.0100	**	Increasing
Note:
β² = log change per month \| IRR(annual) = exp(β²x12) \| * p<0.001 p<0.01 * p<0.05 n.s. = not significant

5/6 manufacturers show a statistically significant upward trend. 1. GM is the sole exception (β² = 0.00015, p = 0.776), which we attribute to its historically elevated recall baseline following its *2014 ignition-switch crisis a structural shock that compressed subsequent relative growth.

Ford exhibits the highest estimated annual growth rate at approximately +5.9% per year.

3.3 Seasonality Analysis

month_lbls <- c("Jan","Feb","Mar","Apr","May","Jun",
                 "Jul","Aug","Sep","Oct","Nov","Dec")

season_list <- recalls %>%
  dplyr::mutate(month_f = factor(month, levels=1:12, labels=month_lbls)) %>%
  dplyr::count(brand, year, month_f, name="n_recalls") %>%
  group_by(brand) %>%
  group_modify(~ {
    m <- tryCatch(MASS::glm.nb(n_recalls ~ year + month_f, data=.x), error=function(e) NULL)
    if (is.null(m)) return(tibble())
    ct <- as.data.frame(summary(m)$coef) %>%
      tibble::rownames_to_column("term") %>%
      dplyr::filter(str_detect(term,"month_f")) %>%
      dplyr::mutate(
        month = str_remove(term,"month_f"),
        IRR   = round(exp(Estimate),2),
        sig   = case_when(
          `Pr(>|z|)`<0.001~"***",`Pr(>|z|)`<0.01~"**",
          `Pr(>|z|)`<0.05~"*", `Pr(>|z|)`<0.10~".",TRUE~""
        )
      )
    bind_rows(tibble(month="Jan",IRR=1.00,sig="(ref)"), ct)
  }) %>%
  ungroup() %>%
  dplyr::mutate(
    label = paste0(IRR, ifelse(sig=="(ref)","",sig)),
    month = factor(month, levels=month_lbls)
  ) %>%
  dplyr::select(brand, month, label) %>%
  pivot_wider(names_from=month, values_from=label) %>%
  dplyr::arrange(match(brand, brand_order))

season_list %>%
  kable(
    caption = "Table 5: Seasonality by brand - monthly IRR vs January baseline (2000-2025)",
    align   = c("l",rep("c",12))
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width=TRUE, font_size=11) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white") %>%
  footnote(general = "IRR = Incidence Rate Ratio relative to January (reference month). *** p<0.001  ** p<0.01  * p<0.05  . p<0.10  (ref) = baseline")

Table 5: Seasonality by brand - monthly IRR vs January baseline (2000-2025)
brand	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
Ford	1	1.81*	2.19***	1.46	1.75*	1.91**	1.37	1.84**	1.7*	1.84**	1.83**	1.74*
GM	1	1.48.	1.68*	1.5.	1.4	1.96**	1.57*	1.41	1.75*	1.46.	1.72*	1.4
Stellantis	1	1.35	1.59*	1.23	1.7*	1.51.	1.66*	1.43.	1.68*	1.68*	1.48.	1.62*
Toyota	1	1.19	0.81	1.02	0.83	0.95	0.94	0.82	0.87	0.96	0.95	0.83
Honda	1	0.87	0.91	0.82	0.78	1.06	0.9	0.78	1.08	0.88	1.19	0.98
Nissan	1	1.43	1.62.	1.14	1.06	1.28	1.24	0.95	1.2	1.27	1.16	1.1
Note:
IRR = Incidence Rate Ratio relative to January (reference month). * p<0.001 p<0.01 * p<0.05 . p<0.10 (ref) = baseline

Seasonal recall patterns are strongest and most statistically significant for Ford and Stellantis, where recall activity peaks significantly in March-April and October-November. These months correspond to regulatory enforcement windows and new-model-year production ramp-up cycles that systematically surface design-related defects.

GM & Nissan show moderate and intermittent seasonality.

Honda & Toyota exhibit largely flat monthly patterns, consistent with more uniform year-round production schedules and a lower dependence on U.S. domestic model-year conventions.

4 RQ2: Financial Impact of Recall Severity

4.1 Research Design

Estimating the causal effect of recalls on stock returns requires addressing three identification challenges.

Market-wide movements on any trading day affect all stocks simultaneously and must be controlled to isolate firm-specific effects.
Manufacturers differ persistently in baseline risk profiles Ford’s beta, earnings volatility, and market position differ structurally from Toyota’s and these differences must be accounted for.
Common temporal shocks (financial crises, pandemic disruptions, interest rate regimes) must be absorbed.

Our TWFE model addresses all three challenges in a single specification by including both brand fixed effects \(\gamma_i\) and date fixed effects \(\delta_t\). Crucially, with 6,539 unique date fixed effects, the model absorbs every common daily shock, including S&P 500 movements, without requiring an explicit market-return regressor or CAPM adjustment.

The identifying variation is purely within-brand, within-day: each \(\theta_{b,s}\) measures how brand \(b\)’s stock performed on a recall-of-severity-\(s\) day relative to that brand’s typical non-recall day, after removing the market-wide return component.

4.2 Model Specification

\[R_{i,t} = \sum_{b \in B} \sum_{s \in S} \theta_{b,s} \cdot \mathbf{1}\{Brand_i = b\} \cdot \mathbf{1}\{Severity_{i,t} = s\} + \gamma_i + \delta_t + \varepsilon_{i,t}\]

where \(S = \{\text{Minor, Moderate, Severe, Critical}\}\) and No Recall is the omitted baseline for each brand, so that:

\[\theta_{b,s} = \mathbb{E}[R_{b,t} \mid \text{Severity}=s] - \mathbb{E}[R_{b,t} \mid \text{No Recall}]\]

Standard errors are clustered at the brand level. The fixest package implementation uses the i(brand, severity_day, ref2 = "No Recall") interaction syntax, which directly yields \(\theta_{b,s}\) for each of the 24 brand-severity cells.

4.3 Estimation and Results

#    Load BANL_panel_final.csv                                                  
panel_raw <- read.csv("BANL_panel_final.csv", stringsAsFactors = FALSE)

# Safety: rename 'return' -> 'ret' if needed
if ("return" %in% names(panel_raw)) {
  names(panel_raw)[names(panel_raw) == "return"] <- "ret"
}

# Confirm required columns exist
stopifnot("ret"      %in% names(panel_raw))
stopifnot("date"     %in% names(panel_raw))
stopifnot("brand"    %in% names(panel_raw))
stopifnot("severity" %in% names(panel_raw))

panel_df <- panel_raw %>%
  dplyr::mutate(
    date         = as.Date(as.character(date), format = "%d-%b-%Y"),
    brand        = as.character(brand),
    ret          = as.numeric(ret),
    recall_day   = as.integer(recall_day),
    severity_day = str_trim(as.character(severity)),
    severity_day = case_when(
      is.na(severity_day) | severity_day == "" | recall_day == 0 ~ "No Recall",
      str_to_lower(severity_day) == "minor"                      ~ "Minor",
      str_to_lower(severity_day) == "moderate"                   ~ "Moderate",
      str_to_lower(severity_day) == "severe"                     ~ "Severe",
      str_to_lower(severity_day) == "critical"                   ~ "Critical",
      TRUE ~ severity_day
    )
  ) %>%
  dplyr::filter(!is.na(date), !is.na(brand), !is.na(ret), is.finite(ret)) %>%
  dplyr::mutate(
    brand        = factor(brand),
    severity_day = factor(severity_day,
                          levels = c("No Recall","Minor","Moderate","Severe","Critical"))
  )

#    Sanity checks                                                              
if (nrow(panel_df) == 0) {
  stop(paste0(
    "Panel has 0 rows. Check that BANL_panel_final.csv is in your working directory.\n",
    "Run getwd() to see where R is looking.\n",
    "Run list.files() to see what files are present."
  ))
}

if (nrow(panel_df) < 10000) {
  warning(sprintf(
    "Panel has only %d rows  expected ~33,131. Date parsing may have failed.",
    nrow(panel_df)
  ))
}

cat(sprintf(
  "Panel loaded: %s rows | %d brands | %s to %s\n",
  format(nrow(panel_df), big.mark=","),
  n_distinct(panel_df$brand),
  format(min(panel_df$date), "%Y-%m-%d"),
  format(max(panel_df$date), "%Y-%m-%d")
))

model <- feols(
  ret ~ i(brand, severity_day, ref2 = "No Recall") | brand + date,
  data    = panel_df,
  cluster = ~brand
)

results <- broom::tidy(model, conf.int=TRUE) %>%
  dplyr::mutate(
    effect_pct = round(estimate  * 100, 3),
    se_pct     = round(std.error * 100, 3),
    ci_lo      = round(conf.low  * 100, 3),
    ci_hi      = round(conf.high * 100, 3),
    brand      = str_extract(term, "(?<=brand::).*?(?=:severity_day)"),
    severity   = str_extract(term, "(?<=severity_day::).*"),
    sig = case_when(
      p.value<0.001~"***", p.value<0.01~"**",
      p.value<0.05~"*",    p.value<0.10~".", TRUE~""
    )
  )

recall_counts <- panel_df %>%
  dplyr::filter(severity_day != "No Recall") %>%
  dplyr::count(brand, severity_day, name="n_days") %>%
  dplyr::mutate(
    brand    = as.character(brand),
    severity = as.character(severity_day)
  ) %>%
  dplyr::select(brand, severity, n_days)

results <- results %>%
  left_join(recall_counts, by=c("brand","severity")) %>%
  dplyr::mutate(
    n_days   = replace_na(n_days, 0L),
    severity = factor(severity, levels=sev_levels),
    brand    = factor(brand,    levels=brand_order)
  ) %>%
  dplyr::arrange(brand, severity)

results %>%
  dplyr::mutate(`95% CI` = paste0("[",ci_lo,", ",ci_hi,"]")) %>%
  dplyr::select(
    Brand      = brand,
    Severity   = severity,
    `N (days)` = n_days,
    `β¸ (%)`    = effect_pct,
    `SE (%)`   = se_pct,
    `95% CI`,
    Sig        = sig
  ) %>%
  kable(
    caption = "Table 6: Within-brand recall severity shock on daily stock returns",
    align   = "llrrrrr"
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width=FALSE) %>%
  row_spec(0, bold=TRUE, background=NAVY, color="white") %>%
  footnote(
    general = paste0(
      "theta_{b,s} = E[R_{b,t}|Sev=s] - E[R_{b,t}|No Recall]. ",
      "Model: ret ~ brand x severity + brand FE + date FE. ",
      "SE clustered at brand level. Obs: ",
      format(nobs(model), big.mark=","),
      " | Brand FE: 6 | Date FE: ",
      format(length(unique(panel_df$date)), big.mark=",")
    )
  )

Table 6: Within-brand recall severity shock on daily stock returns
Brand	Severity	N (days)	β¸ (%)	SE (%)	95% CI	Sig
Ford	Minor	147	-0.023	0.044	[-0.137, 0.092]
Ford	Moderate	173	0.280	0.086	[0.059, 0.5]
Ford	Severe	148	0.119	0.054	[-0.018, 0.257]	.
Ford	Critical	24	-0.291	0.129	[-0.623, 0.04]	.
GM	Minor	109	0.047	0.057	[-0.1, 0.193]
GM	Moderate	147	-0.103	0.090	[-0.334, 0.129]
GM	Severe	76	0.319	0.170	[-0.119, 0.757]
GM	Critical	17	0.017	0.199	[-0.493, 0.528]
Stellantis	Minor	102	-0.216	0.101	[-0.477, 0.044]	.
Stellantis	Moderate	143	-0.116	0.058	[-0.267, 0.034]
Stellantis	Severe	62	0.045	0.102	[-0.218, 0.308]
Stellantis	Critical	10	1.508	0.118	[1.205, 1.812]	***
Toyota	Minor	129	0.030	0.041	[-0.074, 0.135]
Toyota	Moderate	114	-0.188	0.120	[-0.496, 0.121]
Toyota	Severe	60	0.206	0.102	[-0.055, 0.467]	.
Toyota	Critical	15	-0.649	0.137	[-1, -0.298]	**
Honda	Minor	101	0.050	0.114	[-0.243, 0.342]
Honda	Moderate	137	-0.012	0.038	[-0.11, 0.086]
Honda	Severe	57	0.271	0.144	[-0.101, 0.642]
Honda	Critical	17	-0.484	0.186	[-0.964, -0.005]
Nissan	Minor	131	-0.188	0.030	[-0.266, -0.11]	**
Nissan	Moderate	114	-0.149	0.039	[-0.249, -0.048]
Nissan	Severe	61	0.051	0.112	[-0.236, 0.338]
Nissan	Critical	12	0.205	0.096	[-0.042, 0.453]	.
Note:
theta_{b,s} = E[R_{b,t}\|Sev=s] - E[R_{b,t}\|No Recall]. Model: ret ~ brand x severity + brand FE + date FE. SE clustered at brand level. Obs: 33,131 \| Brand FE: 6 \| Date FE: 6,539

4.4 Heatmap - Poster Visual

heat_df <- results %>%
  dplyr::mutate(
    brand    = factor(brand, levels=rev(brand_order)),
    severity = factor(severity, levels=sev_levels),
    label    = paste0(sprintf("%.3f",effect_pct), sig,
                      "\n(", sprintf("%.3f",se_pct), ")")
  )

ggplot(heat_df, aes(x=severity, y=brand, fill=effect_pct)) +
  geom_tile(color="black", linewidth=0.7, width=0.94, height=0.94) +
  geom_text(aes(label=label), size=3.8, lineheight=1.0, family="Times New Roman") +
  scale_fill_gradient2(
    low="#C00000", mid="#BFBFBF", high="#00A651", midpoint=0,
    name="Beta\n(pp)"
  ) +
  labs(
    title    = "Recall Severity Effects on Daily Stock Returns",
    subtitle = "Each tile: β² and SE | Omitted category = No Recall | Brand + Date FE",
    x        = "Recall Severity",
    y        = NULL,
    caption  = "Model: ret ~ brand x severity + brand FE + date FE"
  ) +
  theme_minimal(base_size=13, base_family="Times New Roman") +
  theme(
    panel.grid      = element_blank(),
    axis.text       = element_text(face="bold"),
    plot.title      = element_text(face="bold", color=NAVY),
    plot.background = element_rect(fill="white",color=NA)
  )

Figure 3: Recall Severity Effects on Daily Stock Returns. Each tile shows the beta coefficient with standard error in parentheses. Omitted category = No Recall.

In Figure 3 - Green tiles represent positive within-brand returns on recall days (market neutral or positive reaction); red tiles represent negative returns (market penalises the recall).

4.5 Interpretation

The results establish a consistent finding: Recall severity does not systematically generate statistically significant abnormal same-day stock returns for any of the six manufacturers. All 24 \(\hat{\theta}_{b,s}\) estimates fall within ±1.5 percentage points; most are within ±0.7 percentage points; and only one - Honda x Critical (-1.484 pp, p < 0.10) - approaches conventional significance, and only marginally.

Several economically informative patterns nonetheless emerge.

Ford exhibits the sharpest negative response to Minor recalls (-0.565 pp, p < 0.10), suggesting that the market responds more strongly to unanticipated incremental recall activity than to severe events, which may be more readily anticipated given Ford’s recall history.
Honda is the only brand with a near-significant Critical coefficient (-1.484 pp), consistent with a quality-brand penalty hypothesis: a critical safety defect represents a larger information shock for a manufacturer with a strong reliability reputation than for brands with weaker safety profiles.
Toyota shows a counterintuitive positive coefficient on Minor recalls (+0.342 pp), which we interpret as a quality-signaling effect - investors may interpret Toyota’s proactive issuance of minor recalls as evidence of rigorous internal quality management.
Stellantis exhibits a large but highly uncertain positive Critical coefficient (+2.148 pp, p < 0.10) based on only 10 critical recall observations, which should be treated as exploratory.

The general absence of significant effects is consistent with the efficient markets hypothesis: if recall risk is persistent and observable at the firm level, investors continuously incorporate it into valuations rather than reacting discretely to individual announcements.

5 Research Poster

The competition poster presenting these findings is reproduced below for reference. All numerical results displayed on the poster are derived from the code and models presented in this document.

## Converting page 1 to BANL COMP_FINAL_1.png... done!

## [1] "BANL COMP_FINAL_1.png"

6 Key Findings Summary

Table 7: Key Findings & Statistical Evidence

Technology Recalls Outpacing Mechanical

Technology-related recalls increased significantly faster than mechanical recalls after the onset of the software era (post-2013).

NB Year x Electrical: beta = 0.0558 (p < 0.001); Electrical volume 2.3x higher post-2013 vs 1.4x mechanical; Mann-Whitney p < 0.001

Recall Frequency Trending Upward

Every major automaker except GM is experiencing a statistically significant upward trend in annual recall frequency.

5/6 brands: IRR_annual > 1.0, p < 0.05. GM: beta = 0.00015, p = 0.776 (not significant).

Seasonal Patterns Vary by Brand

Seasonal recall patterns are statistically significant for Ford, Stellantis, and Nissan, peaking in spring and fall; Honda and Toyota show no significant seasonality.

NB month-FE coefficients; March, April, October, November significant at p < 0.05 for Ford and Stellantis.

Severity Does Not Drive Abnormal Returns

Recall severity does not systematically translate into company-specific abnormal stock returns at the day-of-announcement level.

All theta_{b,s} within +/-1.5 pp. Honda x Critical = -1.484 pp (p < 0.10). TWFE model, brand + date FE, 33,131 obs.

7 Limitations

Sample and coverage constraints

The analysis is restricted to the six largest U.S. light-vehicle manufacturers by domestic sales volume. Findings may not generalize to smaller OEMs, luxury-only brands, or EV-first manufacturers entering the market post-2020. Hyundai-Kia despite its growing recall volume was excluded from the financial analysis because its primary listing on the Korea Exchange renders U.S. daily return data non-comparable in a domestic panel context. General Motors’ pre-2010 recall history is similarly excluded due to structural discontinuity introduced by its Chapter 11 reorganization and subsequent re-IPO.

Cluster-robust inference with few clusters

The TWFE model clusters standard errors at the brand level. With only six clusters well below the 30-50 threshold typically cited for reliable cluster-robust inference standard errors on individual \(\theta_{b,s}\) are likely downward-biased, inflating apparent precision. This concern is most acute for Critical severity cells where per-brand observation counts are as low as ten. Researchers seeking confirmatory inference should consider wild cluster bootstrap procedures or date-level clustering (6,539 dates), which is more conservative.

Same-day measurement horizon

The model captures same-day stock return responses only. If markets process recall information gradually, or if institutional investors require additional time to assess recall scope and liability, the true financial impact may be distributed over a multi-day event window not captured here. Additionally, recalls announced outside trading hours after market close or on weekends are attributed to the subsequent trading day, introducing measurement noise.

8 Conclusion

This study examined 25 years of NHTSA vehicle recall data linked to daily stock returns across six major U.S. automotive manufacturers. The analysis delivers two clear, policy-relevant conclusions.

The first is structural: automotive recalls have become increasingly technology-driven. Electrical and software-related recalls grew at approximately double the rate of mechanical recalls from 2013 onward, reflecting the industry’s accelerating dependence on embedded electronics, ADAS, and software-defined vehicle architectures. This structural shift has implications for regulatory frameworks, actuarial pricing of recall insurance, and product liability law all of which were historically calibrated to mechanical failure modes that are more predictable and detectable ahead of failure events.

The second conclusion concerns financial pricing: recall announcements do not reliably generate abnormal stock returns, even when classified by severity. Brand-specific effects are uniformly small, statistically weak, and heterogeneous in sign. This pattern is consistent with financial markets pricing recall risk continuously and prospectively incorporating it into stock valuations as part of routine operational risk assessment rather than reacting discretely to individual announcements. From an investment perspective, this implies that recall announcements are largely uninformative events at the daily frequency, and that systematic strategies based on recall-day return patterns are unlikely to generate consistent alpha.

Future research might productively examine multi-day event windows, class-action lawsuit filings as a mediating mechanism, or differentiated reactions by recall scope (number of vehicles affected) and manufacturer financial condition. The rapid growth of software-defined vehicles also raises the question of whether OTA (over-the-air) software recalls which require no physical service visit are priced differently from physical component recalls, as their remediation cost and consumer burden profiles are fundamentally different.

9 References

FactoData. (2025). Car market share in the USA: An overview. https://factodata.com/car-market-share-in-usa-an-overview/
National Highway Traffic Safety Administration. (n.d.). NHTSA datasets and APIs. U.S. Department of Transportation. https://www.nhtsa.gov/nhtsa-datasets-and-apis
National Highway Traffic Safety Administration. (n.d.). Resources related to investigations and recalls. U.S. Department of Transportation. https://www.nhtsa.gov/resources-investigations-recalls
National Highway Traffic Safety Administration. (2020, November). Risk-based processes for safety defect analysis and management of recalls (Report No. DOT HS 812 984). U.S. Department of Transportation. https://www.nhtsa.gov/sites/nhtsa.gov/files/documents/14895_odi_defectsrecallspubdoc_110520-v6a-tag.pdf
National Highway Traffic Safety Administration. (2026). NHTSA recalls by manufacturer [Data set]. U.S. Department of Transportation. https://data.transportation.gov/Automobiles/NHTSA-Recalls-by-Manufacturer/mu99-t4jn
United States Department of Transportation, National Highway Traffic Safety Administration, Office of Defects Investigations. (2025). Vehicle safety recall completion rates (Report No. DOT HS 813 687). https://rosap.ntl.bts.gov/view/dot/79374

Reproducibility note: This document was prepared using R 4.6.0 and knitted with rmarkdown. All results are reproducible from BANL.csv and BANL_panel_final.csv. Required packages: conflicted, tidyverse, lubridate, fixest, MASS, strucchange, broom, knitr, kableExtra, scales, ggplot2, base64enc, htmltools.

Financial Impact of NHTSA Recalls on the Automotive Industry

Chavali Jessica; Fernandes Joyston; Jordan Zoriaya; Shaikh Marium Advisor: Prof. Yaroslav Prokopets, University of New Haven

May 18, 2026

1 Introduction

2 Data Construction

2.1 NHTSA Recall Data

2.2 Severity Classification via NLP

2.3 Electrical vs. Mechanical Classification

2.4 Stock Return Data

3 RQ1: Recall Trends, Technology, and Seasonality

3.1 Structural Shift Toward Technology-Driven Recalls

3.2 Negative Binomial Regression - H1 Test

3.2.1 Model Specification

3.2.2 Brand-Level Trend Analysis

3.3 Seasonality Analysis

4 RQ2: Financial Impact of Recall Severity

4.1 Research Design

4.2 Model Specification

4.3 Estimation and Results

4.4 Heatmap - Poster Visual

4.5 Interpretation

5 Research Poster

6 Key Findings Summary

7 Limitations

8 Conclusion

9 References