Inference on many jumps in nonparametric panel regression models

This R markdown file accompanies the paper Inference on many jumps in nonparametric panel regression models. In particular, it illustrates how the R-package hdthreshold (available on CRAN) can be used to test for existence of possible threshold effects in high dimensional nonparametric panel regression models. The following empirical example is taken from section 5.1 of the above paper and is concerned with discontinuities in the news impact curve, i.e., in the relationship between volatility and past stock returns.

Install and load package

As a first step we need to install and load the R package hdthreshold from CRAN:

#install.packages("hdthreshold")
library(hdthreshold)
library(xtable)
library(rdrobust)
library(fdrtool)
library(KernSmooth)

## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009

Read in data and create plots for stock returns and volatility

Read in the data which covers a subset of 12 stocks out of the entire sample of 500 stocks. The data includes the date, the ticker symbol, volatility and the lagged stock return.

data <- read.csv("sp500_subset.csv")[,-1]

To get an overview of the data, plot the time series of stock returns and volatility for the stock of AT&T, adding the empirical 10% and 90% quantiles as dashed lines to the return plot (Same as the Figure 2 in our paper):

d = data[data$Symbol=="T",]
x = d$Lag.return
y = d$Volatility

par(mar=c(2, 2, 0.5, 0.5))
plot(x~as.Date(d$Date),type="p",xlab="",ylab="",xaxt='n',pch=".")
axis.Date(1, at=seq(min(as.Date(d$Date)), max(as.Date(d$Date)), by="1 year"), format="%Y")
abline(h=quantile(x,0.1),lty=5,lwd=2)
abline(h=quantile(x,0.9),lty=5,lwd=2)

par(mar=c(2, 2, 0.5, 0.5))
plot(y~as.Date(d$Date),type="l",xlab="",ylab="",xaxt='n',lwd=0.5)
axis.Date(1, at=seq(min(as.Date(d$Date)), max(as.Date(d$Date)), by="1 year"), format="%Y")

Model and Test statistic

Consider \(Y_{jt}\) as the output variable which is the volatility of firm \(j\)’s stock at time \(t\) in our example with a total of \(N=12\) stocks (in this subset of stocks, in general we have \(N=499\)) and \(T_j\) is the number of observations of stock \(j\). Variable \(X_{jt}\) is an observed covariate which is the lagged stock return in this case. \[ \begin{align*} Y_{jt}=&h_{j}(X_{jt})+\tau_{j}(X_{jt})\mathbf{1}_{\{X_{jt}\geq c_{0j}\}}+e_{jt},\\ \gamma_j=&\tau_j (c_{0j} ), \end{align*} \] where \(h_{j}(\cdot)\) and \(\tau_j(\cdot)\) are some smooth varying functions, \(\gamma_j\) represents the jump at \(c_{0j}\) (the unknown break location) and \(e_{jt}\)’s represent the error terms. Since \(c_{0j}\) is not known, we consider a grid of possible threshold locations \([c_1,c_{2},\ldots ,c_K]\) for each individual \(j\in \lbrack N]\).

Consider the estimated jump \(\widehat{\gamma}_{j}(c_i)\) at candidate location \(c_i\) as \[\widehat{\gamma}_{j}(c_i)=\sum_{t=1}^{T}w_{jt,b}^{+}(c_i)Y_{jt}-\sum_{t=1}^{T}w_{jt,b}^{-}(c_i)Y_{jt},\] where \(w_{jt,b}^{+}(c_i)\) and \(w_{jt,b}^{-}(c_i)\) are some weights that depend on the covariate \(X_{jt}\).

Consider the feasible test statistic: \[\widehat{I}^C=\max_{1\leq i\leq K}\max_{1\leq j\leq N}(Tb_{j})^{1/2}|\widehat{\gamma}_{j}(c_i)/\widehat{v}_{j}(c_i)|,\] where \(\widehat v_{j}(c_i)\) is an estimator for \[v_{j}^{2}(c_i)=(Tb_{j}) \sum_{t=1}^{T}(w_{jt,b}^{+}(c_i)-w_{jt,b}^{-}(c_i))^{2}\mathrm{Var}\big(e_{jt}|X_{jt}\big).\]

By the results of Theorem 2, we select critical values by taking the \((1-\alpha)\) quantile of the Gaussian random variable \(\max_{1\leq k \leq K}\max_{1\leq j\leq N}|Z_{jk}|\) with \(Z_{jk}\)’s being independent standard normal variables.

Uniform test procedure for existence of threshold effects

We can use the function threshold.test to test the existence of threshold effects in the relationship between volatility and lagged stock returns. We have to specify the response variable, the running variable and the id variable (the stock ticker symbol). Further, we specify a grid for possible threshold locations, the bandwidth and a threshold parameter alpha which determines the only stocks that are significant at level alpha are displayed in the resulting output table.

data$Volatility = data$Volatility*100
test = threshold.test(data=data, response="Volatility", running="Lag.return", id="Symbol", C=c(-0.01,-0.005,0,0.005,0.01), bw=0.026, alternative="two", alpha=0.01)

## Individual  1

## Individual  2

## Individual  3

## Individual  4

## Individual  5

## Individual  6

## Individual  7

## Individual  8

## Individual  9

## Individual  10

## Individual  11

## Individual  12

The output of the function is a list containing the test statistic, the corresponding p-value, the dimension N, the critical values and a table displaying the stocks which have a significant threshold effect at an alpha significance level (Same as Table 1 in our paper).

test

## $I_hat
## [1] 6.793281
## 
## $p_value
## [1] 6.576739e-10
## 
## $N
## [1] 12
## 
## $Critical_values
##      90%      95%      99%    99.9% 
## 3.128926 3.334502 3.763590 4.305315 
## 
## $Table
##      ID   c_0j gamma_hat_j v_hat_j I_hat_j
## 1   ALL -0.010      -0.240   0.043  -5.535
## 2   AXP -0.010      -0.223   0.048  -4.702
## 3  CTAS -0.010      -0.189   0.041  -4.632
## 4  ORLY -0.010      -0.211   0.044  -4.750
## 5   SJM -0.010      -0.191   0.040  -4.742
## 6  NXPI -0.005      -0.529   0.078  -6.793
## 7   HON  0.005       0.136   0.028   4.763
## 8     T  0.010       0.162   0.033   4.857
## 9   MCK  0.010       0.211   0.042   5.000
## 10  TRV  0.010       0.156   0.031   4.978
## 11   WM  0.010       0.181   0.034   5.290

We can also easily export the output of the summary table to latex code:

print(xtable(test$Table,digits=c(0,0,3,3,3,3)),include.rownames=FALSE)

## % latex table generated in R 4.4.1 by xtable 1.8-4 package
## % Thu Jan  9 16:11:49 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrr}
##   \hline
## ID & c\_0j & gamma\_hat\_j & v\_hat\_j & I\_hat\_j \\ 
##   \hline
## ALL & -0.010 & -0.240 & 0.043 & -5.535 \\ 
##   AXP & -0.010 & -0.223 & 0.048 & -4.702 \\ 
##   CTAS & -0.010 & -0.189 & 0.041 & -4.632 \\ 
##   ORLY & -0.010 & -0.211 & 0.044 & -4.750 \\ 
##   SJM & -0.010 & -0.191 & 0.040 & -4.742 \\ 
##   NXPI & -0.005 & -0.529 & 0.078 & -6.793 \\ 
##   HON & 0.005 & 0.136 & 0.028 & 4.763 \\ 
##   T & 0.010 & 0.162 & 0.033 & 4.857 \\ 
##   MCK & 0.010 & 0.211 & 0.042 & 5.000 \\ 
##   TRV & 0.010 & 0.156 & 0.031 & 4.978 \\ 
##   WM & 0.010 & 0.181 & 0.034 & 5.290 \\ 
##    \hline
## \end{tabular}
## \end{table}

Visulalization of threshold effects

Finally, we can visualize the estimated threshold effect for AT&T which is significant at 1% level at the location c=0.01. As a comparison, we also plot the effect for Akamai which does not have any significant threshold effects for any threshold locations (Same as Figure 3 in our paper) we consider:

rdplot(y,x,masspoints = "off",kernel="uni",h=0.026,c=0.01,p=1,nbins=c(20,20),x.lim=c(-0.12,0.12),y.lim=c(0,0.06),title="AT&T",x.label="LagReturn",y.label="Volatility")

rdplot(data$Volatility[data$Symbol=="AKAM"]/100,data$Lag.return[data$Symbol=="AKAM"],masspoints = "off",kernel="uni",h=0.026,c=0.01,p=1,nbins=c(20,20),x.lim=c(-0.12,0.12),y.lim=c(0,0.06),title="Akamai",x.label="LagReturn",y.label="Volatility")