This R markdown file accompanies the paper Inference on many jumps in
nonparametric panel regression models. In particular, it illustrates how
the R-package hdthreshold (available on CRAN) can be used
to test for existence of possible threshold effects in high dimensional
nonparametric panel regression models. The following empirical example
is taken from section 5.1 of the above paper and is concerned with
discontinuities in the news impact curve, i.e., in the relationship
between volatility and past stock returns.
As a first step we need to install and load the R package
hdthreshold from CRAN:
#install.packages("hdthreshold")
library(hdthreshold)
library(xtable)
library(rdrobust)
library(fdrtool)
library(KernSmooth)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
Read in the data which covers a subset of 12 stocks out of the entire sample of 500 stocks. The data includes the date, the ticker symbol, volatility and the lagged stock return.
data <- read.csv("sp500_subset.csv")[,-1]
To get an overview of the data, plot the time series of stock returns and volatility for the stock of AT&T, adding the empirical 10% and 90% quantiles as dashed lines to the return plot (Same as the Figure 2 in our paper):
d = data[data$Symbol=="T",]
x = d$Lag.return
y = d$Volatility
par(mar=c(2, 2, 0.5, 0.5))
plot(x~as.Date(d$Date),type="p",xlab="",ylab="",xaxt='n',pch=".")
axis.Date(1, at=seq(min(as.Date(d$Date)), max(as.Date(d$Date)), by="1 year"), format="%Y")
abline(h=quantile(x,0.1),lty=5,lwd=2)
abline(h=quantile(x,0.9),lty=5,lwd=2)
par(mar=c(2, 2, 0.5, 0.5))
plot(y~as.Date(d$Date),type="l",xlab="",ylab="",xaxt='n',lwd=0.5)
axis.Date(1, at=seq(min(as.Date(d$Date)), max(as.Date(d$Date)), by="1 year"), format="%Y")
Consider \(Y_{jt}\) as the output variable which is the volatility of firm \(j\)’s stock at time \(t\) in our example with a total of \(N=12\) stocks (in this subset of stocks, in general we have \(N=499\)) and \(T_j\) is the number of observations of stock \(j\). Variable \(X_{jt}\) is an observed covariate which is the lagged stock return in this case. \[ \begin{align*} Y_{jt}=&h_{j}(X_{jt})+\tau_{j}(X_{jt})\mathbf{1}_{\{X_{jt}\geq c_{0j}\}}+e_{jt},\\ \gamma_j=&\tau_j (c_{0j} ), \end{align*} \] where \(h_{j}(\cdot)\) and \(\tau_j(\cdot)\) are some smooth varying functions, \(\gamma_j\) represents the jump at \(c_{0j}\) (the unknown break location) and \(e_{jt}\)’s represent the error terms. Since \(c_{0j}\) is not known, we consider a grid of possible threshold locations \([c_1,c_{2},\ldots ,c_K]\) for each individual \(j\in \lbrack N]\).
Consider the estimated jump \(\widehat{\gamma}_{j}(c_i)\) at candidate location \(c_i\) as \[\widehat{\gamma}_{j}(c_i)=\sum_{t=1}^{T}w_{jt,b}^{+}(c_i)Y_{jt}-\sum_{t=1}^{T}w_{jt,b}^{-}(c_i)Y_{jt},\] where \(w_{jt,b}^{+}(c_i)\) and \(w_{jt,b}^{-}(c_i)\) are some weights that depend on the covariate \(X_{jt}\).
Consider the feasible test statistic: \[\widehat{I}^C=\max_{1\leq i\leq K}\max_{1\leq j\leq N}(Tb_{j})^{1/2}|\widehat{\gamma}_{j}(c_i)/\widehat{v}_{j}(c_i)|,\] where \(\widehat v_{j}(c_i)\) is an estimator for \[v_{j}^{2}(c_i)=(Tb_{j}) \sum_{t=1}^{T}(w_{jt,b}^{+}(c_i)-w_{jt,b}^{-}(c_i))^{2}\mathrm{Var}\big(e_{jt}|X_{jt}\big).\]
By the results of Theorem 2, we select critical values by taking the \((1-\alpha)\) quantile of the Gaussian random variable \(\max_{1\leq k \leq K}\max_{1\leq j\leq N}|Z_{jk}|\) with \(Z_{jk}\)’s being independent standard normal variables.
We can use the function threshold.test to test the
existence of threshold effects in the relationship between volatility
and lagged stock returns. We have to specify the response variable, the
running variable and the id variable (the stock ticker symbol). Further,
we specify a grid for possible threshold locations, the bandwidth and a
threshold parameter alpha which determines the only stocks that are
significant at level alpha are displayed in the resulting output
table.
data$Volatility = data$Volatility*100
test = threshold.test(data=data, response="Volatility", running="Lag.return", id="Symbol", C=c(-0.01,-0.005,0,0.005,0.01), bw=0.026, alternative="two", alpha=0.01)
## Individual 1
## Individual 2
## Individual 3
## Individual 4
## Individual 5
## Individual 6
## Individual 7
## Individual 8
## Individual 9
## Individual 10
## Individual 11
## Individual 12
The output of the function is a list containing the test statistic, the corresponding p-value, the dimension N, the critical values and a table displaying the stocks which have a significant threshold effect at an alpha significance level (Same as Table 1 in our paper).
test
## $I_hat
## [1] 6.793281
##
## $p_value
## [1] 6.576739e-10
##
## $N
## [1] 12
##
## $Critical_values
## 90% 95% 99% 99.9%
## 3.128926 3.334502 3.763590 4.305315
##
## $Table
## ID c_0j gamma_hat_j v_hat_j I_hat_j
## 1 ALL -0.010 -0.240 0.043 -5.535
## 2 AXP -0.010 -0.223 0.048 -4.702
## 3 CTAS -0.010 -0.189 0.041 -4.632
## 4 ORLY -0.010 -0.211 0.044 -4.750
## 5 SJM -0.010 -0.191 0.040 -4.742
## 6 NXPI -0.005 -0.529 0.078 -6.793
## 7 HON 0.005 0.136 0.028 4.763
## 8 T 0.010 0.162 0.033 4.857
## 9 MCK 0.010 0.211 0.042 5.000
## 10 TRV 0.010 0.156 0.031 4.978
## 11 WM 0.010 0.181 0.034 5.290
We can also easily export the output of the summary table to latex code:
print(xtable(test$Table,digits=c(0,0,3,3,3,3)),include.rownames=FALSE)
## % latex table generated in R 4.4.1 by xtable 1.8-4 package
## % Thu Jan 9 16:11:49 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrr}
## \hline
## ID & c\_0j & gamma\_hat\_j & v\_hat\_j & I\_hat\_j \\
## \hline
## ALL & -0.010 & -0.240 & 0.043 & -5.535 \\
## AXP & -0.010 & -0.223 & 0.048 & -4.702 \\
## CTAS & -0.010 & -0.189 & 0.041 & -4.632 \\
## ORLY & -0.010 & -0.211 & 0.044 & -4.750 \\
## SJM & -0.010 & -0.191 & 0.040 & -4.742 \\
## NXPI & -0.005 & -0.529 & 0.078 & -6.793 \\
## HON & 0.005 & 0.136 & 0.028 & 4.763 \\
## T & 0.010 & 0.162 & 0.033 & 4.857 \\
## MCK & 0.010 & 0.211 & 0.042 & 5.000 \\
## TRV & 0.010 & 0.156 & 0.031 & 4.978 \\
## WM & 0.010 & 0.181 & 0.034 & 5.290 \\
## \hline
## \end{tabular}
## \end{table}
Finally, we can visualize the estimated threshold effect for AT&T which is significant at 1% level at the location c=0.01. As a comparison, we also plot the effect for Akamai which does not have any significant threshold effects for any threshold locations (Same as Figure 3 in our paper) we consider:
rdplot(y,x,masspoints = "off",kernel="uni",h=0.026,c=0.01,p=1,nbins=c(20,20),x.lim=c(-0.12,0.12),y.lim=c(0,0.06),title="AT&T",x.label="LagReturn",y.label="Volatility")
rdplot(data$Volatility[data$Symbol=="AKAM"]/100,data$Lag.return[data$Symbol=="AKAM"],masspoints = "off",kernel="uni",h=0.026,c=0.01,p=1,nbins=c(20,20),x.lim=c(-0.12,0.12),y.lim=c(0,0.06),title="Akamai",x.label="LagReturn",y.label="Volatility")