1 Data loading

This report analyzes intraday 5-minute stock movement direction (TargetVariable, 0/1) and builds a logistic regression using the four Variable74 series (levels + lag-12 differences), with the response defined as the 13-period lag of the target.

2 Intraday patterns

2.1 Average day

2.2 By month

2.3 By weekday

3 Missing data handling

We check missingness by column and apply median imputation to numeric predictors (the target is not imputed).

Top 20 columns by missing rate
variable missing_rate
Variable167LAST 0.0133401
Variable168LAST 0.0133401
Variable169LAST 0.0133401
Variable170LAST 0.0133401
Variable171LAST 0.0133401
Variable172LAST 0.0133401
Variable173LAST 0.0133401
Variable174LAST 0.0133401
Variable175LAST 0.0133401
Variable176LAST 0.0133401
Variable177LAST 0.0133401
Variable178LAST 0.0133401
Variable179LAST 0.0133401
Variable180LAST 0.0133401
Variable157OPEN 0.0001689
Variable157HIGH 0.0001689
Variable157LOW 0.0001689
Variable157LAST 0.0001689
Timestamp 0.0000000
TargetVariable 0.0000000

4 Variable74 exploration

4.1 Levels

4.2 Lag-12 differences

5 Train/test split

The test set is the last 2539 periods (each period is 5 minutes).

Split summary
test_periods minutes_in_test training_prop_target_1
2539 12695 0.6027195

6 Logistic regression model

Model specification: - Y = TargetVariable lagged by 13 periods - X = Variable74 levels + lag-12 differences

Logistic regression coefficients (with odds ratios)
term estimate std.error statistic p.value odds_ratio conf.low conf.high
(Intercept) -6.3971 1.7212 -3.7166 0.0002 1.700000e-03 0.0001 4.860000e-02
Variable74OPEN -2.0705 9.8477 -0.2103 0.8335 1.261000e-01 0.0000 3.042871e+07
Variable74HIGH 30.3154 10.3353 2.9332 0.0034 1.464923e+13 23347.3915 9.191598e+21
Variable74LOW -26.8219 10.8340 -2.4757 0.0133 0.000000e+00 0.0000 3.700000e-03
Variable74LAST_PRICE -0.8619 10.3361 -0.0834 0.9335 4.224000e-01 0.0000 2.654536e+08
d74OPEN -14.4349 6.7643 -2.1340 0.0328 0.000000e+00 0.0000 3.082000e-01
d74HIGH -48.2618 7.4886 -6.4447 0.0000 0.000000e+00 0.0000 0.000000e+00
d74LOW -14.2190 7.3647 -1.9307 0.0535 0.000000e+00 0.0000 1.240900e+00
d74LAST 17.5875 7.1894 2.4463 0.0144 4.346584e+07 32.9944 5.726054e+13

7 Forecast evaluation

Rule: predict 1 when probability > 0.5, else 0.

Confusion matrix template: - a = TP, b = FP, c = FN, d = TN

## $confusion_matrix
##                Actual Value
## Predicted Value   0    1
##               0 812   59
##               1 149 1519
## 
## $Accuracy
## [1] 0.918078
## 
## $Recall_per_template
## [1] 0.9106715