We are studying a sample dataset of bitcoin with its historical volatilities within specific time interval between last November & December 2025.
In this analysis we will try not just to analyse trends, but also predict number of points having specific value(peak) based on poisson regression.
Head of first six Bitcoin data rows.
| Date | Price | Open | High | Low | Vol. | Change % |
|---|---|---|---|---|---|---|
| 12/17/2025 | 89333.5 | 87859.5 | 89469.7 | 86238.9 | 48.55K | 1.68% |
| 12/16/2025 | 87853.4 | 86429.2 | 88133.0 | 85392.1 | 59.23K | 1.65% |
| 12/15/2025 | 86428.6 | 88154.4 | 89974.1 | 85228.8 | 61.65K | -1.96% |
| 12/14/2025 | 88155.3 | 90248.3 | 90472.3 | 87657.9 | 32.26K | -2.32% |
| 12/13/2025 | 90249.3 | 90272.0 | 90638.1 | 89781.3 | 23.47K | -0.03% |
| 12/12/2025 | 90272.0 | 92510.4 | 92744.6 | 89498.1 | 49.84K | -2.42% |
Data manipulation
Creating the volatility column based on difference between closing and open price.
Data <-Bitcoin_Historical_Data %>% mutate(volatility_pts=Price-Open)
View(Data)
class(Data$Date)
## [1] "character"
Data$Date <- as.Date(Data$Date,format="%m/%d/%Y")
any(is.na(Data$Date))
## [1] FALSE
Data %>% ggplot(aes(x=Date,y=Price))+
geom_point(col="blue")
Studying linear relationship between Open Price, Closing Price & Low volatility during..
Data %>% ggplot(aes(x=`Open`,y=`Price`,color=`Low`))+
geom_point()+
geom_smooth(method="lm",
se=FALSE,
color="pink")
## `geom_smooth()` using formula = 'y ~ x'
There is a linear correlation between the three variables but with some dispersion, so let’s model and analyze!
Model1 <- lm(`Price` ~ `Open`+`Low`+`High`,data=Data)
summary(Model1)
##
## Call:
## lm(formula = Price ~ Open + Low + High, data = Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2483.09 -552.63 22.89 335.54 1728.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13918.8660 6784.0763 2.052 0.05001 .
## Open -0.4718 0.1431 -3.297 0.00274 **
## Low 0.6007 0.1150 5.221 1.68e-05 ***
## High 0.7156 0.1564 4.576 9.51e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 965.8 on 27 degrees of freedom
## Multiple R-squared: 0.8538, Adjusted R-squared: 0.8375
## F-statistic: 52.55 on 3 and 27 DF, p-value: 2.113e-11
Low & High prices are Very significant predictors for closing price, while Open price is little bit significant but still significant and effective to our response closing Price.
Now, we will create new predictors based on our original ones. In addition to the weekly return, weekly volatility and volatility mean..
Data$above_90k <- ifelse(Data$Price>90000,1,0)
Data_weekly <- Data %>% mutate(week=floor_date(Date,"week")) %>%
group_by(week) %>%
summarise(
peaks_90k=sum(above_90k),
Vol_pts_max=max(volatility_pts),
Vol_pts_min=min(volatility_pts),
Vol_pts_abs=mean(abs(volatility_pts)),
ret_week=log(last(Price)/first(Price)),
vol_week=sd(diff(log(Price))),
open_week=first(Open),
close_week=last(Price),
high_week=max(High),
low_week=min(Low),
.groups="drop")
gt(Data_weekly)%>%opt_stylize(style=4,color="blue")
| week | peaks_90k | Vol_pts_max | Vol_pts_min | Vol_pts_abs | ret_week | vol_week | open_week | close_week | high_week | low_week |
|---|---|---|---|---|---|---|---|---|---|---|
| 2025-11-16 | 3 | 765.7 | -4929.4 | 1842.2167 | 0.084535312 | 0.02378041 | 85117.2 | 92195.0 | 96002.7 | 80697.7 |
| 2025-11-23 | 4 | 3133.4 | -929.3 | 1282.7714 | -0.044751766 | 0.01676387 | 90894.7 | 86826.6 | 92942.6 | 84729.0 |
| 2025-11-30 | 4 | 4972.4 | -4063.1 | 2263.3143 | 0.012967929 | 0.03717708 | 89333.6 | 90374.2 | 94077.0 | 83858.1 |
| 2025-12-07 | 7 | 2056.9 | -2238.4 | 990.9143 | 0.001818858 | 0.01546032 | 90272.0 | 90413.6 | 94591.4 | 87747.6 |
| 2025-12-14 | 0 | 1474.0 | -2093.0 | 1679.2500 | -0.013276526 | 0.02096455 | 87859.5 | 88155.3 | 90472.3 | 85228.8 |
open_mean <-mean(Data_weekly$open_week)
open_sd <- sd(Data_weekly$open_week)
vol_mean<- mean(Data_weekly$Vol_pts_abs)
vol_sd <- sd(Data_weekly$Vol_pts_abs)
Data_weekly <- Data_weekly %>%
mutate(open_week_sc = (open_week-open_mean)/open_sd,
vol_pts_abs_sc=(Vol_pts_abs-vol_mean)/vol_sd)
M1 <- glm(peaks_90k ~`vol_pts_abs_sc`,
family=poisson(link="log"),
data=Data_weekly)
M2 <- glm(peaks_90k ~ `ret_week`,
family=poisson(link="log"),
data=Data_weekly)
M3 <- glm(peaks_90k ~ `open_week_sc`,
family=poisson(link="log"),
data=Data_weekly)
M4 <- glm(peaks_90k ~ `vol_pts_abs_sc`+`ret_week`+`open_week_sc`,
family=poisson(link="log"),
data=Data_weekly)
AIC(M1,M2,M3,M4)
## df AIC
## M1 2 25.62309
## M2 2 27.22972
## M3 2 25.27782
## M4 4 23.75676
So after comparing three AIC according to our four models, we conclude model M4 is of lowest AIC.
summary(M4)
##
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc,
## family = poisson(link = "log"), data = Data_weekly)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.7962 0.3776 2.108 0.0350 *
## vol_pts_abs_sc -0.1240 0.2579 -0.481 0.6306
## ret_week 29.7523 14.8780 2.000 0.0455 *
## open_week_sc 1.5282 0.7117 2.147 0.0318 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 9.9015 on 4 degrees of freedom
## Residual deviance: 2.4258 on 1 degrees of freedom
## AIC: 23.757
##
## Number of Fisher Scoring iterations: 5
Although weekly absolute volatility was not individually significant, its inclusion improved overall model fit as measured by AIC, therefore it will be retained in the final specification after comparing with another reduced model!
M_reduced <- glm(peaks_90k ~`ret_week`+`open_week_sc`,
family=poisson(link="log"),
data=Data_weekly)
AIC(M4,M_reduced)
## df AIC
## M4 4 23.75676
## M_reduced 3 21.99744
Although weekly absolute volatility was not individually statistically significant, its inclusion improved overall model fit, as indicated by a lower AIC(Delta AIC >2). Therefore, it was retained in the final model.
final_model <- M4
summary(final_model)
##
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc,
## family = poisson(link = "log"), data = Data_weekly)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.7962 0.3776 2.108 0.0350 *
## vol_pts_abs_sc -0.1240 0.2579 -0.481 0.6306
## ret_week 29.7523 14.8780 2.000 0.0455 *
## open_week_sc 1.5282 0.7117 2.147 0.0318 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 9.9015 on 4 degrees of freedom
## Residual deviance: 2.4258 on 1 degrees of freedom
## AIC: 23.757
##
## Number of Fisher Scoring iterations: 5
new_week <- data.frame(ret_week=0.04,
open_week_sc=(92000-open_mean)/open_sd,
vol_pts_abs_sc=(1500-vol_mean)/vol_sd
)
predict(final_model,newdata = new_week,
type="response")
## 1
## 67.10492
Due to the limited sample size,weekly aggregation resulted in an insufficient number of observations and unstable Poisson estimates. Therefore, the analysis gives us un_interpretable predicions..
So, we will create daily peak count.
Data2 <- Data %>% mutate(peak_90k=ifelse(Price>90000,1,0))
This is now:0 means no peak that day, and 1 means peak that day.
Data2 <- Data2 %>% mutate(vol_pts=abs(volatility_pts),
ret_day=log(Price/lag(Price))) %>%
drop_na()
Data2 <- Data2 %>% mutate(vol_pts_sc=scale(vol_pts),
open_sc=scale(Open),
ret_day_sc=scale(ret_day))
Now we will try logistic regression models to be more natural with binary daily peaks.
Model1 <-glm(peak_90k ~ vol_pts_sc ,
family=binomial,
data=Data2)
Model2 <-glm(peak_90k ~ ret_day_sc ,
family=binomial,
data=Data2)
Model3 <-glm(peak_90k ~ open_sc ,
family=binomial,
data=Data2)
AIC(Model1,Model2,Model3)
## df AIC
## Model1 2 42.90213
## Model2 2 38.79658
## Model3 2 34.86184
After comparison, Model 3 of open price predictor was the most effective within that time period since it is of lowest AIC
FinalModel <- Model3
summary(FinalModel)
##
## Call:
## glm(formula = peak_90k ~ open_sc, family = binomial, data = Data2)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.4937 0.4453 1.109 0.26756
## open_sc 1.3816 0.5347 2.584 0.00977 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 40.381 on 29 degrees of freedom
## Residual deviance: 30.862 on 28 degrees of freedom
## AIC: 34.862
##
## Number of Fisher Scoring iterations: 4
Open price variable is very significant in our binary daily model, as it identify if at that day we will have a 90k peak!