We are studying a sample dataset of bitcoin with its historical volatilities within specific time interval between last November & December 2025.

In this analysis we will try not just to analyse trends, but also predict number of points having specific value(peak) based on poisson regression.

Head of first six Bitcoin data rows.

Date Price Open High Low Vol. Change %
12/17/2025 89333.5 87859.5 89469.7 86238.9 48.55K 1.68%
12/16/2025 87853.4 86429.2 88133.0 85392.1 59.23K 1.65%
12/15/2025 86428.6 88154.4 89974.1 85228.8 61.65K -1.96%
12/14/2025 88155.3 90248.3 90472.3 87657.9 32.26K -2.32%
12/13/2025 90249.3 90272.0 90638.1 89781.3 23.47K -0.03%
12/12/2025 90272.0 92510.4 92744.6 89498.1 49.84K -2.42%

Data manipulation

Creating the volatility column based on difference between closing and open price.

Data <-Bitcoin_Historical_Data  %>% mutate(volatility_pts=Price-Open)
View(Data)

Visualization

class(Data$Date)
## [1] "character"
Data$Date <- as.Date(Data$Date,format="%m/%d/%Y")
any(is.na(Data$Date))
## [1] FALSE
Data %>% ggplot(aes(x=Date,y=Price))+
  geom_point(col="blue")

Studying linear relationship between Open Price, Closing Price & Low volatility during..

Data %>% ggplot(aes(x=`Open`,y=`Price`,color=`Low`))+
  geom_point()+
  geom_smooth(method="lm",
                se=FALSE,
                color="pink")
## `geom_smooth()` using formula = 'y ~ x'

There is a linear correlation between the three variables but with some dispersion, so let’s model and analyze!

Building Model

Model1 <- lm(`Price` ~ `Open`+`Low`+`High`,data=Data)
summary(Model1)
## 
## Call:
## lm(formula = Price ~ Open + Low + High, data = Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2483.09  -552.63    22.89   335.54  1728.10 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13918.8660  6784.0763   2.052  0.05001 .  
## Open           -0.4718     0.1431  -3.297  0.00274 ** 
## Low             0.6007     0.1150   5.221 1.68e-05 ***
## High            0.7156     0.1564   4.576 9.51e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 965.8 on 27 degrees of freedom
## Multiple R-squared:  0.8538, Adjusted R-squared:  0.8375 
## F-statistic: 52.55 on 3 and 27 DF,  p-value: 2.113e-11

Low & High prices are Very significant predictors for closing price, while Open price is little bit significant but still significant and effective to our response closing Price.

Studying peak of 90k

Summarizing peak points by week

Now, we will create new predictors based on our original ones. In addition to the weekly return, weekly volatility and volatility mean..

Data$above_90k <- ifelse(Data$Price>90000,1,0)
Data_weekly <- Data %>% mutate(week=floor_date(Date,"week")) %>% 
                                 group_by(week) %>% 
                                 summarise(
                                   peaks_90k=sum(above_90k),
                                   Vol_pts_max=max(volatility_pts),
                                   Vol_pts_min=min(volatility_pts),
                                   Vol_pts_abs=mean(abs(volatility_pts)),
                                   ret_week=log(last(Price)/first(Price)),
                                   vol_week=sd(diff(log(Price))),
                                   open_week=first(Open),
                                   close_week=last(Price),
                                   high_week=max(High),
                                   low_week=min(Low),
                                   .groups="drop")
gt(Data_weekly)%>%opt_stylize(style=4,color="blue")
week peaks_90k Vol_pts_max Vol_pts_min Vol_pts_abs ret_week vol_week open_week close_week high_week low_week
2025-11-16 3 765.7 -4929.4 1842.2167 0.084535312 0.02378041 85117.2 92195.0 96002.7 80697.7
2025-11-23 4 3133.4 -929.3 1282.7714 -0.044751766 0.01676387 90894.7 86826.6 92942.6 84729.0
2025-11-30 4 4972.4 -4063.1 2263.3143 0.012967929 0.03717708 89333.6 90374.2 94077.0 83858.1
2025-12-07 7 2056.9 -2238.4 990.9143 0.001818858 0.01546032 90272.0 90413.6 94591.4 87747.6
2025-12-14 0 1474.0 -2093.0 1679.2500 -0.013276526 0.02096455 87859.5 88155.3 90472.3 85228.8

Scaling BTC large numbers

open_mean <-mean(Data_weekly$open_week)
open_sd <- sd(Data_weekly$open_week)

vol_mean<- mean(Data_weekly$Vol_pts_abs)
vol_sd <- sd(Data_weekly$Vol_pts_abs)
Data_weekly <- Data_weekly %>% 
  mutate(open_week_sc = (open_week-open_mean)/open_sd,
                         vol_pts_abs_sc=(Vol_pts_abs-vol_mean)/vol_sd)

Testing Poisson Models

M1 <- glm(peaks_90k ~`vol_pts_abs_sc`,
                  family=poisson(link="log"),
                  data=Data_weekly)

M2 <- glm(peaks_90k ~ `ret_week`,
          family=poisson(link="log"),
                  data=Data_weekly)

M3 <- glm(peaks_90k ~ `open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)

M4 <- glm(peaks_90k ~ `vol_pts_abs_sc`+`ret_week`+`open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)
AIC(M1,M2,M3,M4)
##    df      AIC
## M1  2 25.62309
## M2  2 27.22972
## M3  2 25.27782
## M4  4 23.75676

So after comparing three AIC according to our four models, we conclude model M4 is of lowest AIC.

summary(M4)
## 
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc, 
##     family = poisson(link = "log"), data = Data_weekly)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      0.7962     0.3776   2.108   0.0350 *
## vol_pts_abs_sc  -0.1240     0.2579  -0.481   0.6306  
## ret_week        29.7523    14.8780   2.000   0.0455 *
## open_week_sc     1.5282     0.7117   2.147   0.0318 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9.9015  on 4  degrees of freedom
## Residual deviance: 2.4258  on 1  degrees of freedom
## AIC: 23.757
## 
## Number of Fisher Scoring iterations: 5

Although weekly absolute volatility was not individually significant, its inclusion improved overall model fit as measured by AIC, therefore it will be retained in the final specification after comparing with another reduced model!

M_reduced <- glm(peaks_90k ~`ret_week`+`open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)
AIC(M4,M_reduced)
##           df      AIC
## M4         4 23.75676
## M_reduced  3 21.99744

Although weekly absolute volatility was not individually statistically significant, its inclusion improved overall model fit, as indicated by a lower AIC(Delta AIC >2). Therefore, it was retained in the final model.

final_model <- M4
summary(final_model)
## 
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc, 
##     family = poisson(link = "log"), data = Data_weekly)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      0.7962     0.3776   2.108   0.0350 *
## vol_pts_abs_sc  -0.1240     0.2579  -0.481   0.6306  
## ret_week        29.7523    14.8780   2.000   0.0455 *
## open_week_sc     1.5282     0.7117   2.147   0.0318 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9.9015  on 4  degrees of freedom
## Residual deviance: 2.4258  on 1  degrees of freedom
## AIC: 23.757
## 
## Number of Fisher Scoring iterations: 5

Predict on new weekly data

new_week <- data.frame(ret_week=0.04,
                       open_week_sc=(92000-open_mean)/open_sd,
                       vol_pts_abs_sc=(1500-vol_mean)/vol_sd
                       )
predict(final_model,newdata = new_week,
        type="response")
##        1 
## 67.10492

Due to the limited sample size,weekly aggregation resulted in an insufficient number of observations and unstable Poisson estimates. Therefore, the analysis gives us un_interpretable predicions..

So, we will create daily peak count.

Data2 <- Data %>% mutate(peak_90k=ifelse(Price>90000,1,0))

This is now:0 means no peak that day, and 1 means peak that day.

Data2 <- Data2 %>% mutate(vol_pts=abs(volatility_pts),
                          ret_day=log(Price/lag(Price))) %>% 
                          drop_na()

Scaling

Data2 <- Data2 %>% mutate(vol_pts_sc=scale(vol_pts),
                          open_sc=scale(Open),
                          ret_day_sc=scale(ret_day))

Now we will try logistic regression models to be more natural with binary daily peaks.

Model1 <-glm(peak_90k ~ vol_pts_sc ,
    family=binomial,
    data=Data2)

Model2 <-glm(peak_90k ~ ret_day_sc ,
    family=binomial,
    data=Data2)

Model3 <-glm(peak_90k ~ open_sc ,
    family=binomial,
    data=Data2)

AIC(Model1,Model2,Model3)
##        df      AIC
## Model1  2 42.90213
## Model2  2 38.79658
## Model3  2 34.86184

After comparison, Model 3 of open price predictor was the most effective within that time period since it is of lowest AIC

FinalModel <- Model3
summary(FinalModel)
## 
## Call:
## glm(formula = peak_90k ~ open_sc, family = binomial, data = Data2)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.4937     0.4453   1.109  0.26756   
## open_sc       1.3816     0.5347   2.584  0.00977 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.381  on 29  degrees of freedom
## Residual deviance: 30.862  on 28  degrees of freedom
## AIC: 34.862
## 
## Number of Fisher Scoring iterations: 4

Open price variable is very significant in our binary daily model, as it identify if at that day we will have a 90k peak!