We are studying a sample dataset of bitcoin with its historical volatilities within specific time interval between last November & December 2025.

In this analysis we will try not just to analyse trends, but also predict number of points having specific value(peak) based on poisson regression.

Head of first six Bitcoin data rows.

Date	Price	Open	High	Low	Vol.	Change %
12/17/2025	89333.5	87859.5	89469.7	86238.9	48.55K	1.68%
12/16/2025	87853.4	86429.2	88133.0	85392.1	59.23K	1.65%
12/15/2025	86428.6	88154.4	89974.1	85228.8	61.65K	-1.96%
12/14/2025	88155.3	90248.3	90472.3	87657.9	32.26K	-2.32%
12/13/2025	90249.3	90272.0	90638.1	89781.3	23.47K	-0.03%
12/12/2025	90272.0	92510.4	92744.6	89498.1	49.84K	-2.42%

Data manipulation

Creating the volatility column based on difference between closing and open price.

Data <-Bitcoin_Historical_Data  %>% mutate(volatility_pts=Price-Open)
View(Data)

Visualization

class(Data$Date)

## [1] "character"

Data$Date <- as.Date(Data$Date,format="%m/%d/%Y")
any(is.na(Data$Date))

## [1] FALSE

Data %>% ggplot(aes(x=Date,y=Price))+
  geom_point(col="blue")

Studying linear relationship between Open Price, Closing Price & Low volatility during..

Data %>% ggplot(aes(x=`Open`,y=`Price`,color=`Low`))+
  geom_point()+
  geom_smooth(method="lm",
                se=FALSE,
                color="pink")

## `geom_smooth()` using formula = 'y ~ x'

There is a linear correlation between the three variables but with some dispersion, so let’s model and analyze!

Building Model

Model1 <- lm(`Price` ~ `Open`+`Low`+`High`,data=Data)
summary(Model1)

## 
## Call:
## lm(formula = Price ~ Open + Low + High, data = Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2483.09  -552.63    22.89   335.54  1728.10 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13918.8660  6784.0763   2.052  0.05001 .  
## Open           -0.4718     0.1431  -3.297  0.00274 ** 
## Low             0.6007     0.1150   5.221 1.68e-05 ***
## High            0.7156     0.1564   4.576 9.51e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 965.8 on 27 degrees of freedom
## Multiple R-squared:  0.8538, Adjusted R-squared:  0.8375 
## F-statistic: 52.55 on 3 and 27 DF,  p-value: 2.113e-11

Low & High prices are Very significant predictors for closing price, while Open price is little bit significant but still significant and effective to our response closing Price.

Studying peak of 90k

Summarizing peak points by week

Now, we will create new predictors based on our original ones. In addition to the weekly return, weekly volatility and volatility mean..

Data$above_90k <- ifelse(Data$Price>90000,1,0)
Data_weekly <- Data %>% mutate(week=floor_date(Date,"week")) %>% 
                                 group_by(week) %>% 
                                 summarise(
                                   peaks_90k=sum(above_90k),
                                   Vol_pts_max=max(volatility_pts),
                                   Vol_pts_min=min(volatility_pts),
                                   Vol_pts_abs=mean(abs(volatility_pts)),
                                   ret_week=log(last(Price)/first(Price)),
                                   vol_week=sd(diff(log(Price))),
                                   open_week=first(Open),
                                   close_week=last(Price),
                                   high_week=max(High),
                                   low_week=min(Low),
                                   .groups="drop")

gt(Data_weekly)%>%opt_stylize(style=4,color="blue")

week	peaks_90k	Vol_pts_max	Vol_pts_min	Vol_pts_abs	ret_week	vol_week	open_week	close_week	high_week	low_week
2025-11-16	3	765.7	-4929.4	1842.2167	0.084535312	0.02378041	85117.2	92195.0	96002.7	80697.7
2025-11-23	4	3133.4	-929.3	1282.7714	-0.044751766	0.01676387	90894.7	86826.6	92942.6	84729.0
2025-11-30	4	4972.4	-4063.1	2263.3143	0.012967929	0.03717708	89333.6	90374.2	94077.0	83858.1
2025-12-07	7	2056.9	-2238.4	990.9143	0.001818858	0.01546032	90272.0	90413.6	94591.4	87747.6
2025-12-14	0	1474.0	-2093.0	1679.2500	-0.013276526	0.02096455	87859.5	88155.3	90472.3	85228.8

Scaling BTC large numbers

open_mean <-mean(Data_weekly$open_week)
open_sd <- sd(Data_weekly$open_week)

vol_mean<- mean(Data_weekly$Vol_pts_abs)
vol_sd <- sd(Data_weekly$Vol_pts_abs)

Data_weekly <- Data_weekly %>% 
  mutate(open_week_sc = (open_week-open_mean)/open_sd,
                         vol_pts_abs_sc=(Vol_pts_abs-vol_mean)/vol_sd)

Testing Poisson Models

M1 <- glm(peaks_90k ~`vol_pts_abs_sc`,
                  family=poisson(link="log"),
                  data=Data_weekly)

M2 <- glm(peaks_90k ~ `ret_week`,
          family=poisson(link="log"),
                  data=Data_weekly)

M3 <- glm(peaks_90k ~ `open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)

M4 <- glm(peaks_90k ~ `vol_pts_abs_sc`+`ret_week`+`open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)

AIC(M1,M2,M3,M4)

##    df      AIC
## M1  2 25.62309
## M2  2 27.22972
## M3  2 25.27782
## M4  4 23.75676

So after comparing three AIC according to our four models, we conclude model M4 is of lowest AIC.

summary(M4)

## 
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc, 
##     family = poisson(link = "log"), data = Data_weekly)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      0.7962     0.3776   2.108   0.0350 *
## vol_pts_abs_sc  -0.1240     0.2579  -0.481   0.6306  
## ret_week        29.7523    14.8780   2.000   0.0455 *
## open_week_sc     1.5282     0.7117   2.147   0.0318 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9.9015  on 4  degrees of freedom
## Residual deviance: 2.4258  on 1  degrees of freedom
## AIC: 23.757
## 
## Number of Fisher Scoring iterations: 5

Although weekly absolute volatility was not individually significant, its inclusion improved overall model fit as measured by AIC, therefore it will be retained in the final specification after comparing with another reduced model!

M_reduced <- glm(peaks_90k ~`ret_week`+`open_week_sc`,
          family=poisson(link="log"),
                  data=Data_weekly)

AIC(M4,M_reduced)

##           df      AIC
## M4         4 23.75676
## M_reduced  3 21.99744

Although weekly absolute volatility was not individually statistically significant, its inclusion improved overall model fit, as indicated by a lower AIC(Delta AIC >2). Therefore, it was retained in the final model.

final_model <- M4
summary(final_model)

## 
## Call:
## glm(formula = peaks_90k ~ vol_pts_abs_sc + ret_week + open_week_sc, 
##     family = poisson(link = "log"), data = Data_weekly)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      0.7962     0.3776   2.108   0.0350 *
## vol_pts_abs_sc  -0.1240     0.2579  -0.481   0.6306  
## ret_week        29.7523    14.8780   2.000   0.0455 *
## open_week_sc     1.5282     0.7117   2.147   0.0318 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9.9015  on 4  degrees of freedom
## Residual deviance: 2.4258  on 1  degrees of freedom
## AIC: 23.757
## 
## Number of Fisher Scoring iterations: 5

Predict on new weekly data

new_week <- data.frame(ret_week=0.04,
                       open_week_sc=(92000-open_mean)/open_sd,
                       vol_pts_abs_sc=(1500-vol_mean)/vol_sd
                       )

predict(final_model,newdata = new_week,
        type="response")

##        1 
## 67.10492

Due to the limited sample size,weekly aggregation resulted in an insufficient number of observations and unstable Poisson estimates. Therefore, the analysis gives us un_interpretable predicions..

So, we will create daily peak count.

Data2 <- Data %>% mutate(peak_90k=ifelse(Price>90000,1,0))

This is now:0 means no peak that day, and 1 means peak that day.

Data2 <- Data2 %>% mutate(vol_pts=abs(volatility_pts),
                          ret_day=log(Price/lag(Price))) %>% 
                          drop_na()

Scaling

Data2 <- Data2 %>% mutate(vol_pts_sc=scale(vol_pts),
                          open_sc=scale(Open),
                          ret_day_sc=scale(ret_day))

Now we will try logistic regression models to be more natural with binary daily peaks.

Model1 <-glm(peak_90k ~ vol_pts_sc ,
    family=binomial,
    data=Data2)

Model2 <-glm(peak_90k ~ ret_day_sc ,
    family=binomial,
    data=Data2)

Model3 <-glm(peak_90k ~ open_sc ,
    family=binomial,
    data=Data2)

AIC(Model1,Model2,Model3)

##        df      AIC
## Model1  2 42.90213
## Model2  2 38.79658
## Model3  2 34.86184

After comparison, Model 3 of open price predictor was the most effective within that time period since it is of lowest AIC

FinalModel <- Model3
summary(FinalModel)

## 
## Call:
## glm(formula = peak_90k ~ open_sc, family = binomial, data = Data2)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.4937     0.4453   1.109  0.26756   
## open_sc       1.3816     0.5347   2.584  0.00977 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.381  on 29  degrees of freedom
## Residual deviance: 30.862  on 28  degrees of freedom
## AIC: 34.862
## 
## Number of Fisher Scoring iterations: 4

Open price variable is very significant in our binary daily model, as it identify if at that day we will have a 90k peak!

Bitcoin Analysis|2025

Batoul Kalot