suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
library(broom)
})
In this report, I analyze my dataset to answer questions using visualization and statistical methods. The dataset is loaded from a CSV file to ensure reproducibility.
crypto <- read.csv("/Users/wilmercifuentes/Downloads/top_100_cryptos_with_correct_network.csv") %>%
mutate(
date = as.Date(date),
network = as.factor(network),
symbol = as.factor(symbol)
)
btc <- filter(crypto, symbol == "BTCUSDT")
ggplot(btc, aes(x = date, y = close)) +
geom_line() +
labs(
title = "Bitcoin (BTCUSDT) — Daily Closing Price Over Time",
x = "Date",
y = "Closing Price (USD)",
caption = "Source: top_100_cryptos_with_correct_network.csv"
) +
theme_minimal()
Explanation: This line chart shows BTC’s closing price
trend, highlighting rallies and drawdowns typical of crypto market
volatility
Here I compute the mean and standard deviation of BTC closing prices.
mean_close_btc <- mean(btc$close, na.rm = TRUE)
sd_close_btc <- sd(btc$close, na.rm = TRUE)
data.frame(
metric = c("Mean closing price (USD)", "Standard deviation (USD)"),
value = c(round(mean_close_btc, 2), round(sd_close_btc, 2))
)
## metric value
## 1 Mean closing price (USD) 35688.33
## 2 Standard deviation (USD) 28971.93
Explanation:The mean summarizes the typical BTC price level, while the standard deviation measures price variability; a large SD relative to the mean indicates high volatility.
I examine the relationship between opening and closing prices for BTC via correlation and simple linear regression. # Correlation
corr_btc <- cor(btc$open, btc$close, use = "complete.obs")
fit_btc <- lm(close ~ open, data = btc)
coef_tbl <- tidy(fit_btc)
fit_stats <- glance(fit_btc)
list(
correlation_open_close = round(corr_btc, 4),
regression_coefficients = coef_tbl,
model_fit = fit_stats
)
## $correlation_open_close
## [1] 0.999
##
## $regression_coefficients
## # A tibble: 2 Ă— 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 31.3 40.3 0.776 0.438
## 2 open 1.00 0.000878 1140. 0
##
## $model_fit
## # A tibble: 1 Ă— 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.998 0.998 1285. 1299188. 0 1 -21975. 43955. 43973.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Explanation: The correlation quantifies linear association strength. The regression slope indicates how much close changes per unit change in open. A small slope p-value (e.g., < 0.05) indicates a statistically significant relationship.
ggplot(btc, aes(x = close)) +
geom_histogram(binwidth = 500, fill = "skyblue", color = "black") +
labs(
title = "Histogram of Bitcoin Daily Closing Prices",
x = "Closing Price (USD)",
y = "Frequency"
) +
theme_minimal()
BTC closing prices often appear right-skewed. Skew influences test
choice; a log transform is often preferred for parametric
comparisons.
I compare Bitcoin vs Ethereum daily price levels. Because prices are skewed and on different scales, I run a two-sample t-test on log prices. I also show the raw-price test for completeness.
btc_close <- crypto %>% filter(network == "Bitcoin") %>% pull(close)
eth_close <- crypto %>% filter(network == "Ethereum") %>% pull(close)
eps <- 1e-8
btc_log <- log(btc_close + eps)
eth_log <- log(eth_close + eps)
t.test(btc_close, eth_close, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: btc_close and eth_close
## t = 61.624, df = 2562.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 34154.23 36399.27
## sample estimates:
## mean of x mean of y
## 35688.3342 411.5832
# (B) Two-sample t-test on log prices (preferred for skewed data)
t.test(btc_log, eth_log, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: btc_log and eth_log
## t = 476.89, df = 5034.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 10.57911 10.66645
## sample estimates:
## mean of x mean of y
## 10.0933900 -0.5293892