suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
  library(broom)
})

Introduction

In this report, I analyze my dataset to answer questions using visualization and statistical methods. The dataset is loaded from a CSV file to ensure reproducibility.

crypto <- read.csv("/Users/wilmercifuentes/Downloads/top_100_cryptos_with_correct_network.csv") %>%
  mutate(
    date = as.Date(date),
    network = as.factor(network),
    symbol  = as.factor(symbol)
  )

Q1) Load CSV, plot selected columns and briefly explain.

 btc <- filter(crypto, symbol == "BTCUSDT")

ggplot(btc, aes(x = date, y = close)) +
  geom_line() +
  labs(
    title = "Bitcoin (BTCUSDT) — Daily Closing Price Over Time",
    x = "Date",
    y = "Closing Price (USD)",
    caption = "Source: top_100_cryptos_with_correct_network.csv"
  ) +
  theme_minimal()

Explanation: This line chart shows BTC’s closing price trend, highlighting rallies and drawdowns typical of crypto market volatility

Q2) Do a simple statistical calculation (e.g., mean, sd) and explain

Here I compute the mean and standard deviation of BTC closing prices.

mean_close_btc <- mean(btc$close, na.rm = TRUE)
sd_close_btc   <- sd(btc$close,   na.rm = TRUE)

data.frame(
  metric = c("Mean closing price (USD)", "Standard deviation (USD)"),
  value  = c(round(mean_close_btc, 2), round(sd_close_btc, 2))
)
##                     metric    value
## 1 Mean closing price (USD) 35688.33
## 2 Standard deviation (USD) 28971.93

Explanation:The mean summarizes the typical BTC price level, while the standard deviation measures price variability; a large SD relative to the mean indicates high volatility.

Q3) Apply correlation or regression to detect a relationship and explain

I examine the relationship between opening and closing prices for BTC via correlation and simple linear regression. # Correlation

corr_btc <- cor(btc$open, btc$close, use = "complete.obs")

Regression: close ~ open

fit_btc   <- lm(close ~ open, data = btc)
coef_tbl  <- tidy(fit_btc)
fit_stats <- glance(fit_btc)

list(
  correlation_open_close = round(corr_btc, 4),
  regression_coefficients = coef_tbl,
  model_fit = fit_stats
)
## $correlation_open_close
## [1] 0.999
## 
## $regression_coefficients
## # A tibble: 2 Ă— 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)    31.3  40.3          0.776   0.438
## 2 open            1.00  0.000878  1140.      0    
## 
## $model_fit
## # A tibble: 1 Ă— 12
##   r.squared adj.r.squared sigma statistic p.value    df  logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>   <dbl>  <dbl>  <dbl>
## 1     0.998         0.998 1285.  1299188.       0     1 -21975. 43955. 43973.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Explanation: The correlation quantifies linear association strength. The regression slope indicates how much close changes per unit change in open. A small slope p-value (e.g., < 0.05) indicates a statistically significant relationship.

Q4) Plot a histogram of a numerical column and discuss its distribution

ggplot(btc, aes(x = close)) +
  geom_histogram(binwidth = 500, fill = "skyblue", color = "black") +
  labs(
    title = "Histogram of Bitcoin Daily Closing Prices",
    x = "Closing Price (USD)",
    y = "Frequency"
  ) +
  theme_minimal()

BTC closing prices often appear right-skewed. Skew influences test choice; a log transform is often preferred for parametric comparisons.

Q5) Split into two groups and test for significant differences (consistent with distribution)

I compare Bitcoin vs Ethereum daily price levels. Because prices are skewed and on different scales, I run a two-sample t-test on log prices. I also show the raw-price test for completeness.

btc_close <- crypto %>% filter(network == "Bitcoin")   %>% pull(close)
eth_close <- crypto %>% filter(network == "Ethereum") %>% pull(close)

Log transform to mitigate skew (tiny epsilon to avoid log(0))

eps <- 1e-8
btc_log <- log(btc_close + eps)
eth_log <- log(eth_close + eps)

(A) Two-sample t-test on raw prices (scale-dependent)

t.test(btc_close, eth_close, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  btc_close and eth_close
## t = 61.624, df = 2562.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  34154.23 36399.27
## sample estimates:
##  mean of x  mean of y 
## 35688.3342   411.5832
# (B) Two-sample t-test on log prices (preferred for skewed data)
t.test(btc_log, eth_log, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  btc_log and eth_log
## t = 476.89, df = 5034.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  10.57911 10.66645
## sample estimates:
##  mean of x  mean of y 
## 10.0933900 -0.5293892

Conclusion