Roshan R Naidu (29/03/2026)

Loading and Exploring The Dataset

# Load the dataset
bike_data <- read.csv("/Users/roshannaidu/Desktop/IU Sem 2/Stats 1/bike+sharing+dataset/hour.csv")

# Create our binary outcome variable: High Demand Periods
# We'll define "high demand" as periods where total rentals (cnt) exceed the 75th percentile
demand_threshold <- quantile(bike_data$cnt, 0.75)
bike_data$high_demand <- as.numeric(bike_data$cnt > demand_threshold)

# Display the distribution of our binary outcome
table(bike_data$high_demand)
## 
##     0     1 
## 13053  4326

Binary Variable Selection Rationale

I chose to create a binary variable for “high demand” periods (1 = high demand, 0 = normal/low demand) because: * It helps identify peak usage periods that require additional resource allocation * It’s directly relevant to operational decision-making * Understanding what drives high-demand periods can improve service quality

Building the Logistic Regression Model

I selected three key explanatory variables: 1. Temperature (temp) - normalized temperature values 2. Hour of day (hr) - to capture daily patterns 3. Working day indicator (workingday) - to account for commuting patterns

# Fit logistic regression model
model <- glm(high_demand ~ temp + hr + workingday, 
            data = bike_data, 
            family = binomial(link = "logit"))

# Get model summary
model_summary <- summary(model)

# Display model summary
print(model_summary)
## 
## Call:
## glm(formula = high_demand ~ temp + hr + workingday, family = binomial(link = "logit"), 
##     data = bike_data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.386229   0.081101  -54.08  < 2e-16 ***
## temp         4.458206   0.111067   40.14  < 2e-16 ***
## hr           0.080942   0.002969   27.26  < 2e-16 ***
## workingday  -0.196880   0.041272   -4.77 1.84e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 19504  on 17378  degrees of freedom
## Residual deviance: 16541  on 17375  degrees of freedom
## AIC: 16549
## 
## Number of Fisher Scoring iterations: 5
# Get coefficients table using broom for easier extraction
coef_table <- tidy(model)
print(coef_table)
## # A tibble: 4 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  -4.39     0.0811     -54.1  0        
## 2 temp          4.46     0.111       40.1  0        
## 3 hr            0.0809   0.00297     27.3  1.17e-163
## 4 workingday   -0.197    0.0413      -4.77 1.84e-  6

Interpreting the Coefficients

Let’s interpret each coefficient:

# Extract coefficients and standard errors
temp_coef <- coef_table$estimate[coef_table$term == "temp"]
temp_se <- coef_table$std.error[coef_table$term == "temp"]
hr_coef <- coef_table$estimate[coef_table$term == "hr"]
workingday_coef <- coef_table$estimate[coef_table$term == "workingday"]

# Calculate odds ratios
temp_odds <- exp(temp_coef)
hr_odds <- exp(hr_coef)
workingday_odds <- exp(workingday_coef)
  1. Temperature (temp):
  1. Hour (hr):
  1. Working Day:

Confidence Interval Analysis

Let’s focus on the temperature coefficient, as it shows a strong effect:

# Calculate 95% CI for temperature coefficient
temp_ci_lower <- temp_coef - 1.96 * temp_se
temp_ci_upper <- temp_coef + 1.96 * temp_se

# Convert to odds ratios for easier interpretation
temp_ci_odds_lower <- exp(temp_ci_lower)
temp_ci_odds_upper <- exp(temp_ci_upper)

Interpretation of Confidence Interval:

  • We are 95% confident that the true temperature coefficient lies between 4.241 and 4.676 (in log odds)
  • In terms of odds ratios, we’re 95% confident that a one-unit increase in temperature multiplies the odds of high demand by a factor between 69.44 and 107.33
  • The confidence interval does not include 0 (or 1 for odds ratios), indicating a statistically significant effect
  • Even at the lower bound, temperature has a substantial positive effect on high demand probability

Insights and Significance

  1. Temperature Effect:
  1. Time of Day Impact:
  1. Working Day Effect:

Further Questions to Investigate

  1. Interaction Effects:
  1. Threshold Effects:
  1. Model Improvements:
  1. Operational Implications: