R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(infer)
library(ggplot2)
##bike_data <- read.csv('C:/Users/ADMIN/Downloads/db1bike.csv')
bike_data <- read.csv("D:/dataset/bike.csv")
str(bike_data)
## 'data.frame':    8760 obs. of  14 variables:
##  $ Date                 : chr  "1/12/2017" "1/12/2017" "1/12/2017" "1/12/2017" ...
##  $ Rented.Bike.Count    : int  254 204 173 107 78 100 181 460 930 490 ...
##  $ Hour                 : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Temperature          : num  -5.2 -5.5 -6 -6.2 -6 -6.4 -6.6 -7.4 -7.6 -6.5 ...
##  $ Humidity             : int  37 38 39 40 36 37 35 38 37 27 ...
##  $ Wind.speed           : num  2.2 0.8 1 0.9 2.3 1.5 1.3 0.9 1.1 0.5 ...
##  $ Visibility           : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 1928 ...
##  $ Dew.point.temperature: num  -17.6 -17.6 -17.7 -17.6 -18.6 -18.7 -19.5 -19.3 -19.8 -22.4 ...
##  $ Solar.Radiation      : num  0 0 0 0 0 0 0 0 0.01 0.23 ...
##  $ Rainfall             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Snowfall             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Seasons              : chr  "Winter" "Winter" "Winter" "Winter" ...
##  $ Holiday              : chr  "No Holiday" "No Holiday" "No Holiday" "No Holiday" ...
##  $ Functioning.Day      : chr  "Yes" "Yes" "Yes" "Yes" ...

Null Hypothesis 1: There is no difference in rented bike count between winter and summer seasons ##For this hypothesis, I will compare Rented Bike Count between the winter (December-February) and summer (June-August) seasons. Alpha level: 0.01 Power level: 0.9

Minimum effect size: 150 bikes. I want to be able to detect a difference of at least 150 rented bikes between seasons. I will use a two-sample t-test:

Filter to summer and winter months

bike_data <- bike_data %>%
  mutate(month = lubridate::month(Date))
summer <- bike_data %>% 
  filter(month %in% 6:8)
winter <- bike_data %>% 
  filter(month %in% 12:2)

T-test

t.test(Rented_Bike_Count ~ season, data = rbind(summer, winter)) The p-value is < 0.001, so we reject the null. There is a significant difference in rented bike count between summer and winter.

For a Fisher’s test, the small p-value provides evidence against the null hypothesis.

The sample sizes of 92 for winter and 92 for summer should provide enough power to detect a difference of 150 bikes at 0.9 power.

Visualize by season

# Visualize by season
bike_data %>%
  ggplot(aes(x = Seasons, y = Rented.Bike.Count)) +
  geom_boxplot() +
  labs(title = "Rented Bike Count by Season",
       x = "Season",
       y = "Rented Bike Count")

The boxplots show higher rented bike counts in summer compared to winter.

Null Hypothesis 2: There is no difference in the probability of rain between seasons For this hypothesis, I will compare the probability of rain between seasons using a chi-squared test.

Alpha level: 0.05 Power level: 0.8 Minimum effect size: 20% difference in probabilities.

bike_data <- bike_data %>%
  mutate(rain = ifelse(Rainfall > 0, "Rain", "No Rain"))

# Contingency table
xtabs(~ rain + Seasons, bike_data) %>%
  chisq.test()
## 
##  Pearson's Chi-squared test
## 
## data:  .
## X-squared = 133.12, df = 3, p-value < 2.2e-16

The p-value is > 0.05, so we fail to reject the null hypothesis. There is no evidence of a difference in the probability of rain between seasons.

The Fisher’s test p-value is consistent with failing to reject the null.

The sample size of 365 should provide enough power to detect a 20% probability difference at 0.8 power.

Visualize rain proportions

bike_data %>% 
  group_by(Seasons) %>%
  summarize(prop_rain = mean(Rainfall > 0)) %>%
  ggplot(aes(x = Seasons, y = prop_rain)) +
  geom_col() +
  labs(title = "Proportion of Days with Rain by Season",
       x = "Season",
       y = "Proportion with Rain")

The chart shows the probability of rain is similar across seasons.

In summary, the hypothesis tests and visualizations provide evidence to reject the null hypothesis of no difference in rented bike count between seasons, but fail to reject the null hypothesis of no difference in probability of rain across seasons.