R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

###GENERALIZED LINEAR MODEL

library(stats)

# Read the data (replace 'your_data.csv' with the actual file path)

bike_data <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike_data.csv')
bike_data$Holiday <- as.numeric(bike_data$Holiday == "Holiday")

# Create the linear regression model with Temperature as the explanatory variable
model <- lm(Rented.Bike.Count ~ Temperature, data = bike_data)

# Summarize the model
summary(model)
## 
## Call:
## lm(formula = Rented.Bike.Count ~ Temperature, data = bike_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1100.60  -336.57   -49.69   233.81  2525.19 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 329.9525     8.5411   38.63   <2e-16 ***
## Temperature  29.0811     0.4862   59.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 543.5 on 8758 degrees of freedom
## Multiple R-squared:   0.29,  Adjusted R-squared:   0.29 
## F-statistic:  3578 on 1 and 8758 DF,  p-value: < 2.2e-16

###EVALUATION

model1 <- lm(Rented.Bike.Count ~ Temperature,
            filter(bike_data, Holiday == 1))

rsquared <- summary(model)$r.squared

bike_data |> 
  filter( Holiday == 1 ) |>
  ggplot(mapping = aes(x = Temperature, 
                       y = Rented.Bike.Count)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed', 
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(title = "Temperature vs Biked rented ",
       subtitle = paste("Linear Fit R-Squared =") ) +
  theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

let us use sqrt transformation on explanatory variable according to our lambda value i.e Y=0.501 and close to 0 (positive)

bike_data <- bike_data |>
  mutate(Temperature_sqrt = sqrt(Temperature))  # add new variable
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Temperature_sqrt = sqrt(Temperature)`.
## Caused by warning in `sqrt()`:
## ! NaNs produced
model <- lm(Rented.Bike.Count ~ Temperature_sqrt + Temperature,
            filter(bike_data,Holiday == 1))

rsquared <- summary(model)$r.squared

bike_data |> 
  filter(Holiday == 1) |>
  ggplot(mapping = aes(x = sqrt(Temperature), 
                       y = Rented.Bike.Count)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(title = "Rented.Bike.Count vs. (Temperature) ^ sqrt",
       subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) +
  theme_classic()
## Warning in sqrt(Temperature): NaNs produced
## Warning in sqrt(Temperature): NaNs produced

## Warning in sqrt(Temperature): NaNs produced

## Warning in sqrt(Temperature): NaNs produced
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 294 rows containing missing values (`geom_point()`).

The scatter plot for square root transformation is as similar to monotonic. In conclusion, ” If \(\lambda \approx 1\), no transformation is needed”

###INTERPRETATION For each one-degree increase in temperature, we can expect an increase of approximately 29.0811 in the number of rented bikes, assuming all other factors remain constant.

This implies that as the temperature rises, more people are likely to rent bikes, which is a positive relationship between temperature and the number of rented bikes. It’s important to note that this interpretation assumes a linear relationship between temperature and bike rentals. The coefficient represents the estimated change in the response variable based on the dataset and model used.