This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
bike_data <- read.csv('D:/FALL 2023/STATISTICS/datasets/bike_data.csv')
bike_data$Holiday <- as.numeric(bike_data$Holiday == "Holiday")
View(bike_data)
You can also embed plots, for example:
# Fit the logistic regression model
model <- glm(Holiday ~ Temperature + Humidity + Wind.speed, data = bike_data, family = "binomial")
# Print the model coefficients
coef(model)
## (Intercept) Temperature Humidity Wind.speed
## -1.550233366 -0.000523561 -0.003637841 0.056958168
summary(model)
##
## Call:
## glm(formula = Holiday ~ Temperature + Humidity + Wind.speed,
## family = "binomial", data = bike_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.5502334 0.1205530 -12.859 <2e-16 ***
## Temperature -0.0005236 0.0024563 -0.213 0.8312
## Humidity -0.0036378 0.0015487 -2.349 0.0188 *
## Wind.speed 0.0569582 0.0293593 1.940 0.0524 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7671.1 on 8759 degrees of freedom
## Residual deviance: 7656.7 on 8756 degrees of freedom
## AIC: 7664.7
##
## Number of Fisher Scoring iterations: 4
Intercept: Represents the log-odds of the outcome when all predictors are 0.
Temperature: The negative coefficient indicates that as temperature increases, the log odds of the outcome decreases. However, the coefficient is very small, indicating a minor effect.
Humidity: The negative coefficient indicates that as humidity increases, the log odds of the outcome decreases. Again, the coefficient is small, so the effect is minor.
Wind.speed: The positive coefficient indicates that as wind speed increases, the log odds of the outcome increases. To summarize, temperature and humidity have negligible negative effects on the log odds, while wind speed has a minor positive effect. The intercept provides the baseline log odds when all predictors are 0.
We’d need to exponentiate the coefficients to get actual odds ratios. And compute predicted probabilities to fully assess the model. But based just on the coefficients, we can see the direction and relative scale of the variable relationships.
Confidence Interval=Coefficient±Critical Value×Standard Error
Lower Bound=0.056958168−1.96×0.0293593
Upper Bound=0.056958168+1.96×0.0293593
Calculating the values:
Lower Bound≈−0.0978 Upper Bound≈0.2117
Interpretation: The 95% confidence interval for the “Wind.speed” coefficient is approximately (-0.0005678, 0.1144842).
This means that we are 95% confident that the true value of the “Wind.speed” coefficient falls within this interval. In other words, if we were to repeatedly sample from the population and estimate the “Wind.speed” coefficient, we would expect it to fall within this interval in approximately 95% of the samples
###linear Relation between variables
model1 <- lm(Rented.Bike.Count ~ Temperature,
filter(bike_data, Holiday == 1))
rsquared <- summary(model)$r.squared
bike_data |>
filter( Holiday == 1 ) |>
ggplot(mapping = aes(x = Temperature,
y = Rented.Bike.Count)) +
geom_point() +
geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
se = FALSE) +
geom_smooth(se = FALSE) +
labs(title = "Temperature vs Biked rented ",
subtitle = paste("Linear Fit R-Squared =")) +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
So, here is the scatter plot which shows relation between Temperature & Rental.Bike.Count where the relation is little bit non linear. So we might need to perform polynomial transformation to make it more linear. for that we need to take squaring of Temperature.
bike_data <- bike_data |>
mutate(Temperature_2 = Temperature ^ 2) # add new variable
model <- lm(Rented.Bike.Count ~ Temperature_2 + Temperature,
filter(bike_data, Holiday == 1))
rsquared <- summary(model)$r.squared
bike_data |>
filter(Holiday == 1) |>
ggplot(mapping = aes(x = Temperature^2,
y = Rented.Bike.Count)) +
geom_point() +
geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
se = FALSE) +
geom_smooth(se = FALSE) +
labs(title = "Rented.Bike.Count vs. (Temperature) ^ 2",
subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
let us use sqrt transformation according to our lambda value i.e Y=0.501 and close to 0 (positive)
bike_data <- bike_data |>
mutate(Temperature_sqrt = sqrt(Temperature)) # add new variable
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Temperature_sqrt = sqrt(Temperature)`.
## Caused by warning in `sqrt()`:
## ! NaNs produced
model <- lm(Rented.Bike.Count ~ Temperature_sqrt + Temperature,
filter(bike_data, Holiday == 1))
rsquared <- summary(model)$r.squared
bike_data |>
filter(Holiday == 1) |>
ggplot(mapping = aes(x = sqrt(Temperature),
y = Rented.Bike.Count)) +
geom_point() +
geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
se = FALSE) +
geom_smooth(se = FALSE) +
labs(title = "Rented.Bike.Count vs. (Temperature) ^ sqrt",
subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) +
theme_classic()
## Warning in sqrt(Temperature): NaNs produced
## Warning in sqrt(Temperature): NaNs produced
## Warning in sqrt(Temperature): NaNs produced
## Warning in sqrt(Temperature): NaNs produced
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 294 rows containing missing values (`geom_point()`).
The scatter plot for square root transformation is as similar to monotonic. In conclusion, ” If \(\lambda \approx 1\), no transformation is needed”
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.