rmarkdown_week10

Including Plots

You can also embed plots, for example:

# Fit the logistic regression model
model <- glm(Holiday ~ Temperature + Humidity + Wind.speed, data = bike_data, family = "binomial")

# Print the model coefficients
coef(model)

##  (Intercept)  Temperature     Humidity   Wind.speed 
## -1.550233366 -0.000523561 -0.003637841  0.056958168

summary(model)

## 
## Call:
## glm(formula = Holiday ~ Temperature + Humidity + Wind.speed, 
##     family = "binomial", data = bike_data)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.5502334  0.1205530 -12.859   <2e-16 ***
## Temperature -0.0005236  0.0024563  -0.213   0.8312    
## Humidity    -0.0036378  0.0015487  -2.349   0.0188 *  
## Wind.speed   0.0569582  0.0293593   1.940   0.0524 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7671.1  on 8759  degrees of freedom
## Residual deviance: 7656.7  on 8756  degrees of freedom
## AIC: 7664.7
## 
## Number of Fisher Scoring iterations: 4

Interpreting these coefficients

Intercept: Represents the log-odds of the outcome when all predictors are 0.

Temperature: The negative coefficient indicates that as temperature increases, the log odds of the outcome decreases. However, the coefficient is very small, indicating a minor effect.

Humidity: The negative coefficient indicates that as humidity increases, the log odds of the outcome decreases. Again, the coefficient is small, so the effect is minor.

Wind.speed: The positive coefficient indicates that as wind speed increases, the log odds of the outcome increases. To summarize, temperature and humidity have negligible negative effects on the log odds, while wind speed has a minor positive effect. The intercept provides the baseline log odds when all predictors are 0.

We’d need to exponentiate the coefficients to get actual odds ratios. And compute predicted probabilities to fully assess the model. But based just on the coefficients, we can see the direction and relative scale of the variable relationships.

Confidence interval

Confidence Interval=Coefficient±Critical Value×Standard Error

Lower Bound=0.056958168−1.96×0.0293593

Upper Bound=0.056958168+1.96×0.0293593

Calculating the values:

Lower Bound≈−0.0978 Upper Bound≈0.2117

Interpretation: The 95% confidence interval for the “Wind.speed” coefficient is approximately (-0.0005678, 0.1144842).

This means that we are 95% confident that the true value of the “Wind.speed” coefficient falls within this interval. In other words, if we were to repeatedly sample from the population and estimate the “Wind.speed” coefficient, we would expect it to fall within this interval in approximately 95% of the samples

###linear Relation between variables

model1 <- lm(Rented.Bike.Count ~ Temperature,
            filter(bike_data, Holiday == 1))

rsquared <- summary(model)$r.squared

bike_data |> 
  filter( Holiday == 1 ) |>
  ggplot(mapping = aes(x = Temperature, 
                       y = Rented.Bike.Count)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed', 
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(title = "Temperature vs Biked rented ",
       subtitle = paste("Linear Fit R-Squared =")) +
  theme_classic()

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Interpretation

So, here is the scatter plot which shows relation between Temperature & Rental.Bike.Count where the relation is little bit non linear. So we might need to perform polynomial transformation to make it more linear. for that we need to take squaring of Temperature.

bike_data <- bike_data |>
  mutate(Temperature_2 = Temperature ^ 2)  # add new variable

model <- lm(Rented.Bike.Count ~ Temperature_2 + Temperature,
            filter(bike_data, Holiday == 1))

rsquared <- summary(model)$r.squared

bike_data |> 
  filter(Holiday == 1) |>
  ggplot(mapping = aes(x = Temperature^2, 
                       y = Rented.Bike.Count)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(title = "Rented.Bike.Count vs. (Temperature) ^ 2",
       subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) +
  theme_classic()

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Interpretation

let us use sqrt transformation according to our lambda value i.e Y=0.501 and close to 0 (positive)

bike_data <- bike_data |>
  mutate(Temperature_sqrt = sqrt(Temperature))  # add new variable

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Temperature_sqrt = sqrt(Temperature)`.
## Caused by warning in `sqrt()`:
## ! NaNs produced

model <- lm(Rented.Bike.Count ~ Temperature_sqrt + Temperature,
            filter(bike_data, Holiday == 1))

rsquared <- summary(model)$r.squared

bike_data |> 
  filter(Holiday == 1) |>
  ggplot(mapping = aes(x = sqrt(Temperature), 
                       y = Rented.Bike.Count)) +
  geom_point() +
  geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed',
              se = FALSE) +
  geom_smooth(se = FALSE) +
  labs(title = "Rented.Bike.Count vs. (Temperature) ^ sqrt",
       subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) +
  theme_classic()

## Warning in sqrt(Temperature): NaNs produced

## Warning in sqrt(Temperature): NaNs produced

## Warning in sqrt(Temperature): NaNs produced

## Warning in sqrt(Temperature): NaNs produced

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

## Warning: Removed 294 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 294 rows containing missing values (`geom_point()`).

Interpretation

The scatter plot for square root transformation is as similar to monotonic. In conclusion, ” If \(\lambda \approx 1\), no transformation is needed”

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

rmarkdown_week10

roshan

2023-10-26

R Markdown

Including Plots

Interpreting these coefficients

Confidence interval

Interpretation

Interpretation

Interpretation