R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
bike_data <- read_csv('D:/dataset/db1bike.csv')
## Rows: 199 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Date, Seasons, Holiday, Functioning Day
## dbl (10): Rented_Bike_Count, Hour, Temperature, Humidity, Wind_speed, Visibi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bike_data)
## spc_tbl_ [199 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date                 : chr [1:199] "01-12-2017" "01-12-2017" "01-12-2017" "01-12-2017" ...
##  $ Rented_Bike_Count    : num [1:199] 254 204 173 107 78 100 181 460 930 490 ...
##  $ Hour                 : num [1:199] 0 1 2 3 4 5 6 7 8 9 ...
##  $ Temperature          : num [1:199] -5.2 -5.5 -6 -6.2 -6 -6.4 -6.6 -7.4 -7.6 -6.5 ...
##  $ Humidity             : num [1:199] 37 38 39 40 36 37 35 38 37 27 ...
##  $ Wind_speed           : num [1:199] 2.2 0.8 1 0.9 2.3 1.5 1.3 0.9 1.1 0.5 ...
##  $ Visibility           : num [1:199] 2000 2000 2000 2000 2000 ...
##  $ Dew point temperature: num [1:199] -17.6 -17.6 -17.7 -17.6 -18.6 -18.7 -19.5 -19.3 -19.8 -22.4 ...
##  $ Solar Radiation      : num [1:199] 0 0 0 0 0 0 0 0 0.01 0.23 ...
##  $ Rainfall             : num [1:199] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Snowfall             : num [1:199] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Seasons              : chr [1:199] "Winter" "Winter" "Winter" "Winter" ...
##  $ Holiday              : chr [1:199] "No Holiday" "No Holiday" "No Holiday" "No Holiday" ...
##  $ Functioning Day      : chr [1:199] "Yes" "Yes" "Yes" "Yes" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Rented_Bike_Count = col_double(),
##   ..   Hour = col_double(),
##   ..   Temperature = col_double(),
##   ..   Humidity = col_double(),
##   ..   Wind_speed = col_double(),
##   ..   Visibility = col_double(),
##   ..   `Dew point temperature` = col_double(),
##   ..   `Solar Radiation` = col_double(),
##   ..   Rainfall = col_double(),
##   ..   Snowfall = col_double(),
##   ..   Seasons = col_character(),
##   ..   Holiday = col_character(),
##   ..   `Functioning Day` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Here is a linear regression model using the bike rental data:

bike_lm <- lm(Rented_Bike_Count ~ Temperature + Humidity + Wind_speed, data = bike_data)

summary(bike_lm)
## 
## Call:
## lm(formula = Rented_Bike_Count ~ Temperature + Humidity + Wind_speed, 
##     data = bike_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -306.04  -99.08  -43.99   84.28  660.89 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 561.3730    48.8856  11.483  < 2e-16 ***
## Temperature  15.2569     3.3376   4.571 8.60e-06 ***
## Humidity     -3.5854     0.6343  -5.653 5.54e-08 ***
## Wind_speed   -7.3326    11.1647  -0.657    0.512    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 175 on 195 degrees of freedom
## Multiple R-squared:  0.1754, Adjusted R-squared:  0.1627 
## F-statistic: 13.82 on 3 and 195 DF,  p-value: 3.279e-08

To diagnose the linear regression model, I will use the following tools:

Residual Plots

par(mfrow = c(1,2))
plot(bike_lm)

The residual vs fitted plot shows the residuals scattered randomly around 0, with no clear patterns in the residuals across the fitted values. This suggests the assumption of constant variance is met.

The Q-Q plot shows the residuals generally following the normal distribution line, with some deviation at the tails. This indicates the assumption of normality of errors is reasonably met.

Influence Plots```

car::influencePlot(bike_lm)

##       StudRes        Hat        CookD
## 9   3.7308679 0.02177967 0.0726631306
## 81  3.9659967 0.02458816 0.0921634190
## 86 -0.1360456 0.05372922 0.0002640557
## 88 -0.5113523 0.06502233 0.0045634122

The influence plot shows a couple high leverage points but no major influential cases.

Collinearity Diagnostics

car::vif(bike_lm)
## Temperature    Humidity  Wind_speed 
##    1.189676    1.322395    1.166072

The VIF values are all below 2, indicating there are no issues with multicollinearity in the model.

Overall, the diagnostic plots do not highlight any major issues with this linear regression model. The assumptions appear reasonably met and there are no highly influential cases.