This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
bike_data <- read_csv('D:/dataset/db1bike.csv')
## Rows: 199 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Date, Seasons, Holiday, Functioning Day
## dbl (10): Rented_Bike_Count, Hour, Temperature, Humidity, Wind_speed, Visibi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(bike_data)
## spc_tbl_ [199 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:199] "01-12-2017" "01-12-2017" "01-12-2017" "01-12-2017" ...
## $ Rented_Bike_Count : num [1:199] 254 204 173 107 78 100 181 460 930 490 ...
## $ Hour : num [1:199] 0 1 2 3 4 5 6 7 8 9 ...
## $ Temperature : num [1:199] -5.2 -5.5 -6 -6.2 -6 -6.4 -6.6 -7.4 -7.6 -6.5 ...
## $ Humidity : num [1:199] 37 38 39 40 36 37 35 38 37 27 ...
## $ Wind_speed : num [1:199] 2.2 0.8 1 0.9 2.3 1.5 1.3 0.9 1.1 0.5 ...
## $ Visibility : num [1:199] 2000 2000 2000 2000 2000 ...
## $ Dew point temperature: num [1:199] -17.6 -17.6 -17.7 -17.6 -18.6 -18.7 -19.5 -19.3 -19.8 -22.4 ...
## $ Solar Radiation : num [1:199] 0 0 0 0 0 0 0 0 0.01 0.23 ...
## $ Rainfall : num [1:199] 0 0 0 0 0 0 0 0 0 0 ...
## $ Snowfall : num [1:199] 0 0 0 0 0 0 0 0 0 0 ...
## $ Seasons : chr [1:199] "Winter" "Winter" "Winter" "Winter" ...
## $ Holiday : chr [1:199] "No Holiday" "No Holiday" "No Holiday" "No Holiday" ...
## $ Functioning Day : chr [1:199] "Yes" "Yes" "Yes" "Yes" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Rented_Bike_Count = col_double(),
## .. Hour = col_double(),
## .. Temperature = col_double(),
## .. Humidity = col_double(),
## .. Wind_speed = col_double(),
## .. Visibility = col_double(),
## .. `Dew point temperature` = col_double(),
## .. `Solar Radiation` = col_double(),
## .. Rainfall = col_double(),
## .. Snowfall = col_double(),
## .. Seasons = col_character(),
## .. Holiday = col_character(),
## .. `Functioning Day` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Here is a linear regression model using the bike rental data:
bike_lm <- lm(Rented_Bike_Count ~ Temperature + Humidity + Wind_speed, data = bike_data)
summary(bike_lm)
##
## Call:
## lm(formula = Rented_Bike_Count ~ Temperature + Humidity + Wind_speed,
## data = bike_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -306.04 -99.08 -43.99 84.28 660.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 561.3730 48.8856 11.483 < 2e-16 ***
## Temperature 15.2569 3.3376 4.571 8.60e-06 ***
## Humidity -3.5854 0.6343 -5.653 5.54e-08 ***
## Wind_speed -7.3326 11.1647 -0.657 0.512
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 175 on 195 degrees of freedom
## Multiple R-squared: 0.1754, Adjusted R-squared: 0.1627
## F-statistic: 13.82 on 3 and 195 DF, p-value: 3.279e-08
To diagnose the linear regression model, I will use the following tools:
Residual Plots
par(mfrow = c(1,2))
plot(bike_lm)
The residual vs fitted plot shows the residuals scattered randomly
around 0, with no clear patterns in the residuals across the fitted
values. This suggests the assumption of constant variance is met.
The Q-Q plot shows the residuals generally following the normal distribution line, with some deviation at the tails. This indicates the assumption of normality of errors is reasonably met.
Influence Plots```
car::influencePlot(bike_lm)
## StudRes Hat CookD
## 9 3.7308679 0.02177967 0.0726631306
## 81 3.9659967 0.02458816 0.0921634190
## 86 -0.1360456 0.05372922 0.0002640557
## 88 -0.5113523 0.06502233 0.0045634122
The influence plot shows a couple high leverage points but no major influential cases.
Collinearity Diagnostics
car::vif(bike_lm)
## Temperature Humidity Wind_speed
## 1.189676 1.322395 1.166072
The VIF values are all below 2, indicating there are no issues with multicollinearity in the model.
Overall, the diagnostic plots do not highlight any major issues with this linear regression model. The assumptions appear reasonably met and there are no highly influential cases.