Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.
I downloaded the dataset Bike Sharing Dataset from the UCI machine learning database link.
rm(list = ls())
day <- read.csv("~/Desktop/UCSC//UCSC Data Analysis - R/Project/Bike-Sharing-Dataset/day.csv")
bk_sh_dy <- day
head(bk_sh_dy)
## instant dteday season yr mnth holiday weekday workingday weathersit
## 1 1 2011-01-01 1 0 1 0 6 0 2
## 2 2 2011-01-02 1 0 1 0 0 0 2
## 3 3 2011-01-03 1 0 1 0 1 1 1
## 4 4 2011-01-04 1 0 1 0 2 1 1
## 5 5 2011-01-05 1 0 1 0 3 1 1
## 6 6 2011-01-06 1 0 1 0 4 1 1
## temp atemp hum windspeed casual registered cnt
## 1 0.344167 0.363625 0.805833 0.1604460 331 654 985
## 2 0.363478 0.353739 0.696087 0.2485390 131 670 801
## 3 0.196364 0.189405 0.437273 0.2483090 120 1229 1349
## 4 0.200000 0.212122 0.590435 0.1602960 108 1454 1562
## 5 0.226957 0.229270 0.436957 0.1869000 82 1518 1600
## 6 0.204348 0.233209 0.518261 0.0895652 88 1518 1606
dim(bk_sh_dy)
## [1] 731 16
names(bk_sh_dy) ## names of the columns
## [1] "instant" "dteday" "season" "yr" "mnth"
## [6] "holiday" "weekday" "workingday" "weathersit" "temp"
## [11] "atemp" "hum" "windspeed" "casual" "registered"
## [16] "cnt"
is.null(bk_sh_dy) ## Checking for null values
## [1] FALSE
is.integer(bk_sh_dy)
## [1] FALSE
bk_sh_dy<- data.frame(bk_sh_dy)
str(bk_sh_dy)
## 'data.frame': 731 obs. of 16 variables:
## $ instant : int 1 2 3 4 5 6 7 8 9 10 ...
## $ dteday : Factor w/ 731 levels "2011-01-01","2011-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ yr : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mnth : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int 0 0 0 0 0 0 0 0 0 0 ...
## $ weekday : int 6 0 1 2 3 4 5 6 0 1 ...
## $ workingday: int 0 0 1 1 1 1 1 0 0 1 ...
## $ weathersit: int 2 2 1 1 1 1 2 2 1 1 ...
## $ temp : num 0.344 0.363 0.196 0.2 0.227 ...
## $ atemp : num 0.364 0.354 0.189 0.212 0.229 ...
## $ hum : num 0.806 0.696 0.437 0.59 0.437 ...
## $ windspeed : num 0.16 0.249 0.248 0.16 0.187 ...
## $ casual : int 331 131 120 108 82 88 148 68 54 41 ...
## $ registered: int 654 670 1229 1454 1518 1518 1362 891 768 1280 ...
## $ cnt : int 985 801 1349 1562 1600 1606 1510 959 822 1321 ...
instant: record index
dteday: date
season: season (1:spring, 2:summer, 3:fall, 4:winter)
yr: year (0: 2011, 1:2012)
mnth: month ( 1 to 12)
holiday: weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
weekday: day of the week
workingday: if day is neither weekend nor holiday is 1, otherwise is 0.
weathersit:
1: Clear, Few clouds, Partly cloudy
2: Mist and Cloudy, Mist and Broken clouds, Mist and Few clouds, Mist
3: Light Snow, Light Rain and Thunderstorm and Scattered clouds, Light Rain and Scattered clouds
4: Heavy Rain and Ice Pallets and Thunderstorm and Mist, Snow and Fog
temp: Normalized temperature in Celsius. The values are divided to 41 (max)
atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
hum: Normalized humidity. The values are divided to 100 (max)
windspeed: Normalized wind speed. The values are divided to 67 (max)
casual: count of casual users
registered: count of registered users
cnt: count of total rental bikes including both casual and registered
We created new attributes to denormalize the actual values, since the normalized values were very low and factorized the categorical attributes.
actual_temp: Converted normalized temperature in Celsius
actual_windspeed: Converted normalized windspeed
actual_humidity: Converted normalized humidity
actual_feel_temp: Converted normalized feeled temperature in Celsius
mean_acttemp_feeltemp: Created a mean of actual temperature and feel temperature
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(corrplot)
library(ggplot2)
library(stats)
bk_sh_dy$season <- factor(format(bk_sh_dy$season, format="%A"),
levels = c("1", "2","3","4") , labels = c("Spring","Summer","Fall","Winter"))
table(bk_sh_dy$season)
##
## Spring Summer Fall Winter
## 181 184 188 178
bk_sh_dy$holiday <- factor(format(bk_sh_dy$holiday, format="%A"),
levels = c("0", "1") , labels = c("Working Day","Holiday"))
table(bk_sh_dy$holiday)
##
## Working Day Holiday
## 710 21
bk_sh_dy$weathersit <- factor(format(bk_sh_dy$weathersit, format="%A"),
levels = c("1", "2","3","4") ,
labels = c("Good:Clear/Sunny","Moderate:Cloudy/Mist","Bad: Rain/Snow/Fog","Worse: Heavy Rain/Snow/Fog"))
table(bk_sh_dy$weathersit)
##
## Good:Clear/Sunny Moderate:Cloudy/Mist
## 463 247
## Bad: Rain/Snow/Fog Worse: Heavy Rain/Snow/Fog
## 21 0
bk_sh_dy$yr <- factor(format(bk_sh_dy$yr, format="%A"),
levels = c("0", "1") , labels = c("2011","2012"))
table(bk_sh_dy$yr)
##
## 2011 2012
## 365 366
bk_sh_dy$actual_temp <- bk_sh_dy$temp*41
bk_sh_dy$actual_feel_temp <- bk_sh_dy$atemp*50
bk_sh_dy$actual_windspeed <- bk_sh_dy$windspeed*67
bk_sh_dy$actual_humidity <- bk_sh_dy$hum*100
bk_sh_dy$mean_acttemp_feeltemp <- (bk_sh_dy$actual_temp+bk_sh_dy$actual_feel_temp)/2
str(bk_sh_dy)
## 'data.frame': 731 obs. of 21 variables:
## $ instant : int 1 2 3 4 5 6 7 8 9 10 ...
## $ dteday : Factor w/ 731 levels "2011-01-01","2011-01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ season : Factor w/ 4 levels "Spring","Summer",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ yr : Factor w/ 2 levels "2011","2012": 1 1 1 1 1 1 1 1 1 1 ...
## $ mnth : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : Factor w/ 2 levels "Working Day",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ weekday : int 6 0 1 2 3 4 5 6 0 1 ...
## $ workingday : int 0 0 1 1 1 1 1 0 0 1 ...
## $ weathersit : Factor w/ 4 levels "Good:Clear/Sunny",..: 2 2 1 1 1 1 2 2 1 1 ...
## $ temp : num 0.344 0.363 0.196 0.2 0.227 ...
## $ atemp : num 0.364 0.354 0.189 0.212 0.229 ...
## $ hum : num 0.806 0.696 0.437 0.59 0.437 ...
## $ windspeed : num 0.16 0.249 0.248 0.16 0.187 ...
## $ casual : int 331 131 120 108 82 88 148 68 54 41 ...
## $ registered : int 654 670 1229 1454 1518 1518 1362 891 768 1280 ...
## $ cnt : int 985 801 1349 1562 1600 1606 1510 959 822 1321 ...
## $ actual_temp : num 14.11 14.9 8.05 8.2 9.31 ...
## $ actual_feel_temp : num 18.18 17.69 9.47 10.61 11.46 ...
## $ actual_windspeed : num 10.7 16.7 16.6 10.7 12.5 ...
## $ actual_humidity : num 80.6 69.6 43.7 59 43.7 ...
## $ mean_acttemp_feeltemp: num 16.15 16.29 8.76 9.4 10.38 ...
summary(bk_sh_dy)
## instant dteday season yr mnth
## Min. : 1.0 2011-01-01: 1 Spring:181 2011:365 Min. : 1.00
## 1st Qu.:183.5 2011-01-02: 1 Summer:184 2012:366 1st Qu.: 4.00
## Median :366.0 2011-01-03: 1 Fall :188 Median : 7.00
## Mean :366.0 2011-01-04: 1 Winter:178 Mean : 6.52
## 3rd Qu.:548.5 2011-01-05: 1 3rd Qu.:10.00
## Max. :731.0 2011-01-06: 1 Max. :12.00
## (Other) :725
## holiday weekday workingday
## Working Day:710 Min. :0.000 Min. :0.000
## Holiday : 21 1st Qu.:1.000 1st Qu.:0.000
## Median :3.000 Median :1.000
## Mean :2.997 Mean :0.684
## 3rd Qu.:5.000 3rd Qu.:1.000
## Max. :6.000 Max. :1.000
##
## weathersit temp atemp
## Good:Clear/Sunny :463 Min. :0.05913 Min. :0.07907
## Moderate:Cloudy/Mist :247 1st Qu.:0.33708 1st Qu.:0.33784
## Bad: Rain/Snow/Fog : 21 Median :0.49833 Median :0.48673
## Worse: Heavy Rain/Snow/Fog: 0 Mean :0.49538 Mean :0.47435
## 3rd Qu.:0.65542 3rd Qu.:0.60860
## Max. :0.86167 Max. :0.84090
##
## hum windspeed casual registered
## Min. :0.0000 Min. :0.02239 Min. : 2.0 Min. : 20
## 1st Qu.:0.5200 1st Qu.:0.13495 1st Qu.: 315.5 1st Qu.:2497
## Median :0.6267 Median :0.18097 Median : 713.0 Median :3662
## Mean :0.6279 Mean :0.19049 Mean : 848.2 Mean :3656
## 3rd Qu.:0.7302 3rd Qu.:0.23321 3rd Qu.:1096.0 3rd Qu.:4776
## Max. :0.9725 Max. :0.50746 Max. :3410.0 Max. :6946
##
## cnt actual_temp actual_feel_temp actual_windspeed
## Min. : 22 Min. : 2.424 Min. : 3.953 Min. : 1.500
## 1st Qu.:3152 1st Qu.:13.820 1st Qu.:16.892 1st Qu.: 9.042
## Median :4548 Median :20.432 Median :24.337 Median :12.125
## Mean :4504 Mean :20.311 Mean :23.718 Mean :12.763
## 3rd Qu.:5956 3rd Qu.:26.872 3rd Qu.:30.430 3rd Qu.:15.625
## Max. :8714 Max. :35.328 Max. :42.045 Max. :34.000
##
## actual_humidity mean_acttemp_feeltemp
## Min. : 0.00 Min. : 3.189
## 1st Qu.:52.00 1st Qu.:15.251
## Median :62.67 Median :22.347
## Mean :62.79 Mean :22.014
## 3rd Qu.:73.02 3rd Qu.:28.664
## Max. :97.25 Max. :38.413
##
h <- hist(bk_sh_dy$cnt, breaks = 25, ylab = 'Frequency of Rental', xlab = 'Total Bike Rental Count', main = 'Distribution of Total Bike Rental Count', col = 'blue' )
xfit <- seq(min(bk_sh_dy$cnt),max(bk_sh_dy$cnt), length = 50)
yfit <- dnorm(xfit, mean =mean(bk_sh_dy$cnt),sd=sd(bk_sh_dy$cnt))
yfit <- yfit*diff(h$mids[1:2])*length(bk_sh_dy$cnt)
lines(xfit,yfit, col='red', lwd= 3)
Firstly, we observed how the response variable Total Bike Rentals (cnt) is distributed.
From the histogram above, it seems that the number of total rented bikes follow a nearly normal distribution.The mean and variance of distribution are the same, and when the mean is getting larger, distribution approximates a normal distribution.
Next, we looked at the relationship between the response variable and each explanatory variable. We selected few plots with patterns as shown below.
par(mfcol=c(2,2))
boxplot(bk_sh_dy$cnt ~ bk_sh_dy$season,
data = bk_sh_dy,
main = "Total Bike Rentals Vs Season",
xlab = "Season",
ylab = "Total Bike Rentals",
col = c("coral", "coral1", "coral2", "coral3"))
boxplot(bk_sh_dy$cnt ~ bk_sh_dy$holiday,
data = bk_sh_dy,
main = "Total Bike Rentals Vs Holiday/Working Day",
xlab = "Holiday/Working Day",
ylab = "Total Bike Rentals",
col = c("pink", "pink1", "pink2", "pink3"))
boxplot(bk_sh_dy$cnt ~ bk_sh_dy$weathersit,
data = bk_sh_dy,
main = "Total Bike Rentals Vs Weather Situation",
xlab = "Weather Situation",
ylab = "Total Bike Rentals",
col = c("purple", "purple1", "purple2", "purple3"))
plot(bk_sh_dy$dteday, bk_sh_dy$cnt,type = "p",
main = "Total Bike Rentals Vs DateDay",
xlab = "Year",
ylab = "Total Bike Rentals",
col = "orange",
pch = 19)
The plot shows the relationship between Total Bike Rentals(cnt) variable and season. The average numbers of bike rentals are the highest during summer and fall.
The plot shows the relationship between Total Bike Rentals(cnt) variable and holiday. We can see that the average number of bike rentals on working day is higher than holiday.
The plot shows the relationship between Total Bike Rentals(cnt) variable and weather. There is a clearly decreasing trend of bike rentals when weather is bad.
The plot shows the relationship between Total Bike Rentals(cnt) variable and Year. We can see that the overall trend increased during the two-year time span. And within each year, there are huge amount of bike rentals during summer and fall seasons.
par(mfrow=c(2,2))
plot(bk_sh_dy$actual_temp, bk_sh_dy$cnt ,type = 'h', col= 'yellow', xlab = 'Actual Temperature', ylab = 'Total Bike Rentals')
plot(bk_sh_dy$actual_feel_temp, bk_sh_dy$cnt ,type = 'h', col= 'yellow', xlab = 'Actual Feel Temperature', ylab = 'Total Bike Rentals')
plot(bk_sh_dy$actual_windspeed, bk_sh_dy$cnt ,type = 'h', col= 'yellow', xlab = 'Actual Windspeed', ylab = 'Total Bike Rentals')
plot(bk_sh_dy$actual_humidity, bk_sh_dy$cnt ,type = 'h', col= 'yellow', xlab = 'Actual Humidity', ylab = 'Total Bike Rentals')
It seems these numerical variables are distributed quite naturally.
Correlation tests between Bike Rental Count, Actual temp , Feel Temp, Mean Actual Temp Feel Temp, Windspeed and Humidity .
Cor_actual_temp<-cor(x = bk_sh_dy$actual_temp, y = bk_sh_dy$cnt)
Cor_actual_feel_temp <- cor(x = bk_sh_dy$actual_feel_temp, y =bk_sh_dy$cnt)
bk_sh_dy_cor<- bk_sh_dy %>% select (cnt,actual_temp,actual_feel_temp,mean_acttemp_feeltemp,actual_humidity,actual_windspeed)
bk_sh_dy_cor<- data.frame(bk_sh_dy_cor)
colnames(bk_sh_dy_cor)[1] <- "Total Number of Bike Rentals"
colnames(bk_sh_dy_cor)[2] <- "Temperature"
colnames(bk_sh_dy_cor)[3] <- "Feel Temperature"
colnames(bk_sh_dy_cor)[4] <- "Mean Actual Temp Feel Temp"
colnames(bk_sh_dy_cor)[5] <- "Humidity"
colnames(bk_sh_dy_cor)[6] <- "Windspeed"
cor(bk_sh_dy_cor)
## Total Number of Bike Rentals Temperature
## Total Number of Bike Rentals 1.0000000 0.6274940
## Temperature 0.6274940 1.0000000
## Feel Temperature 0.6310657 0.9917016
## Mean Actual Temp Feel Temp 0.6306607 0.9977489
## Humidity -0.1006586 0.1269629
## Windspeed -0.2345450 -0.1579441
## Feel Temperature Mean Actual Temp Feel Temp
## Total Number of Bike Rentals 0.6310657 0.6306607
## Temperature 0.9917016 0.9977489
## Feel Temperature 1.0000000 0.9980905
## Mean Actual Temp Feel Temp 0.9980905 1.0000000
## Humidity 0.1399881 0.1340209
## Windspeed -0.1836430 -0.1716773
## Humidity Windspeed
## Total Number of Bike Rentals -0.1006586 -0.2345450
## Temperature 0.1269629 -0.1579441
## Feel Temperature 0.1399881 -0.1836430
## Mean Actual Temp Feel Temp 0.1340209 -0.1716773
## Humidity 1.0000000 -0.2484891
## Windspeed -0.2484891 1.0000000
corplot_bk_sh <- cor(bk_sh_dy_cor)
corrplot(corplot_bk_sh, method="number")
From the above correlation plots actual tempearture is more correlated with bike rentals, humidity and windspeed are also slightly correlated.
library(ggplot2)
ggplot_Temp_Rent<- ggplot(bk_sh_dy, aes(x=bk_sh_dy$actual_temp,y=bk_sh_dy$cnt))+geom_point(shape=1)+geom_smooth(method=lm)+ xlab("Actual Temp. in Celcius")+ylab("Bike Rentals")
ggplot_Temp_Rent+scale_y_continuous(breaks=c(0,1100,2345,3500,5000,6000,7000,8000))+labs(title="Total Bike Rentals Vs Actual Temperature | Intercept = 2345")
lm_test<- lm(bk_sh_dy$cnt~bk_sh_dy$actual_temp)
summary(lm_test)
##
## Call:
## lm(formula = bk_sh_dy$cnt ~ bk_sh_dy$actual_temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4615.3 -1134.9 -104.4 1044.3 3737.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1214.642 161.164 7.537 1.43e-13 ***
## bk_sh_dy$actual_temp 161.969 7.444 21.759 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1509 on 729 degrees of freedom
## Multiple R-squared: 0.3937, Adjusted R-squared: 0.3929
## F-statistic: 473.5 on 1 and 729 DF, p-value: < 2.2e-16
plot(lm_test, col = "green")
From the linear regression between bike rentals (cnt) and actual temperature, we found that R-Squared value is at 40%, with p-value for actual temperatue is at a significant level.
Linear Regression between Total Bike Rentals, Temperature, Windspeed and Humidity
lm_test1<- lm(sqrt(bk_sh_dy$cnt)~bk_sh_dy$actual_temp+bk_sh_dy$actual_humidity+bk_sh_dy$actual_windspeed)
lm_test1
##
## Call:
## lm(formula = sqrt(bk_sh_dy$cnt) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Coefficients:
## (Intercept) bk_sh_dy$actual_temp
## 61.6726 1.3374
## bk_sh_dy$actual_humidity bk_sh_dy$actual_windspeed
## -0.2531 -0.6035
summary(lm_test1)
##
## Call:
## lm(formula = sqrt(bk_sh_dy$cnt) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.460 -8.065 0.531 7.811 25.632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.67260 2.70399 22.808 < 2e-16 ***
## bk_sh_dy$actual_temp 1.33744 0.05721 23.378 < 2e-16 ***
## bk_sh_dy$actual_humidity -0.25313 0.03073 -8.237 8.21e-16 ***
## bk_sh_dy$actual_windspeed -0.60348 0.08468 -7.127 2.48e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.41 on 727 degrees of freedom
## Multiple R-squared: 0.4781, Adjusted R-squared: 0.4759
## F-statistic: 222 on 3 and 727 DF, p-value: < 2.2e-16
lm_test2<- lm(((bk_sh_dy$cnt)^2)~bk_sh_dy$actual_temp+bk_sh_dy$actual_humidity+bk_sh_dy$actual_windspeed)
lm_test2
##
## Call:
## lm(formula = ((bk_sh_dy$cnt)^2) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Coefficients:
## (Intercept) bk_sh_dy$actual_temp
## 22179942 1347024
## bk_sh_dy$actual_humidity bk_sh_dy$actual_windspeed
## -280650 -617458
summary(lm_test2)
##
## Call:
## lm(formula = ((bk_sh_dy$cnt)^2) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38825404 -9628562 -3271989 9996519 45699800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22179942 3299775 6.722 3.63e-11 ***
## bk_sh_dy$actual_temp 1347024 69816 19.294 < 2e-16 ***
## bk_sh_dy$actual_humidity -280650 37503 -7.483 2.10e-13 ***
## bk_sh_dy$actual_windspeed -617458 103337 -5.975 3.60e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13920000 on 727 degrees of freedom
## Multiple R-squared: 0.3875, Adjusted R-squared: 0.3849
## F-statistic: 153.3 on 3 and 727 DF, p-value: < 2.2e-16
lm_test3<- lm((log(bk_sh_dy$cnt))~bk_sh_dy$actual_temp+bk_sh_dy$actual_humidity+bk_sh_dy$actual_windspeed)
lm_test3
##
## Call:
## lm(formula = (log(bk_sh_dy$cnt)) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Coefficients:
## (Intercept) bk_sh_dy$actual_temp
## 8.225902 0.046859
## bk_sh_dy$actual_humidity bk_sh_dy$actual_windspeed
## -0.009481 -0.023428
summary(lm_test3)
##
## Call:
## lm(formula = (log(bk_sh_dy$cnt)) ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5836 -0.2396 0.0637 0.2787 0.7905
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.225902 0.103620 79.39 < 2e-16 ***
## bk_sh_dy$actual_temp 0.046859 0.002192 21.37 < 2e-16 ***
## bk_sh_dy$actual_humidity -0.009481 0.001178 -8.05 3.37e-15 ***
## bk_sh_dy$actual_windspeed -0.023428 0.003245 -7.22 1.31e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4371 on 727 degrees of freedom
## Multiple R-squared: 0.4408, Adjusted R-squared: 0.4385
## F-statistic: 191 on 3 and 727 DF, p-value: < 2.2e-16
lm_final<- lm(bk_sh_dy$cnt~bk_sh_dy$actual_temp+bk_sh_dy$actual_humidity+bk_sh_dy$actual_windspeed)
lm_final
##
## Call:
## lm(formula = bk_sh_dy$cnt ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Coefficients:
## (Intercept) bk_sh_dy$actual_temp
## 4084.36 161.60
## bk_sh_dy$actual_humidity bk_sh_dy$actual_windspeed
## -31.00 -71.75
summary(lm_final)
##
## Call:
## lm(formula = bk_sh_dy$cnt ~ bk_sh_dy$actual_temp + bk_sh_dy$actual_humidity +
## bk_sh_dy$actual_windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4780.5 -1082.6 -62.2 1056.5 3653.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4084.363 337.862 12.089 < 2e-16 ***
## bk_sh_dy$actual_temp 161.598 7.148 22.606 < 2e-16 ***
## bk_sh_dy$actual_humidity -31.001 3.840 -8.073 2.83e-15 ***
## bk_sh_dy$actual_windspeed -71.745 10.581 -6.781 2.48e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1425 on 727 degrees of freedom
## Multiple R-squared: 0.4609, Adjusted R-squared: 0.4587
## F-statistic: 207.2 on 3 and 727 DF, p-value: < 2.2e-16
plot(lm_final,col = "gold", main = "Linear Regression: Bike Rentals, Temp, Windspeed and Humidity")
As we found the correlation plots against bike rentals with humidity and windspeed were slightly related, we created a linear model and found the R-Squared value at 46% and all p-value for three variables were significant.
Though, checking the residual plot and QQ plot, we can see that the residuals have a pattern, and are not normally distributed, which means the linear model doesn’t fit the data so well.