As a solution to air pollution and traffic congestion, bike share schemes are being implemented and researched in many cities. Brighton’s bikeshare scheme starts in September and I have used data from the UCI depository for some basic data analysis.
The original dataset has 16 variables and 731 records, which were simplified for the purpose of my analysis.
new_dfrm <- select (day, season,temp, cnt)
final_dfrm <- mutate (new_dfrm, Temperature = new_dfrm$temp*41)
head(final_dfrm)
## # A tibble: 6 × 4
## season temp cnt Temperature
## <int> <dbl> <int> <dbl>
## 1 1 0.344167 985 14.110847
## 2 1 0.363478 801 14.902598
## 3 1 0.196364 1349 8.050924
## 4 1 0.200000 1562 8.200000
## 5 1 0.226957 1600 9.305237
## 6 1 0.204348 1606 8.378268
The data was split into seasonal subsets of total usage count and temperature.
Spring <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==1))
Summer <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==2))
Autumn <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==3))
Winter <-subset (final_dfrm, select=c(cnt, Temperature), subset= (season==4))
A boxplot of the counts for each season was produced using ggplot2
season_box <- ggplot(final_dfrm, aes(x = factor(season), y=cnt,))+ geom_boxplot() + xlab("Season")+ ylab ("Total") + ggtitle("Boxplot of Total Count by Season")
(season_box)
Summary statistics for each season were produced.
summary (Spring)
## cnt Temperature
## Min. : 431 Min. : 2.424
## 1st Qu.:1538 1st Qu.: 9.123
## Median :2209 Median :11.719
## Mean :2604 Mean :12.208
## 3rd Qu.:3456 3rd Qu.:14.831
## Max. :7836 Max. :23.473
summary (Summer)
## cnt Temperature
## Min. : 795 Min. :10.37
## 1st Qu.:4003 1st Qu.:18.78
## Median :4942 Median :23.05
## Mean :4992 Mean :22.32
## 3rd Qu.:6377 3rd Qu.:25.90
## Max. :8362 Max. :33.14
summary (Autumn)
## cnt Temperature
## Min. :1115 Min. :19.24
## 1st Qu.:4586 1st Qu.:27.35
## Median :5354 Median :29.30
## Mean :5644 Mean :28.96
## 3rd Qu.:6929 3rd Qu.:30.76
## Max. :8714 Max. :35.33
summary (Winter)
## cnt Temperature
## Min. : 22 Min. : 9.054
## 1st Qu.:3616 1st Qu.:13.581
## Median :4634 Median :16.776
## Mean :4728 Mean :17.339
## 3rd Qu.:5624 3rd Qu.:21.055
## Max. :8555 Max. :26.957
Correlation between temperature and bike usage was calculated using Pearson’s coefficent
cor.test(final_dfrm$Temperature,final_dfrm$cnt)
##
## Pearson's product-moment correlation
##
## data: final_dfrm$Temperature and final_dfrm$cnt
## t = 21.759, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5814369 0.6695422
## sample estimates:
## cor
## 0.627494
The result showed a significant correlation so a linear model equation was produced.
bikeshare_lm <- lm(final_dfrm$Temperature ~ final_dfrm$cnt)
summary (bikeshare_lm)
##
## Call:
## lm(formula = final_dfrm$Temperature ~ final_dfrm$cnt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.938 -4.772 -0.773 4.332 17.469
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.3606148 0.5477469 17.09 <2e-16 ***
## final_dfrm$cnt 0.0024310 0.0001117 21.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.848 on 729 degrees of freedom
## Multiple R-squared: 0.3937, Adjusted R-squared: 0.3929
## F-statistic: 473.5 on 1 and 729 DF, p-value: < 2.2e-16
Although the regression summary was disappointing, there is a clear trend of increasing use as the temperature rises to 25C,and then drops as illustrated below.
temp_gg <- ggplot (final_dfrm, aes( x= Temperature, y = cnt, colour = cnt))+geom_point()+geom_smooth()+xlab("Temperature") + ylab ("Total Count")+ggtitle("Total Count of Bikes used depending on Temperature")
temp_gg
## `geom_smooth()` using method = 'loess'