As a solution to air pollution and traffic congestion, bike share schemes are being implemented and researched in many cities. Brighton’s bikeshare scheme starts in September and I have used data from the UCI depository for some basic data analysis.

The original dataset has 16 variables and 731 records, which were simplified for the purpose of my analysis.

new_dfrm <- select (day, season,temp, cnt)
final_dfrm <- mutate (new_dfrm, Temperature = new_dfrm$temp*41)
head(final_dfrm)
## # A tibble: 6 × 4
##   season     temp   cnt Temperature
##    <int>    <dbl> <int>       <dbl>
## 1      1 0.344167   985   14.110847
## 2      1 0.363478   801   14.902598
## 3      1 0.196364  1349    8.050924
## 4      1 0.200000  1562    8.200000
## 5      1 0.226957  1600    9.305237
## 6      1 0.204348  1606    8.378268

The data was split into seasonal subsets of total usage count and temperature.

Spring <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==1))
Summer <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==2))
Autumn <- subset(final_dfrm, select=c(cnt,Temperature), subset=(season==3))
Winter <-subset (final_dfrm, select=c(cnt, Temperature), subset= (season==4))

A boxplot of the counts for each season was produced using ggplot2

season_box <- ggplot(final_dfrm, aes(x = factor(season), y=cnt,))+ geom_boxplot() + xlab("Season")+ ylab ("Total") + ggtitle("Boxplot of Total Count by Season")
(season_box)

Summary statistics for each season were produced.

summary (Spring)
##       cnt        Temperature    
##  Min.   : 431   Min.   : 2.424  
##  1st Qu.:1538   1st Qu.: 9.123  
##  Median :2209   Median :11.719  
##  Mean   :2604   Mean   :12.208  
##  3rd Qu.:3456   3rd Qu.:14.831  
##  Max.   :7836   Max.   :23.473
summary (Summer)
##       cnt        Temperature   
##  Min.   : 795   Min.   :10.37  
##  1st Qu.:4003   1st Qu.:18.78  
##  Median :4942   Median :23.05  
##  Mean   :4992   Mean   :22.32  
##  3rd Qu.:6377   3rd Qu.:25.90  
##  Max.   :8362   Max.   :33.14
summary (Autumn)
##       cnt        Temperature   
##  Min.   :1115   Min.   :19.24  
##  1st Qu.:4586   1st Qu.:27.35  
##  Median :5354   Median :29.30  
##  Mean   :5644   Mean   :28.96  
##  3rd Qu.:6929   3rd Qu.:30.76  
##  Max.   :8714   Max.   :35.33
summary (Winter)
##       cnt        Temperature    
##  Min.   :  22   Min.   : 9.054  
##  1st Qu.:3616   1st Qu.:13.581  
##  Median :4634   Median :16.776  
##  Mean   :4728   Mean   :17.339  
##  3rd Qu.:5624   3rd Qu.:21.055  
##  Max.   :8555   Max.   :26.957

Correlation and Regression

Correlation between temperature and bike usage was calculated using Pearson’s coefficent

cor.test(final_dfrm$Temperature,final_dfrm$cnt)
## 
##  Pearson's product-moment correlation
## 
## data:  final_dfrm$Temperature and final_dfrm$cnt
## t = 21.759, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5814369 0.6695422
## sample estimates:
##      cor 
## 0.627494

The result showed a significant correlation so a linear model equation was produced.

bikeshare_lm <- lm(final_dfrm$Temperature ~ final_dfrm$cnt)
summary (bikeshare_lm)
## 
## Call:
## lm(formula = final_dfrm$Temperature ~ final_dfrm$cnt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.938  -4.772  -0.773   4.332  17.469 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.3606148  0.5477469   17.09   <2e-16 ***
## final_dfrm$cnt 0.0024310  0.0001117   21.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.848 on 729 degrees of freedom
## Multiple R-squared:  0.3937, Adjusted R-squared:  0.3929 
## F-statistic: 473.5 on 1 and 729 DF,  p-value: < 2.2e-16

Although the regression summary was disappointing, there is a clear trend of increasing use as the temperature rises to 25C,and then drops as illustrated below.

temp_gg <- ggplot (final_dfrm, aes( x= Temperature, y = cnt, colour = cnt))+geom_point()+geom_smooth()+xlab("Temperature") + ylab ("Total Count")+ggtitle("Total Count of Bikes used depending on Temperature")
temp_gg
## `geom_smooth()` using method = 'loess'