2017 Cycling Stats

By Bike

Data were recorded from 1/1/17 to 12/31/17

Bike names were truncated for ease of visualization:
1. Masi FG = Masi Speciale Fixed FG Road
2. Masi LTD = Masi Speciale Fixed LTD FG Road
3. Lang FG = Specialized Langster FG Road
4. Roub EXP = Specialized Roubaix EXPERT Road
5. Tri COMP = Specialized Tricross COMP CX Road
6. Tri EXP = Specialized Tricross EXPERT CX Road
7. Tri SING = Specialized TriCross SINGLE CX Road

90.3% of rides are covered by only 4 bikes.

90.6% of distance is coverd by only 3 bikes.

By Day of the Week

Speed doesn’t vary day to day, but Distance definitely does: longer rides on the weekends, shorter rides on weekdays. Monday has slightly longer rides, likely because Monday is sometimes part of a 3 day weekend.

Over Time

Altitude Gain and Distance


Call:
lm(formula = Altitude.Gain..ft. ~ Distance..miles., data = bikes)

Residuals:
    Min      1Q  Median      3Q     Max 
-656.05 -185.36  -47.61  143.65 1268.33 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -198.335     70.816  -2.801  0.00586 ** 
Distance..miles.   44.786      1.827  24.510  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 325.1 on 133 degrees of freedom
Multiple R-squared:  0.8187,    Adjusted R-squared:  0.8174 
F-statistic: 600.7 on 1 and 133 DF,  p-value: < 2.2e-16

Year-by-year Comparison

Year-by-year Overview

Seems like he’s going on longer rides (the number of rides used to outpace the distance, but that hasn’t been the case over the past 3 years). Let’s see if this is true based on average distance.

Average ride distance was almost identitical over the first 3 years and has definitely decreased in recent years.

Here’s an alternative barplot visualization of Ride Frequency and Distance.

The highest total distance occurred in 2012, while the lowest total distance occurred in 2014. 2017 was about par for the case, with total distance around 4800 miles in 2013, 2015, and 2017.

Altitude gained follows a similar trend to the distance gained. Let’s take a closer look at the highest distance year, lowest distance year, and last year.

He was on an even slower pace than he was in 2014 headed into the summer months, but made up a lot of ground and finished around the average mileage total. The mileage gain in 2012 was consistently paced throughout.

Although 2012 was by far the best year in terms of distance gained, the average speeds were a lot more erratic. Much more consistent riding in 2017.

2016 consisted of much longer rides than any other year (shown by the higher median and longer upper whisker than other years). The individual points show outliers (greater than 1.5*IQR higher than the 75th percentile). The plot shows that the large total distance in 2016 is explained by longer rides, while the large total distance in 2012 is explained by a greater volume of bike rides.

An alternative representation that proves the hypothesis above. The violin plots show the relative density of points. In 2012, there’s a greater density of shorter-distance rides, while in 2016, the density is higher around the median.

Ride Type by Year

An overwhelming majority of the rides are Rolling Road rides over the past 6 years. Last year, no off-road rides were recorded and only one Hills Road ride has been recorded in the past 4 years.

Bike Type

Finally, we’ll look at the bikes used for the road rides over the past 6 years.

The 10 bikes used over the past 6 years are shown in the table below. Values for altitude gain, distance, and speed are average values for each bike.

Bike Rides Alt. Gain (ft) Distance (mi) Speed (mph)
Masi Speciale Fixed FG Road 33 560.3636 20.68303 13.311976
Specialized Langster FG Road 102 923.1275 29.45637 14.854365
Specialized Rockhopper EXPERT 29er 2 441.5000 12.65000 8.899292
Specialized Roubaix EXPERT Road 310 2117.2774 47.41097 15.208614
Specialized Tricross COMP CX Road 241 1274.5643 33.17975 14.394964
Specialized Tricross SPORT CX Road 18 678.6667 16.50889 13.841827
Centurion Super Le Mans FG Road 63 665.6349 20.63762 13.897439
Masi Speciale Fixed LTD FG Road 28 564.6429 19.96429 13.514350
Specialized Tricross EXPERT CX Road 41 1437.4390 35.49049 14.458006
Specialized TriCross SINGLE CX Road 8 346.3750 21.51250 12.514158

Rides by Day

A lot more weekday rides in 2012 and 2017 than there were in other years. Pretty consistent Sunday and Saturday riding, with a couple of years exhibiting more than 52 rides on those days.

As expected, a lot more distance is accumulated on the weekends while weekdays are usually shorter rides.

---
title: "2017 Bike Stats"
author: "Alex Stransky"
date: "January 1, 2018"
output: html_notebook
---

## 2017 Cycling Stats

```{r echo = FALSE, warning = FALSE, message = FALSE}
setwd("~/R")
source("fteplots.R")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
setwd("~/R/fun")
bikes <- read.csv("stransky_2017_journal.csv")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
library(ggplot2)
library(dplyr)
library(stringr)
library(lubridate)
library(scales)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
# Convert ride date to date format
bikes$Ride.Date <- mdy(bikes$Ride.Date)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikes <- bikes %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikes <- bikes %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  )
```

### By Bike

```{r echo = FALSE, warning = FALSE, message = FALSE}
bike.type <- bikes %>%
  group_by(Bike) %>%
  summarise(
    n = n(),
    Altitude = mean(Altitude.Gain..ft.),
    Distance = mean(Distance..miles.),
    Speed = mean(Avg.Speed)
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bike.type$BikeTrunc = c("Masi FG", "Masi LTD", "Lang FG", "Roub EXP", "Tri COMP", "Tri EXP", "Tri SING")
bike.type$BikeCat = c("Masi Speciale", "Masi Speciale", "Langster", "Roubaix", "Tricross", "Tricross", "Tricross")
```

Data were recorded from 1/1/17 to 12/31/17

Bike names were truncated for ease of visualization: <br>
1. Masi FG = Masi Speciale Fixed FG Road <br>
2. Masi LTD = Masi Speciale Fixed LTD FG Road <br>
3. Lang FG = Specialized Langster FG Road <br>
4. Roub EXP = Specialized Roubaix EXPERT Road <br>
5. Tri COMP = Specialized Tricross COMP CX Road <br>
6. Tri EXP = Specialized Tricross EXPERT CX Road <br>
7. Tri SING = Specialized TriCross SINGLE CX Road <br>

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bike.type, aes(x = BikeTrunc)) +
  geom_bar(aes(y = Speed, fill = BikeCat), colour = "#535353", stat = "identity", alpha = 0.80) +
  ggtitle("Bike Speed") +
  fte +
  xlab("Bike") + ylab("Average Speed (mph)") +
  theme(axis.text.x = element_text(vjust = 0.6, angle = 75)) +
  guides(fill = guide_legend(title = "Category"))
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bike.type <- bike.type %>%
  mutate(
    Tot.Alt = n * Altitude,
    Tot.Dist = n * Distance
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
#Stroke modifies the width of the border
ggplot(bike.type, aes(x = n, y = Tot.Dist)) +
  geom_point(size = 2.5, alpha = 0.3, stroke = 1.5) +
  fte +
  ggtitle("Bike Distance and Frequency of Use") +
  xlab("Number of Rides") +
  ylab("Total Distance") +
  geom_text(label = if_else(bike.type$BikeTrunc %in% c("Roub EXP", "Tri COMP"), bike.type$BikeTrunc, ' '), size = 3.5, colour = "chocolate1", fontface = "bold", hjust = 1, vjust = 1.5) +
  geom_text(label = if_else(bike.type$BikeTrunc %in% c("Tri EXP"), bike.type$BikeTrunc, ' '), size = 3.5, colour = "chocolate1", fontface = "bold", hjust = -0.10, vjust = -0.75) +
  annotate("segment", x = 9, y = 500, xend = 8, yend = 250, size = 0.75, arrow = arrow(length = unit(0.2, "cm")), colour = "chocolate1") +
  annotate("text", x = 9.5, y = 600, label = "Tri SING", colour = "chocolate1", fontface = "bold", size = 3.5) +
  geom_text(label = if_else(bike.type$BikeTrunc %in% c("Masi LTD FG", "Masi FG"), bike.type$BikeTrunc, ' '), size = 3.5, colour = "chocolate1", fontface = "bold", hjust = -0.05, vjust = 1.2) +
  geom_text(label = if_else(bike.type$BikeTrunc %in% c("Lang FG"), bike.type$BikeTrunc, ' '), size = 3.5, colour = "chocolate1", fontface = "bold", hjust = 0.40, vjust = -0.80) +
  annotate("segment", x = 12, y = 25, xend = 9.5, yend = 140, size = 0.75, arrow = arrow(length = unit(0.2, "cm")), colour = "chocolate1") +
  annotate("text", x = 15.5, y = 25, label = "Masi LTD", colour = "chocolate1", fontface = "bold", size = 3.5)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bike.type$RidePerc <- str_c(round(100 * bike.type$n / sum(bike.type$n), 1), "%", sep = "")
pie(bike.type$n, labels = bike.type$RidePerc, main = "Ride Pie Chart", col = topo.colors(length(bike.type$n)))
legend("left", bike.type$BikeTrunc, cex = 0.8, fill = topo.colors(length(bike.type$n)))
```

90.3% of rides are covered by only 4 bikes.

```{r echo = FALSE, warning = FALSE, message = FALSE}
bike.type$DistPerc <- str_c(round(100 * bike.type$Tot.Dist / sum(bike.type$Tot.Dist), 1), "%", sep = "")
pie(bike.type$Tot.Dist, labels = bike.type$DistPerc, main = "Distance Pie Chart", col = topo.colors(length(bike.type$n)))
legend("left", bike.type$BikeTrunc, cex = 0.8, fill = topo.colors(length(bike.type$n)))
```

90.6% of distance is coverd by only 3 bikes.

```{r echo = FALSE, warning = FALSE, message = FALSE}
library(reshape2)
bike.type.melt <- bike.type %>%
  select(BikeTrunc, Speed, Distance)
bike.type.melt <- melt(bike.type.melt, id.vars = "BikeTrunc")
bike.type.melt$value <- as.numeric(bike.type.melt$value)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bike.type.melt, aes(x = BikeTrunc, y = value, fill = variable)) +
  geom_bar(colour = "#535353", stat = "identity", position = "dodge", alpha = 0.80) +
  fte +
  ggtitle("Better Visualization for Ride Frequency and Distance") +
  xlab("Bike") +
  ylab("Average Miles (per Hour)") +
  theme(axis.text.x = element_text(angle = 90)) +
  guides(fill = guide_legend(title = "Variable"))
```

### By Day of the Week

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikes$day <- weekdays(bikes$Ride.Date, abbreviate = T)

dayofweek <- bikes %>%
  group_by(day) %>%
  summarise(
    n = n(),
    Altitude = mean(Altitude.Gain..ft.),
    Distance = mean(Distance..miles.),
    Speed = mean(Avg.Speed)
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
#Relevel the factor in order of the days of the week
dayofweek$day <- as.factor(dayofweek$day)
dayofweek$day <- factor(dayofweek$day, c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(dayofweek, aes(x = day)) +
  geom_bar(aes(y = n, fill = day), colour = "#535353", stat = "identity", alpha = 0.80) +
  ggtitle("Rides by Day") +
  fte +
  xlab("Day") + ylab("Rides") +
  theme(legend.position = "none")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
dayofweek.melt <- dayofweek %>%
  select(day, Distance, Speed)
dayofweek.melt <- melt(dayofweek.melt, id.vars = "day")
dayofweek.melt$value <- as.numeric(dayofweek.melt$value)
```


```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(dayofweek.melt, aes(x = day, y = value, fill = variable)) +
  geom_bar(colour = "535353", stat = "identity", position = "dodge", alpha = 0.80) +
  fte +
  ggtitle("Distance and Speed by Day") +
  xlab("Day") +
  ylab("Average Miles (per Hour)") +
  theme(axis.text.x = element_text(angle = 90)) +
  guides(fill = guide_legend(title = "Variable"))
```

Speed doesn't vary day to day, but Distance definitely does: longer rides on the weekends, shorter rides on weekdays. Monday has slightly longer rides, likely because Monday is sometimes part of a 3 day weekend.

### Over Time

```{r echo = FALSE, warning = FALSE, message = FALSE}
overtime <- bikes %>%
  select(Ride.Date, Distance..miles., Altitude.Gain..ft., Total.Time, Avg.Speed, Ride.Type) %>%
  mutate(
    cumDist = cumsum(Distance..miles.),
    cumAlt = cumsum(Altitude.Gain..ft.),
    cumTime = cumsum(Total.Time)
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(overtime, aes(x = Ride.Date, y = cumDist)) +
  geom_line(size = 1, colour = "#BA0C2E") +
  fte +
  ggtitle("Distance over the Year") +
  xlab("Date") +
  scale_y_continuous(name = "Distance (miles)", labels = comma) +
  annotate("text", x = as.Date("2017-11-30"), y = 4900, label = "Total Distance\n4,806.1 mi", size = 3.5, fontface = "bold", colour = "#BA0C2E") +
  annotate("point", x = as.Date("2017-12-31"), y = 4806.1, colour = "#BA0C2E", size = 2)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(overtime, aes(x = Ride.Date, y = cumTime)) +
  geom_line(size = 1, colour = "#002F6C") +
  fte +
  ggtitle("Cumulative Time over the Year") +
  xlab("Date") +
  ylab("Time (hours)") +
  annotate("text", x = as.Date("2017-11-30"), y = 335, label = "Total Time\n328.88 hr", size = 3.5, fontface = "bold", colour = "#002F6C") +
  annotate("point", x = as.Date("2017-12-31"), y = 328.8842, colour = "#002F6C", size = 2)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(overtime, aes(x = Ride.Date, y = cumAlt)) +
  geom_line(size = 1, colour = "#006203") +
  fte +
  ggtitle("Altitude Gained over the Year") +
  xlab("Date") +
  scale_y_continuous(name = "Altitude Gained (feet)", label = comma) +
  annotate("text", x = as.Date("2017-11-30"), y = 192000, label = "Total Altitude\n188,473 ft", size = 3.5, fontface = "bold", colour = "#006203") +
  annotate("point", x = as.Date("2017-12-31"), y = 188473, colour = "#006203", size = 2)
```

### Altitude Gain and Distance

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikes, aes(x = Distance..miles., y = Altitude.Gain..ft.)) +
  geom_point(size = 2, alpha = 0.50) +
  fte +
  ggtitle("Altitude versus Distance") +
  xlab("Distance (miles)") +
  scale_y_continuous(name = "Altitude Gained (feet)", label = comma) +
  geom_hline(yintercept = 0, size = 1.2, colour = "#535353") +
  annotate("text", x = 66, y = 4250, label = "8/5/17\n4,209 ft", size = 3.5, fontface = "bold", colour = "chocolate1") +
  annotate("text", x = 85, y = 2750, label = "10/1/17\n84.3 mi", size = 3.5, fontface = "bold", colour = "chocolate1")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
fm00 <- lm(Altitude.Gain..ft. ~ Distance..miles., data = bikes)
summary(fm00)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikes, aes(x = Distance..miles., y = Altitude.Gain..ft.)) +
  geom_point(size = 2, alpha = 0.50) +
  fte +
  ggtitle("Altitude versus Distance") +
  xlab("Distance (miles)") +
  scale_y_continuous("Altitude Gained (feet)", label = comma) +
  geom_hline(yintercept = 0, size = 1.2, colour = "#535353") +
  annotate("text", x = 66, y = 4250, label = "8/5/17\n4,209 ft", size = 3.5, fontface = "bold", colour = "chocolate1") +
  annotate("text", x = 85, y = 2750, label = "10/1/17\n84.3 mi", size = 3.5, fontface = "bold", colour = "chocolate1") +
  geom_smooth(method = "lm", formula = y~x, se = T, colour = "chocolate1", size = 1)
```

## Year-by-year Comparison

### Year-by-year Overview

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikes2017 <- read.csv("stransky_2017_journal.csv")
bikes2016 <- read.csv("stransky_2016_journal.csv")
bikes2015 <- read.csv("stransky_2015_journal.csv")
bikes2014 <- read.csv("stransky_2014_journal.csv")
bikes2013 <- read.csv("stransky_2013_journal.csv")
bikes2012 <- read.csv("stransky_2012_journal.csv")

bikes2017$Ride.Date <- mdy(bikes2017$Ride.Date)
bikes2016$Ride.Date <- mdy(bikes2016$Ride.Date)
bikes2015$Ride.Date <- mdy(bikes2015$Ride.Date)
bikes2014$Ride.Date <- mdy(bikes2014$Ride.Date)
bikes2013$Ride.Date <- mdy(bikes2013$Ride.Date)
bikes2012$Ride.Date <- mdy(bikes2012$Ride.Date)

bikes2017 <- bikes2017 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )

bikes2016 <- bikes2016 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )

bikes2015 <- bikes2015 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )

bikes2014 <- bikes2014 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )

bikes2013 <- bikes2013 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )

bikes2012 <- bikes2012 %>%
  mutate(
    hr = as.numeric(str_sub(Ride.Time..hh.mm.ss., 1, 1)),
    min = as.numeric(str_sub(Ride.Time..hh.mm.ss., 3,4)),
    sec = as.numeric(str_sub(Ride.Time..hh.mm.ss., 6,7))
  ) %>%
  mutate(
    Total.Time = hr + min / 60 + sec / 3600,
    Avg.Speed = Distance..miles. / Total.Time
  ) %>%
  mutate(
    cumTime = cumsum(Total.Time),
    cumDist = cumsum(Distance..miles.),
    cumAltGain = cumsum(Altitude.Gain..ft.)
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikestotal <- rbind(bikes2012, bikes2013, bikes2014, bikes2015, bikes2016, bikes2017)
bikestotal <- filter(bikestotal, Distance..miles. != 0)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikestotal$Year <- format(bikestotal$Ride.Date, "%Y")
byyear <- bikestotal %>%
  group_by(Year) %>%
  summarise(
    n = n(),
    AltGain = mean(Altitude.Gain..ft.),
    Speed = mean(Avg.Speed),
    Distance = mean(Distance..miles.),
    TotAltGain = n * AltGain,
    TotDistance = n * Distance
  )
```

```{r results = "hide", echo = FALSE, warning = FALSE, message = FALSE}
mean(byyear$n)
mean(byyear$TotDistance)
mean(byyear$TotDistance) / mean(byyear$n)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(byyear, aes(x = Year, group = 1)) +
  geom_line(aes(y = n, colour = "Rides"), size = 1) +
  geom_line(aes(y = TotDistance / 35, colour = "Distance"), size = 1) +
  geom_point(aes(y = n, colour = "Rides"), size = 2) +
  geom_point(aes(y = TotDistance / 35, colour = "Distance"), size = 2) +
  scale_y_continuous(sec.axis = sec_axis(~.*35, name = "Distance (miles)", labels = comma)) +
  scale_colour_manual(values = c("#BA0C2E", "#002F6C")) +
  ggtitle("Ride Frequency and Distance 2012-2017") +
  xlab("Year") + ylab("Rides") +
  withborderfte +
  guides(colour = guide_legend(title = "Variable"))
```

Seems like he's going on longer rides (the number of rides used to outpace the distance, but that hasn't been the case over the past 3 years). Let's see if this is true based on average distance.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(byyear, aes(x = Year, y = Distance)) +
  geom_bar(aes(fill = Year), stat = "identity", colour = "#535353", alpha = 0.80) +
  fte +
  ggtitle("Average Distance by Year") +
  xlab("Year") + ylab("Average Distance (miles)") +
  theme(legend.position = "none")
```

Average ride distance was almost identitical over the first 3 years and has definitely decreased in recent years.

Here's an alternative barplot visualization of Ride Frequency and Distance.

```{r echo = FALSE, warning = FALSE, message = FALSE}
byyearmelt <- select(byyear, Year, n, TotDistance)
byyearmelt$TotDistance <- byyearmelt$TotDistance / 35
byyearmelt <- melt(byyearmelt, id.vars = "Year")
byyearmelt$value <- as.numeric(byyearmelt$value)
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(byyearmelt, aes(x = Year, y = value, fill = variable)) +
  geom_bar(colour = "#535353", stat = "identity", position = "dodge", alpha = 0.80) +
  scale_y_continuous(sec.axis = sec_axis(~.*35, name = "Distance (mi)", labels = comma)) +
  scale_fill_manual(values = c("#F8766D", "#00BFC4"), labels = c("Rides", "Distance")) +
  ggtitle("Ride Frequency and Distance 2012-2017") +
  xlab("Year") + ylab("Rides") +
  fte +
  guides(fill = guide_legend(title = "Variable"))
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikestotal$MonthDay <- format(bikestotal$Ride.Date, "%d-%b")
bikestotal$MonthDay <- ydm(paste0(2012, bikestotal$MonthDay))
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikestotal, aes(x = MonthDay, y = cumDist)) +
  geom_line(aes(colour = Year), size = 1) +
  scale_x_date(
    name = "Date",
    date_breaks = "1 month",
    date_labels = "%b"
  ) +
  fte +
  ggtitle("Distance over the Year 2012-2017") +
  scale_y_continuous(name = "Distance (miles)", labels = comma) +
  annotate("point", x = as.Date("2012-12-31"), y = 6306.18, colour = "#F8766D", size = 2) +
  annotate("point", x = as.Date("2012-12-28"), y = 3991.30, colour = "#00BA38", size = 2) +
  annotate("text", x = as.Date("2012-11-01"), y = 6300, label = "2012\n6,306.18 mi", colour = "#F8766D", size = 3.5, fontface = "bold") +
  annotate("text", x = as.Date("2012-12-20"), y = 3550, label = "2014\n3,991.30 mi", colour = "#00BA38", size = 3.5, fontface = "bold") +
  geom_hline(yintercept = 5200, lty = 2, colour = "#535353", size = 0.75) +
  annotate("text", x = as.Date("2012-02-14"), y = 5000, label = "Target: 5,200 mi", colour = "#535353", fontface = "bold", size = 3.5)
```

The highest total distance occurred in 2012, while the lowest total distance occurred in 2014. 2017 was about par for the case, with total distance around 4800 miles in 2013, 2015, and 2017.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikestotal, aes(x = MonthDay, y = cumAltGain)) +
  geom_line(aes(colour = Year), size = 1) +
  scale_x_date(
    name = "Date",
    date_breaks = "1 month",
    date_labels = "%b"
  ) +
  fte +
  ggtitle("Altitude Gained over the Year 2012-2017") +
  theme(axis.text.y = element_text(angle = 60)) +
  scale_y_continuous(name = "Altitude Gained (feet)", labels = comma)
```

Altitude gained follows a similar trend to the distance gained. Let's take a closer look at the highest distance year, lowest distance year, and last year.

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikestotfilt <- bikestotal %>%
  filter(Year %in% c(2012, 2014, 2017))

ggplot(bikestotfilt, aes(x = MonthDay, y = cumDist, colour = Year)) +
  geom_line(aes(colour = Year), size = 1) +
  scale_x_date(
    name = "Date",
    date_breaks = "1 month",
    date_labels = "%b"
  ) +
  fte +
  ggtitle("Distance Gained 2012, 2014, 2017") +
  scale_y_continuous(name = "Distance (miles)", labels = comma) +
  facet_grid(facets = Year ~ ., margins = F)
```

He was on an even slower pace than he was in 2014 headed into the summer months, but made up a lot of ground and finished around the average mileage total. The mileage gain in 2012 was consistently paced throughout.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikestotfilt, aes(x = MonthDay, y = Avg.Speed, colour = Year)) +
  geom_line(aes(colour = Year), size = 0.75) +
  scale_x_date(
    name = "Date",
    date_breaks = "1 month",
    date_labels = "%b"
  ) +
  fte +
  ggtitle("Average Speed 2012, 2014, 2017") +
  scale_y_continuous(name = "Speed (mph)") +
  facet_grid(facets = Year ~ ., margins = F)
```

Although 2012 was by far the best year in terms of distance gained, the average speeds were a lot more erratic. Much more consistent riding in 2017.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikestotal, aes(x = Year, y = Distance..miles.)) +
  geom_boxplot(colour = "#002F6C") +
  coord_flip() +
  fte +
  ggtitle("Ride Distance Comparison 2012-2017") +
  ylab("Distance (miles)")
```

2016 consisted of much longer rides than any other year (shown by the higher median and longer upper whisker than other years). The individual points show outliers (greater than 1.5*IQR higher than the 75th percentile). The plot shows that the large total distance in 2016 is explained by longer rides, while the large total distance in 2012 is explained by a greater volume of bike rides.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(bikestotal, aes(x = Year, y = Distance..miles.)) +
  geom_violin(colour = "#002F6C") +
  geom_boxplot(colour = "#002F6C", alpha = 0.75) +
  coord_flip() +
  fte +
  ggtitle("Ride Distance Comparison 2012-2017") +
  ylab("Distance (miles)")
```

An alternative representation that proves the hypothesis above. The violin plots show the relative density of points. In 2012, there's a greater density of shorter-distance rides, while in 2016, the density is higher around the median.

### Ride Type by Year

```{r echo = FALSE, warning = FALSE, message = FALSE}
ridetype <- bikestotal %>%
  group_by(Year, Ride.Type) %>%
  summarise(
    n = n()
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(ridetype, aes(x = Year, y = n, fill = Ride.Type)) +
  geom_bar(colour = "#535353", stat = "identity", position = "dodge", alpha = 0.80) +
  fte +
  ggtitle("Ride Type Frequency by Year") +
  ylab("Number of Rides") +
  guides(fill = guide_legend(title = "Ride Type"))
```

An overwhelming majority of the rides are Rolling Road rides over the past 6 years. Last year, no off-road rides were recorded and only one Hills Road ride has been recorded in the past 4 years.

### Bike Type

Finally, we'll look at the bikes used for the road rides over the past 6 years.

```{r echo = FALSE, warning = FALSE, message = FALSE}
biketyperoad <- bikestotal %>%
  filter(Ride.Type %in% c("Road: Flat", "Road: Hills", "Road: Rolling")) %>%
  group_by(Bike) %>%
  summarise(
    n = n(),
    Altitude = mean(Altitude.Gain..ft.),
    Distance = mean(Distance..miles.),
    Speed = mean(Avg.Speed)
  )

biketyperoad$BikeTrunc <- c("Masi FG", "Langster", "Rockhopper", "Roubaix", "Tri COMP", "Tri SPORT", "Centurion", "Masi LTD FG", "Tri EXP", "Tri SING")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(biketyperoad, aes(x = BikeTrunc, y = n, fill = BikeTrunc)) +
  geom_bar(stat = "identity", colour = "#535353", alpha = 0.80) +
  fte +
  ggtitle("Bike Frequency of Use") +
  xlab("Bike") +
  ylab("Number of Rides") +
  theme(axis.text.x = element_text(vjust = 0.4, angle = 90)) +
  theme(legend.position = "none")
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(biketyperoad, aes(x = BikeTrunc, y = Speed, fill = BikeTrunc)) +
  geom_bar(stat = "identity", colour = "#535353", alpha = 0.80) +
  fte +
  ggtitle("Bike Average Speed") +
  xlab("Bike") +
  ylab("Average Speed (mph)") +
  theme(axis.text.x = element_text(vjust = 0.4, angle = 90)) +
  theme(legend.position = "none")
```

The 10 bikes used over the past 6 years are shown in the table below. Values for altitude gain, distance, and speed are average values for each bike.

```{r echo = FALSE, warning = FALSE, message = FALSE}
knitr::kable(biketyperoad %>%
               select(Bike, "Rides" = n, "Alt. Gain (ft)" = Altitude, "Distance (mi)" = Distance, "Speed (mph)" = Speed))
```

### Rides by Day

```{r echo = FALSE, warning = FALSE, message = FALSE}
bikestotal$dow <- if_else(
  bikestotal$Day.of.Week == "Sun",
  "Sunday",
  if_else(
    bikestotal$Day.of.Week == "Sat",
    "Saturday",
    "Weekday"
  )
)

byday <- bikestotal %>%
  group_by(Year, dow) %>%
  summarise(
    n = n(),
    Dist = sum(Distance..miles.),
    Alt = sum(Altitude.Gain..ft.),
    avgDist = mean(Distance..miles.)
  )
```

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(byday, aes(x = dow, y = n, fill = Year)) +
  geom_bar(stat = "identity", colour = "#535353", alpha = 0.80, position = "dodge") +
  fte +
  ggtitle("Rides by Day") +
  xlab("Day Type") +
  ylab("Rides")
```

A lot more weekday rides in 2012 and 2017 than there were in other years. Pretty consistent Sunday and Saturday riding, with a couple of years exhibiting more than 52 rides on those days.

```{r echo = FALSE, warning = FALSE, message = FALSE}
ggplot(byday, aes(x = dow, y = avgDist, fill = Year)) +
  geom_bar(stat = "identity", colour = "#535353", alpha = 0.80, position = "dodge") +
  fte +
  ggtitle("Distance by Day") +
  xlab("Day Type") +
  scale_y_continuous(name = "Distance (miles)", labels = comma)
```

As expected, a lot more distance is accumulated on the weekends while weekdays are usually shorter rides.

