This html file will help answer the question, how many steps did I take given that I biked “x” amount of miles.

\[ meter~development = \frac{chainring}{cog} \times radius~ratio \] \[ radius~ratio = \frac{wheel~radius}{crank~radius} \]




packages needed

library(dplyr)
library(rstanarm)
library(ggplot2)
library(bayesplot)
library(tidybayes)
library(broom)
library(knitr)




Information about bike needed

Cogs: are each sprocket within a bike rear cassette that become the rear gears.
  • Most cogs come in sets of 8 through 12. In this markdown we assume an 11 speed cassette which is a popular set of gears. However, cog sizes, refered to as tooth, vary and you will have to identify your cassette size and cog sizes.
Here is an example of a set of cogs.

Here is an example of a set of cogs.

We will create a vector containing the “tooth” sizes for each cog in sequential order.
  • For an 11 speed 11-46t our cogs are of sizes;
cogs <- c(11, 13, 15, 18, 21, 24, 28, 32, 36, 40, 46)


Next we identify the size of the chainring up front which is your front bike gear.
Here is an example of a front chainring with the popular two up front.

Here is an example of a front chainring with the popular two up front.


The exaple above also contains the crank arm, and you will need to know your crank arm length (standard is between 160-175mm in length).
  • Chainring will also be sized in “tooth”, in our case we have a 40t single chainring in the front with a 160mm crank arm. The crank arm length will become our radius.
chain_ring <- 40
crank_length_rad_mm <- 160



The final piece of information needed is your wheel diameter in mm. We assume that your bike has the same diameter wheels on the front and rear.
Here is an example of popular wheel sizes.

Here is an example of popular wheel sizes.

In our case we have a 650b wheel with a 2" tire resulting in a wheel diameter of;
wheel_dia_mm <- 685.60 
wheel_rad_mm <- wheel_dia_mm/2




We take all this information and put it into a function resulting in a dataframe.

The following function assumes that the middle cogs within a cassette are more frequently used. Smaller cogs are the harder gears and larger cogs are the easier gears (for climbing hills).
  • Because we deal in miles, everything is converted from milimeters to inches and meters to yards.
bike_function <- function(x, named.df = "df"){
  
  # conversions
  # 1 mm = .0394 inches
  mm_to_in <- .0393701
  # 1 m = 1.09361 yards
  m_to_yrd <- 1.09361
  # 1,760 yars in a mile
  yrd_in_mile <- 1760
  
  miles = x
  
  df <- tibble(chain_ring, cogs) %>% 
    mutate(
      meter_dev = (chain_ring / cogs) * 
        (wheel_rad_mm / crank_length_rad_mm), # radius ratio
      yrd_dev = meter_dev*m_to_yrd, # conver from meters to yards
      # revolutions it takes to get x miles for each cog
      revolutions = (yrd_in_mile * miles / yrd_dev), 
      possible_steps  = revolutions * 2, # get possible step count for each cog
      centered_cogs = cogs - 24) # we use this centered variable in a model
  
  assign(named.df, df, envir=.GlobalEnv)
  }
  • input total miles that you rode into the function
bike_function(50)
kable(df) # print table
chain_ring cogs meter_dev yrd_dev revolutions possible_steps centered_cogs
40 11 7.790909 8.520216 10328.38 20656.75 -13
40 13 6.592308 7.209414 12206.26 24412.53 -11
40 15 5.713333 6.248158 14084.15 28168.30 -9
40 18 4.761111 5.206799 16900.98 33801.96 -6
40 21 4.080952 4.462970 19717.81 39435.62 -3
40 24 3.570833 3.905099 22534.64 45069.28 0
40 28 3.060714 3.347228 26290.41 52580.83 4
40 32 2.678125 2.928824 30046.19 60092.37 8
40 36 2.380556 2.603399 33801.96 67603.92 12
40 40 2.142500 2.343059 37557.73 75115.47 16
40 46 1.863044 2.037443 43191.39 86382.79 22


If you know that you only used one cog the whole time you rode your bike than you can take the value under possible_steps specific to the cog.










Modeling our data:

  • Assumption: We will place a normal distribution as a prior on the cogs since the middle cogs tend to be used more often during a long ride.
    • We also center our cogs so that when our beta = 0 we can interpret our intercept beloging to the cog we think we use the most, in this case, cog # 6, the middle one in our cassette.
  • Here we model the probability of steps we took given our cog sizes for a 50 mile bike ride.
  • We use a bayesian linear model, although we could use an analysis of variance model, to estimate the posterior likelihood of getting x amount of steps for a given cog.
fit <- stan_glm(possible_steps ~ centered_cogs, data=df,
    family = gaussian(),
    prior = normal(location=0, scale=20), 
    prior_intercept = normal(location=0, scale=50000),
    iter = 2000,
    warmup = 500,
    thin = 1,
    cores = 4)

summary(fit)


Model results and fit

tidy(fit)
## # A tibble: 2 x 3
##   term          estimate std.error
##   <chr>            <dbl>     <dbl>
## 1 (Intercept)     45069.  0.00147 
## 2 centered_cogs    1878.  0.000134
Test characteristics of the outcome variable from all replications

-means from each replication is plotted as a histogram against the mean of the observed variable.

# the dark blue line represents our data.
pp_check(fit, "stat")

Examining the mean and standard deviation for replications and observed data. Our observe mean and sd falls near the center. Evidence that our model fits the data.
pp_check(fit, "stat_2d")

Since our predictor, cogs, is centered, when beta = 0 (middle cog), are intercept is the average number of steps from our draws.
draws <- tidy_draws(fit, `(Intercept)`, centered_cogs)
head(draws)
## # A tibble: 6 x 12
##   .chain .iteration .draw `(Intercept)` centered_cogs  sigma accept_stat__
##    <int>      <int> <int>         <dbl>         <dbl>  <dbl>         <dbl>
## 1      1          1     1        45069.         1878. 0.0139         0.991
## 2      1          2     2        45069.         1878. 0.0143         0.996
## 3      1          3     3        45069.         1878. 0.0150         0.906
## 4      1          4     4        45069.         1878. 0.0150         0.999
## 5      1          5     5        45069.         1878. 0.0152         1.00 
## 6      1          6     6        45069.         1878. 0.0151         0.949
## # … with 5 more variables: stepsize__ <dbl>, treedepth__ <dbl>,
## #   n_leapfrog__ <dbl>, divergent__ <dbl>, energy__ <dbl>

Examining the draws

draws <- tidy_draws(fit, `(Intercept)`, centered_cogs)
head(draws)
## # A tibble: 6 x 12
##   .chain .iteration .draw `(Intercept)` centered_cogs  sigma accept_stat__
##    <int>      <int> <int>         <dbl>         <dbl>  <dbl>         <dbl>
## 1      1          1     1        45069.         1878. 0.0139         0.991
## 2      1          2     2        45069.         1878. 0.0143         0.996
## 3      1          3     3        45069.         1878. 0.0150         0.906
## 4      1          4     4        45069.         1878. 0.0150         0.999
## 5      1          5     5        45069.         1878. 0.0152         1.00 
## 6      1          6     6        45069.         1878. 0.0151         0.949
## # … with 5 more variables: stepsize__ <dbl>, treedepth__ <dbl>,
## #   n_leapfrog__ <dbl>, divergent__ <dbl>, energy__ <dbl>
mean(draws$`(Intercept)`) 
## [1] 45069.28


Finally our intercept is the best predictor for how many steps were taken given the characteristics of the bike as well as the distance traveled.

fit$coefficients[1]
## (Intercept) 
##    45069.28