Introduction

-I am a middle aged person. From December 17th, 2024 to January 20th 2025, I walked anywhere from 2.1 to 4.0 miles once per day on a treadmill and recorded the distance and time to explore whether or not consistent effort over a span of time could improve walking speed. - In this presentation, we explore Simple Linear Regression using these walking data. - We analyze how walking speed changes over time and if there is any hope of improvement for those of (slightly) advanced age.

Data Overview

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Create data
days <- 1:35
distanceWalked <- c(2.1,2.2,2.2,2.2,2.2,2.4,2.2,3.1,2.2,2.3,2.3,2.3,2.5,
                    2.3,2.3,2.3,3.3,2.3,2.3,4.0,2.3,2.7,2.3,2.3,4.0,2.3,
                    2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.4)
timeWalked <- c(35.7,36.517,32.983,34.5,32.983,37.3,33.017,55.783,32.217,33.5,
                34.85,34.283,40.45,35.5,33.133,33.3,54.25,33.8,33.02,64.917,
                33.033,45.817,33.033,33.517,59.517,39.75,33.3,32.783,32.517,
                38.633,33.467,33.5833,33.77,33.2,34.833)

# Compute walking speed (distance per hour)
hrs <- timeWalked / 60
speed <- distanceWalked / hrs
walkingDf <- data.frame(days, speed)

Scatter Plot

ggplot(walkingDf, aes(x = days, y = speed)) +
  geom_point(color = 'blue') +
  geom_smooth(method = 'lm', color = 'red') +
  labs(title = "Walking Speed Over Time", x = "Days", y = "Speed (mi/h)")
## `geom_smooth()` using formula = 'y ~ x'

Regression Model

  • The simple linear regression model is:

\[ Speed = \beta_0 + \beta_1 \times Days + \epsilon \]

  • We estimate the coefficients using least squares estimation.
model <- lm(speed ~ days, data = walkingDf)
summary(model)
## 
## Call:
## lm(formula = speed ~ days, data = walkingDf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.54487 -0.11091  0.07134  0.17941  0.24295 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.793230   0.082586  45.931   <2e-16 ***
## days        0.008590   0.004001   2.147   0.0393 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2391 on 33 degrees of freedom
## Multiple R-squared:  0.1225, Adjusted R-squared:  0.09595 
## F-statistic: 4.609 on 1 and 33 DF,  p-value: 0.03925

3D Plotly Visualization

-Here we plot the distance walked and the time it took and the days on three axes to demonstrate that most distances were around 2.2 to 2.3 miles.

fig <- plot_ly(
  data = walkingDf, 
  x = ~days, 
  y = ~distanceWalked, 
  z = ~timeWalked, 
  type = "scatter3d", 
  mode = "markers",
  marker = list(size = 5, color = ~distanceWalked, colorscale = "Viridis", opacity = 0.8)
)
fig <- fig %>% layout(
  title = "3D Plot of Walking Data",
  scene = list(
    xaxis = list(title = "Days"),
    yaxis = list(title = "Distance Walked (miles)"),
    zaxis = list(title = "Time Walked (minutes)")
  )
)
fig

Hypothesis Testing

  • We test whether \(\beta_1\) is significantly different from zero.
  • The null hypothesis, no improvement, is:

\[ H_0: \beta_1 = 0 \]

  • The alternative hypothesis, speed increased over time and practice, is:

\[ H_A: \beta_1 \neq 0 \]

  • If the p-value is small, we reject \(H_0\) and conclude that days significantly impact walking speed and that there is hope of improvement with dillegent and consistent practice.
summary(model)$coefficients
##                Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept) 3.793229705 0.082586238 45.930530 1.727732e-31
## days        0.008589927 0.004001316  2.146775 3.925447e-02

Residual Plot

ggplot(walkingDf, aes(x = model$fitted.values, y = model$residuals)) +
  geom_point(color = 'blue') +
  geom_hline(yintercept = 0, linetype = 'dashed', color = 'red') +
  labs(title = "Residual Plot", x = "Fitted Values", y = "Residuals")

Conclusion

  • We analyzed how walking speed changes over time.
  • The linear regression model suggests a relationship between days and speed.
  • Statistical testing showed whether this relationship is significant.

R Code

plot(walkingDf$days, walkingDf$speed, main = "Walking Speed Over Time",
     xlab = "Days", ylab = "Speed (km/h)", col = "blue", pch = 16)
abline(model, col = "red")

Thank You!

  • This presentation was generated using R Markdown and ioslides.
  • Published on RPubs.