2026-02-10

3dplotly.knit

The Equation for Linear Regression

\[ y = \beta_0 - \beta_1 x + \epsilon, \quad \text{where } \beta_1 > 0 \] \[ \beta_0 = \text {y-intercept} \] \[ \beta_1 = \text{slope coefficient} \] \[ \epsilon = \text{error term} \]

Best-Fit Line Equation

\[ y = \hat{\beta}_0 + \hat{\beta}_1 x \]

ggplot R code

ggplot(data = carlos_santana_baseballref, aes(x = Age, y = OBP)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "On-base Percentage for Each Age",
    x = "Age",
    y = "On-base Percentage"
  ) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "blue") +
  theme_bw() +
  theme(aspect.ratio = 0.5) +
  coord_fixed()

1st ggplot

2nd ggplot

Plotly Plot

Conclusion

Carlos Santana’s offensive production has continued to decline as the seasons pass by, and he gets older. Although the odds are not zero, it is safe to say he will not be the NL MVP next season.

Source for my dataset