Harlan D. Harris, PhD
CSP, February 2018
* AKA, “Cancelling Classes for Fun and Profit”
Input:
Output:
Observations: 1,008
Variables: 6
$ t <int> 1, 2, 3, 4, 5, 6, 7,...
$ t_mo <dbl> 1, 1, 1, 1, 1, 1, 1,...
$ t_dom <dbl> 1, 2, 3, 4, 5, 6, 7,...
$ bts <lgl> FALSE, FALSE, FALSE,...
$ set <fct> Train, Train, Train,...
$ sales <dbl> 115, 91, 109, 125, 1...
\[ \forall y, P(\text{total sales}=y \ \vert\ \text{frac of month}=z, \text{sales to date}=x) \propto \\ P(\text{sales to date}=x \ \vert\ \text{frac of month}=z, \text{total sales}=y) \cdot {\bf P(\text{total sales}=y) } \]
Location: \[ \text{sales} = f(t,x) + N(0, \sigma) \]
plus Scale: \[ \text{sales} = f(t,x) + N(0, g(t,x)) \]
plus Shape: \[ \text{sales} = f(t,x) + ?(0, g(t,x), h(t,x)) \]
“Mathematically speaking, GAM is an additive modeling technique where the impact of the predictive variables is captured through smooth functions which–depending on the underlying patterns in the data–can be nonlinear:”
GAM: The Predictive Modeling Silver Bullet (Kim Larsen, Stichfix, 2015)
Family: c("TF", "t Family")
Fitting method: RS()
Call:
gamlss(formula = log(sales) ~ cs(t_mo) + bts, sigma.formula = ~t_mo,
nu.formula = ~t_mo, family = TF, data = monthly_training,
control = gamlss.control(trace = FALSE))
Mu Coefficients:
(Intercept) cs(t_mo) btsTRUE
8.27662 0.01552 0.25989
Sigma Coefficients:
(Intercept) t_mo
-2.755428 0.007908
Nu Coefficients:
(Intercept) t_mo
2.1004 0.3959
Degrees of Freedom for the fit: 9.999 Residual Deg. of Freedom 23
Global Deviance: -78.6846
AIC: -58.6858
SBC: -43.7216
# Simplified:
nd <- data.frame(t_mo=c(15,36),
bts=c(FALSE, TRUE))
with(predictAll(fit, newdata=nd),
dTF(log(2000:15000),
mu=mu,
sigma=sigma,
nu=nu))
\[ \forall y, P(\text{sales}=y \ \vert\ \text{frac of month}=z, \text{sales to date}=x) \propto \\ P(\text{sales to date}=x \ \vert\ \text{frac of month}=z, \text{sales}=y) \cdot P(\text{sales}=y) \\ \text{or} \\ {\bf P(\text{fraction sales to date}=\frac{x}{y}|\text{fraction of month}=z)} \cdot P(\text{sales} = y) \]
Family: c("BEINF0", "Beta Inflated zero")
Fitting method: RS()
Call:
gamlss(formula = prop_sales ~ cs(prop_month), sigma.formula = ~cs(prop_month),
nu.formula = ~cs(prop_month), family = BEINF0,
data = daily_training, control = gamlss.control(trace = FALSE))
Mu Coefficients:
(Intercept) cs(prop_month)
-2.614 4.886
Sigma Coefficients:
(Intercept) cs(prop_month)
-2.57953 -0.01477
Nu Coefficients:
(Intercept) cs(prop_month)
-2.328 -44.281
Degrees of Freedom for the fit: 15 Residual Deg. of Freedom 909
Global Deviance: -3946.15
AIC: -3916.15
SBC: -3843.72
\[ \forall y, P(\text{sales}=y \vert \text{sales to date}=x) \propto \\ P(\text{sales to date}=x | \text{frac of month}=z, \text{sales}=y) \cdot \\ P(\text{sales}=y) \]
Algorithm:
\[ log(y) = s(time) + seasonality + holiday + changepoints + error \]
Harlan D. Harris, PhD
Twitter,
Medium,
GitHub: @harlanh
This Presentation on GitHub: https://github.com/HarlanH/gamlss_accum
I work at WayUp: https://www.wayup.com/profile/Harlan-Harris-7864660d87/
\[ X_t = c + \sum_{i=1}^p \phi_i X_{t-1} + \epsilon_t \]