My interest in fashion inspired me to explore a time series that showcases the development of retail internet sales over time, using historical data.
The Prophet library is a forecasting tool developed by Meta Platform, designed for time series forecasting. When installing the package to use the function prophet, I explored two methods: the default install or the latest version. Results indicated that for the default install, two other packages needed to be downloaded, whilst the latest version needed 6 packages. I used the default install as this worked best with my R code, allowing me to set the foundation for analysing, forecasting and visualising my data :
ds <- internet_reference_tables[,1]
start_date <- as.Date("2006-11-30")
end_date <- as.Date("2026-01-31")
ds_1 <- seq.Date(from = start_date, to = end_date, by = "1 month")
y <- as.numeric(internet_reference_tables[[2]])
#length(ds_1);length(y), to ensure the data frame is of equal dimension
d <- data.frame(ds = ds_1, y = y)Here:
There are numerous parts to the function Prophet. Given my data, I disabled the weekly and daily seasonality, as these patterns are not relevant, and focused on the monthly behaviour in internet sales.
m <- prophet(d,weekly.seasonality=FALSE,daily.seasonality=FALSE)
f <- make_future_dataframe(m, periods = 11)
p <- predict(m, f)The make_future_dataframe function extends the dataset so that the predict function can forecast values up to 11 months, generating sales beyond the observed data.
The plot of ‘the average monthly internet sale’ shows the historical data, forecasted values and prediction interval. The visuals portray the long-term trend, growth/decline of future prediction and uncertainty band.
The prophet_plot-components function plots the components of the prophet forecast:
A linear regression model is used to estimate the long-term trend in internet sales. The plot below highlights that internet sales are increasing.
The plot shows the residuals are not randomly distributed, but there is a gradual negative decrease, followed by a sudden increase with very spread points. This suggests the linear model is not fully capturing the structure of the data.
##
## Call:
## lm(formula = y ~ t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -451.73 -252.97 -36.96 185.27 1462.60
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -193.4518 41.5872 -4.652 5.57e-06 ***
## t 12.3080 0.3108 39.599 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 315 on 229 degrees of freedom
## Multiple R-squared: 0.8726, Adjusted R-squared: 0.872
## F-statistic: 1568 on 1 and 229 DF, p-value: < 2.2e-16
Let the significant figure be 5%
When we test the dataset, we get:
null hypothesis: time series is not stationary
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Augmented Dickey-Fuller Test
##
## data: y
## Dickey-Fuller = -2.6842, Lag order = 6, p-value = 0.2884
## alternative hypothesis: stationary
The p-value is larger than 0.05, so we have insufficient evidence to reject Ho, which indicates the time series is not stationary.
null hypothesis: time series has homoskedasticity
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 13.862, df = 1, p-value = 0.0001967
The p-value is smaller than 0.05, so we have sufficient evidence to reject Ho, which indicates the time series has heteroskedasticity, confirming the results of the residual plot.
null hypothesis: time series is white noise
##
## Box-Ljung test
##
## data: y
## X-squared = 215.76, df = 1, p-value < 2.2e-16
The p-value is much smaller than 0.05, so we have sufficient evidence to reject Ho, which indicates the time series is not white noise.
The optimal lambda for Box-Cox transformation is lambda = -0.09388. Since lambda is close to 0, the transformation behaves similar to the log transformation. This compresses the large values, reducing heteroskedasticity and stabilising variance. This makes the time series more stable for better forecasting.
```