logo

Section 1: Meta’s Prophet forecasting system

1.1 Finding my data

My interest in fashion inspired me to explore a time series that showcases the development of retail internet sales over time, using historical data.

1.2 Setting up Time Series

The Prophet library is a forecasting tool developed by Meta Platform, designed for time series forecasting. When installing the package to use the function prophet, I explored two methods: the default install or the latest version. Results indicated that for the default install, two other packages needed to be downloaded, whilst the latest version needed 6 packages. I used the default install as this worked best with my R code, allowing me to set the foundation for analysing, forecasting and visualising my data :

ds <- internet_reference_tables[,1]
start_date <- as.Date("2006-11-30")
end_date <- as.Date("2026-01-31")
ds_1 <- seq.Date(from = start_date, to = end_date, by = "1 month")
y <- as.numeric(internet_reference_tables[[2]])
#length(ds_1);length(y), to ensure the data frame is of equal dimension
d <- data.frame(ds = ds_1, y = y)

Here:

  1. ds_1 creates a monthly data sequence, representing the time column of the data, ds, so Prophet recognises the time structure
  2. y represents the retail internet sales between 2006 and 2026
  3. df combines both into a dataframe, ensuring that both columns have equal length

1.3 Exploring Prophet

There are numerous parts to the function Prophet. Given my data, I disabled the weekly and daily seasonality, as these patterns are not relevant, and focused on the monthly behaviour in internet sales.

m <- prophet(d,weekly.seasonality=FALSE,daily.seasonality=FALSE)

f <- make_future_dataframe(m, periods = 11)

p <- predict(m, f)

The make_future_dataframe function extends the dataset so that the predict function can forecast values up to 11 months, generating sales beyond the observed data.

1.4 Prediction and Plot

The plot of ‘the average monthly internet sale’ shows the historical data, forecasted values and prediction interval. The visuals portray the long-term trend, growth/decline of future prediction and uncertainty band.

The prophet_plot-components function plots the components of the prophet forecast:

  1. A positive trend
  2. Yearly seasonality

Section 2: Exploration of Data

2.1 Linear regression

A linear regression model is used to estimate the long-term trend in internet sales. The plot below highlights that internet sales are increasing.

The plot shows the residuals are not randomly distributed, but there is a gradual negative decrease, followed by a sudden increase with very spread points. This suggests the linear model is not fully capturing the structure of the data.

## 
## Call:
## lm(formula = y ~ t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -451.73 -252.97  -36.96  185.27 1462.60 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -193.4518    41.5872  -4.652 5.57e-06 ***
## t             12.3080     0.3108  39.599  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 315 on 229 degrees of freedom
## Multiple R-squared:  0.8726, Adjusted R-squared:  0.872 
## F-statistic:  1568 on 1 and 229 DF,  p-value: < 2.2e-16

2.3 Hypothesis Testing of Time Series

Let the significant figure be 5%

When we test the dataset, we get:

Dickey-Fuller Test

null hypothesis: time series is not stationary

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
##  Augmented Dickey-Fuller Test
## 
## data:  y
## Dickey-Fuller = -2.6842, Lag order = 6, p-value = 0.2884
## alternative hypothesis: stationary

The p-value is larger than 0.05, so we have insufficient evidence to reject Ho, which indicates the time series is not stationary.

Breusch-Pagan Test

null hypothesis: time series has homoskedasticity

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 13.862, df = 1, p-value = 0.0001967

The p-value is smaller than 0.05, so we have sufficient evidence to reject Ho, which indicates the time series has heteroskedasticity, confirming the results of the residual plot.

Ljung-Box Test

null hypothesis: time series is white noise

## 
##  Box-Ljung test
## 
## data:  y
## X-squared = 215.76, df = 1, p-value < 2.2e-16

The p-value is much smaller than 0.05, so we have sufficient evidence to reject Ho, which indicates the time series is not white noise.

Box-Cox Transformation

The optimal lambda for Box-Cox transformation is lambda = -0.09388. Since lambda is close to 0, the transformation behaves similar to the log transformation. This compresses the large values, reducing heteroskedasticity and stabilising variance. This makes the time series more stable for better forecasting.

```