This report analyses and forecasts a retail sales time series using the Prophet package in R. The dataset contains historical monthly retail sales values from 01/01/1992 to 01/05/2016. The goal using this data is to study the main features of the series, especially trend and seasonality, and then use Prophet to generate forecasts for future months.
This dataset is a good choice for the time series analysis as it contains 293 monthly observations and shows visible changes over time. Furthermore, retail sales data is also linked to consumer behaviour, business activity, and seasonal shopping patterns.
The data used in this analysis was provided in a CSV file called
example_retail_sales.csv. The file was placed in the
data/ folder of the project.
The data contains the two column names required by Prophet:
ds for the date variable and y for the
observed value of sales.
## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Rows: 293 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): y
## date (1): ds
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Converts the date column into Date format
retail_sales_data <- retail_sales_data %>%
mutate(ds = as.Date(ds))
# Checks the structure of the data
str(retail_sales_data)## tibble [293 × 2] (S3: tbl_df/tbl/data.frame)
## $ ds: Date[1:293], format: "1992-01-01" "1992-02-01" ...
## $ y : num [1:293] 146376 147079 159336 163669 170068 ...
## # A tibble: 6 × 2
## ds y
## <date> <dbl>
## 1 1992-01-01 146376
## 2 1992-02-01 147079
## 3 1992-03-01 159336
## 4 1992-04-01 163669
## 5 1992-05-01 170068
## 6 1992-06-01 168663
We can visualise the full time series to understand its overall behaviour.
# Basic plot of retail sales over time
plot(retail_sales_data$ds, retail_sales_data$y, type = "l",
main = "Retail Sales Over Time",
xlab = "Date", ylab = "Retail Sales",
col = "blue", lwd = 2)The plot shows a clear upward trend over time. It also shows a repeating seasonal pattern, with large peaks appearing regularly each year. The peaks near the end of each year are likely linked to stronger seasonal spending, such as holiday shopping for Christmas.
It is useful to look more closely at the series using a scatter and line plot and also fit a simple linear regression to understand the long-run direction of the data.
# Add a time index for a simple regression model
retail_sales_data$time_index <- 1:nrow(retail_sales_data)
# Fit a linear regression model
retail_sales_trend_model <- lm(y ~ time_index, data = retail_sales_data)
summary(retail_sales_trend_model)##
## Call:
## lm(formula = y ~ time_index, data = retail_sales_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67104 -12468 -616 12476 76877
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 165792.04 2875.05 57.67 <2e-16 ***
## time_index 974.01 16.95 57.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24540 on 291 degrees of freedom
## Multiple R-squared: 0.919, Adjusted R-squared: 0.9187
## F-statistic: 3301 on 1 and 291 DF, p-value: < 2.2e-16
# Add fitted values from the linear model
retail_sales_data$linear_fit <- predict(retail_sales_trend_model)
# Plot actual data and fitted linear trend
plot(retail_sales_data$ds, retail_sales_data$y, type = "l",
main = "Retail Sales with Linear Trend",
xlab = "Date", ylab = "Retail Sales",
col = "darkblue", lwd = 2)
lines(retail_sales_data$ds, retail_sales_data$linear_fit,
col = "red", lwd = 2, lty = 2)
legend("topleft", legend = c("Retail sales", "Linear trend"),
col = c("darkblue", "red"), lty = c(1, 2), lwd = 2, bty = "n")The regression confirms that the series has a positive long-term trend. However, the linear trend line does not capture the seasonal peaks and troughs, so a more flexible time series model will be needed.
Now the Prophet model is fitted to the data.
# Fit the Prophet model
retail_sales_model <- prophet(retail_sales_data[, c("ds", "y")], yearly.seasonality = TRUE)## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Prophet is useful here as it separates the time series into trend, seasonality, and uncertainty. This is suitable for retail sales as the data clearly has repeated yearly patterns.
We then create future data frames to forecast retail sales for the next 12 months and the next 36 months.
# Create a future dataframe with 12 additional months
future_1yr <- make_future_dataframe(retail_sales_model, periods = 12, freq = "month")
# Create a future dataframe for the next 36 months
future_3yr <- make_future_dataframe(retail_sales_model, periods = 36, freq = "month")
# Display the tail of the future dataframes
tail(future_1yr)## ds
## 300 2016-12-01
## 301 2017-01-01
## 302 2017-02-01
## 303 2017-03-01
## 304 2017-04-01
## 305 2017-05-01
## ds
## 324 2018-12-01
## 325 2019-01-01
## 326 2019-02-01
## 327 2019-03-01
## 328 2019-04-01
## 329 2019-05-01
We use the predict function to forecast future values
and then plot the results.
# Make the 1 year forecast
forecast_1yr <- predict(retail_sales_model, future_1yr)
# Make the 3 year forecast
forecast_3yr <- predict(retail_sales_model, future_3yr)
# View the tail of the forecast data
tail(forecast_1yr[, c("ds", "yhat", "yhat_lower", "yhat_upper")])## ds yhat yhat_lower yhat_upper
## 300 2016-12-01 517377.2 506410.0 527334.2
## 301 2017-01-01 444451.2 434095.9 453648.0
## 302 2017-02-01 439215.3 429054.1 449655.4
## 303 2017-03-01 479705.0 469784.9 489567.1
## 304 2017-04-01 474713.8 464121.2 485818.9
## 305 2017-05-01 492035.7 481355.5 502495.4
## ds yhat yhat_lower yhat_upper
## 324 2018-12-01 555420.5 541374.4 568775.5
## 325 2019-01-01 475254.1 462767.9 488706.8
## 326 2019-02-01 476942.0 463557.8 491156.8
## 327 2019-03-01 510123.0 496397.3 525110.9
## 328 2019-04-01 507619.7 492286.6 521766.0
## 329 2019-05-01 525760.2 510628.7 540788.7
1 Year Forecast
The 1 year forecast continues the general upward movement in retail sales and preserves the strong yearly seasonal pattern. The model predicts that the largest values will continue to occur near the end of the year, which is consistent with the historical data. The uncertainty interval is present but still fairly controlled over this short horizon, so the short-term forecast appears reasonable.
3 Year Forecast
The 3 year forecast also shows continued growth together with repeated seasonal peaks and troughs. However, the uncertainty interval becomes wider as the forecast horizon increases. This means that while Prophet can project the general structure of the series, the exact future values become less certain further into the future. This is expected as retail sales can be affected by changes in inflation, income, business conditions, and consumer confidence.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the prophet package.
## Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Trend Component:
The trend component shows a strong long-term increase in retail sales over the sample period. This suggests that average spending rises over time, which may reflect population growth, economic expansion, inflation, or changes in consumption patterns. There are also some periods where the slope of the trend changes, showing that growth is not completely constant.
Yearly Seasonality:
The yearly seasonality graph shows a clear repeating pattern within each year. The strongest positive seasonal effects appear near the end of the calendar year, which is consistent with higher retail spending during the holiday period. There are also lower points earlier in the year when consumer spending is usually weaker.
The size of the seasonal peaks appears to grow as the level of the series increases. This suggests that the variability may rise with the level of the data. A log transformation can sometimes help stabilise this.
# Create a log-transformed version of the series
retail_sales_log_data <- retail_sales_data[, c("ds", "y")]
retail_sales_log_data$y <- log(retail_sales_log_data$y)
# Fit Prophet on the log-transformed data
retail_sales_log_model <- prophet(retail_sales_log_data, yearly.seasonality = TRUE)## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Forecast 12 months ahead on the log scale
future_log_1yr <- make_future_dataframe(retail_sales_log_model, periods = 12, freq = "month")
forecast_log_1yr <- predict(retail_sales_log_model, future_log_1yr)The log transformed model produces a smoother series on the transformed scale. This suggests that taking logs might be useful when the variation increases as the level of the series rises.
The results from the graphs suggest that retail sales are influenced by both long-run growth and strong yearly seasonality trends. The upward trend indicates that the average level of retail sales has increased substantially over time. The yearly seasonal pattern shows that some months consistently have stronger performance than others, such as, at the end of the year.
Prophet makes these components easy to separate and interpret. This helps to explain what the forecast is and why the forecast takes its particular shape. At the same time, the model is still limited because it only uses past values of the series. It does not directly include external factors such as inflation, recessions, interest rates, policy changes or financial crashes that can explain the sudden dip around 2008.
In this project, I used the Prophet model to analyse and forecast a monthly retail sales time series. The data showed and helped to draw two important features which were a strong upward trend and a clear yearly seasonal pattern. Prophet was able to capture both features and provide forecasts for the next 1 year and 3 years.The short-term forecast looked more reliable than the longer-term forecast because the uncertainty interval widened over time. The component plots gave useful insight into the main structure of the data, especially the strong end-of-year seasonal effect. I also explored a log transformation as an extension, which helped show how changing the scale can affect the behaviour of the model.
Overall, this project showed that Prophet is a useful and accessible tool for practical time series forecasting. It works especially well when the data contains clear trend and seasonal structure.