A time series is a sequence of data points recorded in chronological order at regular intervals. It is used to track changes over time, identify trends, and make forecasts. Prophet is a forecasting tool developed by Meta that is designed to model time series with strong trend and seasonal components. It allows us to build forecasting models while still providing flexibility for further tuning. The aim of this project is to analyse a time series and produce forecasts for future values. The CO2 dataset is used to identify important features such as long term trends and seasonal patterns. Visualisations and a simple regression model are used to understand how CO2 levels have changed over time, and Meta’s Prophet model is used to generate forecasts for future values.
library(prophet)
library(zoo)
library(plotly)
The co2 dataset records monthly atmospheric carbon dioxide concentrations in parts per million measured at the Mauna Loa Observatory in Hawaii. It is one of the most widely used datasets in time series analysis because it clearly shows real-world patterns.
The data displays two important characteristics. First, there is a long-term upward trend, which indicates that CO2 levels have been steadily increasing over time. Second, there is a repeating seasonal pattern within each year, where CO2 levels rise and fall due to natural environmental processes such as plant growth and seasonal changes.
head(co2)
## Jan Feb Mar Apr May Jun
## 1959 315.42 316.31 316.50 317.56 318.13 318.00
summary(co2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 313.2 323.5 335.2 337.1 350.3 366.8
plot(co2,
main = "Atmospheric CO2 Concentration",
xlab = "Year",
ylab = "CO2 (ppm)")
The time series plot shows a clear and consistent increase in CO2 levels over the observed period. This upward movement indicates the presence of a strong long-term trend, suggesting that the data is influenced by persistent underlying factors rather than short-term random factors.
Alongside the long-term increase, the plot also reveals regular fluctuations within each year. These repeating patterns indicate the presence of seasonality, meaning that the series follows a structured and predictable cycle over time, which is helpful for building an effective forecasting model.
co2_dataframe_temp <- data.frame(
time = time(co2),
value = as.numeric(co2)
)
plot_ly(
co2_dataframe_temp,
x = ~time,
y = ~value,
type = "scatter",
mode = "lines"
)
The interactive chart enhances the analysis by allowing the data to be explored more closely. Users can zoom into specific time periods, making it easier to examine how the series behaves during different intervals and identify patterns that may not be immediately visible in a static plot.
This level of interaction helps confirm the consistency of the observed structure across the dataset. It also allows for a more detailed inspection of fluctuations, so we can get a better understanding of how both long-term and short-term patterns change over time.
Prophet requires a dataframe with two columns:
time_index <- zoo::as.yearmon(time(co2))
co2_dataframe <- data.frame(
ds = as.Date(time_index),
y = as.numeric(co2)
)
head(co2_dataframe)
## ds y
## 1 1959-01-01 315.42
## 2 1959-02-01 316.31
## 3 1959-03-01 316.50
## 4 1959-04-01 317.56
## 5 1959-05-01 318.13
## 6 1959-06-01 318.00
prophet_model <- prophet::prophet(co2_dataframe)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
The Prophet model is fitted to the dataset to capture its underlying structure and generate forecasts. It works by decomposing the time series into different components, such as trend and seasonality, and modelling each of these elements separately.
This approach allows the model to remain flexible and adapt to changes in the data while preserving important patterns. By combining these components, Prophet is able to produce forecasts that reflect both the overall direction and the repeating behaviour observed in the historical series.
future_dates <- prophet::make_future_dataframe(
prophet_model,
periods = 24,
freq = "month"
)
Once the model is fitted, it gives us predicted values for both the historical period and the future dates. These predictions are based on the patterns identified in the data and represent the model’s best estimate of how the series evolves over time.
Including predictions for the historical period allows for a comparison between observed and fitted values. This helps assess how well the model captures the data, while the future predictions provide insight into expected behaviour beyond the available observations.
forecast_results <- predict(prophet_model, future_dates)
head(forecast_results)
## ds trend additive_terms additive_terms_lower additive_terms_upper
## 1 1959-01-01 315.3626 -0.0775880 -0.0775880 -0.0775880
## 2 1959-02-01 315.4469 0.5946394 0.5946394 0.5946394
## 3 1959-03-01 315.5230 1.2325855 1.2325855 1.2325855
## 4 1959-04-01 315.6073 2.4609156 2.4609156 2.4609156
## 5 1959-05-01 315.6888 3.0206586 3.0206586 3.0206586
## 6 1959-06-01 315.7731 2.3515302 2.3515302 2.3515302
## yearly yearly_lower yearly_upper multiplicative_terms
## 1 -0.0775880 -0.0775880 -0.0775880 0
## 2 0.5946394 0.5946394 0.5946394 0
## 3 1.2325855 1.2325855 1.2325855 0
## 4 2.4609156 2.4609156 2.4609156 0
## 5 3.0206586 3.0206586 3.0206586 0
## 6 2.3515302 2.3515302 2.3515302 0
## multiplicative_terms_lower multiplicative_terms_upper yhat_lower yhat_upper
## 1 0 0 314.7755 315.7767
## 2 0 0 315.5499 316.5139
## 3 0 0 316.3031 317.2487
## 4 0 0 317.5969 318.5721
## 5 0 0 318.2367 319.1703
## 6 0 0 317.6593 318.6449
## trend_lower trend_upper yhat
## 1 315.3626 315.3626 315.2850
## 2 315.4469 315.4469 316.0415
## 3 315.5230 315.5230 316.7556
## 4 315.6073 315.6073 318.0682
## 5 315.6888 315.6888 318.7095
## 6 315.7731 315.7731 318.1247
plot(prophet_model, forecast_results)
The forecast plot shows both the observed data and the model’s predictions together. The predicted values seem like a continuation of the historical series, indicating that the model has successfully learned the structure of the data and can continue it into the future.
The shaded region surrounding the forecast line represents uncertainty in the predictions. This interval widens as the forecast moves further ahead, reflecting increased uncertainty over longer horizons, while still maintaining a consistent overall direction in the projected values.
prophet_plot_components(prophet_model, forecast_results)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the prophet package.
## Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
To examine whether variance stabilisation improves the model, we apply a logarithmic transformation.
log_co2_dataframe <- data.frame(
ds = co2_dataframe$ds,
y = log(co2_dataframe$y)
)
log_prophet_model <- prophet::prophet(log_co2_dataframe)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
log_future_dates <- prophet::make_future_dataframe(
log_prophet_model,
periods = 24,
freq = "month"
)
logforecast <- predict(log_prophet_model, future_dates)
plot(log_prophet_model, logforecast)
A log transformation is applied to the data to stabilise variation and create a better model. As the time series gets bigger, modelling the raw values can overstate some of the larger observations.
By transforming the data, the model focuses on proportional changes rather than absolute differences. This can result in more balanced predictions and improve the model’s ability to calculate underlying structure, particularly when variability increases with the size of the series.
A linear regression model is used to estimate the overall trend in the data as a simple benchmark. This approach provides a straightforward way to quantify the relationship between time and CO2 levels.
The results show a strong positive relationship, confirming the upward movement observed in the time series. Although this model is less flexible than Prophet, it supports the findings and provides a useful comparison for understanding the overall direction of the data.
time_index_numeric <- 1:length(co2)
trend_model <- lm(co2 ~ time_index_numeric)
summary(trend_model)
##
## Call:
## lm(formula = co2 ~ time_index_numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.115e+02 2.424e-01 1284.9 <2e-16 ***
## time_index_numeric 1.090e-01 8.958e-04 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
This project analysed the atmospheric CO2 dataset and applied Meta’s Prophet forecasting model to predict future values. The data exhibits a strong upward trend together with seasonal variation. Prophet successfully captures these components and produces forecasts for future CO2 levels.
Additional analysis using log transformations and regression helps further understand the behaviour of the time series. Prophet provides a flexible and powerful approach to time series forecasting and can be applied to many real-world datasets.