This Project aims to demonstrate time series forecasting using Meta’s
Prophet forecasting system in R. Prophet is designed to handle
seasonality, trends, and holiday effects in time series data. In this
example, I use the built-in co2 dataset, which contains
monthly measurements of atmospheric CO₂ levels recorded at the Mauna Loa
Observatory.
## starting httpd help server ... done
Having run ?co2, we can see that CO2 is a dataset that
provides monthly observations on the atmospheric concentrations of CO2
from 1959 to 1997. The format of the data is a time series.
plot(co2, main = "Monthly CO2 Atmospheric Conc. (1959-1997)", ylab = "CO2 Conc. (ppm)", xlab = "Year")For the basic above plot, we see an increasing trend. Across time, the CO2 concentration increases. We now move to adding Prophet’s Forecast…
First, we need to prepare our data. Prophet requires a data frame
with a column for dates (ds) and a column for the data
(y).
## Warning: package 'prophet' was built under R version 4.4.3
## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## ds y
## 1 Jan 1959 315.42
## 2 Feb 1959 316.31
## 3 Mar 1959 316.50
## 4 Apr 1959 317.56
## 5 May 1959 318.13
## 6 Jun 1959 318.00
We will now proceed to explain the above code. CO2, is a built-in R
dataset containing monthly atmospheric CO2 concentrations. We then
extract the time index from the TS. We then convert the extracted number
into a year-month format for Prophet. We then create a dataframe with
two columns: ds and y.
Now, we fit a Prophet model to our data, create future time points, and forecast the future values.
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
The variable m, initialises and fits the Prophet model
using the CO2.df dataframe. The variable f,
extends the dataframe for 8 more periods with a frequency of a quarter.
We then generate predictions and then plot.
Analysis of the Prophet Forecast: The forecasted CO2 concentrations indicate that we expect a continued rise in CO2 levels for the next 8 quarters.
We now run a linear regression to gain an understanding of the growth of the series
##
## Call:
## lm(formula = y ~ ds, data = co2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.250e+03 2.127e+01 -105.8 <2e-16 ***
## ds 1.308e+00 1.075e-02 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
Understanding the above results of the linear regression analysis: The atmospheric CO2 levels have slowly been increasing over time, at a rate of 1.308ppm, suggesting a strong linear relationship and confirming the upward trend of the model. Also, since the p-values are less than 0.001, the coefficients are statistically significant, and relationship between CO2 and time is unlikely to be due to chance. The model explains around 97% of the variation in CO2 levels - meaning it is a very reliable predictor for long term trends.
In this section, we explore the trend, seasonality, and other components of the forecast.
We can decompose the TS into trend, seasonal, and residual components.
From the above decomposition. We have the observed data. We then have
the trend, which shows increasing levels of CO2. The seasonality shows
the natural fluctuations of CO2 levels across the years. Finally, the
residual noise, is what’s left of our TS, after we have removed the
trend, seasonality and cyclic patterns.
A TS is called heteroscedastic if its variance changes through time. In part 1 we plotted the original TS, we observed, that as CO2 levels increased, the spread of the data didn’t necessarily get any larger.
To confirm our suspicions, we run a Breusch-Pagan Test for Heteroscedasticity:
## Warning: package 'lmtest' was built under R version 4.4.3
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 3.6238, df = 1, p-value = 0.05696
Since the p-value obtained is >0.05 this suggests that heteroscedasticity is not present.
Had it been present, we could fix the problem of heteroscedasticity with the idea to transform our data using a function, i.e. instead of considering \[x_t\], look at \[x_t(new) = f(x_t)\] for some suitable function. We can use the Box-Cox transformation.
## Warning: package 'forecast' was built under R version 4.4.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## [1] -0.03432107
From the above plot, we can see that the variance doesn’t change.
Motivated by parsimony, we can use a simple logarithm to stabilise the variance.
The above work, has stabilised the variance.
In this project, we used Meta’s Prophet forecasting system to analyse and forecast CO2 concentration levels.
To begin, our analysis of the data shows a long-term predicted upward trend in CO2 concentration levels, this aligns with the notion of global warming. Human activity, in particular, the burning of fossil fuels, deforestation and farming livestock, all contribute and influence the climate and the levels of CO2.
Having completed a seasonal decomposition, we were able to observe varying levels/fluctuating levels of CO2 throughout the time frame. Suggesting there is some sort of natural/seasonal element at play here.
The project reinforces growing cooncerns in relation to global warming and rising CO2 levels. To improve the project, we could analyse and work with different data sets that could have an impact on CO2 levels. For instance, do an analysis and forecast on deforestation, number of petrol/diesel cars being produced and sold, or the number of cattle farms and whether they are increasing - all these could be factors and variables that could help us better understand the reason behind rising CO2 levels.