## Loading required package: Rcpp
## Loading required package: rlang
The aim of this project is to explore a time series using Meta’s
Prophet forecasting system. Here we take a time series co2
with 468 observations, monthly from 1959 to 1997. It is based on the,
Mauna Loa Atmospheric \(CO_2\)
Concentration. Atmospheric concentrations of \(CO_2\) are expressed in parts per million
(ppm) and reported in the preliminary 1997 SIO manometric mole fraction
scale.
Here we’ve used zoo::as.yearmon which is a class from
zoo for representing monthly data. It also ensures the
correct format with ds as a date column and y
as the value column.
We fit a base model using the prophet function. Note
that I’ve got 2 models, as the prophet function removes weekly and daily
seasonality by default, so to see their effects too I’ll manually add
that by using weekly.seasonality = TRUE and the likewise
piece of code for daily, in a second model. To begin with these models
will be the same, its only later they will have different effects and
I’ll get to that shortly.
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Now the next part uses the function
make_future_dataframe which takes the model object and a
number of periods to forecast and produces a suitable data frame with a
chosen frequency (ie quarterly, monthly etc.). Here I’ve got two
variables one with a much larger period for future values than the other
so we can see what kind of effect a larger period to forecast has on the
plot and its accuracy. With the period sizes dictated by the part
periods = ... and the frequency by
freq = "..".
future = make_future_dataframe(model, periods=10, freq="quarter")
future2 = make_future_dataframe(model, periods=80, freq="quarter")Then we can use the generic predict function to get our
forecast, taking in our model and future
variables as arguments/inputs. There are 4 different variables here
because of the 2 different models and 2 different sets or future values.
Again 2 of these when plotted will still seem identical because we
haven’t come to the point where it’ll look different. Right now only the
future variables are having an effect, hence the 2 differences. It
becomes clearer further down where we plot the data.
Here we can simply use plot to create plots of the models
by using the model and forecast data frames as arguments. Whilst there
are 4 plots but as aforementioned there is only 2 visually different
plots.
For now let’s talk about the plots below. So in the first plot, it
(atmospheric concentration of \(CO_2\))
generally increases over time, although throughout time its got a lot of
noise, increasing and decreasing zig-zag movement suggesting something
is causing it to change during the year. For example during the year
\(CO_2\) emissions are different, which
would affect the atmospheric concentration. These models are near
identical as the code so far doesn’t really change anything, the
differences are more to be observed with latter code as will be seen,
but what we can see different is the effect of the differing periods for
forecasting. We see the plots with greater forecast periods have larger
unpredictability, as there is a large shaded area for the forecast,
rather than what is mostly a line with very little shaded area for the
plot with less periods to be predicted.
Just as a check we can use head of each our variables, to
see it is displaying the columns as we want in the data frame.
## $growth
## [1] "linear"
##
## $changepoints
## [1] "1960-04-01 GMT" "1961-07-01 GMT" "1962-10-01 GMT" "1964-01-01 GMT"
## [5] "1965-04-01 GMT" "1966-07-01 GMT" "1967-09-01 GMT" "1968-12-01 GMT"
## [9] "1970-03-01 GMT" "1971-06-01 GMT" "1972-09-01 GMT" "1973-12-01 GMT"
## [13] "1975-03-01 GMT" "1976-06-01 GMT" "1977-09-01 GMT" "1978-12-01 GMT"
## [17] "1980-03-01 GMT" "1981-06-01 GMT" "1982-08-01 GMT" "1983-11-01 GMT"
## [21] "1985-02-01 GMT" "1986-05-01 GMT" "1987-08-01 GMT" "1988-11-01 GMT"
## [25] "1990-02-01 GMT"
##
## $n.changepoints
## [1] 25
##
## $changepoint.range
## [1] 0.8
##
## $yearly.seasonality
## [1] "auto"
##
## $weekly.seasonality
## [1] "auto"
## ds
## 1 1959-01-01
## 2 1959-02-01
## 3 1959-03-01
## 4 1959-04-01
## 5 1959-05-01
## 6 1959-06-01
## ds trend additive_terms additive_terms_lower additive_terms_upper
## 1 1959-01-01 315.3537 -0.07753751 -0.07753751 -0.07753751
## 2 1959-02-01 315.4387 0.59487739 0.59487739 0.59487739
## 3 1959-03-01 315.5155 1.23263039 1.23263039 1.23263039
## 4 1959-04-01 315.6005 2.46038945 2.46038945 2.46038945
## 5 1959-05-01 315.6828 3.02022342 3.02022342 3.02022342
## 6 1959-06-01 315.7678 2.35150786 2.35150786 2.35150786
## yearly yearly_lower yearly_upper multiplicative_terms
## 1 -0.07753751 -0.07753751 -0.07753751 0
## 2 0.59487739 0.59487739 0.59487739 0
## 3 1.23263039 1.23263039 1.23263039 0
## 4 2.46038945 2.46038945 2.46038945 0
## 5 3.02022342 3.02022342 3.02022342 0
## 6 2.35150786 2.35150786 2.35150786 0
## multiplicative_terms_lower multiplicative_terms_upper yhat_lower yhat_upper
## 1 0 0 314.7905 315.7814
## 2 0 0 315.5674 316.4958
## 3 0 0 316.2761 317.2251
## 4 0 0 317.6072 318.5529
## 5 0 0 318.2456 319.2117
## 6 0 0 317.6307 318.5535
## trend_lower trend_upper yhat
## 1 315.3537 315.3537 315.2762
## 2 315.4387 315.4387 316.0336
## 3 315.5155 315.5155 316.7481
## 4 315.6005 315.6005 318.0609
## 5 315.6828 315.6828 318.7030
## 6 315.7678 315.7678 318.1193
Now here is where the changes we mentioned before occurs, and the
bulk of our analysis. And this we find by using the function
prophet_plot_component and this takes our
model and predictedmodel as arguments. This
function breaks down the forecast into different parts, for the first
one its broken into trend and yearly seasonality, with different defined
parameters it can break it down into monthly/weekly/other
seasonality.
This leads on from what was talked about before, the trend component, we
see is a steadily increasing line over time, supporting the general
increase of \(CO_2\) atmospheric
concentrations. When we look at the yearly seasonality plot we can see
it changes a lot during the year with it being higher earlier in the
year and lower later in the year, and increasing once again when it gets
closer to January. This could be explained by seasons, earlier in the
year where the values are high it is spring/summer time, times where
\(CO_2\) emissions would usually be
higher again assuming and rather supporting the idea that this has an
effect on \(CO_2\) atmospheric
concentrations. But also things like summer/spring being where the
effect of plants are more heightened, i.e. \(CO_2\) absorption by plants. This kind of
idea is also reflected in the daily plot, where we see that early
morning and late night (between 22.00 and 02.00) the levels are high but
during the day they’re lower, relating to plant respiration and human
activity patterns.
What can we see here?
There is variation of \(CO_2\) levels
across different days in the week, with noticeable dips on Tuesdays but
higher levels on Thursdays and Sundays, possibly linked to human
activity like industrial emissions. Atmospheric \(CO_2\) concentrations driven by larger
climate forces, but can be implied that weekly change is influenced by
localised emissions like industrial, transportation and human
activities.
I’ve also created a dynamic plot of our first model and
predictedmodel as arguments that can be interacted with
below.
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `select()` instead.
## ℹ The deprecated feature was likely used in the prophet package.
## Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Continuous long-term increases of \(CO_2\) atmospheric concentrations, aligning with anthropogenic \(CO_2\) emissions.
We can say there are strong seasonal patterns, for all yearly, weekly and daily cycles, with the weekly and daily variations due to smaller but still influential human activities and natural processes.
There is increased uncertainty in forecasting the further into the future you predict.
What can we take away from this, with the trend and seasonality analysis we see there is predictability of \(CO_2\) accumulation, and the findings indicate intervention is required, as increasing levels of \(CO_2\) atmospheric concentrations will only lead to environmental catastrophe.