In this project, We’re going to be looking at Mauna Loa Atmospheric CO2 (Carbon Dioxide) concentration. The aim is to fit a forecasting model using the Prophet package and then analyze trends, seasonality, and overall growth in the data.
The format of the dataset is a time series of 468 observations; monthly from 1959 to 1997.
It should be noted that the values February, March and April of 1964 were missing and have been obtained by interpolating linearly between the values for January and May of 1964.
## Warning: package 'prophet' was built under R version 4.3.3
## Loading required package: Rcpp
## Loading required package: rlang
# Convert the CO2 time series to a data frame
co2.df = data.frame(
ds=zoo::as.yearmon(time(co2)), # Convert time to friendlier 'year-month' format
y=co2)
# Fit Prophet model to CO2 data
m = prophet::prophet(co2.df)## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Create a data frame for forecasting future dates (8 quarters)
f = prophet::make_future_dataframe(m, periods=8, freq="quarter")
# Finally generate the forecast using the model
p = predict(m, f)
plot(m,p)It should be noted that we have disabled daily and weekly seasonality since our data is recorded at a much coarser resolution, so that we don’t over fit the model and find patterns that simply don’t exist in the data.
The atmospheric CO2 content in ppm (parts per million) is rising over time, with clear, similar seasonal patterns year over year. The rate of increase appears to be linear with respect to time, so there isn’t any evidence of an exponential effect happening on the atmpospheric CO2 content.
By looking at the forecast of 8 quarters beyond the current data, we can see the same pattern occurring. The model has successfully captured the underlying pattern of the data, and it’s seasonal fluctuations, as shown by the forecast. The reliability of this forecast is only as good as the reliability of the base dataset, so this should be taken into account in any future references to this model.
# Perform a linear regression of CO2 values on time
model_lm <- lm(y ~ as.numeric(ds), data = co2.df)
summary(model_lm)##
## Call:
## lm(formula = y ~ as.numeric(ds), data = co2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.250e+03 2.127e+01 -105.8 <2e-16 ***
## as.numeric(ds) 1.308e+00 1.075e-02 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
We have performed a linear regression of CO2 values on time to put some numbers to the data analysis.
The slope estimate of 1.308 confirms that the CO2 content has been rising over time.
Both the intercept and slope value have extremely small p-values, so the relationship between time and CO2 levels is statistically significant, and it’s unlikely that the trend captured in our model was due to chance.
The Multiple R-squared value of ~0.97 indicates that the model explains 97% of the variability in CO2 levels.
This simple linear regression was performed in addition to the time series analysis; It is evident that the linear regression does not capture the seasonal variations in the CO2 data set, whereas this was clear in the time series. Both models confirm an up trend in the atmospheric CO2 content over time.
Some things that could be explored in future projects are:
Increasing the granularity of the time to further analyse the seasonal fluctuations
Use alternative forecasting methods, such as ARIMA or ML techniques, to see how their predictions differ
Compare the CO2 data from multiple locations, and see if there is a geographical factor to the rising CO2 content