Carbon Dioxide Concentration

Carbon dioxide, or CO2 for short, is one of three main greenhouse gases on Earth, along with methane and water vapor. Monitoring the concentration of these gases in the atmosphere is an extremely important matter in the 21st century because of the influence they have on temperature and weather patterns that affect water and food supply.

The Mauna Loa Observatory

Mauna Loa Observatory on the Big Island in Hawaii

Mauna Loa Observatory on the Big Island in Hawaii

The Mauna Loa Observatory is an atmospheric baseline station in Hawaii. Mauna Loa has the oldest continuous record of atmospheric carbon dioxide measurements.

NOAA Earth System Research Laboratory

Atmospheric carbon dioxide concentration data are monitored and published by the National Oceanic and Atmospheric Administration’s Earth System Research Laboratory.

Weekly aggregates of CO2 measurements at the Mauna Loa site, from 1974 to the present time, are released in text format via FTP: ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt

...
#      Start of week      CO2 molfrac           (-999.99 = no data)  increase
# (yr, mon, day, decimal)    (ppm)  #days       1 yr ago  10 yr ago  since 1800
  1974   5  19  1974.3795    333.29  7          -999.99   -999.99     50.31
  1974   5  26  1974.3986    332.94  6          -999.99   -999.99     50.06
  1974   6   2  1974.4178    332.16  5          -999.99   -999.99     49.43
...

Setting up R and RStudio

With the recent rise of open source (as in free!) data analysis software, such as R and R Studio, it’s now easy for anyone with access to a computer to examine environmental data for themselves and come to their own conclusions.

  1. Download and install the R interpreter from https://cran.r-project.org
  2. Download and install R Studio from https://www.rstudio.com/products/rstudio/download/

Importing the Data From the NOAA FTP Site into R

Lets use read.table to load the data from ESRL into a data frame in your R session and use head to look at the first few lines.

mauna_loa_weekly <- read.table('ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt')
head(mauna_loa_weekly)
##     V1 V2 V3       V4     V5 V6      V7      V8    V9
## 1 1974  5 19 1974.380 333.29  7 -999.99 -999.99 50.31
## 2 1974  5 26 1974.399 332.94  6 -999.99 -999.99 50.06
## 3 1974  6  2 1974.418 332.16  5 -999.99 -999.99 49.43
## 4 1974  6  9 1974.437 332.16  7 -999.99 -999.99 49.63
## 5 1974  6 16 1974.456 332.27  7 -999.99 -999.99 49.99
## 6 1974  6 23 1974.475 331.71  7 -999.99 -999.99 49.73

Preparing the Data

We can filter out the decimal years and historical comparisons from the table and keep the year, month, day and carbon concentration observed. We only need the first, second, third, and fifth columns from the original table.

mauna_loa_weekly <- mauna_loa_weekly[, c(1, 2, 3, 5)]
head(mauna_loa_weekly)
##     V1 V2 V3     V5
## 1 1974  5 19 333.29
## 2 1974  5 26 332.94
## 3 1974  6  2 332.16
## 4 1974  6  9 332.16
## 5 1974  6 16 332.27
## 6 1974  6 23 331.71

Now lets put names on the columns.

names(mauna_loa_weekly) <- c('year', 'month', 'day', 'co2ppm')
head(mauna_loa_weekly)
##   year month day co2ppm
## 1 1974     5  19 333.29
## 2 1974     5  26 332.94
## 3 1974     6   2 332.16
## 4 1974     6   9 332.16
## 5 1974     6  16 332.27
## 6 1974     6  23 331.71

In order to analyze this data over time, we should convert the year, month, and day columns into a data type that R understands as dates.

mauna_loa_weekly$date <- as.Date(paste(mauna_loa_weekly$year, mauna_loa_weekly$month, mauna_loa_weekly$day, sep = '-'), format = '%Y-%m-%d')
mauna_loa_weekly <- mauna_loa_weekly[, c('date', 'co2ppm')]
head(mauna_loa_weekly)
##         date co2ppm
## 1 1974-05-19 333.29
## 2 1974-05-26 332.94
## 3 1974-06-02 332.16
## 4 1974-06-09 332.16
## 5 1974-06-16 332.27
## 6 1974-06-23 331.71

Lets take a look at a summary of the data.

summary(mauna_loa_weekly)
##       date                co2ppm       
##  Min.   :1974-05-19   Min.   :-1000.0  
##  1st Qu.:1985-01-09   1st Qu.:  345.2  
##  Median :1995-09-03   Median :  360.8  
##  Mean   :1995-09-03   Mean   :  353.2  
##  3rd Qu.:2006-04-26   3rd Qu.:  381.4  
##  Max.   :2016-12-18   Max.   :  408.7

Something looks weird with carbon concentration data. Why is the minimum -1000?

Notice the original file says (-999.99 = no data). The value -999.99 is being used to mean that data isn’t available. We don’t want that value in our calculations, so in R, we use a special value called NA instead.

mauna_loa_weekly[mauna_loa_weekly$co2ppm == -999.99, ]$co2ppm = NA

Examining the Data

Lets summarize again.

summary(mauna_loa_weekly)
##       date                co2ppm     
##  Min.   :1974-05-19   Min.   :326.9  
##  1st Qu.:1985-01-09   1st Qu.:345.6  
##  Median :1995-09-03   Median :361.0  
##  Mean   :1995-09-03   Mean   :363.6  
##  3rd Qu.:2006-04-26   3rd Qu.:381.5  
##  Max.   :2016-12-18   Max.   :408.7  
##                       NA's   :17

Now the values make sense, and the invalid values are counted as NA. The lowest observed CO2 concentration in the Mauna Loa record is 326.9 ppm, and the highest is 408.7 ppm. That ppm unit stands for parts per million. It’s a convenient way of expressing very small ratios, equivalent to 0.0001%.

Now we can plot carbon dioxide concentration over time using the date column as the x-axis to look at the data visually. The plot function in R is a fast way to examine the relationship between variables.

plot(
  mauna_loa_weekly$date,
  mauna_loa_weekly$co2ppm,
  type = 'l',
  xlab = 'Date',
  ylab = 'CO2 Concentration PPM',
  main = 'Mauna Loa Weekly Carbon Dioxide Concentration'
)

Quantifying the Trend

There’s very clearly a linear trend from the 326 ppm minimum in 1974 to the 408 ppm maximum in 2016. Let’s quantify that trend with a linear regression using lm with CO2 concentration as the dependant variable and date as the independant variable.

trend <- lm(mauna_loa_weekly$co2ppm ~ mauna_loa_weekly$date)
summary(trend)
## 
## Call:
## lm(formula = mauna_loa_weekly$co2ppm ~ mauna_loa_weekly$date)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.3620 -2.0666  0.0539  2.0808  9.7247 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.193e+02  1.400e-01  2281.0   <2e-16 ***
## mauna_loa_weekly$date 4.713e-03  1.344e-05   350.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.829 on 2204 degrees of freedom
##   (17 observations deleted due to missingness)
## Multiple R-squared:  0.9824, Adjusted R-squared:  0.9824 
## F-statistic: 1.23e+05 on 1 and 2204 DF,  p-value: < 2.2e-16

Notice the p-value <2e-16 *** for date. This means that the chance of this pattern occurring randomly is essentially zero. Also notice the R-squared value of 0.9824. This means that time explains 98% of the change of the CO2 concentration. Now that we’ve demonstrated this linear relationship mathematically, let’s show it on the plot using abline.

plot(
  mauna_loa_weekly$date,
  mauna_loa_weekly$co2ppm,
  type = 'l',
  xlab = 'Date',
  ylab = 'CO2 Concentration PPM',
  main = 'Mauna Loa Weekly Carbon Dioxide Concentration'
)
abline(trend, col = 'dark blue')

We have some initial results that are worth sharing, so let’s make them more presentable. The ggplot2 library makes good looking plots in R. Let’s install ggplot2.

if(!require(lazyeval)) {
    install.packages('lazyeval')
    library(lazyeval)
}

if(!require(ggplot2)) {
    install.packages('ggplot2')
    library(ggplot2)
}

Now we can make a nicer looking version of our plot using ggplot.

ggplot(data = mauna_loa_weekly, aes(date, co2ppm)) +
  geom_line() +
  xlab('Date') +
  ylab('CO2 Concentration PPM') + 
  ggtitle('Mauna Loa Weekly Carbon Dioxide Concentration') +
  stat_smooth(method = lm, color = 'dark blue')

Examining Seasonality

So we can see the long term trend in CO2 concentration is going up in almost a straight line over time, and dipped just below this trend in the 90s. But why is this variable going up and down along that line in such a regular fashion? It looks like it cycles each year, so it should be seasonal variation. Let’s take a closer look at a couple years.

To make it easier to deal with dates, we can use the lubridate package, and to make it easier to subset the data using these dates, we can use the dplyr package.

if(!require(lubridate)) {
    install.packages('lubridate')
    library(lubridate)
}

if(!require(dplyr)) {
    install.packages('dplyr')
    library(dplyr)
}

The subset function in the dplyr package lets you take only the rows of a table you want to look at. The %>% in dplyr lets you pass a table through a chain of such filters. Let’s take a look at the first few rows of 2015.

mauna_loa_weekly %>% subset(year(date) == 2015) %>% head()
##            date co2ppm
## 2121 2015-01-04 399.85
## 2122 2015-01-11 400.16
## 2123 2015-01-18 399.51
## 2124 2015-01-25 400.20
## 2125 2015-02-01 400.23
## 2126 2015-02-08 399.95

The year 2015 started off at 399.85 ppm.

We should also look at the last few rows using tail.

mauna_loa_weekly %>% subset(year(date) == 2015) %>% tail()
##            date co2ppm
## 2167 2015-11-22 400.37
## 2168 2015-11-29 400.80
## 2169 2015-12-06 401.31
## 2170 2015-12-13 402.35
## 2171 2015-12-20 402.60
## 2172 2015-12-27 402.07

It appears 2015 closed a little higher at 402.07 ppm.

We should also make note of the range of the values by looking at the summary.

mauna_loa_weekly %>% subset(year(date) == 2015) %>% summary()
##       date                co2ppm     
##  Min.   :2015-01-04   Min.   :397.2  
##  1st Qu.:2015-04-03   1st Qu.:399.2  
##  Median :2015-07-01   Median :400.8  
##  Mean   :2015-07-01   Mean   :400.8  
##  3rd Qu.:2015-09-28   3rd Qu.:402.4  
##  Max.   :2015-12-27   Max.   :404.1

The highest concentration observed in 2015 was 404.1 ppm, and the lowest observed was 397.2 ppm. The average or mean of all observations that year was 400.8 ppm.

Let’s look at all of this on a plot.

ggplot(data = mauna_loa_weekly %>% subset(year(date) == 2015), aes(date, co2ppm)) +
  geom_line() +
  xlab('Date') +
  ylab('CO2 Concentration PPM') + 
  ggtitle('Mauna Loa Weekly Carbon Dioxide Concentration')

It looks like the starting point of 399.85 ppm at the beginning of the year was close to the mean value of 400.8, and from there, it went up to the maximum of 404.1 ppm around late spring early summer in the northern hemisphere, then steeply dropped in the fall.

Let’s check when those maximum and minimum measurements were taken.

mauna_loa_weekly %>% subset(year(date) == 2015) %>% subset(co2ppm == max(co2ppm))
##            date co2ppm
## 2138 2015-05-03 404.13

The maximum value was observed on May 3rd.

mauna_loa_weekly %>% subset(year(date) == 2015) %>% subset(co2ppm == min(co2ppm))
##            date co2ppm
## 2159 2015-09-27  397.2

And the minimum value was observed on September 27th.

Let’s check the previous year.

ggplot(data = mauna_loa_weekly %>% subset(year(date) == 2014), aes(date, co2ppm)) +
  geom_line() +
  xlab('Date') +
  ylab('CO2 Concentration PPM') + 
  ggtitle('Mauna Loa Weekly Carbon Dioxide Concentration')

The pattern looks very similar.

mauna_loa_weekly %>% subset(year(date) == 2014) %>% subset(co2ppm %in% c(min(co2ppm), max(co2ppm)))
##            date co2ppm
## 2085 2014-04-27 402.15
## 2105 2014-09-14 394.85

The year 2014 reached a high of 402.15 on April 27th and a low of 394.85 on September 14th.

To look at this seasonal trend in all years observed, let’s break the dates back up into year and day of year.

mauna_loa_weekly$year <- year(mauna_loa_weekly$date)
mauna_loa_weekly$yday <- yday(mauna_loa_weekly$date)
head(mauna_loa_weekly)
##         date co2ppm year yday
## 1 1974-05-19 333.29 1974  139
## 2 1974-05-26 332.94 1974  146
## 3 1974-06-02 332.16 1974  153
## 4 1974-06-09 332.16 1974  160
## 5 1974-06-16 332.27 1974  167
## 6 1974-06-23 331.71 1974  174

And let’s plot each year with a different color.

ggplot(data = mauna_loa_weekly, aes(yday, co2ppm, colour = year, group = year)) +
  geom_line() +
  xlab('Day of Year') +
  ylab('CO2 Concentration PPM') + 
  ggtitle('Mauna Loa Weekly Carbon Dioxide Concentration') +
  scale_color_gradientn('Year', colors = rainbow(length(unique(mauna_loa_weekly$year))))

This seasonal cycle repeats each year, but is overcome by the long term increasing trend.

Monthly Data

This cycle and trend might be easier to see on a monthy basis. Returning to the source data, ESRL publishes monthly averages of the Mauna Loa observations.

ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt

...
# CO2 expressed as a mole fraction in dry air, micromol/mol, abbreviated as ppm
#
#  (-99.99 missing data;  -1 no data for #daily means in month)
#
#            decimal     average   interpolated    trend    #days
#             date                             (season corr)
1958   3    1958.208      315.71      315.71      314.62     -1
1958   4    1958.292      317.45      317.45      315.29     -1
1958   5    1958.375      317.50      317.50      314.71     -1
1958   6    1958.458      -99.99      317.10      314.85     -1
1958   7    1958.542      315.86      315.86      314.98     -1
...

Let’s import that data as well. The monthly record at Mauna Loa goes back to 1958. Raw averages are provided, but as a convenience to you, an interpolated column is provided that fills in gaps in data. We’ll take the year, month, and gap-filled CO2 concentrations.

mauna_loa_monthly <- read.table('ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt')
mauna_loa_monthly <- mauna_loa_monthly[, c(1, 2, 5)]
names(mauna_loa_monthly) = c('year', 'month', 'co2ppm')
mauna_loa_monthly$date <- as.Date(paste(mauna_loa_monthly$year, mauna_loa_monthly$month, '01', sep = '-'), format = '%Y-%m-%d')
summary(mauna_loa_monthly)
##       year          month            co2ppm           date           
##  Min.   :1958   Min.   : 1.000   Min.   :312.7   Min.   :1958-03-01  
##  1st Qu.:1972   1st Qu.: 4.000   1st Qu.:328.1   1st Qu.:1972-11-01  
##  Median :1987   Median : 7.000   Median :349.7   Median :1987-07-01  
##  Mean   :1987   Mean   : 6.506   Mean   :351.9   Mean   :1987-07-02  
##  3rd Qu.:2002   3rd Qu.: 9.000   3rd Qu.:373.1   3rd Qu.:2002-03-01  
##  Max.   :2016   Max.   :12.000   Max.   :407.7   Max.   :2016-11-01
ggplot(data = mauna_loa_monthly, aes(date, co2ppm)) +
  geom_line() +
  xlab('Date') +
  ylab('CO2 Concentration PPM') + 
  ggtitle('Mauna Loa Monthly Carbon Dioxide Concentration') +
  stat_smooth(method = lm, color = 'dark blue')