About The Misleading Canonical Graph of 400,000 Years of CO2 and Global Temperature

Ivo Welch

21 February, 2022

Abstract

One of the most prominent graphs in climate activism — the correlation between CO2 and global temperature over the last 400,000 years, based on the Vostok Ice Cores — is misleading when it is shown in order to bolster the case that CO2 increases have caused global warming. The graph intermingles two effects. Temperature can predict CO2. CO2 can predict temperature. Both effects are well known and one does not preclude the other. When presented in a CO2-temperature context, this intermixing renders the graph deceptive. In truth, the prime problem of the analysis is the high auto-correlation of changes in CO2. It makes it difficult to pin down the role of CO2 shocks on temperature innovations. Put into econometric-analysis terms, the x variable is poorly identified. It is itself endogenous most of the time. Perhaps most importantly, this joint 400,000 year history is not relevant today. Although the driving forces of CO2 were not well-known for most of this long period, scientists know that humanity has injected about 130 ppm of CO2 into the atmosphere, unrelated to other factors, during the last 200 years.

1 Introduction

1.1 The Data Set

The source of the original data is the Vostok ice core data on temperature and CO2.¹ The data itself is stored locally in a file called temp-co2-solar-422k.csv. First, we need to read it.

options(digits=2, width=200)
library(formatR)
library(data.table)
dlvl <- read.csv("temp-co2-solar-422k.csv")
dlvl <- dlvl[complete.cases(dlvl),]
summary(dlvl)

##       year              temp           co2          solar    
##  Min.   :-414000   Min.   :-9.2   Min.   :182   Min.   :388  
##  1st Qu.:-311100   1st Qu.:-7.0   1st Qu.:204   1st Qu.:424  
##  Median :-208200   Median :-5.5   Median :225   Median :441  
##  Mean   :-208200   Mean   :-4.9   Mean   :227   Mean   :441  
##  3rd Qu.:-105300   3rd Qu.:-3.5   3rd Qu.:248   3rd Qu.:456  
##  Max.   :  -2400   Max.   : 3.2   Max.   :298   Max.   :497

cat("N= ",nrow(dlvl), "\n")

## N=  4117

The data are regularly spaced in 100 year intervals each. With 4,117 observations, and many ups and downs, we have a fairly large dataset. Statistical tools are well-suited to uncovering reasonably reliable relationships in such large data sets.

1.2 The Canonical But Misleading Graph

The following is the “canonical” plot of concern. It is commonly shown to suggest to naive audiences that CO2 has caused global warming in the past.

with(dlvl, {
  plot(year/1000, scale(temp), col="blue", xlab="Relative Year, in Thousands", ylab="Z-Score",
       type="l", lwd=3, main="Misleading Association Graph of 400,000 Years")
  lines(year/1000, scale(co2), col="brown", lwd=3)
  lines(year/1000, scale(solar), col="orange")
  legend("topleft", col=c("brown","blue"), lty=1, box.col="white", lwd=3, legend=c("CO2","Temp") )
  })

Clearly, CO2 and temperature move together. The associated verbal suggestion — sometimes explicit, sometimes left to the audience’s imagination — is often that CO2 determines temperature.² The visually implied X-Y association can also be graphed:

with(dlvl, {
  plot(co2, temp, xlab="CO2", ylab="Temperature", main="Misleading Association Graph of 400,000 Years, XY", cex=0.4)
  lines(co2,temp, col="gray")
  abline( lm(temp ~ co2), col="blue", lwd=2)
  })

The blue line is the best-fit OLS line, \(\text{temp}_t = -24 + 0.084*\text{CO2}_t\). The suggestion in both plots is that an increase in 100 ppm of CO2 (about 33% of the mean CO2 in the data) predicts warming of 8.4°C. This is of course absurd.

This naive interpretation, that this graph even suggests that CO2 drove global warming, is misleading and therefore wrong.

Anyone who understands basic data analysis should understand this plot is misleading as far as establishing a (causal) link from CO2 to temperature is concerned. Indeed, the only point of my analysis is to plead not to use this graph any longer. This plotted relationship is a misleading and classic example of a spurious relation. A classic example is the association between ice cream sales and murders. Both are higher in summer, and the two plots between ice-cream sales and murder would look just like two plots of CO2 and temperature above.

There are better ways to analyze the CO2, temperature, and solar data, shown below. These better ways address the facts that the graph misleads with respect to two problems:

Could a third variable — such as trends, volcanos, solar radiation, or anything else — have caused (co-)variation in both CO2 and temperature?
Is CO2 causing warming or is warming causing CO2, or are both causing one another?

The remedy to the first problem is to work in changes of variables, not in levels of variables. The remedy to the second problem is to work with lead-lag associations. I am not the first to have noticed that temperature changes can also anticipate CO2 changes. However, some climate-change critics have jumped to the equally incorrect conclusion that such feedback effects then reject the hypothesis that CO2 drives temperature. Feedbacks are not mutually exclusive with respect to the hypothesis of interest, which is whether CO2 changes anticipate temperature changes. Section 3 below analyzes the two data series to disentangle both directions below.

2 Models and Spurious Correlation

If the true model is \(y_t = a + b*x_t\), then it follows algebraically that \((\Delta y_t) = 0 + b*(\Delta x_t)\), where \(\Delta\) is the change (also called the first difference). A good test uncovering many spurious correlations is to estimate regressions both in levels and differences. If the coefficient \(b\) is not the same (or at least similar) in both regression models, it suggests that the level correlation is spurious. In such a case, we would have learned that \(y_t = a + b*x_t\) was not the correct model to begin with.

2.1 Defining Variable Changes

To work with changes, it is useful to define and work with “chg” and “lag” R functions.

chg <- function(x,...) { o <- x - shift(x,...); names(o) <- paste0("d",names(x)); o }
lag <- function(x,...) { o <- shift(x,...); names(o) <- paste0("l",names(x)); o }
   ## the above use data.table's shift function

ds <- cbind(dlvl, chg(dlvl)); rownames(ds) <- NULL  ## combine levels and changes into one data set

print(head(ds))  ## show the output to make it easier to understand how this works

##      year temp co2 solar dyear  dtemp  dco2 dsolar
## 1 -414000 0.84 285   443    NA     NA    NA     NA
## 2 -413900 0.83 285   443   100 -0.010 -0.28   0.29
## 3 -413800 0.82 285   444   100 -0.009 -0.28   0.30
## 4 -413700 0.81 284   444   100 -0.009 -0.28   0.30
## 5 -413600 0.80 284   444   100 -0.008 -0.28   0.30
## 6 -413500 0.85 284   445   100  0.051 -0.28   0.30

Here are some basic background statistics on our data, both levels and changes:

print(summary( ds ))

##       year              temp           co2          solar         dyear         dtemp            dco2           dsolar     
##  Min.   :-414000   Min.   :-9.2   Min.   :182   Min.   :388   Min.   :100   Min.   :-1.67   Min.   :-13.0   Min.   :-1.28  
##  1st Qu.:-311100   1st Qu.:-7.0   1st Qu.:204   1st Qu.:424   1st Qu.:100   1st Qu.:-0.11   1st Qu.: -0.4   1st Qu.:-0.40  
##  Median :-208200   Median :-5.5   Median :225   Median :441   Median :100   Median :-0.01   Median : -0.1   Median :-0.01  
##  Mean   :-208200   Mean   :-4.9   Mean   :227   Mean   :441   Mean   :100   Mean   : 0.00   Mean   :  0.0   Mean   : 0.00  
##  3rd Qu.:-105300   3rd Qu.:-3.5   3rd Qu.:248   3rd Qu.:456   3rd Qu.:100   3rd Qu.: 0.12   3rd Qu.:  0.3   3rd Qu.: 0.41  
##  Max.   :  -2400   Max.   : 3.2   Max.   :298   Max.   :497   Max.   :100   Max.   : 1.92   Max.   :  6.0   Max.   : 1.47  
##                                                               NA's   :1     NA's   :1       NA's   :1       NA's   :1

Theoretically, it would be better not to work with plain changes, but with (log one plus) percent changes in CO2 (and also for temperature in Kelvin). Trust me that it matters little, except that the exposition is easier if I just use plain differences in CO2 as I do here, because the units are more familiar.

2.2 Contemporaneous Regressions in Levels Vs. Changes

We first define a function that returns only the regression coefficients that we want to see, thereby removing a lot R output clutter.

showcoef <- function( formula, controls= (~.), data=ds ) 
  coef(summary(lm( update(formula,controls),  data=data)))[,c(1,3)]

Here is the basic level regression, \(\text{temp}_t = a + b*\text{CO2}_t\):

showcoef( temp ~ co2 )

##             Estimate t value
## (Intercept)  -23.910    -139
## co2            0.084     111

Now, the regression in differences of \(\Delta\text{temp}_t = a + b*(\Delta\text{CO2}_t)\):

showcoef( dtemp ~ dco2 )

##             Estimate t value
## (Intercept) -0.00044    -0.1
## dco2         0.03310     6.3

The coefficient estimate of 0.03 is much smaller than 0.08, suggesting spurious level trend correlation in the prior regression.³

For the same reason that first differencing should yield the same coefficient if the model is reasonably correct, so should second differencing:

showcoef( chg(dtemp) ~ chg(dco2) )

##             Estimate t value
## (Intercept) -0.00032  -0.055
## chg(dco2)    0.01767   1.415

Again, even the change regression contains spurious trends. It is only this last changes-in-changes regression that is finally stable with respect to further differencing.

3 Granger-Sims Causality and Lead-Lag Associations

Note that the above regression inputs were still contemporaneous. They only solve the spurious correlation issue with respect to trends. They do not address the question of whether CO2 drives warming or vice-versa.

A better test is based on the idea that if CO2 really changes temperature, then (unexpected) changes in CO2 should anticipate (unexpected) changes in temperature. Econometricians call this Granger-Sims causality (GSC).⁴ GSC is a necessary but not a sufficient condition to read potential causality into data, even if there are no important omitted variables. If there is no GSC, then the data cannot show that CO2 causally influences temperature. (This will not be a problem in this data, however. The problems are elsewhere.) GSC tests may be too weak in actual data to find an association that exists. (The obvious example is to consider a case in which one has only 2 data points.) In this case, without GSC, the data is only decisive in stating that they are inconclusive, not that there is no association.

Although the correct way to estimate the structure are Sims’ vector-autoregressions (VARs), which we shall do in section 3.2, we start with simple OLS regressions in section 3.1. The estimates are similar.

3.1 Simple OLS Analysis

3.1.1 Preliminary Analysis: Predicting Changes in CO2

First, consider changes in CO2. The strongest association in the data, by far, is that they are highly autocorrelated:⁵

showcoef( dco2 ~ lag(dco2) )

##             Estimate t value
## (Intercept)  0.00021   0.031
## lag(dco2)    0.83836  98.586

# showcoef( dco2 ~ lag(dco2) + I(lag(dco2)*(abs(dco2)>1)) )  # large shocks are more autocorrelated than small ones.

This is stable and applies even to changes in changes in CO2:

showcoef( lag(dco2) ~ lag(lag(dco2)) )

##                Estimate t value
## (Intercept)     0.00018   0.025
## lag(lag(dco2))  0.83831  98.555

The data inform us that once CO2 changes got going into a particular direction, they tended to continue the same direction. CO2 changes had strong internal dynamics, almost random-walk like: whenever CO2 increased over 100 years, it strongly tended to increase over the next 100 years again, and by almost as much. Ergo, when a shock to CO2 occurs, it has long-term effects, far beyond a century. The half-life of shocks to changes in CO2 is about \(-\log(2)/\log(0.83)\approx3.7\) centuries.

I am not a climate scientist. My subject expertise (unlike my data analysis expertise) is very limited. I do not understand why CO2 changes were so highly autocorrelated. Not shown, this seems not to be driven by temperature or solar forcing. Buffers likely play a role. Presumably, when a shock to CO2 occurs for whatever reasons, it take earth centuries to undo it.

Empirically, the autocorrelation of CO2 is what will make it difficult to determine how CO2 changes influenced temperature changes — CO2 changes over the last 100 years tend to look very similar to CO2 changes from, say, 500 to 600 years ago.

3.1.2 Key Analysis: Predicting Changes in Temperature

The true variable of interest are changes in temperatures, not changes in CO2. Rather than start with the simplest regression, the following regression already includes a host of controls, to be explained in a moment.

showcoef( dtemp ~ lag(dco2) , ~ . + lag(dsolar) + lag(dtemp) +
                       lag(temp) + lag(dtemp)*lag(temp) + lag(co2) + lag(solar) + dsolar )

##                      Estimate t value
## (Intercept)          -0.77886   -5.20
## lag(dco2)             0.03046    5.73
## lag(dsolar)           0.03540    0.37
## lag(dtemp)           -0.09593   -3.36
## lag(temp)            -0.02149   -6.22
## lag(co2)              0.00150    4.61
## lag(solar)            0.00076    3.39
## dsolar               -0.02708   -0.28
## lag(dtemp):lag(temp) -0.04711   -8.65

This regression suggests the following:

When solar radiation was high, temperature tended to increase (0.00076). Changes in solar radiation were fairly unimportant (-0.02708), suggesting a very slow response of temperature to solar radiation.
When lagged CO2 was high (0.0015) and when lagged CO2 increased recently (0.03046), temperature tended to increase. This is good evidence that changes in CO2 influenced future warming, both short-term and long-term. (The relation is robust to inclusion or exclusion of many control variables, such as the state variables included here, too.)
Earth has a strong thermostat: (a) when temperature was high, it tended to go down (-0.021); (b) when temperature recently has gone up, it tended to go down again (-0.096); and (c) temperature really wanted to go down again if both temperature was high and temperature had recently gone up (-0.047).

There are two interesting earth-science questions beyond my expertise related to this regression output.

The first concerns earth’s thermostat. It seems to work, even controlling for solar forcing and CO2. What is its cause? Will it continue?

The second concerns the magnitude of the coefficient estimate on lag(dco2). It’s still way too big. It suggests that on the margin, an increase of 250 ppm (about doubling) in CO2 predicts global warming of nearly \(0.03046\times250\approx7.6\)°C. Standard climate change models would suggest increases of less than half this much, about \(3\)°C. The 400,000 association between lagged CO2 changes and temperature changes was far too strong. (This is even more worrisome because Archer suggests that about 3/4 of the CO2 disappears in the carbon cycle rapidly before it has much opportunity to drive the greenhouse effect.)

3.2 Improved Analysis: Vector Autoregressions (VARs)

The improved statistical estimation method are Sims’ vector autoregressions. They estimate equations on both \(\Delta \text{CO2}\) dynamics and \(\Delta \text{Temp}\) dynamics together to disentangle better how they influence one another (in innovations, too). The specification explicitly allows for the two variables to influence one another, too.⁶

We first need to do some basic setup and create variables for the package.

library(vars)

## Loading required package: strucchange

## Loading required package: urca

# define some variables used later as controls
ds <- within(ds, {
  lagsolar <- lag(solar)
  lagco2 <- lag(co2)
  lagtemp <- lag(temp)
})
dvar <- ds[complete.cases(ds),]

dvar.mainseries <- subset( dvar, T, select=c(dco2,dtemp) );  ## the two key var variables

We begin with a one-lag VAR analysis. The format of the coefficient-test output is first the dependent variable, then the independent variables, then an indicator of the lag.

var1 <- VAR( dvar.mainseries, type="none", lag.max=1 )
print(var1)

## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation dco2: 
## ========================================= 
## Call:
## dco2 = dco2.l1 + dtemp.l1 
## 
##  dco2.l1 dtemp.l1 
##    0.837    0.046 
## 
## 
## Estimated coefficients for equation dtemp: 
## ========================================== 
## Call:
## dtemp = dco2.l1 + dtemp.l1 
## 
##  dco2.l1 dtemp.l1 
##    0.024    0.100

coeftest(var1)[,c(1,3)]

##                Estimate t value
## dco2:dco2.l1      0.837    98.0
## dco2:dtemp.l1     0.046     1.9
## dtemp:dco2.l1     0.024     4.5
## dtemp:dtemp.l1    0.100     6.4

This suggests (as before) that

Carbon-dioxide changes, \(\Delta\text{CO2}\), are highly autocorrelated (0.837). When CO2 has increased, it wants to continue to increase. When CO2 has decreased, it wants to continue to decrease. Of all the association in this 400,000 year data, it is by far the strongest one. It is evidence of strong buffers and/or a strong CO2 feedback effect on earth.
When temperature has recently gone up, \(\Delta\text{CO2}\) wants to go up just a little more (0.046).
When temperature has recently gone up, then the temperature wants to go up just a little more (0.1).[^This disappears with better control for the level of temperature and recent temperature changes interacted. The reason can be inferred due to the acclerating shape of the xy graph above.]
The association of most interest to us: When CO2 has recently gone up, then the temperature wants to go up just a little more. This is what we found before, and the magnitude of the coefficient remains troubling. The coefficient is still far too large, suggesting a warming effect of about \(2.4\times250\approx6\)°C for a doubling of CO2. And this is even more disconcerting, because it is not even for a recent 1-50 year increase but for a 100-year lagged increase in CO2.

3.2.1 Expected Decay of Autocoefficients With Lag

The theory further predicts that the coefficient on lag CO2 should decrease with lag. A change in CO2 this century should have more ability to predict the temperature in the next 100 years than, say, in 100 years in five centuries. This is a quasi-placebo test. There should be very little association beyond the first one or two lags.

The following includes 10 lags of CO2 changes, i.e., the last ten centuries. The printouts are only for the coefficients on the \(\Delta\text{CO2}\) predictors in the \(\Delta\text{temp}\) prediction, although the analysis itself remains based on the full VAR.

var10 <- VAR( dvar.mainseries, type="none", lag.max=10)
## with controls: use VAR( dvar.mainseries, lag.max=10, exog= subset( dvar, T, select=c(lagsolar, lagco2, lagtemp) )
var10

## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation dco2: 
## ========================================= 
## Call:
## dco2 = dco2.l1 + dtemp.l1 + dco2.l2 + dtemp.l2 + dco2.l3 + dtemp.l3 + dco2.l4 + dtemp.l4 + dco2.l5 + dtemp.l5 + dco2.l6 + dtemp.l6 + dco2.l7 + dtemp.l7 + dco2.l8 + dtemp.l8 + dco2.l9 + dtemp.l9 + dco2.l10 + dtemp.l10 
## 
##   dco2.l1  dtemp.l1   dco2.l2  dtemp.l2   dco2.l3  dtemp.l3   dco2.l4  dtemp.l4   dco2.l5  dtemp.l5   dco2.l6  dtemp.l6   dco2.l7  dtemp.l7   dco2.l8  dtemp.l8   dco2.l9  dtemp.l9  dco2.l10 dtemp.l10 
##   0.84864   0.05839   0.02070   0.01303  -0.03842   0.05003   0.00097   0.08456  -0.01525   0.00036   0.00038   0.09773  -0.11836   0.06611   0.06887   0.06901   0.03370   0.07269   0.01032   0.00906 
## 
## 
## Estimated coefficients for equation dtemp: 
## ========================================== 
## Call:
## dtemp = dco2.l1 + dtemp.l1 + dco2.l2 + dtemp.l2 + dco2.l3 + dtemp.l3 + dco2.l4 + dtemp.l4 + dco2.l5 + dtemp.l5 + dco2.l6 + dtemp.l6 + dco2.l7 + dtemp.l7 + dco2.l8 + dtemp.l8 + dco2.l9 + dtemp.l9 + dco2.l10 + dtemp.l10 
## 
##   dco2.l1  dtemp.l1   dco2.l2  dtemp.l2   dco2.l3  dtemp.l3   dco2.l4  dtemp.l4   dco2.l5  dtemp.l5   dco2.l6  dtemp.l6   dco2.l7  dtemp.l7   dco2.l8  dtemp.l8   dco2.l9  dtemp.l9  dco2.l10 dtemp.l10 
##    0.0219    0.1116   -0.0060   -0.2423    0.0157   -0.0257    0.0016   -0.0368   -0.0125    0.0140    0.0229    0.0134   -0.0163   -0.0053    0.0371    0.0278   -0.0116    0.0392    0.0081   -0.0460

The important coefficients (past changes in CO2 predicting the change in temperature) are now graphed:

plotvarcoef <- function( ctbl , vnm="CO2", yl=0.05 ) {
   plot( 1:nrow(ctbl), ctbl[,1], xlab=paste("100-Year Lag of",vnm,"Change"), ylab=paste("Coefficient on",vnm,"Change"), type="b", ylim=c(-yl,yl), main="Explaining 100-Year Ahead Temperature Changes")
   lines( 1:nrow(ctbl), ctbl[,1] + 1*ctbl[,2], col="gray", lty=2 )
   lines( 1:nrow(ctbl), ctbl[,1] - 1*ctbl[,2], col="gray", lty=2 )
   lines( 1:nrow(ctbl), ctbl[,1] + 2*ctbl[,2], col="gray", lty=3 )
   lines( 1:nrow(ctbl), ctbl[,1] - 2*ctbl[,2], col="gray", lty=3 )
   lines( c(0,20), c(0,0), lty=2, col="gray")
   points( 1, ctbl[1,1], col="blue", cex=2)
}

coef.dtemp <- coef(var10)$dtemp
coef.dtemp.dco2 <- coef.dtemp[grepl("dco2", rownames(coef.dtemp)),]
# print(coef.dtemp.dco2[,1])
plotvarcoef( coef.dtemp.dco2 )

The data analysis fails the placebo test.

Archer explains that the theory says that the coefficients should be decaying. More specifically, theory predicts a long-run coefficient of about 0.01 (in this sample, a 250 ppm increase should induce a 3°C increase). It should be split into about 0.0075 on lag0, 0.002 on lag1, and lower coefficients on further lags. The most recent CO2 change should be more powerful than more lagged CO2 changes.

If I were to claim that this data suggests that changes in CO2 have driven changes in temperature, then

why are CO2 changes from many centuries ago similarly powerful as the most recent CO2 changes?
why are the coefficient estimates so large?

The statistical reason for the first part of this mess is the high correlation among CO2 changes. When CO2 increased in the last 100 years, it also likely increased before. As far as the regression is concerned, many recently past CO2 changes look somewhat alike in their power and could have been responsible for their influence on predicting temperature changes. 4,000 centuries should have been enough to uncover the relationship, but just weren’t. The reason for the second part of this mess, the terribly high coefficient estimates are a mystery to me.

The data absolutely do not reject the hypothesis that changes in CO2 drive changes in global warming. (I would go further and characterize this as “they hint at an association.”) The data just do not reject the hypothesis that the relationship is not strong enough to identify this relationship cleanly. Thus, the use of the visual graph at the outset is not only misleading (for ignoring the reverse association), it is badly misleading.

4 Conclusion

4.1 Is This Evidence Against A Climate Role for CO2?

Absolutely not!

This is not evidence against the role of CO2 on global temperatures. It is only evidence that the theoretically predicted relation in this data set is difficult to uncover, because we have such strong autocorrelation of changes in CO2. We have absence of evidence in this graph, not evidence of absence in this data.

There are good reasons for this. In particular:

CO2 or global temperatures could be measured with too much noise. It’s not like we had satellites measuring CO2 and temperature for hundreds of thousands of years. The measurement comes from proxies and only in one place.
There are state variables (buffers) in the system that can obscure the relationship to the point where the graph is highly misleading.

As in all statistical analysis, theory and empirical identification of more variables can improve the estimated associations. Advances in knowledge could point either way. The inclusion of omitted control variables could bolster the case for a causal association of CO2 on future warming or it could undermine it. The answer to whether CO2 causes warming is beyond the analysis here — indeed shown to be beyond the analysis feasible merely with CO2, temperature, and solar data, even using state information — and not the expertise of the author.

The question examined in this writeup was not whether CO2 causes warming, but whether the canonical graph is reasonably representative of the association in the data and the predicted association from the theory. The answer is a clear no. The canonical graph is highly misleading. It is not solid empirical evidence in favor of a role of CO2 in warming. There is evidence of unaccounted trends, omitted variables, and feedback effects. The graph is not even mildly representative of what can be gathered from a better analysis of the data, either. It is best not shown to unsuspecting audiences.

4.2 How Relevant Is This Data?

It is not very relevant at all.

Scientists have known for a long time that the graph reflects feedback effects and omitted variables. They are not evidence for or against a causal effect of CO2 on temperature. This data set — likely the best public dataset available at the moment covering this 400,000 year span — suggests that the empirical evidence of how CO2 influenced temperature in prehistoric times is insufficient in itself. It does not suggest that the relationship was not there, only that the inference must be based on other evidence. The canonical graph is misleading and should not be shown in order to bolster the case for the relationship.

My note has clarified how it is the auto-correlation in CO2 changes that makes reliable and proper inference from the data so difficult, even ignoring the misleading aspects of the presentation of the canonical graph. Thus, the attention on the canonical graph seems misplaced. The problem in the prehistorical data is that the causes of the CO2 increases were not known and difficult to disentangle. In contrast, scientists are sure that it was humanity that has injected about 130 ppm of CO2 into the atmosphere over the last 100 years. Thus, this is an entirely different situation at hand today.

5 Backmatter

5.1 Omitted Analysis

Sea-level does not seem to make much difference.
Further use of solar forcing controls does not make much difference.

5.2 Clarification

If there are no omitted obscuring variables, then:

Causation implies correlation.
Correlation does not imply causation.
**Absence of correlation implies absence of causation.**

5.3 References

Archer, David, The Long Thaw, 2016

Leamer, Edward, VECTOR AUTOREGRESSIONS FOR CAUSAL INFERENCE?, Carnegie Papers.

Stock, James H. and Mark W. Watson, Vector Autoregressions, Working Paper.

Lorius, Claude et al. The ice-core record: climate sensitivity and future greenhouse warming, Nature, 1990.

Rahmstorf, Stefan . Cosmic Rays, Carbon Dioxide, and Climate, EOS 2004.

Note that the references are dated, because the graph is. It continues to be prominently displayed, though.

* http://www.climatedata.info/proxies/data-downloads/ , itself originally from - http://www.ncdc.noaa.gov/paleo/indexice.html and - http://www1.ncdc.noaa.gov/pub/data/paleo/climate_forcing/orbital_variations/berger_insolation/ (solar forcing data, in m/W^2, at 65degree north, mid-July).↩︎
The orange line is solar heat hitting the planet, caused by astronomical variations. For the most part, the data show that it had relatively little influence.↩︎
The T-statistic is smaller in differences than in levels, but this is always the case and does not suggest a problem. Note that for 4,000 or so observations, a T-statistic of 5 is not that uncommon. It just means there is good statistical relationship. It does not tell you whether the association itself is strong. For this judgment, you want to assess the magnitude of the coefficient multiplied by the spread in the variable itself.↩︎
GSC is still not conclusive proof of a causal role of CO2 on temperature (in the same sense that the weather forecast comes first but it does not cause the weather).↩︎
Oddly, large changes in CO2 are more autocorrelated than small changes. Adding I(lag(dco2)*(abs(dco2)>1)) yields a coefficient of 0.58 for the ordinary dco2 autocoef and 0.58+0.40 for the large-change dco2 autocoef. This contradicts the idea that exogenous shocks push CO2 away and the autocorrelation then primarily comes from buffers that slow down mean-reversion.↩︎
However, in this data set, it turns out that OLS regressions and VAR regressions yield almost the same results.↩︎