Introduction

A simultaneous equation model is a statistical model in the form of a set of simultaneous linear equations. They differ from regular regression models in that there are two or more dependent variables. A common use for these types of models is estimating supply and demand. For reasons that will be explained in this article, using linear regression to estimate the parameters of a set of supply and demand equations is not ideal. Rather, one can estimate the parameters of a simultaneous set of supply and demand equations using 2-staged least squares estimation.

1. The Supply and Demand Functions

One of the very first concepts every student learns in their introductory economics course is the concept of supply and demand. It’s one of the cornerstones of modern economics. Let’s revisit this concept. Consider the following supply and demand functions

\[ \begin{aligned} \text{Supply:} & Q = \beta_{1}P + \epsilon_{s} \\ \text{Demand:} & Q = \alpha_{1}P + \alpha_{2}I + \epsilon_{d} \end{aligned} \]

In this simultaneous model, the variables \(P\) and \(Q\) are endogenous variables because their values are determined within the system. The variable \(I\), which denotes income in the demand equation, is an exogenous variable because its value is determined outside the system. Both \(P\) and \(Q\) are dependent variables and are random.

Random errors are added to the equations for the usual reasons and have the same least squares properties

\[ \begin{aligned} E(\epsilon_{d}) = 0 && \text{var}(\epsilon_{d}) = \sigma_{d}^{2} && E(\epsilon_{s}) = 0 && \text{var}(\epsilon_{s}) = \sigma_{s}^{2} && \text{cov}(\epsilon_{d}, \epsilon_{s}) = 0 \end{aligned} \]

2. The Failure of Least Squares

One might think that estimating each supply and demand equation individually using least squares would give you the proper parameter estimates. Unfortunately, that is not the case. One problem is that the endogenous variable \(P\) in the supply equation is correlated with its error term \(\epsilon_{s}\), which, if you’re curious, I will demonstrate in the Appendix. Essentially, the failure of least squares of the supply equation is due to the fact that the relationship between \(Q\) and \(P\) gives credit to price \((P)\) for the effect of changes in the error term \(\epsilon_{s}\). This happens becasuse we do not observe the change in the error term, but only the change in \(P\) owing to its correlation with the error term \(\epsilon_{s}\). The least squares estimator \(\beta_{1}\) will understate the true parameter value in this model because of the negative correlation between the endogenous variable \(P\) and its error term \(\epsilon_{s}\).

2.1 The Identification Problem

In our supply and demand model:

  1. The parameters of the demand equation \(\alpha_{1}\) and \(\alpha_{2}\) cannot be consistently estimated by any estimation method.
  2. The slope of the supply equation \(\beta_{1}\) can be consistently estimated.

To make these statements clear, let’s suppose that the level of income \(I\) changes. The demand curve shifts and new equilibrium price and quantity are created.

Figure 1: The effect of income change

Figure 1: The effect of income change

Figure 1 shows the demand curves \(d_{1}\), \(d_{2}\), and \(d_{3}\) and equilibria at points \(a\), \(b\), and \(c\), for the three levels of income. As income changes, data on price \(P\) and quantity \(Q\) will be observed around intersections of supply and demand. The random error terms \(\epsilon_{s}\) and \(e_{d}\) cause small shifts in the supply and demand curves, creating equilibrium observations on price and quantity that are scattered about the intersections at point \(a\), \(b\), and \(c\). The problem is that as income changes, the demand curve shifts but the supply curve remains unchanged, resulting in observations along the supply curve. There are no data values falling along any of the demand curves, and there is no way to estimate their slope! Thus, any one of the demand curves passing through the equilibrium points could be correct. Given the data, there is no way to distinguish the true demand curve from the rest.

The problem lies in the model that we are using. There lacks a variable in the supply function that will shift it relative to the demand curve. If we were to to add a variable to the supply curve, then each time that variable changed, the supply curve would shift, and the demand curve would stay fixed. The resulting shifting of the supply curve to a fixed demand curve would create equilibrium observations along the demand curve, making it possible to estimate the slope of the demand curve and making it possible to estimate the slope of the demand curve and the effect of income on demand.

2.2 Necessary Condition for Identification

In a system of \(M\) simultaneous equations, which jointly determine the values of \(M\) endogenous variables, at least \(M-1\) variables must be omitted from an equation for estimation of its parameters to be possible. When the esetimation of an equation’s parameters is possible, then the equation is said to be identified, and its parameters can be consistently estimated. Hoever, if fewer than \(M-1\) variables are omitted from an equation, then it is said to be unidentified, and its parameters cannot be consistently estimated.

3. The Reduced-Form Equation

Our supply and demand functions that were first introduced in Section 1 can be solved to express the endogenous variables \(P\) and \(Q\) as a function of the exogenous variables \(I\). This transformation of the model is called the reduced form of the structural equation system. First, let’s revisit our supply and demand models

\[ \begin{aligned} \text{Supply:} & Q = \beta_{1}P + \epsilon_{s} \\ \text{Demand:} & Q = \alpha_{1}P + \alpha_{2}I + \epsilon_{d} \end{aligned} \]

To solve for our endogenous variables \(P\) and \(Q\), we set the supply and demand function equal to each other to get

\[ \begin{aligned} \beta_{1}P + \epsilon_{s} = \alpha_{1}P + \alpha_{2}I + \epsilon_{d} \end{aligned} \]

Then we can solve for \(P\) using simple algebra

\[ \begin{aligned} & \beta_{1}P - \alpha_{1}P = \alpha_{2}I + \epsilon_{d} - \epsilon_{s} \\ & P = \frac{\alpha_{2}I + \epsilon_{d} - \epsilon_{s}}{(\beta_{1} - \alpha_{1})} \\ & = \bigg[\frac{\alpha_{2}}{(\beta_{1} - \alpha_{1})}\bigg]I + \frac{\epsilon_{d} - \epsilon_{s}}{(\beta{1} - \alpha_{1})} \\ & = \pi_{1}X + v_{1} \end{aligned} \]

Then solve for \(Q\) by plugging in \(P\) to the supply equation

\[ \begin{aligned} Q = \beta_{1}P + \epsilon_{s} \\\ & = \beta_{1}\bigg[\frac{\alpha_{2}}{(\beta_{1} - \alpha_{1})}I + \frac{\epsilon_{d} - \epsilon_{s}}{(\beta_{1} - \alpha_{1})} \bigg] + \epsilon_{s} \\\ & = \frac{\beta_{1}\alpha_{1}}{(\beta_{1} - \alpha_{1})}I + \frac{\beta_{1}\epsilon_{d}-\epsilon_{s}}{(\beta_{1} - \alpha_{1})} \\\ & = \pi_{2}X + v_{2} \end{aligned} \]

The parameters \(\pi_{1}\) and \(\pi_{2}\) are called reduced-form parameters and the error terms \(v_1\) and \(v_2\) are called reduced-form errors.

The reduced-form equations can be estimated consistently by least square. The explanatory variable \(X\) is determined outside the system. It is uncorrelated with \(v_1\) and \(v_2\), which both have the usual properties of zero mean, constant variance, and zero covariance.

The reduced-form equations are important for economic analysis. These equations equate the equilibrium values of the endogenous variables to the exogenous variables. Thus, if there is an increase in income \(I\), \(\pi_{1}\) is the expected increase in price \(P\) after market adjustments lead to a new equilibrium for \(P\) and \(Q\). The estimated reduced-form equations can be used to predict the values of equilibrium price and quantity for different levels of income.

4. Two-Stage Least Squares Estimation

Two-staged least squares estimation is the most widely-used method for estimating parameters for identified structural equations. Recall that we cannot apply least squares to estimate \(\beta_{1}\) because the endogenous variable \(P\) on the right-hand side of the equation is correlated with its error term \(\epsilon_{s}\).

The variable \(P\) is composed of two components: a systematic part \(E[P]\) (its expected value) and a random component \(v_{1}\), which is the reduced-form error. Thus, \(P\) can be expressed as

\[ \begin{aligned} P = E[P] + v_{1} = \pi_{1}X + v_{1} \end{aligned} \]

\(v_{1}\) in the above equation is what causes problems for \(P\). It is \(v_{1}\) that causes \(P\) to be correlated with the error term \(\epsilon_{s}\). However, suppose we knew the value of \(\pi_{1}\). Then, we could replace \(P\) in our original supply equation with

\[ \begin{aligned} Q = \beta_{1}\big[E(P) + v_{1}\big] + \epsilon_{s} \\ = \beta_{1}E(P) + \big(\beta_{1}v_{1} + \epsilon_{s}\big) \end{aligned} \]

Unfortunately, we cannot use \(E(P)=\pi_{1}X\) in place of \(P\) because we do not know the value of \(\pi_{1}\). However, we can estimate \(\pi_{1}\) using its estimate \(\hat{\pi}_{1}\) from the reduced-form equation for \(P\). A consistent estimator for \(E(P)\) is

\[ \begin{aligned} \hat{P} = \hat{\pi}_{1}X \end{aligned} \]

Using \(\hat{P}\) in lieu of \(E(P)\), we can obtain

\[ \begin{aligned} Q = \beta_{1}\hat{P} + \epsilon_{s} \end{aligned} \]

To summarize the procedure:

  1. Least squares estimation of the reduced-form equation for \(P\) and the calculation of its predicted value \(\hat{P}\).
  2. Least squares estimation of the structural equation in which the right-hand side of the endogenous variable \(P\) is replaced by its estimator \(\hat{P}\).

4.1 The General Two-Stage Least Squares Estimation Procedure

In a system of \(M\) simultaneous equations, let \(y_{1},y_{2}, \ldots, y_{M}\) denote the endogenous variables, and let there be \(K\) exogenosu variables denoted by \(x_{1}, x_{2}, \ldots , x_{K}\). Let us suppose the first structural equation within this system is

\[ \begin{aligned} y_{1} = \alpha_{2}y_{2} + \alpha_{3}y_{3} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon_{1} \end{aligned} \]

If this equation is identified, then its parameters can be estimated in two steps

  1. Estimate the parameters of the reduced-form equations by least squares

\[ \begin{aligned} y_{2} = \pi_{12}x_{1} + \pi_{22}x_{2} + \ldots + \pi_{K2}x_{K} + v_{2} \\ y_{3} = \pi_{13}x_{1} + \pi_{23}x_{2} + \ldots + \pi_{K3}x_{K} + v_{3} \end{aligned} \]

and obtain the predicted values

\[ \begin{aligned} \hat{y}_{2} = \hat{\pi}_{12}x_{1} + \hat{\pi}_{22}x_{2} + \ldots + \hat{\pi}_{K2}x_{K} \\ \hat{y}_{3} = \hat{\pi}_{13}x_{1} + \hat{\pi}_{23}x_{2} + \ldots + \hat{\pi}_{K3}x_{K} \end{aligned} \]

  1. Replace the endogenous variables \(y_{2}\) and \(y_{3}\) on the right-hand side of the structural equation by their predicted values

\[ \begin{aligned} y_{1} = \alpha_{2}\hat{y}_{2} + \alpha_{3}\hat{y}_{3} + \beta_{1}x_{1} + \beta_{2}x_{2} + \epsilon_{1}^{*} \end{aligned} \]

5. An Example

Let’s try to predict the supply and demand of truffles (mmmm…truffles). Our data looks like this

p q ps di pf
29.64 19.89 19.97 2.103 10.52
40.23 13.04 18.04 2.043 19.67
34.71 19.61 22.36 1.870 13.74
41.43 17.13 20.87 1.525 17.95
53.37 22.55 19.79 2.709 13.71
38.52 6.37 15.98 2.489 24.95

It consists of 5 variables and 30 observations. The variable abbrevations stand for

var meaning
p Price
q Quantity traded
ps Price of substitute for real truffles
di Per capita onthly disposable income of invdividuals
pf Price of factor production

Our supply and demand equations are

\[ \begin{aligned} \text{Supply:} & Q_{i} = \beta_{1} + \beta_{2}P_{i} + \beta_{3}PF_{i} + \epsilon_{si} \\ \text{Demand:} & Q_{i} = \alpha_{1} + \alpha_{2}P_{i} + \alpha_{3}PS_{i} + \alpha_{4}DI_{i} + \epsilon_{di} \end{aligned} \]

5.1 Identification

Before we can proceed to our reduced-form equations, recall the concept of identification. In a system of \(M\) equations, at least \(M-1\) variables must be excluded from one of the equations. In this case, we have \(M=2\) equations and \(2-1=1\) variable has been omitted from our supply equation, so the system is identified.

5.2 Reduced-Form Equations

The reduced-form equations express the endogenous variables as a function of the exogenous variables. In this case, our endogenous variables are \(P\) and \(Q\), and our exogenous variables are \(PS\), \(DI\), and \(PF\). Thus, our reduced-form equations are

\[ \begin{aligned} Q_{i} = \pi_{11} + \pi_{12}PS_{i} + \pi_{13}DI_{i} + \pi_{14}PF_{i} + v_{i1} \\ P_{i} = \pi_{11} + \pi_{22}PS_{i} + \pi_{23}DI_{i} + \pi_{24}PF_{i} + v_{i2} \end{aligned} \]

5.2.1 Model Estimation: The Manual Way

There’s a “long way” of estimating our model and a very short way. Let’s start with the long way, as it helps to understand the procedures we’ve discussed regarding two-staged least squares.

# Step 1. Estimate reduced-form parameters
q.lm <- lm(q ~ ps + di + pf, data = truffles)
p.lm <- lm(p ~ ps + di + pf, data = truffles)

# Step 2. Use the predicted value of P and plug into the right-hand side of the structural equations. 
truffles$phat <- p.lm$fitted.values
demand.lm <- lm(q ~ phat + ps + di, data = truffles)
supply.lm <- lm(q ~ phat + pf, data = truffles)

If you’re curious about the output of our reduced-form model estimates, for \(\hat{Q}\)

## 
## Call:
## lm(formula = q ~ ps + di + pf, data = truffles)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1814 -1.1390  0.2765  1.4595  4.4318 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.8951     3.2434   2.434 0.022099 *  
## ps            0.6564     0.1425   4.605 9.53e-05 ***
## di            2.1672     0.7005   3.094 0.004681 ** 
## pf           -0.5070     0.1213  -4.181 0.000291 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.68 on 26 degrees of freedom
## Multiple R-squared:  0.6974, Adjusted R-squared:  0.6625 
## F-statistic: 19.97 on 3 and 26 DF,  p-value: 6.332e-07

and for \(\hat{P}\)

## 
## Call:
## lm(formula = p ~ ps + di + pf, data = truffles)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.4825  -3.5927   0.2801   4.5326  12.9210 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -32.5124     7.9842  -4.072 0.000387 ***
## ps            1.7081     0.3509   4.868 4.76e-05 ***
## di            7.6025     1.7243   4.409 0.000160 ***
## pf            1.3539     0.2985   4.536 0.000115 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.597 on 26 degrees of freedom
## Multiple R-squared:  0.8887, Adjusted R-squared:  0.8758 
## F-statistic: 69.19 on 3 and 26 DF,  p-value: 1.597e-12

Our equation for \(\hat{P}\) is

\[ \begin{aligned} \hat{P} = -32.51 + 1.71PS + 7.6DI + 1.36PF \end{aligned} \]

We’ve already plugged \(\hat{P}\) into our structural equations, so let’s see what our estimated supply and demand functions are. Our estimated demand function is

## 
## Call:
## lm(formula = q ~ phat + ps + di, data = truffles)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1814 -1.1390  0.2765  1.4595  4.4318 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.27947    3.01383  -1.420 0.167505    
## phat        -0.37446    0.08956  -4.181 0.000291 ***
## ps           1.29603    0.19309   6.712 4.03e-07 ***
## di           5.01398    1.24141   4.039 0.000422 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.68 on 26 degrees of freedom
## Multiple R-squared:  0.6974, Adjusted R-squared:  0.6625 
## F-statistic: 19.97 on 3 and 26 DF,  p-value: 6.332e-07

Our estimated supply equation is

## 
## Call:
## lm(formula = q ~ phat + pf, data = truffles)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0732 -0.9754  0.5228  1.8115  3.8940 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 20.03280    2.16570    9.25 7.36e-10 ***
## phat         0.33798    0.04412    7.66 3.07e-08 ***
## pf          -1.00091    0.14613   -6.85 2.33e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.652 on 27 degrees of freedom
## Multiple R-squared:  0.6924, Adjusted R-squared:  0.6696 
## F-statistic: 30.38 on 2 and 27 DF,  p-value: 1.226e-07

Thus, our estimated supply and demand equations are

\[ \begin{aligned} \widehat{\text{Supply}} & = 20.03 + 0.34\hat{P} - 1.00PF \\ \widehat{\text{Demand}} & = -4.28 - 0.37\hat{P} + 1.30PS + 5.01DI \end{aligned} \]

5.2.2 Model Estimation: The Easy Way

Luckily, there’s a two-stage least squares package available on CRAN that makes estimating our structural equations a breeze. It requires installing the sem package and using the \(tsls()\) function. You pass a supply and demand equation along with the exogenous variables and it automatically outputs the estimated equations for your structural models.

library(sem)
demand <- tsls(q ~ p + ps + di, ~ ps + di + pf, data = truffles)   
supply <- tsls(q ~ p + pf, ~ ps + di + pf, data = truffles)

As you can see, the outputs are the same for the estimated demand equation

summary(demand)
## 
##  2SLS Estimates
## 
## Model Formula: q ~ p + ps + di
## 
## Instruments: ~ps + di + pf
## 
## Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -14.8500  -2.5390   0.9025   0.0000   3.1160   7.5830 
## 
##               Estimate Std. Error  t value  Pr(>|t|)   
## (Intercept) -4.2794706  5.5438844 -0.77193 0.4471180   
## p           -0.3744591  0.1647517 -2.27287 0.0315350 * 
## ps           1.2960332  0.3551932  3.64881 0.0011601 **
## di           5.0139771  2.2835559  2.19569 0.0372352 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.92996 on 26 degrees of freedom

and our estimated supply equation

summary(supply)
## 
##  2SLS Estimates
## 
## Model Formula: q ~ p + pf
## 
## Instruments: ~ps + di + pf
## 
## Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.7830 -0.8530  0.2270  0.0000  0.7578  3.3480 
## 
##                Estimate  Std. Error   t value   Pr(>|t|)    
## (Intercept) 20.03280215  1.22311480  16.37851 1.5543e-15 ***
## p            0.33798157  0.02491956  13.56290 1.4344e-13 ***
## pf          -1.00090937  0.08252794 -12.12813 1.9456e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.4975853 on 27 degrees of freedom

Conclusion

Unlike regular regression models, simultaneous equation models have two dependent variables. They are a great tool for estimating supply and demand functions because using ordinary least squares produces errors, such as biased estimators. The most common technique of solving for simultaneous equation models is a technique called two-staged least squares. This method transforms a set of simultaneous equations into functional forms that use the endogenous variables as a function of the system’s exogenous variables. You can then use least squares to get the estimators for the reduced-form equations. The final step is to plug one of the fitted values into the right-hand side of one of your structural equations to get the correct estimates of your equations.

Appendix: An Algebraic Explanation of the Failure of Least Squares

Consider the following supply and demand function

\[ \begin{aligned} \text{Supply:} & Q = \beta_{1}P + \epsilon_{s} \\ \text{Demand:} & Q = \alpha_{1}P + \alpha_{2}I + \epsilon_{d} \end{aligned} \]

To explain why least squares fails, let’s first obtain the covariance between \(P\) and \(\epsilon_{s}\).

\[ \begin{aligned} &\text{cov}(P, \epsilon_{s}) = E[P-E(P)][\epsilon_{s} - E(\epsilon_{s})] && \\ & = E(P\epsilon_{s}) && (\text{Since } E(\epsilon_{s}) = 0) \\ & = E\big[\pi_{1}X + v_{1}\big]\epsilon_{s} && (\text{Substitute for } P) \\ & = E\Bigg[\frac{\epsilon_{d} - \epsilon_{s}}{(\beta_{1} - \alpha_{1})}\Bigg]\epsilon_{s} && (\text{Since } \pi_{1} \text{is exogenous}) \\ & = \frac{-E(\epsilon_{s}^{2})}{\beta_{1} - \alpha_{1}} \\ & = \frac{-\sigma_{s}^{2}}{\beta_{1} - \alpha_{1}} < 0 \end{aligned} \]

So, what effect does the negative covariance have on the least squares estimator? The least squares estimator of the supply equation, without an intercept, is

\[ \begin{aligned} b_{1} = \frac{\sum{P_{i}Q_{i}}}{\sum{P_{i}^{2}}} \end{aligned} \]

Plug in \(Q\) from the supply equation and simplify

\[ \begin{aligned} b_1 = \frac{\sum{P_{i}(\beta_{1}P_{i}+\epsilon_{si})}}{\sum{P_{i}^{2}}} = \beta_{1}+\sum{\Bigg(\frac{P_{i}}{\sum{P_{i}^{2}}}\Bigg)}\epsilon_{si} = \beta_{1} + \sum{h_{i}\epsilon_{si}} \end{aligned} \]

where \(h_{i} = P_{i}/\sum{P_{i}^{2}}\). The least squares estimator is biased because \(\epsilon_{s}\) and \(P\) are correlated, implying \(E(h_{i}\epsilon_{si})≠0\).

In large samples there is a still a similar failure. Multiply through the supply equation by price \(P\), take expectations, and solve.

\[ \begin{aligned} PQ = \beta_{1}P^{2} + P\epsilon_{s} \\ E(PQ)=\beta_{1}E(P^{2}) + E(P\epsilon_{s}) \\ \beta_{1} = \frac{E(PQ)}{E(P^{2})} - \frac{E(P\epsilon_{s})}{E(P^{2})} \end{aligned} \]

In large samples where \(N \to \infty\), sample analogs of expectations converge to the expectations. That is,

\[ \begin{aligned} \sum{Q_{i}P_{i}/N} \to E(PQ), && \sum{P_{i}^2/N \to E\big(P^{2}\big)} \end{aligned} \]

Because the covariance between \(P\) and \(\epsilon_{s}\) is negative,

\[ \begin{aligned} b_1=\frac{\sum{Q_{i}P_{i}/N}}{\sum{P_{i}^{2}/N}} \to \frac{E(PQ)}{E\big(P^{2}\big)}=\beta_{1} + \frac{E(P\epsilon_{s})}{E\big(P^{2}\big)}=\beta_{1}-\frac{\sigma_{s}^{2}/(\beta_{1}-\alpha_{1})}{E\big(P^{2}\big)} < \beta_{1} \end{aligned} \]

In large samples, the least squares estimator of the slope of the supply equation converges to a value less than \(\beta_{1}\).