This week, we’re going think about how unobservables might influence empirical results.

Interpretation and discussion

You’ll be assigned to groups where you’ll work to interpret the results below. Professor Rao will visit each group to clarify any questions. At the end, we’ll reconvene and Professor Rao will go over some of the key issues.

You’ll be using the OVB sign theorem a lot in this exercise. As a reminder, the OVB sign theorem is given below.

OVB sign theorem: Given a causal system where X affects Y, and Z affects both X and Y, the sign of the omitted variable bias will be the product of the signs of the correlations between X and Z and Y and Z. That is, letting \(sign(x) = +1\) if \(x > 0\) and \(sign(x) = -1\) if \(x < 0\), and letting \(\hat\beta\) be the estimate of the true parameter \(\beta\),

\[sign(E[\hat{\beta} - \beta]) = sign(corr(X,Z))*sign(corr(Y,Z)).\]

What happens when the price of a good falls?

Economists think we have the answer: price falls, quantity demanded increases (i.e., the Law of Demand). We typically model this by saying that quantity demanded is an decreasing function of the price, something like \(Q_d = \alpha + \beta P + \epsilon\) with \(\beta < 0\). A similar model of quantity supplied would have \(\beta > 0\), and in equilibrium \(Q_d = Q_s\).

But is that actually true?

You say that, but… 🤔 🤔 🤔

You say that, but… 🤔 🤔 🤔

You work for a major retailer who operates in 5 cities in the US. Your task is to figure out the profit-maximizing price to charge for a store-brand product. First, though, you need to understand the relationship between price and quantity. You take some sales data from two years (properly randomly sampled and so on) and plot the relationship.

That’s odd… It seems like there’s a positive relationship between price and quantity sold? To investigate further, you run a regression like the one above.

summary(lm(quantities ~ prices, data = dfrm))
## 
## Call:
## lm(formula = quantities ~ prices, data = dfrm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1399.46  -182.88   -19.88   177.46  1677.17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1389.023     53.567   25.93   <2e-16 ***
## prices       100.010      7.515   13.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 350.9 on 998 degrees of freedom
## Multiple R-squared:  0.1507, Adjusted R-squared:  0.1499 
## F-statistic: 177.1 on 1 and 998 DF,  p-value: < 2.2e-16

Interesting… it looks like the coefficient on prices is positive, large in magnitude, and statistically significant. If this is right then you should increase prices to increase your sales! You know your boss will be skeptical of this though, so you dig in further to make sure you’re not missing something.

1. Digging in

First, let’s look at some summary stats on this data:

summary(dfrm)
##      prices        quantities          year         cities         
##  Min.   :4.290   Min.   : 954.7   Min.   :2017   Length:1000       
##  1st Qu.:5.630   1st Qu.:1831.4   1st Qu.:2017   Class :character  
##  Median :7.015   Median :2024.2   Median :2018   Mode  :character  
##  Mean   :6.974   Mean   :2086.5   Mean   :2018                     
##  3rd Qu.:8.293   3rd Qu.:2286.7   3rd Qu.:2018                     
##  Max.   :9.650   Max.   :4002.3   Max.   :2018
# Looks like cities is a character vector; what cities do we have here?
unique(dfrm$cities)
## [1] "Boulder, CO"       "South Bend, IN"    "San Francisco, CA" "Waterbury, VT"    
## [5] "Honolulu, HI"
# Ok, good to know!
  1. What does the regression table above say? Interpret the estimates of \(\alpha\) and \(\beta\).

  2. Does this support the Law of Demand? Why or why not?

  3. Explain your result in words. What might be driving the relationship you observe?

2. Thinking about unobservables
  1. You are concerned that there might be some time trends confounding your results, so you include year in the regression. Interpret the results below. What does the coefficient on year mean? Are the results consistent with year being an omitted variable?
summary(lm(quantities ~ prices + year, data = dfrm))
## 
## Call:
## lm(formula = quantities ~ prices + year, data = dfrm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1381.68  -181.83   -18.84   186.33  1659.19 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -70809.738  44733.396  -1.583    0.114    
## prices         100.050      7.509  13.324   <2e-16 ***
## year            35.786     22.173   1.614    0.107    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 350.6 on 997 degrees of freedom
## Multiple R-squared:  0.1529, Adjusted R-squared:  0.1512 
## F-statistic:    90 on 2 and 997 DF,  p-value: < 2.2e-16
  1. As a final check, you include cities as a variable in the regression. It’s a character vector, but R is clever and converts the character vector in a set of indicator variables for each city, dropping one to avoid the dummy variable trap.
summary(lm(quantities ~ prices + cities, data = dfrm))
## 
## Call:
## lm(formula = quantities ~ prices + cities, data = dfrm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1307.4  -176.5    -2.5   164.4  1740.4 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              2698.20     192.99  13.981  < 2e-16 ***
## prices                    -83.80      27.40  -3.059  0.00228 ** 
## citiesHonolulu, HI        348.00      64.57   5.390 8.81e-08 ***
## citiesSan Francisco, CA   181.40      43.46   4.174 3.25e-05 ***
## citiesSouth Bend, IN     -447.58      65.56  -6.827 1.51e-11 ***
## citiesWaterbury, VT      -218.69      44.02  -4.968 7.96e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 342.8 on 994 degrees of freedom
## Multiple R-squared:  0.1925, Adjusted R-squared:  0.1884 
## F-statistic: 47.38 on 5 and 994 DF,  p-value: < 2.2e-16
  1. Interpret each of the coefficient estimates in the results above, including the intercept. Are these results consistent with the Law of Demand?

  2. Are the results above consistent with cities being an omitted variable? If so, apply the OVB sign theorem to determine how the unobservable captured by cities must be correlated with prices and quantities. Give two examples of an unobservable like this.

3. Your questions/thoughts
  1. What questions are you left with after this exercise?