This week, we’re going think about how unobservables might influence empirical results.
You’ll be assigned to groups where you’ll work to interpret the results below. Professor Rao will visit each group to clarify any questions. At the end, we’ll reconvene and Professor Rao will go over some of the key issues.
You’ll be using the OVB sign theorem a lot in this exercise. As a reminder, the OVB sign theorem is given below.
OVB sign theorem: Given a causal system where X affects Y, and Z affects both X and Y, the sign of the omitted variable bias will be the product of the signs of the correlations between X and Z and Y and Z. That is, letting \(sign(x) = +1\) if \(x > 0\) and \(sign(x) = -1\) if \(x < 0\), and letting \(\hat\beta\) be the estimate of the true parameter \(\beta\),
\[sign(E[\hat{\beta} - \beta]) = sign(corr(X,Z))*sign(corr(Y,Z)).\]
Economists think we have the answer: price falls, quantity demanded increases (i.e., the Law of Demand). We typically model this by saying that quantity demanded is an decreasing function of the price, something like \(Q_d = \alpha + \beta P + \epsilon\) with \(\beta < 0\). A similar model of quantity supplied would have \(\beta > 0\), and in equilibrium \(Q_d = Q_s\).
But is that actually true?
You say that, but… 🤔 🤔 🤔
You work for a major retailer who operates in 5 cities in the US. Your task is to figure out the profit-maximizing price to charge for a store-brand product. First, though, you need to understand the relationship between price and quantity. You take some sales data from two years (properly randomly sampled and so on) and plot the relationship.
That’s odd… It seems like there’s a positive relationship between price and quantity sold? To investigate further, you run a regression like the one above.
summary(lm(quantities ~ prices, data = dfrm))
##
## Call:
## lm(formula = quantities ~ prices, data = dfrm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1399.46 -182.88 -19.88 177.46 1677.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1389.023 53.567 25.93 <2e-16 ***
## prices 100.010 7.515 13.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 350.9 on 998 degrees of freedom
## Multiple R-squared: 0.1507, Adjusted R-squared: 0.1499
## F-statistic: 177.1 on 1 and 998 DF, p-value: < 2.2e-16
Interesting… it looks like the coefficient on prices is positive, large in magnitude, and statistically significant. If this is right then you should increase prices to increase your sales! You know your boss will be skeptical of this though, so you dig in further to make sure you’re not missing something.
First, let’s look at some summary stats on this data:
summary(dfrm)
## prices quantities year cities
## Min. :4.290 Min. : 954.7 Min. :2017 Length:1000
## 1st Qu.:5.630 1st Qu.:1831.4 1st Qu.:2017 Class :character
## Median :7.015 Median :2024.2 Median :2018 Mode :character
## Mean :6.974 Mean :2086.5 Mean :2018
## 3rd Qu.:8.293 3rd Qu.:2286.7 3rd Qu.:2018
## Max. :9.650 Max. :4002.3 Max. :2018
# Looks like cities is a character vector; what cities do we have here?
unique(dfrm$cities)
## [1] "Boulder, CO" "South Bend, IN" "San Francisco, CA" "Waterbury, VT"
## [5] "Honolulu, HI"
# Ok, good to know!
What does the regression table above say? Interpret the estimates of \(\alpha\) and \(\beta\).
Does this support the Law of Demand? Why or why not?
Explain your result in words. What might be driving the relationship you observe?
year in the regression. Interpret the results below. What does the coefficient on year mean? Are the results consistent with year being an omitted variable?summary(lm(quantities ~ prices + year, data = dfrm))
##
## Call:
## lm(formula = quantities ~ prices + year, data = dfrm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1381.68 -181.83 -18.84 186.33 1659.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -70809.738 44733.396 -1.583 0.114
## prices 100.050 7.509 13.324 <2e-16 ***
## year 35.786 22.173 1.614 0.107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 350.6 on 997 degrees of freedom
## Multiple R-squared: 0.1529, Adjusted R-squared: 0.1512
## F-statistic: 90 on 2 and 997 DF, p-value: < 2.2e-16
cities as a variable in the regression. It’s a character vector, but R is clever and converts the character vector in a set of indicator variables for each city, dropping one to avoid the dummy variable trap.summary(lm(quantities ~ prices + cities, data = dfrm))
##
## Call:
## lm(formula = quantities ~ prices + cities, data = dfrm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1307.4 -176.5 -2.5 164.4 1740.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2698.20 192.99 13.981 < 2e-16 ***
## prices -83.80 27.40 -3.059 0.00228 **
## citiesHonolulu, HI 348.00 64.57 5.390 8.81e-08 ***
## citiesSan Francisco, CA 181.40 43.46 4.174 3.25e-05 ***
## citiesSouth Bend, IN -447.58 65.56 -6.827 1.51e-11 ***
## citiesWaterbury, VT -218.69 44.02 -4.968 7.96e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 342.8 on 994 degrees of freedom
## Multiple R-squared: 0.1925, Adjusted R-squared: 0.1884
## F-statistic: 47.38 on 5 and 994 DF, p-value: < 2.2e-16
Interpret each of the coefficient estimates in the results above, including the intercept. Are these results consistent with the Law of Demand?
Are the results above consistent with cities being an omitted variable? If so, apply the OVB sign theorem to determine how the unobservable captured by cities must be correlated with prices and quantities. Give two examples of an unobservable like this.