This week, we’re going to continue thinking about non-experimental study design and about interpreting results of statistical analysis.

Discuss the following questions as a group. Professor Rao will visit each group to help clarify any questions you may have. Nominate a member of your group to communicate your thoughts/conclusions to the class as a whole when we reconvene.

Modeling the probability of air quality monitors being in a specific site

Grainger and Schreiber build a model of the probability that an air quality monitor is placed in a specific site. A “site” in their study is a 0.05x0.05 degree grid cell—basically, they cut the country up into a bunch of tiny boxes and model the probability an air quality monitor gets put into any specific box.

They model this probability as a function of many factors: air pollution levels, the demographics of the box, home prices in the box, whether the county has been designated as “non-attainment” (has excess measured air pollution levels, in violation of the relevant regulations), and idiosyncratic spatial and time-varying effects (the “fixed effects”).

A note on terminology: “indicator” variables are discrete variables which are either 0 or 1. For example, “Attainment” is an indicator variable which is 1 if the county is “in attainment” (not in violation) or 0 if the county is “in non-attainment” (in violation)

1. What data would you use to determine whether air quality monitor siting decisions are systematically biased?
  1. Grainger and Schreiber use remote sensing (satellite imaging) data to determine the prevalence of pollution at sites. What do you think are the advantages or disadvantages of using this kind of data?
  2. Suppose you couldn’t use remote sensing data. How else might you get representative samples of air pollution prevalence at sites? What are the advantages and disadvantages of your method?
2. Modeling the probability of putting air quality monitors in a specific site
  1. Suppose you had (reasonably accurate) measurements of the amount of a pollutant at each site, measured in units of parts per million (ppm). What are the advantages and disadvantages of using the raw measurements? What else might you use instead?
  2. Suppose you were in charge of siting air quality monitors, but due to limited funds you couldn’t place a monitor at every site. How would you decide which sites should get monitors and which sites shouldn’t?
3. What do Grainger and Schreiber find?

For these questions, refer to figure 1 of the paper.

  1. What does the y-axis measure?
  2. Some of the 95% confidence intervals are near zero or include zero. How does this affect your interpretation of the estimates?