Simulation and inference

This is the second challenge question for the course. Remember that it’s meant to be challenging and stretch your skills, but you can always ask for help or collaborate with your classmate. There is an article at the bottom which will help provide broader context for the questions we’re studying here.

How are air quality monitors sited?

Air quality is usually regulated (when it is regulated) based on readings from monitors which track ambient air pollution concentrations. Suppose air pollution policies are set at a national level, but sub-national units (e.g. states) are charged with monitoring and compliance. The sub-national regulators have an incentive to avoid placing monitors in the most polluted areas because “discovering” a hotspot could lead to penalties from the national regulator.

Economists call this kind of situation a principal-agent problem: a principal (in this case, the national regulator) acts through an agent (in this case, the sub-national regulator) to do something, but the agent has an interest in subverting that something. Principal-agent problems are common. Examples include

financial management (the client is the principal who wants returns, the manager is the agent who wants fee revenues),
home buying (the home buyer is the principal who wants the best home for the cheapest price, the realtor is the agent who wants transaction fees), and
wage employment (the firm owner is the principal who wants workers to exert high levels of effort, the workers are the agents who want to exert as little effort as possible).

You work for an NGO in a particular country, which has conducted a very extensive random sample of air quality in all 5 states for a year. You are trying to assess whether the principal-agent problem is an issue here, and if so how bad it is.

The data

The dataset NGO_measurements.csv is an annual average of air quality measurements at 100,000 randomly-sampled sites across 5 states (20,000 per state). Air quality is a composite measure of concentrations of a number of air pollutants, including volatile organic compounds and particulate matter. Higher concentrations are associated with worse air quality.

index enumerates each site in a state, state lists each state, and AQI is the air quality measurement. The table official_measurements.csv shows the annual average AQI in each state calculated from the official monitor site measurements. There are 100 official monitor sites throughout each state.

Questions

1. Simulate the sampling distribution of the statewide annual average AQI calculated from 100 randomly-placed monitor sites for each state.

2. Plot the distribution of AQI for each state as a histogram, and show all 5 states in a combined plot. Show the official monitor sites as vertical lines on the plots.

3. Use the simulated distributions to calculate the probability of observing something as extreme as the official measurements.

4. Read the paper “Discrimination in Ambient Air Pollution Monitoring?” (skim the parts you don’t understand; focus your reading effort on the introduction, background, and discussion, but do read the whole paper).

a. Who is the principal and who is the agent in this study?

b. How did the researchers measure the true distribution of air pollution and detect strategic monitor siting behavior? How does this compare to the scenario you analyzed above? Which approach do you think is cheaper to implement?

c. In previous studies the researchers mention, what behaviors by local regulators cause bias in measured pollution levels? Do these behaviors cause downward or upward bias?

d. How do the researchers argue income and race affect monitor siting decisions in this study? How do these decisions then affect the amount of air pollution different groups in society are exposed to?