This is the second challenge question for the course. Remember that it’s meant to be challenging and stretch your skills, but you can always ask for help or collaborate with your classmate. There is an article at the bottom which will help provide broader context for the questions we’re studying here.
Air quality is usually regulated (when it is regulated) based on readings from monitors which track ambient air pollution concentrations. Suppose air pollution policies are set at a national level, but sub-national units (e.g. states) are charged with monitoring and compliance. The sub-national regulators have an incentive to avoid placing monitors in the most polluted areas because “discovering” a hotspot could lead to penalties from the national regulator.
Economists call this kind of situation a principal-agent problem: a principal (in this case, the national regulator) acts through an agent (in this case, the sub-national regulator) to do something, but the agent has an interest in subverting that something. Principal-agent problems are common. Examples include
You work for an NGO in a particular country, which has conducted a very extensive random sample of air quality in all 5 states for a year. You are trying to assess whether the principal-agent problem is an issue here, and if so how bad it is.
The dataset NGO_measurements.csv is an annual average of air quality measurements at 100,000 randomly-sampled sites across 5 states (20,000 per state). Air quality is a composite measure of concentrations of a number of air pollutants, including volatile organic compounds and particulate matter. Higher concentrations are associated with worse air quality.
index enumerates each site in a state, state lists each state, and AQI is the air quality measurement. The table official_measurements.csv shows the annual average AQI in each state calculated from the official monitor site measurements. There are 100 official monitor sites throughout each state.