(Based on DSS Materials and on Chattopadhyay and Esther Duflo. 2004. ``Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)
We will estimate the average causal effect of having a female politician on two policy outcomes. For this purpose, we will analyze data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. The Table below shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.
| Variable | Description |
|---|---|
| village | village identifier (“Gram Panchayat number _ village number”) |
| female | whether the village was assigned a female politician: 1=yes, 0=no |
| water | number of new (or repaired) drinking water facilities in the village
since random assignment |
| irrigation | number of new (or repaired) irrigation facilities in the
village since random assignment |
In this problem set, we will practice loading, making sense of data, and understanding the basics of causal inference. We will also learn how to use R Markdown.
Answer: We can compute the average number of repaired or new drinking water facilities in villages that have a female politician and compare them to the average number of repaired or new drinking water facilities in villages that have a male politician. The name of the estimator is the mean.
## Coding answers here
mean(india$water[india$female==1])
## [1] 23.99074
Answer: The average number of new or repaired drinking water facilities in a village with a female politician is 24.0 facilities.
(Hint: we use [] to subset a variable; inside the square
brackets, we specify the selection criterion. For example, we can use
the relational operator == to set a logical test; only the
observations for which the logical test is true will be extracted.)
mean(india$water[india$female==0])
## [1] 14.73832
Answer: The average number of new or repaired drinking water facilities in a village with a male politician is 14.7 facilities.
mean(india$water[india$female==1])-mean(india$water[india$female==0])
## [1] 9.252423
Answer: The estimated average causal effect of having a female politician on the number of new (or repaired) drinking water facilities is 9.3 new or repaired water drinking facilities.
ggplot(data = india, aes(x = water)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Answers: a) This graph doesn’t look bell-shaped distributed. b) Approximately 0 villages have had 250 water facilities built since the beginning of the experiment.
(Hint: the histogram of a variable is the visual representation of
its distribution. The function in R to create a histogram is . The only
required arguments are the variable and the
dataset.)
ggplot(data = india, aes(x = water, y = irrigation)) +
geom_point() +
geom_smooth(formula = 'y ~ x', method = 'lm', se = F)
Answers: Positive. No.
(Hint: a scatter plot is the graphical representation of the relationship between two variables. The function in R to create a scatter plot with the fitting line is:
ggplot(data = dataset, aes(x = var_x, y = var_y)) +
geom_point() +
geom_smooth(formula = 'y ~ x', method = 'lm', se = F)
It requires three arguments: (1) the name you saved the dataset; (2) the code identifying the variable to be plotted along the x-axis, and (3) the code identifying the variable to be plotted along the y-axis.)
cor(india$water,india$irrigation)
## [1] 0.4073307
Answers: No, it makes sense that politicians who prioritize building or repairing water systems would also prioritize building or repairing irrigation systems because they ultimately prioritize building public infrastructure. I am a little surprised by the absolute value of this correlation because I would think that it’s closer to 1 given the fact that they’re both pieces of intrastructure. But it makes sense that some politicians would value irrigation while others would value drinking water more.
(Hint: the function in R to compute a correlation coefficient is . It requires two arguments (separated by a comma) and in no particular order: the code identifying each of the two variables.)
Answer: A random sample
Answer: I would get a random number generator, ranging from 1 to the number of total villages sampled in this experiment. Then, I would run the generator and sample the village numbers that are chosen.