Problem Set 02 - Do Women Promote Different Policies than Men? (continuation)

Due: Feb 20

(Based on DSS Materials and on Chattopadhyay and Esther Duflo. 2004. ``Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)

We will estimate the average causal effect of having a female politician on two policy outcomes. For this purpose, we will analyze data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. The Table below shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Variable	Description
village	village identifier (“Gram Panchayat number _ village number”)
female	whether the village was assigned a female politician: 1=yes, 0=no
water	number of new (or repaired) drinking water facilities in the village since random assignment
irrigation	number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we will practice loading, making sense of data, and understanding the basics of causal inference. We will also learn how to use R Markdown.

1. Considering that the dataset we are analyzing comes from a randomized experiment, what can we compute to estimate the average causal effect of having a female politician on the number of new (or repaired) drinking water facilities? Please provide the name of the estimator. (1 point)

Answer: We can compute the average number of repaired or new drinking water facilities in villages that have a female politician and compare them to the average number of repaired or new drinking water facilities in villages that have a male politician. The name of the estimator is the mean.

2. In this dataset, what is the average number of new (or repaired) drinking water facilities in villages with a female politician? Please answer with a full sentence. (1 point)

## Coding answers here
mean(india$water[india$female==1])

## [1] 23.99074

Answer: The average number of new or repaired drinking water facilities in a village with a female politician is 24.0 facilities.

(Hint: we use [] to subset a variable; inside the square brackets, we specify the selection criterion. For example, we can use the relational operator == to set a logical test; only the observations for which the logical test is true will be extracted.)

3. What is the average number of new (or repaired) drinking water facilities in villages with a male politician? Please answer with a full sentence. (1 point)

mean(india$water[india$female==0])

## [1] 14.73832

Answer: The average number of new or repaired drinking water facilities in a village with a male politician is 14.7 facilities.

4. What is the estimated average causal effect of having a female politician on the number of new (or repaired) drinking water facilities? (2 points)

mean(india$water[india$female==1])-mean(india$water[india$female==0])

## [1] 9.252423

Answer: The estimated average causal effect of having a female politician on the number of new (or repaired) drinking water facilities is 9.3 new or repaired water drinking facilities.

5. Create a visualization of the distribution of the variable water.

Does this variable look bell-shaped distributed? (0.5 points)
Approximately how many villages in this experiment had about 250 new (or repaired) drinking water facilities since the randomization of politicians? (0.5 points)

ggplot(data = india, aes(x = water)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Answers: a) This graph doesn’t look bell-shaped distributed. b) Approximately 0 villages have had 250 water facilities built since the beginning of the experiment.

(Hint: the histogram of a variable is the visual representation of its distribution. The function in R to create a histogram is . The only required arguments are the variable and the dataset.)

6. Create a visualization of the relationship between water and irrigation.

Does the linear relationship between these two variables look positive or negative? A positive/negative answer will suffice. (0.5 points)
Does the relationship between these two variables look strongly linear? A yes/no answer will suffice. (0.5 points)

ggplot(data = india, aes(x = water, y = irrigation)) + 
  geom_point() + 
  geom_smooth(formula = 'y ~ x', method = 'lm', se = F)

Answers: Positive. No.

(Hint: a scatter plot is the graphical representation of the relationship between two variables. The function in R to create a scatter plot with the fitting line is:

ggplot(data = dataset, aes(x = var_x, y = var_y)) + 
  geom_point() + 
  geom_smooth(formula = 'y ~ x', method = 'lm', se = F)

It requires three arguments: (1) the name you saved the dataset; (2) the code identifying the variable to be plotted along the x-axis, and (3) the code identifying the variable to be plotted along the y-axis.)

7. Compute the correlation between water and irrigation.

Are you surprised by the sign of the correlation? Provide your reason. (0.5 points)
And are you surprised by the absolute value of the correlation? Provide your reason. (0.5 points)

cor(india$water,india$irrigation)

## [1] 0.4073307

Answers: No, it makes sense that politicians who prioritize building or repairing water systems would also prioritize building or repairing irrigation systems because they ultimately prioritize building public infrastructure. I am a little surprised by the absolute value of this correlation because I would think that it’s closer to 1 given the fact that they’re both pieces of intrastructure. But it makes sense that some politicians would value irrigation while others would value drinking water more.

(Hint: the function in R to compute a correlation coefficient is . It requires two arguments (separated by a comma) and in no particular order: the code identifying each of the two variables.)

8. If we wanted to use the sample of villages in this dataset to infer the characteristics of all villages in India, we would have to make sure that the sample is _____________ of the population of all villages. (Please provide the missing word). (1 point)

Answer: A random sample

9. What would have been the best way of selecting the villages for the sample to ensure that the statement above was true? (1 point)

Answer: I would get a random number generator, ranging from 1 to the number of total villages sampled in this experiment. Then, I would run the generator and sample the village numbers that are chosen.

POLI 30 D

Jayden Xia