Problem Set 02 - Do Women Promote Different Policies than Men? (continuation)

We will estimate the average causal effect of having a female politician on two policy outcomes. For this purpose, we will analyze data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. The Table below shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Variable	Description
village	village identifier (“Gram Panchayat number _ village number”)
female	whether the village was assigned a female politician: 1=yes, 0=no
water	number of new (or repaired) drinking water facilities in the village since random assignment
irrigation	number of new (or repaired) irrigation facilities in the village since random assignment

1. Considering that the dataset we are analyzing comes from a randomized experiment, what can we compute to estimate the average causal effect of having a female politician on the number of new (or repaired) drinking water facilities? Please provide the name of the estimator. (1 point)

Considering that the data set we are analyzing comes from a randomized experiment, what we can compute to estimate the average casual effect of having a female politician on the number of new(or repaired) drinking water facilities will be contrast between both treatment and control group. Utilizing the difference in means estimator we can determine the gap between the mean number of the new(or repaired) drinking water facilities in the treatment group and in the control group. Control groups are those without a female politician and treatment groups are those with a female politician.

2. In this dataset, what is the average number of new (or repaired) drinking water facilities in villages with a female politician? Please answer with a full sentence. (1 point)

mean(india\(water[india\)female == 1])

## [2] 23.99074

3. What is the average number of new (or repaired) drinking water facilities in villages with a male politician? Please answer with a full sentence. (1 point)

mean(india\(water[india\)female == 0])

## [3] 14.73832

4. What is the estimated average causal effect of having a female politician on the number of new (or repaired) drinking water facilities? (2 points)

mean(india\(water[india\)female == 1])- mean(india\(water[india\)female == 0])

##[4] 9.252423

5. Create a visualization of the distribution of the variable water.

ggplot(india, aes(x=water))+geom_histogram()

Does this variable look bell-shaped distributed? (0.5 points)
Approximately how many villages in this experiment had about 250 new (or repaired) drinking water facilities since the randomization of politicians? (0.5 points)

6. Create a visualization of the relationship between water and irrigation.

Does the linear relationship between these two variables look positive or negative? A positive/negative answer will suffice. (0.5 points)
Does the relationship between these two variables look strongly linear? A yes/no answer will suffice. (0.5 points)

## Coding answers here

Answers: Answers here.

(Hint: a scatter plot is the graphical representation of the relationship between two variables. The function in R to create a scatter plot with the fitting line is:

ggplot(data = dataset, aes(x = var_x, y = var_y)) + 
  geom_point() + 
  geom_smooth(formula = 'y ~ x', method = 'lm', se = F)

It requires three arguments: (1) the name you saved the dataset; (2) the code identifying the variable to be plotted along the x-axis, and (3) the code identifying the variable to be plotted along the y-axis.)

7. Compute the correlation between water and irrigation.

Are you surprised by the sign of the correlation? Provide your reason. (0.5 points)
And are you surprised by the absolute value of the correlation? Provide your reason. (0.5 points)

## Coding answers here

Answers: Answers here.

(Hint: the function in R to compute a correlation coefficient is . It requires two arguments (separated by a comma) and in no particular order: the code identifying each of the two variables.)

8. If we wanted to use the sample of villages in this dataset to infer the characteristics of all villages in India, we would have to make sure that the sample is _____________ of the population of all villages. (Please provide the missing word). (1 point)

Answer: Answers here.

9. What would have been the best way of selecting the villages for the sample to ensure that the statement above was true? (1 point)

Answer: Answers here.