india <- read.csv # Problem Set 02 - Do Women Promote Different Policies than Men? (continuation)
(Based on DSS Materials and on Chattopadhyay and Esther Duflo. 2004. ``Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)
We will estimate the average causal effect of having a female politician on two policy outcomes. For this purpose, we will analyze data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. The Table below shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.
| Variable | Description |
|---|---|
| village | village identifier (“Gram Panchayat number _ village number”) |
| female | whether the village was assigned a female politician: 1=yes, 0=no |
| water | number of new (or repaired) drinking water facilities in the village
since random assignment |
| irrigation | number of new (or repaired) irrigation facilities in the
village since random assignment |
In this problem set, we will practice loading, making sense of data, and understanding the basics of causal inference. We will also learn how to use R Markdown.
One way to we can get the average casual affect of having a female politician on the number of new (or repaired) drinking water facilities is by using the difference of mean estimator.
mean(india$water[india$female == 1])
## [1] 23.99074
23.99074
(Hint: we use [] to subset a variable; inside the square
brackets, we specify the selection criterion. For example, we can use
the relational operator == to set a logical test; only the
observations for which the logical test is true will be extracted.)
mean(india$water[india$female == 0])
## [1] 14.73832
14.73832
mean(india$water[india$female == 1])- mean(india$water[india$female == 0])
## [1] 9.252423
9.252423
ggplot(india, aes(x=water))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A.I decided to make a bar plot, and it does not appear to be bell shaped. Rather have a negative corralation, that can be seen with the largest amount of data is in the beginning and decreases. B. Approximately, 2 villages have data above 250 +-25.
(Hint: the histogram of a variable is the visual representation of
its distribution. The function in R to create a histogram is . The only
required arguments are the variable and the
dataset.)
barplot(india$water[india$irrigation == 1])
a. The relationship between these two variables look positive. b. no
(Hint: a scatter plot is the graphical representation of the relationship between two variables. The function in R to create a scatter plot with the fitting line is:
ggplot(data = dataset, aes(x = var_x, y = var_y)) +
geom_point() +
geom_smooth(formula = 'y ~ x', method = 'lm', se = F)
It requires three arguments: (1) the name you saved the dataset; (2) the code identifying the variable to be plotted along the x-axis, and (3) the code identifying the variable to be plotted along the y-axis.)
cor(india$water,india$irrigation)
## [1] 0.4073307
0.4073307 a. No. The data dds ups as can be seen when politicians prioritize building or repairing water systems would as well prioritize building public infrastructure. b. I am surprised by the absolute value, because I thought it would be closer to 1 since they re both public infrastructure.
(Hint: the function in R to compute a correlation coefficient is . It requires two arguments (separated by a comma) and in no particular order: the code identifying each of the two variables.)
Representative
Randomly selecting data from these villages to compare and contrast. Any outliars can be quickly addressed to insure they are truly representative of this data.