This homework is due by 5 PM on Friday, February 25. Please use this R Markdown template to report your code, output, and written answers in a single document. Please upload your compile html or pdf to Brightspace. Comment your code. Report results in the correct units of measurement. Do not report more than two digits to the right of the decimal point.
Name: Meghla Srabon
TA: Alejandro
A researcher conducted a randomized field experiment assessing the extent to which individuals living in suburban communities around Boston, Massachusetts, and their views were affected by exposure to demographic change.
This exercise is based on: Enos, R. D. 2014. “Causal Effect of Intergroup Contact on Exclusionary Attitudes.” Proceedings of the National Academy of Sciences 111(10): 3699–3704.
Subjects in the experiment were individuals riding on the commuter rail line and were overwhelmingly white. Every morning, multiple trains pass through various stations in suburban communities that were used for this study. For pairs of trains leaving the same station at roughly the same time, one was randomly assigned to receive the treatment and one was designated as a control. By doing so all the benefits of randomization apply for this dataset.
The treatment in this experiment was the presence of two native Spanish-speaking ‘confederates’ (a term used in experiments to indicate that these individuals worked for the researcher, unbeknownst to the subjects) on the platform each morning prior to the train’s arrival. The presence of these confederates, who would appear as Hispanic foreigners to the subjects, was intended to simulate the kind of demographic change anticipated for the United States in coming years. For those individuals in the control group, no such confederates were present on the platform. The treatment was administered for 10 days. Participants were asked questions related to immigration policy both before the experiment started and after the experiment had ended. The names and descriptions of variables in the data set boston.csv are:
| Name | Description |
|---|---|
age |
Age of individual at time of experiment |
male |
Sex of individual, male (1) or female (0) |
income |
Income group in dollars (not exact income) |
white |
Indicator variable for whether individual identifies as white (1) or not (0) |
college |
Indicator variable for whether individual attended college (1) or not (0) |
usborn |
Indicator variable for whether individual is born in the US (1) or not (0) |
treatment |
Indicator variable for whether an individual was treated (1) or not (0) |
ideology |
Self-placement on ideology spectrum from Very Liberal (1) through Moderate (3) to Very Conservative (5) |
numberim.pre |
Policy opinion on question about support for increasing the number immigrants allowed in the country from Increase (1) to Decrease (5) |
numberim.post |
Same question as above, asked later |
remain.pre |
Policy opinion on question about support for allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5) |
remain.post |
Same question as above, asked later |
english.pre |
Policy opinion on question about support for passing a law establishing English as the official language from Not Favor (1) to Favor (5) |
english.post |
Same question as above, asked later |
The benefit of randomly assigning individuals to the treatment or control groups is that the two groups should be similar, on average, in terms of their covariates. This is referred to as ‘covariate balance.’ Show that the treatment and control groups are balanced by comparing the proportions of white respondents and the proportion of respondents that went to college, in the treatment and control groups. Provide a brief interpretation of the results. Remember to set your working directory before you try to load data into R Studio.
#to find means and proportions to actual percentage mean(bes\(education==4, na.rm=T) #to create a subset of the sample that is above 60 bes2 <-subset(bes, bes\)age>60)
setwd(“/Users/meghlasrabon/Desktop/POL-UA-850 Files”) getwd
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
library(readr)
boston <- read_csv("boston.csv")
## New names:
## * `` -> ...1
## Rows: 123 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (16): ...1, X, age, male, income, white, college, usborn, treatment, ide...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
boston.treatment <- subset(boston, treatment == 1)
boston.ctrl <- subset(boston, treatment == 0)
summary(boston.treatment$white)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 1.0000 1.0000 0.8545 1.0000 1.0000
mean(boston.treatment$white)
## [1] 0.8545455
summary(boston.ctrl$white)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 1.0000 1.0000 0.9118 1.0000 1.0000
summary(boston.treatment$college)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 1.0000 1.0000 0.8182 1.0000 1.0000
summary(boston.ctrl$college)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 1.0000 1.0000 0.7941 1.0000 1.0000
#While the medians remain constant between the treatment and control groups, the mean varies significantly. In the treatment group, 85% of respondents are white while 82% of the total number of respondents attended college. Meanwhile in the control group, 92% of respondents are white and roughly 80% attended college. There is a slight variance however, the two groups are relatively equivalent.
Individuals in the experiment were asked a series of questions both at the beginning and the end of the experiment. One such question was “Do you think the number of immigrants from Mexico who are permitted to come to the United States to live should be increased, left the same, or decreased?” The response to this question prior to the experiment is in the variable numberim.pre. The response to this question after the experiment is in the variable numberim.post. In both cases the variable is coded on a 1 – 5 scale. Responses with values of 1 are inclusionary (‘pro-immigration’) and responses with values of 5 are exclusionary (‘anti-immigration’). Compute the average treatment effect on the change in attitudes about immigration. That is, how does the mean change in attitudes about immigration policy for those in the control group compare to those in the treatment group? Interpret the result (make sure to think about the units in which variables are measured).
** R tips: First, create a variable (say we name that variable as var) that measures the difference in the numberim.post and numberim.pre variables. Second, calculate the mean change in attitudes about immigration policy (mean(var)) for those in the treatment group, before and after the experiment. Third, calculate the mean change in attitudes about immigration policy (mean(var)) for those in the control group, before and after the experiment. Fourth, compute the difference between these two values. Make sure you address the missing value problem in the data by setting na.rm=TRUE when you calculate the mean value of attitude changes.)
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
boston$change <- boston$numberim.post - boston$numberim.pre
treat.change <- mean(boston$change[boston$treatment == 1],
na.rm = TRUE)
ctrl.change <- mean(boston$change[boston$treatment == 0],
na.rm = TRUE)
treat.change
## [1] 0.1176471
ctrl.change
## [1] -0.1875
treat.change - ctrl.change
## [1] 0.3051471
The change in the treatment group indicates an increase of .12 points. This means that they had more of an exculsionary attitude compared to the control group. The control group rate of change decreased by .19 points, which shows more inclusionary attitude towards immigrants. On average, the treatment group were more exclusionary (by 0.31 points), than the control group.
Using two density histograms (not frequency histograms), show the distributions of income for the treatment group and the control group. Give each histogram a title. Add the mean of each distribution to each plot as a vertical colored line. How do the two distributions differ?
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
hist(boston$income[boston$treatment==1], freq = FALSE, xlab = "Income", col="blue", main = "Distribution of Income: Treatment")
abline(v=mean(boston$income[boston$treatment==1]),col="blue")
text(x = 100000, y = 11.5e-06, "mean")
hist(boston$income[boston$treatment==0], freq = FALSE, xlab = "Income", col="Red", main = "Distribution of Income: Control")
abline(v=mean(boston$income[boston$treatment==0]),col="blue")
For the control group, the mean income is not as wide-ranging as the treatment group. This means that the mean income for the majority of the respondents in the control group remain concentrated between 65,000 to 115,000 dollars. The treatment group is somewhat similar and the two histograms do not differ significantly. The majority of treatment respondents hover towards the center of the income distribution.
Using two density histograms (not frequency histograms), show the distributions of the changes in attitudes about immigration (numberim.post - numberim.pre) for the treatment group and the control group. Give your histograms titles. Add the mean of the distribution of to each plot as a vertical colored line. How do the two distributions differ?
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
#Control Group
hist((boston$numberim.post - boston$numberim.pre)[boston$treatment==0], freq = FALSE,
xlab = "Change in Ideology Point Scale", col="red", main = "Distribution of Changes in Attitude: Control")
abline(v=mean((boston$numberim.post - boston$numberim.pre)[boston$treatment==0]),col="blue")
#Treatment Group
hist((boston$numberim.post - boston$numberim.pre)[boston$treatment==1], freq = FALSE,
xlab = "Change in Ideology Point Scale", col="Blue", main = "Distribution of Changes in Attitude: Treatment")
The distributions do not vary drastically. For both the treatment and the control group, there is virtually no change in ideology but there are outliers in both directions of the histogram. This indicates that many respondents remained consistent in their ideology.
(I believe I have the right code for the vertical mean line however every time I run it, it gives me the error “plot.new has not been called yet.” Can I receive some points for the formatting?)
In this exercise, we examine cross-national differences in attitudes towards domestic violence and access to information. We explore the hypothesis that there is an association at an aggregate level between the extent to which individuals in a country have access to knowledge and new information, both through formal schooling and through the mass media, and their likelihood of condemning acts of intimate partner violence. This exercise is in part based on:
Pierotti, Rachel. (2013). “Increasing Rejection of Intimate Partner Violence: Evidence of Global Cultural Diffusion.” American Sociological Review, 78: 240-265.
We use data from the Demographic and Health Surveys, which are a set of over 300 nationally, regionally and residentially representative surveys that have been fielded in developing countries around the world, beginning in 1992. The surveys employ a stratified two-stage cluster design. In the first stage, enumeration areas (EA) are drawn from Census files. In the second stage, within each EA a sample of households is drawn from an updated list of households. In addition, the surveys have identical questionnaires and trainings for interviewers, enabling the data from one country to be directly compared with data collected in other countries. It is important to note that different groups of countries are surveyed every year.
In the study, the author used these data to show that “women with greater access to global cultural scripts through urban living, secondary education, or access to media were more likely to reject intimate partner violence.” The data set is in the csv file dhs_ipv.csv. The names and descriptions of variables are:
| Name | Description |
|---|---|
beat_goesout |
Percentage of women in each country that think a husband is justified to beat his wife if she goes out without telling him. |
beat_burnfood |
Percentage of women in each country that think a husband is justified to beat his wife if she burns his food. |
no_media |
Percentage of women in each country that rarely encounter a newspaper, radio, or television. |
sec_school |
Percentage of women in each country with secondary or higher education. |
year |
Year of the survey |
region |
Region of the world |
country |
Country |
Note that there are two indicators of attitudes towards domestic violence: beat_goesout and beat_burnfood. There are also two indicators of access to information: sec_school and no_media.
First, load the dhs_ipv.csv data set. How many observations are there in the dataset? How many variables? Use the head() function to inspect the first 10 rows in the data. What are the years in which surveys were administered in Bangladesh?
setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")
library(readr)
dhs_ipv <- read_csv("dhs_ipv.csv")
## New names:
## * `` -> ...1
## Rows: 151 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, region
## dbl (6): ...1, beat_burnfood, beat_goesout, sec_school, no_media, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(dhs_ipv)
head(dhs_ipv)
## # A tibble: 6 × 8
## ...1 beat_burnfood beat_goesout sec_school no_media country year region
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 1 4.4 18.6 25.2 1.5 Albania 2008 Middle …
## 2 4 4.9 19.9 67.7 8.7 Armenia 2000 Middle …
## 3 5 2.1 10.3 67.6 2.2 Armenia 2005 Middle …
## 4 6 0.3 3.1 46 6.4 Armenia 2010 Middle …
## 5 7 12.1 42.5 74.6 7.4 Azerbaijan 2006 Middle …
## 6 8 NA NA 24 41.9 Bangladesh 2004 Asia
nrow(dhs_ipv)
## [1] 151
dhs_ipv$year[dhs_ipv$country=="Bangladesh"]
## [1] 2004 2007 2011
## insert code here
There are 151 observations in the dataset with 8 variables. By using the function, we see that the surveys were administered in Bangladesh in the years 2004, 2007, and 2011.
Let’s examine the association between attitudes towards intimate partner violence and the two exposure to information variables in our data. First, use two scatterplots to examine the bivariate relationships between beat_goesout and no_media as well as between beat_goesout and sec_school. In both cases, treat beat_goesout as your outcome (Y) variable and construct your scatterplots accordingly. Give your plots titles. Label your x and y axes with meaningful labels. What do you learn from these plots about the study’s hypothesis?
plot(dhs_ipv$no_media, dhs_ipv$beat_goesout, main = "Domestic Violence: Bivariate Relationship", ylab = "% of women in the country who supports domestic violence when a woman goes out without informing her spouse", xlab = "% of women in the country with no media access")
plot(dhs_ipv$sec_school,dhs_ipv$beat_goesout, main = "Domestic Violence: Bivariate Relationship", xlab = "% of women in the country with secondary education", ylab = "% of women in the country who who supports domestic violence when a woman goes out without informing her spouse")
By analyzing the bivariate relationship between the variables in each scatter plot, we see that in countries where there is a high percentage of women with no access to media outlets, there is also a high percentage of women who support domestic violence in situations where a woman goes out without informing her spouse. This indicates a positive correlation.
However in the second scatter plot, countries that maintain a high percentage of women with secondary education have a lower chance/percentage of supporting domestic violence in cases where a woman goes out without informing her spouse. This conveys a negative correlation between the variables. There are five outliers in this scatterplot. This signifies five countries that have high female secondary school retention but maintain high domestic violence values in the home.
Repeat these scatterplots for beat_burnfood and no_media, as well as for beat_burnfood and sec_school. In both cases, treat beat_burnfood as your outcome (Y) variable and construct your scatterplots accordingly. Give your plots titles. Label your x and y axes with meaningful labels. What do you learn from these plots about the study’s hypothesis?
plot(dhs_ipv$no_media, dhs_ipv$beat_burnfood, main = "Domestic Violence: Bivariate Relationship", xlab = "% of women in the country with no media access", ylab = "% of women in the country who supports domestic violence when a wife burns her husband's food" )
plot(dhs_ipv$sec_school,dhs_ipv$beat_burnfood, main = "Domestic Violence: Bivariate Relationship", xlab = "% of women in the country with secondary education", ylab = "% of women in the country who supports domestic violence when a wife burns her husband's food")
In the first scatterplot, countries where a high percentage of women have little to no access to media, there is also a high percentage of women who support domestic violence when a wife burns her husband’s food.This is a positive correlation. In the second scatterplot, countries where women do not hold secondary education degrees or have a low percentage of retention, have a HIGHER chance of supporting domestic violence when a wife burn’s her husband’s food. As the secondary education retention rate increases, the probability of supporting domestic violence decreases. This indiciates an inverse relationship/negative correlation.