Load the data set boston.csv
boston <- read.csv("boston.csv")
Set the working directory
Operators
<- Assignment operator used to create new
objects# used to comment code; R ignores everything that
follows it$ used to access an element inside an object, such as a
variable inside a dataframe (data$variable)== evaluates whether two values are equal to each
other. The output is a logical value: TRUE or FALSE (3==3).
It is used in the logical test to evaluate whether the observations in a
variable are equal to a particular value
(data$male==1).[] Sqaure brackets can be used to extract a selection
of observations from a variable. We specify the criterion of selection
inside the square brakets
(mean(boston$age[boston$male == 1])).!is.na used with some functions when the variable
contains NA. Also works with na.rmFunctions - Explore the data
read.csv(”filename.csv”) reads CSV fileshead() or tail() shows the first/last
observations in a dataframedim() provides the dimensions of a dataframeView() opens a new tab with contents of datasetModify your data
data$latinx <- ifelse(data$ethnic=="latino", 1, 0)
We use ifelse to create new variables based on certain conditions.subset_data <- cmps[cmps$state == "Florida" , ]
Subset the data keeping all the variables based on a condition (e.g,
state=Florida)subset_data <- data[data$state == "Florida", c("var1", "var2", "var3")]
subset when you want to impose a condition and you also want to keep
only a few variables.Functions - descriptives and visualizations
mean() calculate the mean of a variable
(mean(Data$Variable))median() calculates the median of a varialble
(median(data$variable))sd() calculates the standard deviation of a variable
(sd(data$variable))var() calculates the variance of a variable
(var(data$variable))table() creates the frequency table of a varialbe
(table(data$variable)).prop.table() converts a frequency table of proportions
(pro.table(table(data$variable))).hist() creates the histogram of a variable
(hist(data$variable)plot() creates the scatter plot of two variables
(plot(data$variable1, data$variable2))cor() calculates the correlation coefficient between
two variables (cor(data$variable1, data$variable2))Functions - Regression Analysis
-lm() fits a linear model. It requires a formula of the
type: Y~X, where Y identifies the outcome variable and X identifies the
X variable. lm(data$y_var~data$x_var) or
lm(y_var~x_var, data=data) - summary(lm())
shows a summary of the linear model -abline() adds a
straight line to a graph. To add the fitted line, we specify as the main
argument the object that contains the output of the lm() function.
fit<-lm(Y~X);abline(fit)
The names and descriptions of variables in the data set
| Name | Description |
|---|---|
age |
Age of individual at time of experiment |
male |
Sex of individual, male (1) or female (0) |
income |
Income group in dollars (not exact income) |
white |
Indicator variable for whether individual identifies as white (1) or not (0) |
college |
Indicator variable for whether individual attended college (1) or not (0) |
usborn |
Indicator variable for whether individual is born in the US (1) or not (0) |
treatment |
Indicator variable for whether an individual was treated (1) or not (0) |
ideology |
Self-placement on ideology spectrum from Very Liberal (1) through Moderate (3) to Very Conservative (5) |
numberim.pre |
Policy opinion on question about increasing the number of immigrants allowed in the country from Increased (1) to Decreased (5) |
numberim.post |
Same question as above, asked later |
remain.pre |
Policy opinion on question about allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5) |
remain.post |
Same question as above, asked later |
english.pre |
Policy opinion on question about passing a law establishing English as the official language from Not Favor (1) to Favor (5) |
english.post |
Same question as above, asked later |
First, let’s a get sense for this data. Use head() to
take a quick look at it. What are its dimensions? Calculate the mean of
a variable.
head(boston)
## age male income white college usborn treatment ideology numberim.pre
## 1 31 0 135000 1 1 1 1 3 5
## 2 34 0 105000 1 1 0 1 4 1
## 3 63 1 135000 1 1 1 1 2 1
## 4 45 1 300000 1 1 1 1 4 3
## 5 55 1 135000 1 1 1 0 2 3
## 6 37 0 87500 1 1 1 1 5 3
## numberim.post remain.pre remain.post english.pre english.post
## 1 4 2 3 4 4
## 2 2 5 5 3 3
## 3 3 1 1 1 1
## 4 3 4 4 4 4
## 5 2 1 1 4 2
## 6 3 5 5 5 5
dim(boston)
## [1] 115 14
Our goal is to calculate the average treatment effect on the change in attitudes about immigration (number)
Change post-pre
Average change of the treatment group
Average change of the control group
Difference of Means treatment’s change - control’s change
Calculate the change in attitudes (post-pre)
boston$change <-
boston$numberim.post - boston$numberim.pre
treatment == 1treat.change <-
mean(boston$change[boston$treatment == 1])
treat.change
## [1] 0.1176471
More exclusionary with the treatment
treatment == 0]ctrl.change <-
mean(boston$change[boston$treatment == 0])
ctrl.change
## [1] -0.1875
treat.change - ctrl.change
## [1] 0.3051471
The changes in the treatment group were more exclusionary than the control group by 0.31 points \(\rightarrow\) Exposure to simulated demographic changes caused this increase in exclusionary attitudes.
Calculate the ATE of the experiment but instead of using as the dependent variable the number of immigrants use the policy opinion on question about allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5)
remain.preremain.postFollow the steps:
# 1. Calculate the change in attitudes (post-pre)
boston$change <-
boston$remain.post - boston$remain.pre
# 2. Compute change in attitude for the treatment group`treatment == 1`
treat.change <-
mean(boston$change[boston$treatment == 1])
treat.change
## [1] 0.2156863
# 3. Compute change in attitude for the control group [same but not `treatment == 0`]
ctrl.change <-
mean(boston$change[boston$treatment == 0])
ctrl.change
## [1] -0.109375
# 4. Finally, compute the difference of the mean change between treatment group and control group
treat.change - ctrl.change
## [1] 0.3250613