Today’s Data

Load the data set boston.csv

boston <- read.csv("boston.csv")

1. R Cheat sheet

R Cheat sheet

Set the working directory

setwd(“~/Desktop/DSS”) # example of setwd() for Mac
setwd(“C:/user/Desktop/DSS”) # example for Windows

Operators

<- Assignment operator used to create new objects
# used to comment code; R ignores everything that follows it
$ used to access an element inside an object, such as a variable inside a dataframe (data$variable)
== evaluates whether two values are equal to each other. The output is a logical value: TRUE or FALSE (3==3). It is used in the logical test to evaluate whether the observations in a variable are equal to a particular value (data$male==1).
[] Sqaure brackets can be used to extract a selection of observations from a variable. We specify the criterion of selection inside the square brakets (mean(boston$age[boston$male == 1])).
!is.na used with some functions when the variable contains NA. Also works with na.rm

Functions - Explore the data

Loading data:
- read.csv(”filename.csv”) reads CSV files
Exploring data:
- head() or tail() shows the first/last observations in a dataframe
- dim() provides the dimensions of a dataframe
- View() opens a new tab with contents of dataset

Modify your data

data$latinx <- ifelse(data$ethnic=="latino", 1, 0) We use ifelse to create new variables based on certain conditions.
subset_data <- cmps[cmps$state == "Florida" , ] Subset the data keeping all the variables based on a condition (e.g, state=Florida)
subset_data <- data[data$state == "Florida", c("var1", "var2", "var3")] subset when you want to impose a condition and you also want to keep only a few variables.

Functions - descriptives and visualizations

Measures of centrality
- mean() calculate the mean of a variable (mean(Data$Variable))
- median() calculates the median of a varialble (median(data$variable))
Measures of dispersion
- sd() calculates the standard deviation of a variable (sd(data$variable))
- var() calculates the variance of a variable (var(data$variable))
Tables
- table() creates the frequency table of a varialbe (table(data$variable)).
- prop.table() converts a frequency table of proportions (pro.table(table(data$variable))).
Plots
- hist() creates the histogram of a variable (hist(data$variable)
- plot() creates the scatter plot of two variables (plot(data$variable1, data$variable2))
Relationship between variables
- cor() calculates the correlation coefficient between two variables (cor(data$variable1, data$variable2))

Functions - Regression Analysis

-lm() fits a linear model. It requires a formula of the type: Y~X, where Y identifies the outcome variable and X identifies the X variable. lm(data$y_var~data$x_var) or lm(y_var~x_var, data=data) - summary(lm()) shows a summary of the linear model -abline() adds a straight line to a graph. To add the fitted line, we specify as the main argument the object that contains the output of the lm() function. fit<-lm(Y~X);abline(fit)

2. Case- Causal effect

Field experiment assessing the extent to which individuals living in suburban communities around Boston, Massachusetts, and their views were affected by exposure to demographic change.
Enos, R. D. 2014. “Causal Effect of Intergroup Contact on Exclusionary Attitudes.”
The experiment
- Subjects individuals riding on the commuter rail line and overwhelmingly white.
  - Participants were asked questions related to immigration policy both before the experiment started and after the experiment had ended.
- Treatment presence of two native Spanish-speaking ‘confederates’ on the platform each morning prior to the train’s arrival.
  - why? Trying to simulate the kind of demographic change
  - Administered for 10 days.
- Control group no such confederates were present on the platform.

The names and descriptions of variables in the data set

Name	Description
`age`	Age of individual at time of experiment
`male`	Sex of individual, male (1) or female (0)
`income`	Income group in dollars (not exact income)
`white`	Indicator variable for whether individual identifies as white (1) or not (0)
`college`	Indicator variable for whether individual attended college (1) or not (0)
`usborn`	Indicator variable for whether individual is born in the US (1) or not (0)
`treatment`	Indicator variable for whether an individual was treated (1) or not (0)
`ideology`	Self-placement on ideology spectrum from Very Liberal (1) through Moderate (3) to Very Conservative (5)
`numberim.pre`	Policy opinion on question about increasing the number of immigrants allowed in the country from Increased (1) to Decreased (5)
`numberim.post`	Same question as above, asked later
`remain.pre`	Policy opinion on question about allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5)
`remain.post`	Same question as above, asked later
`english.pre`	Policy opinion on question about passing a law establishing English as the official language from Not Favor (1) to Favor (5)
`english.post`	Same question as above, asked later

Always start by exploring the data

First, let’s a get sense for this data. Use head() to take a quick look at it. What are its dimensions? Calculate the mean of a variable.

head(boston)

##   age male income white college usborn treatment ideology numberim.pre
## 1  31    0 135000     1       1      1         1        3            5
## 2  34    0 105000     1       1      0         1        4            1
## 3  63    1 135000     1       1      1         1        2            1
## 4  45    1 300000     1       1      1         1        4            3
## 5  55    1 135000     1       1      1         0        2            3
## 6  37    0  87500     1       1      1         1        5            3
##   numberim.post remain.pre remain.post english.pre english.post
## 1             4          2           3           4            4
## 2             2          5           5           3            3
## 3             3          1           1           1            1
## 4             3          4           4           4            4
## 5             2          1           1           4            2
## 6             3          5           5           5            5

dim(boston)

## [1] 115  14

Average Treatment Effect

Our goal is to calculate the average treatment effect on the change in attitudes about immigration (number)

Change post-pre
Average change of the treatment group
Average change of the control group
Difference of Means treatment’s change - control’s change
Calculate the change in attitudes (post-pre)

boston$change <- 
  boston$numberim.post - boston$numberim.pre

Compute change in attitude for the treatment grouptreatment == 1

treat.change <- 
  mean(boston$change[boston$treatment == 1])
treat.change

## [1] 0.1176471

More exclusionary with the treatment

Compute change in attitude for the control group [same but not treatment == 0]

ctrl.change <- 
  mean(boston$change[boston$treatment == 0])
ctrl.change

## [1] -0.1875

Finally, compute the difference of the mean change between treatment group and control group

treat.change - ctrl.change

## [1] 0.3051471

The changes in the treatment group were more exclusionary than the control group by 0.31 points $\rightarrow$ Exposure to simulated demographic changes caused this increase in exclusionary attitudes.

3. Let’s Practice

Calculate the ATE of the experiment but instead of using as the dependent variable the number of immigrants use the policy opinion on question about allowing the children of undocumented immigrants to remain in the country from Allow (1) to Not Allow (5)

remain.pre
remain.post

Follow the steps:

# 1. Calculate the change in attitudes (post-pre)
boston$change <- 
  boston$remain.post - boston$remain.pre

# 2. Compute change in attitude for the treatment group`treatment == 1` 

treat.change <- 
  mean(boston$change[boston$treatment == 1])
treat.change

## [1] 0.2156863

# 3. Compute change in attitude for the control group [same but not `treatment == 0`]
ctrl.change <- 
  mean(boston$change[boston$treatment == 0])
ctrl.change

## [1] -0.109375

# 4. Finally, compute the difference of the mean change between treatment group and control group
treat.change - ctrl.change

## [1] 0.3250613

Lab 3