Please use this R Markdown template to report your code, ouput, and written answers in a single document. Remember that you may not collaborate with others on exams. Please ask clarifying questions on CampusWire. Turn in your completed exam by uploading the compiled html or pdf file to Brightspace by 9:30 AM EST on Tuesday, March 1. Make sure to comment your code (using the # key). Report results in the correct units of measurement. Do not report more than two digits to the right of the decimal point.

Name: Meghla Srabon

TA: Alejandro

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

Question 1: Causality

You want to estimate the causal effect of X (treatment) on Y (outcome) for the problems in the table below.

Case X Y
A University policy requiring Covid vax (y/n) University Covid infection rate
B Attendng POL 850 lectures (y/n) Final class grade
C Increase in state Medicaid coverage (y/n) State mortality rate

Question 1.1 (6 points)

For each case in the table above, identify and define the two potential outcomes (\(Y_i(0)\) and \(Y_i(1)\)). Make sure to define the unit of observation (\(i\)).

Answer 1.1

For Case A, the first potential outcome is the university COVID infection rate with university policy requiring COVID-19 vaccination. The other potential outcome in this case is the university COVID infection rate when there is no university policy requiring COVID-19 vaccination. The unit of observation is the specific university COVID infection rate.

For Case B, the first potential outcome is the a student’s final class grade after attending POL 850 lectures while the other potential outcome is the student’s final class grade after NOT attending POL 850 lectures. The unit of observation is each student’s final class grade including those that attend and those that don’t attend lectures.

For Case C, the first potential outcome is the state mortality rate when an increase in state Medicaid coverage is detected. The other potential outcome is the state mortality rate when there is no increase in state Medicaid coverage detected. The unit of observation here is the state mortality rate.

Question 1.2 (5 points)

What is a confounder? Provide a brief definition of a confounding variable.

Answer 1.2

A confounder is a pre-treatment characteristic that effects both the X and Y variables within a survey. Confounders may distort or skew the causal effect of the selected treatment condition. For example, in the zero-sugar Coke and regular Coke survey we covered in the class, the livelihood/health styles of each individual differs prior to the experiment. This is a confounder because it effects the treatment group as well as the resulting outcome of the experiment. If the zero-sugar Coke person is more athletic and lives healthier compared to those that drink regular Coke, the treatment effect may be inaccurate.

Question 1.3 (6 Points)

For each case in the table above, identify a potential confounder, and explain how exactly that potential confounder might interfere with estimating causal effects. (Be specific about the consequences of the presence of the confounder.)

Answer 1.3

In Case A, a potential confounder could be the existence and implementation of state-wide COVID mandates since Northern states are held to a higher standard of health regulation compared to the South. This impacts university policy which is bound by state law. For example, if a school is historically Republican and located in Alabama, the student population may have lenient or no state vaccination requirement which in turn causes general disregard of public safety. This would increase the infection rate. This skews the estimated causal effect greatly which makes it harder to pinpoint if the outcome is triggered by the treatment condition or preexisting factors.

In Case B, a potential confounder is the student’s prior experience in R. If a student has taken this course before and has a strong background in R, they are more likely to not attend lectures. This will also positively impact his/her final class grade because of his/her high skill level. This will bias estimates of causal effects in a different direction.

In Case C, a potential confounder is the presence of a high population with low socio-economic status. For example, states with a lower average income will inherently have a higher dependency on in state Medicaid coverage. These states will also have a higher state mortality rate compared to states with higher average income who can afford doctors visits without Medicaid coverage.

Question 1.4 (5 Points)

What is the best way to avoid confounding when studying causal effects? Explain why that strategy helps to avoid confounding using one of the cases above to illustrate your explanation.

Answer 1.4

The best way to avoid confounding when studying causal effects is to randomize the treatment assignment in order to disrupt the causal effect of the pre-treatment characteristics. For example in Case B, instead of surveying students physically during lecture, there is a huge benefit in randomly selecting names of the class roster to gather a more accurate representation of the class’s attendance rate. If the former approach is conducted, the survey data may be inflated and show that the attendance rate is high when this is inaccurate.

Question 2

This exercise is based on the following article:

Green, Donald P., Tiffancy C. Davenport, and Kolby Hanson (2019). “Are There Long-Term Effects of the Vietnam Draft on Political Attitudes or Behavior? Apparently Not.Journal of Experimental Political Science. 6(2), 71-80.

This article examines the long-term effects of the Vietnam draft lottery on the political attitudes and behavior of the men who were eligible for the draft during the period of 1969–1971. Based on birth dates, the draft lottery was used to randomly select men who turned 19 prior to 1969, 1970 and 1971 to serve in the US army. While many of those selected did not eventually serve, the authors use a survey and publicly available information, such as voter registration, voter records, and partisan membership of eligible draftees to study whether being assigned to the draft has any long-term political effects.

The dataset survey.csv contains the following variables.

Name Description
draft Whether a respondent was drafted (1) or not (0)
year Birth year
ideology Ideology score that takes values from 1 to 5 where 1 is “very conservative,” 3 is “moderate,” and 5 is “very liberal.”
state Respondent’s state of residence

Question 2.1 (10 points)

Load the dataset. Use the dim() and summary() functions to explore the data: how many observations does the dataset contain? Is there a variable with missing values (NA) and if so, which one? What birth years are represented in the dataset?

This dataset contains 675 observations with four variables. There is a variable with missing values and this is the ideology variable(survey$ideology). The birth years represented in the dataset are 1950, 1951, and 1952.

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

survey <- read.csv("survey.csv")

View(survey)

dim(survey)
## [1] 675   4
summary(survey)
##      draft            year         ideology        state          
##  Min.   :0.000   Min.   :1950   Min.   :1.000   Length:675        
##  1st Qu.:0.000   1st Qu.:1950   1st Qu.:2.000   Class :character  
##  Median :1.000   Median :1951   Median :3.000   Mode  :character  
##  Mean   :0.523   Mean   :1951   Mean   :3.047                     
##  3rd Qu.:1.000   3rd Qu.:1952   3rd Qu.:4.000                     
##  Max.   :1.000   Max.   :1952   Max.   :5.000                     
##                                 NA's   :15
summary(survey$year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1950    1950    1951    1951    1952    1952

Question 2.2 (15 points)

Calculate the mean birth year for those who were drafted and those who were not drafted. Briefly report and interpret the results. Why do we care about mean birth year across these two groups?

Answer 2.2

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

drafted <-subset(survey, survey$draft==1)
notdrafted <- subset(survey, survey$draft==0)
mean(drafted$year)
## [1] 1951.088
mean(notdrafted$year)
## [1] 1950.925

The mean birth year for those who were drafted is 1951.08 while the mean for those who were not drafted is 1950.93. There is a very small margin between these two figures. This shows that those who were selected in the draft lottery were typically younger gentlemen compared to those who were not drafted. We want to make sure that these mean birth years are as identical as possible because if otherwise, age is a key determinant in long-term political beliefs. Ex: younger generations tend to be more liberal and progressive while “Boomers” and older generations hold conservative beliefs. This mean shows that since both groups are around the same age, this survey accounts for this potential confounder.

Question 2.3 (15 points)

Use a frequency table to identify the states represented in the dataset. Then, create a new variable that is 0 if an individual’s state of residence was Colorado, and 1 if an individual’s state of residence was Oregon (hint: try the ifelse() function). Calculate the mean proportions of individuals whose state of residence was Oregon among those who were not drafted and those who were drafted. Briefly report and interpret the results. Why do we care about state of residence across these two groups?

Answer 2.3

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

table(survey$state)
## 
##  CO  OR 
## 284 391
freq_table1 <- table(survey$state)
survey$residence <- ifelse(survey$state=="Colorado", 1, 0)

prop.table(table(drafted$draft==1, drafted$state=="OR"))
##       
##            FALSE      TRUE
##   TRUE 0.4390935 0.5609065

prop.table(table(notdrafted\(draft==0, drafted\)state==“OR”))

This survey represents Colorado and Oregon. Out of the 322 respondents that were not drafted, 193 resided within Oregon or .59 with regards to the total sample. Out of the 353 respondents who were drafted, 198 of them lived in Oregon or .56 of the total sample (drafted). Since we’re measuring the effects of the draft on long term political beliefs, the state of residence matters because we want to identify any potential confounders associated with state-held political affiliation.

Question 2.4 (15 points)

Events at an early stage of life can have long-lasting impacts on a person’s political perspectives. Estimate the sample average treatment effect of being drafted on a person’s ideology score, pooling all birth years. Briefly report and interpret the result.

Answer 2.4

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

mean(drafted$ideology, na.rm=TRUE)
## [1] 3.034884
mean(notdrafted$ideology, na.rm=TRUE)
## [1] 3.060127
ideodiff <- (mean(drafted$ideology, na.rm=TRUE) - mean(notdrafted$ideology, na.rm=TRUE))

In order to find the average treatment effect, we must subtract the mean of the outcomes under the control condition from the mean of the outcomes under the treatment condition. After doing so, the average treatment effect is -0.025. This is a very small difference. Those that are not drafted tend to be slightly more liberal than those that are drafted.

Or phrased as, those who are drafted identify as slightly more conservative than those who are not drafted. This could be because conservatives highly praise national military service for its value of patriotism.

Question 2.5 (15 points)

Even if we assign the same treatment to all individuals in a sample, respondents’ characteristics can change its effect. In this question, we focus on the geographical heterogeneity. Estimate the sample average treatment effect on ideology by state (still pooling all years). Briefly report and interpret the results.

Answer 2.5

setwd("/Users/meghlasrabon/Desktop/POL-UA-850 Files")

Colorado <- subset(survey, survey$state=="CO")
Oregon <- subset(survey, survey$state=="OR")

COdrafted <- subset(Colorado, Colorado$draft==1)
COnotdrafted <- subset(Colorado, Colorado$draft==0)

ORdrafted <- subset(Oregon, Oregon$draft==1)
ORnotdrafted <- subset(Oregon, Oregon$draft==0)

na.omit(COdrafted$ideology)
##   [1] 3 4 5 3 1 1 2 3 4 3 4 4 4 2 3 5 3 5 2 4 5 2 4 4 5 3 5 2 3 1 3 3 3 3 3 2 2
##  [38] 3 2 2 3 2 2 3 4 2 3 4 4 3 4 2 2 3 4 5 4 4 1 3 3 3 3 4 3 3 3 3 3 3 2 2 3 3
##  [75] 2 3 3 4 3 3 3 3 3 2 2 2 2 2 3 4 4 3 2 3 2 4 2 2 3 3 3 3 5 4 3 3 4 3 3 1 4
## [112] 3 1 2 2 2 4 3 3 2 5 5 2 2 2 2 4 3 1 4 1 2 3 4 2 3 4 3 2 2 3 2 2 4 2 2 4 2
## [149] 2 2
## attr(,"na.action")
## [1]  11  13  86 117 137
## attr(,"class")
## [1] "omit"
na.omit(COnotdrafted$ideology)
##   [1] 3 2 1 3 3 2 1 1 3 2 2 2 5 2 3 3 2 1 2 3 3 2 1 4 3 3 2 1 3 4 1 5 3 1 4 3 3
##  [38] 4 2 4 3 2 4 4 5 3 3 2 2 4 4 3 4 3 1 3 3 2 2 4 2 3 1 3 4 4 3 3 4 5 3 3 2 2
##  [75] 2 3 4 2 3 5 4 3 4 3 3 5 3 2 4 3 4 3 2 5 2 3 1 2 4 2 4 4 3 4 3 3 4 4 4 2 3
## [112] 2 2 3 1 3 2 4 5 3 3 2 4 2 2 2 2
## attr(,"na.action")
## [1]  50 113
## attr(,"class")
## [1] "omit"
na.omit(ORnotdrafted$ideology)
##   [1] 1 3 4 3 2 4 3 3 4 3 3 2 3 4 3 2 2 2 3 2 2 3 3 2 3 3 3 2 2 2 5 3 4 2 4 4 3
##  [38] 1 3 4 3 3 5 3 2 5 2 2 3 2 4 5 2 1 3 4 5 3 3 5 5 4 4 5 3 2 4 1 5 5 5 3 5 2
##  [75] 3 1 1 3 4 3 5 5 3 2 3 3 5 2 3 3 4 4 5 3 3 5 4 3 4 3 2 4 3 4 2 3 3 3 3 5 4
## [112] 3 3 3 3 3 5 4 4 3 1 1 1 2 2 2 3 3 5 3 2 2 1 1 5 3 5 5 5 2 3 3 2 2 4 4 3 2
## [149] 3 2 4 4 4 4 4 5 5 3 4 2 2 1 4 4 4 3 3 2 4 4 5 3 3 4 3 3 3 4 3 3 2 3 4 5 3
## [186] 3 3 2 4
## attr(,"na.action")
## [1] 21 39 70 81
## attr(,"class")
## [1] "omit"
na.omit(ORdrafted$ideology)
##   [1] 2 3 2 3 4 2 3 3 1 3 4 4 2 3 2 3 3 2 2 3 5 4 3 4 3 5 3 4 4 3 2 2 3 5 4 4 2
##  [38] 4 4 3 2 2 5 4 2 3 5 3 3 2 4 4 3 1 5 3 4 4 4 4 4 1 5 1 3 4 5 3 3 3 5 1 4 3
##  [75] 2 3 3 3 5 5 3 2 2 3 4 3 1 3 3 3 1 3 2 3 4 3 1 5 3 3 4 3 4 2 3 3 2 2 3 3 4
## [112] 4 2 2 2 3 3 3 2 3 3 1 2 3 3 3 5 4 3 5 5 4 2 3 2 4 3 2 4 2 4 3 2 4 2 2 4 4
## [149] 2 5 4 4 2 4 3 3 3 2 3 5 1 3 4 2 4 5 5 3 4 3 2 3 2 4 3 4 4 3 1 2 5 1 4 5 3
## [186] 2 3 2 4 4 2 3 3 4
## attr(,"na.action")
## [1]  18  78 173 177
## attr(,"class")
## [1] "omit"
mean(COdrafted$ideology, na.rm=TRUE)
## [1] 2.926667
mean(COnotdrafted$ideology, na.rm=TRUE)
## [1] 2.88189
mean(ORdrafted$ideology, na.rm=TRUE)
## [1] 3.118557
mean(ORnotdrafted$ideology, na.rm=TRUE)
## [1] 3.179894
COdiff <- (mean(mean(COdrafted$ideology, na.rm=TRUE)-COnotdrafted$ideology, na.rm=TRUE))
ORdiff <- (mean(mean(ORdrafted$ideology, na.rm=TRUE)-ORnotdrafted$ideology, na.rm=TRUE))

After taking the states into account, we see that respondents in Colorado who were drafted had an average ideology score of a 2.92 while those who weren’t scored an average 2.88. The average treatment effect on ideology is 0.04.

Meanwhile, respondents in Oregon who were drafted scored an average 3.11 while those who were not scored an average 3.18. The Oregon average treatment effect on ideology is -0.061. Overall, respondents in Oregon were more liberal than those in Colorado.

Question 2.6 (8 points)

A politician from a country in Asia is planning to use this study to discuss the likely effects of instituting a draft lottery in her country. Is this a valid approach for policy making? Discuss briefly.

Answer 2.6

I don’t believe this politician should institute a draft lottery in her country. The political build of most countries in Asia differs drastically from US institutions. She would need to gather more evidence if she plans to implement this lottery because this study has a relatively small sample size, since it only represents the states of Colorado and Oregon.

This survey disregards a lot of other factors such as a citizen’s health status and education, the nation’s expenditures, etc which is important to note before instituting a random draft lottery. The scope of this study is too narrow to be used in policy-making. While this particular study can be used to gauge the effects of a draft lottery on political ideology, I believe it is not helpful when assessing the likely effects of instituting a lottery in other nations. When a world leader is facing impending crisis, what their citizens will think of it in due time is not the crux of the issue. Other crucial factors such as military funding, warfare training measures, long-term economic and social effects on the country, etc must be considered, which this study does not cover.