Submit your HTML output and your .Rmd file to Gauchospace by the deadline
Reminders This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Please also remember that you will want to use the console to “try out” code to get it working. Once you get it working, copy the code that worked (not the results) over into a code chunk in your rmd. Remember that the code within your rmd file has to be self-contained and include all the steps – your rmd file will not “remember” what you did on your own in the console. When you click knit, it can only execute the code that was present in the rmd. Do not copy the results from your console into your RMD file. In addition, do not include large amounts of output in your writeup (i.e. don’t print full datasets to the screen).
Include both the code to get your answer and your answer in words.
It is best to work will small amounts of code at a time: get some code working, copy it into the rmd as a code chunk, write your text answer (outside the code chunk) if needed, and check that the file will still knit properly. Do not proceed to answer more questions until you get the first bit working. If you knit everytime you try to write some new code, you’ll know where the error is (in the last thing you did!) This will save you huge headaches.
Although the questions break up each task for you into parts, remember that you might need to put a bunch of code together into a single chunk to make it work. For example, if you create a density plot in one part of a question, and want to add the mean value to it as a line in another part, you need these two commands to follow one another in the same chunk of code.
Some tips: Start early, work with friends in the class, use the discussion forum, come to class and section, go to office hours if you need to, read the textbook and other readings – do all these things and you’ll succeed! Good luck.
Read the article, “Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment” by Gerber, Green & Larimer, posted on Gauchospace this week.
(1a) The researchers have a main finding. For this main finding, what is their independent variable? What is their dependent variable? For these variables, state the researchers’ null and alternative hypotheses. Given this is an experiment, what’s another term for our independent variable in this context. The main finding in this research people is that citizens are more likely to vote if they are publicly recongized for voting (extrinsic). The researchers conclude that voter turnout will increase if there is social pressure. This “if everyone is doing it I should do it too” mentality has a substational influence on voter turnout and political participation. The independent variable in this study is how a household recieves a ballot (4 different variations of the same ballot). The dependent variable is how successful each ballot is in encouraging people to vote. The null hypothesis is that there is no correlation between social influences and voter turnout. The alternative hypothesis is that there is a relationship between voter turnout and social influences such as knowing if other members of your community participated in voting.
(1b) Examine Table 1. Describe what this table is showing the reader. After looking at this table, what variables would you tell the researchers they need to include in their multivariate regression as control variables? Table 1 is showing estimates of households that were research subjects. The table depicts the treatment and control groups as well as the relationship between the treatment group and the covariates of “180.002 households that form the sample experiement. The covariates include: previous voting histories (primaries, general elections), as well as gender, age, and number of registered voters in the household.
(1c) Examine Table 2. What is the average treatment effect of the “self” treatment? Show your calculation. Would you say this is a causal estimate – why or why not? The average treatment effect was recorded at 4.8% with the calculation being at 34.5% for people who voted with the “self” treatment . Since this is a randomized experiment we can say that the treatment did cuase the difference in percentage points between the “self” and the treatment and control groups
(1d) Examine Table 3. Intepret the first coefficient (top left cell) in terms of its statistical and substantive significance using the substantive meanings of the IV and DV. How does this coefficient change if the authors include covariates in their model? What does this tell us about their research findings? Would you say this is a causal estimate – why or why not?
We expect 1.8% voter turnout rate when a group recieves the Civic Duty Agreement Treatment. Although it is not a large difference between the control and the treated group the difference is statisitically significant. These results are roust because when the data is revisited and covariates are added into the equation the results stay the same. Since this is a randomized experiment and other covariates are accounted for through this, we can say the relationship is causual.
One longstanding debate in the study of international relations concerns the question of whether individual political leaders can make a difference. Some emphasize that leaders with different ideologies and personalities can significantly affect the course of a nation. Others argue that political leaders are severely constrained by historical and institutional forces. Did individuals like Hitler, Mao, Roosevelt, and Churchill make a big difference? The difficulty of empirically testing these arguments stems from the fact that the change of leadership is not random and there are many confounding factors to be adjusted for.
In this exercise, we consider a natural experiment where the success or failure of assassination attempts is assumed to be essentially random. Each observation of the data set leaders.Rdata contains information about an assassination attempt. Below are the names and descriptions of variables in this leader assassination data set. The polity variable documents and quantifies the regime types of all countries in the world from 1800. The polity score is a 21-point scale ranging from −10 (hereditary monarchy) to 10 (consolidated democracy). In this data set, the result variable is a 10 category factor variable describing the result of each assassination attempt. These are the other variables:
| Variable Name | Description |
|---|---|
| country | country |
| year | year |
| leadername | the name of leader who was targeted |
| age | the age of targeted leader |
| politybefore | average polity score of country over 3 years prior to the attempt |
| polityafter | average polity score of country over the 3 years after the attempt |
| civilwarbefore | 1 if country is in civil war during 3 years prior to the attempt, 0 otherwise |
| civilwarafter | 1 if country is in civil war during 3 years after the attempt, 0 otherwise |
| interwarbefore | 1 if country is in international war during 3 years prior to the attempt, 0 otherwise |
| interwarafter | 1 if country is in international war during 3 years after the attempt, 0 otherwise |
| result | the result of assassination attempt |
As usual, set your working directory and load the data.
getwd()
## [1] "/Users/alexsefayan/Downloads"
load("corruption.Rdata")
load("leaders.Rdata")
(2a) Descriptive statistics: What is your sample size? How many assassination attempts are recorded in the data? How many countries experience at least one leader assassination attempt? (The unique() function, which returns a set of unique values from the input vector, may be useful here). What is the average number of such attempts (per year) among these countries?
dim(corruption)
## [1] 108 16
dim(leaders)
## [1] 250 11
#There are 250 different leaders represented in the leaders.Rdata file.
length(leaders$year)
## [1] 250
length(unique(leaders$result))
## [1] 10
#At least 10 countries experience at least one leader assassination attempt
mean(table(leaders$year))
## [1] 2.45098
#2.45 is the average number of such attempts (per year) among these countries
(2b) Create a new binary variable named success that is equal to 1 if a leader dies from the attack and to 0 if the leader survives. Store this new variable as part of the original data frame. What is the overall success rate of leader assassination? Does the result speak to the validity of the assumption that the success of assassination attempts is randomly determined?
leaders$success <- ifelse(leaders$result=="dies between a day and a week"|leaders$result=="dies between a week and a month"|leaders$result=="dies within a day afer the attack"|leaders$result=="dies, timing unknown", 1,0)
prop.table(table(leaders$success))
##
## 0 1
## 0.968 0.032
The prop.table command displays the success and failure rate of leader assassinations. In this case the failed rate of assassinations are 96.8% with a 3.2% success rate meaning that the assassinations are not randomly determined. Typically if something is randomly determined it’s like flipping a coin (50-50) but since the margins for success and failure are so wide it is not randomly determined.
(2c) Investigate whether the average polity score over 3 years prior to an assassination attempt differs on average between successful and failed attempts. Briefly interpret the results in light of the validity of the aforementioned assumption.
mean(leaders$politybefore[leaders$success=="0"])
## [1] -1.472452
mean(leaders$politybefore[leaders$success=="1"])
## [1] -2.916667
We can observe that the average polity score decreases and that assassinations attempts are more than likely to fail. As a result, this counters randomness idea because we can observe an association between failed attempts and a lower polity score.
(2d) Repeat the same analysis as in the previous question, but this time using the country’s experience of civil and international war. Create a new binary variable in the data frame called warbefore. Code the variable such that it is equal to 1 if a country is in either civil or international war during the 3 years prior to an assassination attempt. Provide a brief interpretation of the result in terms of the natural experiment’s assumption of random assignment.
leaders$warbefore <- ifelse(leaders$interwarbefore=="1"|leaders$civilwarbefore=="1",1,0)
mean(leaders$warbefore[leaders$success=="1"])
## [1] 0.375
mean(leaders$warbefore[leaders$success=="0"])
## [1] 0.3677686
(2e) You want to know what happens after these assassinations. Does successful leader assassination cause democratization? Does successful leader assassination cause international wars? Answer these questions by running regressions that controlling for past experience with these variables. Interpreting the results, and stating your assumptions vis-a-vis causation.
model <- lm(leaders$polityafter~leaders$politybefore + leaders$success, data = leaders)
summary(model)
##
## Call:
## lm(formula = leaders$polityafter ~ leaders$politybefore + leaders$success,
## data = leaders)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.343 -1.095 -0.124 1.657 13.435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.36168 0.24182 -1.496 0.136
## leaders$politybefore 0.83809 0.03611 23.210 <2e-16 ***
## leaders$success -0.48555 1.31976 -0.368 0.713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.67 on 247 degrees of freedom
## Multiple R-squared: 0.6863, Adjusted R-squared: 0.6838
## F-statistic: 270.2 on 2 and 247 DF, p-value: < 2.2e-16
model1 <- lm(leaders$interwarafter~leaders$interwarbefore + leaders$success, data = leaders)
summary(model1)
##
## Call:
## lm(formula = leaders$interwarafter ~ leaders$interwarbefore +
## leaders$success, data = leaders)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3421 -0.1046 -0.1046 -0.1046 0.8954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10460 0.02448 4.273 2.75e-05 ***
## leaders$interwarbefore 0.23748 0.05584 4.253 3.00e-05 ***
## leaders$success -0.03897 0.12396 -0.314 0.754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3448 on 247 degrees of freedom
## Multiple R-squared: 0.06836, Adjusted R-squared: 0.06082
## F-statistic: 9.062 on 2 and 247 DF, p-value: 0.0001593
Model: We expect an increase of .8375 in polity score after each increase in the IV (assassinations) =. We expect a .2616 increase in polity score when an assassination attempt is successful. As a result we can claim that polity score is statistically significant and we can utilize it as a descriptor for the polity score of a nation after failed or successful assassination attempts. Model 1: We expect a .2420 increase in international war after each unit increase in international war while at the same time controlling for successful assassination attempts. We expect a .1048 decrease in international war after an assassination attempt is successful.
(2f) You want to know whether it matters how old the person was when the assassination attempt occurred mattered in terms of the likelihood of the assassination being successful. Using the tools learned in this course, examine whether this is the case and interpret your results. What does this say about your assumption of random assignment?
model2 <- lm(leaders$success~leaders$age, data = leaders)
summary(model2)
##
## Call:
## lm(formula = leaders$success ~ leaders$age, data = leaders)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.04302 -0.03464 -0.03216 -0.02906 0.97094
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0485965 0.0510111 0.953 0.342
## leaders$age -0.0003101 0.0009299 -0.333 0.739
##
## Residual standard error: 0.1767 on 248 degrees of freedom
## Multiple R-squared: 0.0004481, Adjusted R-squared: -0.003582
## F-statistic: 0.1112 on 1 and 248 DF, p-value: 0.7391
We expect a .0044 increase in success of assassination attempts as the age of the leader increases by 1. For example, there is a 1% increase in the success of an assassination after 2.5 years in office. We can’t say that there is a positive correlation between the increase of age of a ruler and the successfulness of an assassination attempt because we aren’t accounting for other variables. However we can say that the assumption of random assignment is not true that age leads to more success.
In the paper, “The Causes of Corruption: A Cross-National Study,” Daniel Treisman examines possible determinants of perceived corruption. The data he uses (saved as a file named “corruption.Rdata”) is available on the course website. Below is the list of the variables.
| Variable Name | Description |
|---|---|
| TI98 | Transparency International’s annual index of “perceived corruption” for 1998, ranging from 0 (least corrupt) to 10 (most corrupt). |
| commonlaw | 1 if common law legal system, 0 if civil law legal system. |
| britcolony | 1 if UK or former British colony, 0 otherwise. |
| noncolony | 1 if never a colony, 0 otherwise. |
| pctprot | protestants as a % of total population. |
| elf | ethnolinguistic fractionalization index, measuring the probability as of 1960 that two randomly selected people from the given country will not belong to the same ethnolinguistic group (simply put, higher scores mean higher levels of fragmentation). |
| FMMexports93 | the proportion of exports comprising fuels, metals, and minerals in 1993, from the World Bank’s World Development Reports. |
As usual, set your working directory and load the data.
(3a) Descriptive statistics: Explore the percieved corruption variable. What kind of variable is this? What is this variable’s maximum and minimum? What is the mean level of percieved corruption? What is the standard deviation? Create a density plot to show how this variable is distributed and substantively interpret what this plot tells you about how corruption is distributed across countries. (Remember, when you have missing data you often need to include “na.rm=T” in your code).
min(corruption$TI98, na.rm = T)
## [1] 0
max(corruption$TI98, na.rm = T)
## [1] 8.6
mean(corruption$TI98, na.rm = T)
## [1] 5.112941
sd(corruption$TI98, na.rm = T)
## [1] 2.402766
plot(density(corruption$TI98, na.rm = T), main = "Corruption")
abline(v=mean(corruption$TI98, na.rm = T), lty=2, lwd=2, col = "blue" )
abline(v=median(corruption$TI98, na.rm = T), lty=2, lwd=2, col = "red" )
legend("topright", legend = c("mean", "median"), lty = c(2,4), col = c("blue", "red"))
abline()
(3b) You want to understand what predicts preceived corruption across countries. Run a regression to predict corruption, using commonlaw, noncolony, the % of the population that’s protestant, ethnolinguistic fractionalization, and the export data you have. (Remember, variables are case sensitive in R.) Interpret your results in terms of statistical significance. Which factors seem to be associated with increased corruption? What is your model R-squared? Why does your degrees of freedom drop after running your model?
model3 <- lm(corruption$TI98 ~ corruption$commonlaw + corruption$noncolony + corruption$pctprot+ corruption$elf + corruption$FMMexports93, data = corruption)
summary(model3)
##
## Call:
## lm(formula = corruption$TI98 ~ corruption$commonlaw + corruption$noncolony +
## corruption$pctprot + corruption$elf + corruption$FMMexports93,
## data = corruption)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9217 -1.1433 0.1914 1.1566 4.6571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.11419 0.46600 10.975 8.84e-16 ***
## corruption$commonlaw -1.46659 0.53685 -2.732 0.00833 **
## corruption$noncolony -1.31345 0.61489 -2.136 0.03690 *
## corruption$pctprot -0.04419 0.01035 -4.269 7.37e-05 ***
## corruption$elf 0.02441 0.00863 2.829 0.00641 **
## corruption$FMMexports93 0.01883 0.01141 1.650 0.10434
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.768 on 58 degrees of freedom
## (44 observations deleted due to missingness)
## Multiple R-squared: 0.5384, Adjusted R-squared: 0.4986
## F-statistic: 13.53 on 5 and 58 DF, p-value: 9.473e-09
All the independent variables, with the expection of the export data, are statistically significant. The variable that is best used to describe corruption in this case is the percentage of protestants in the population. There is a positive association with an increase in corruption are ethnolinguistic fractionalization and export data. The export data is not considered significant even through it does lead to an increase in corruption. With an R-squared outcome of .5384 means that 53.88% of the variance of corruption is explained by the independent variables. We use 5 variables but the degrees of freedom are dropped.
(3c) What happens if you add britcolony to your model? Why might you expect this change, given the relationship between the type of law a country uses and its colonial history?
model4 <- lm(corruption$TI98~corruption$commonlaw + corruption$noncolony + corruption$pctprot + corruption$elf + corruption$FMMexports93 + corruption$britcolony, data = corruption)
summary(model4)
##
## Call:
## lm(formula = corruption$TI98 ~ corruption$commonlaw + corruption$noncolony +
## corruption$pctprot + corruption$elf + corruption$FMMexports93 +
## corruption$britcolony, data = corruption)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8954 -0.9570 0.1568 1.1067 4.5880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.292150 0.483479 10.946 1.23e-15 ***
## corruption$commonlaw -0.364621 1.007656 -0.362 0.7188
## corruption$noncolony -1.543111 0.636819 -2.423 0.0186 *
## corruption$pctprot -0.043730 0.010300 -4.246 8.13e-05 ***
## corruption$elf 0.022413 0.008721 2.570 0.0128 *
## corruption$FMMexports93 0.020430 0.011414 1.790 0.0788 .
## corruption$britcolony -1.245752 0.966152 -1.289 0.2025
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.758 on 57 degrees of freedom
## (44 observations deleted due to missingness)
## Multiple R-squared: 0.5515, Adjusted R-squared: 0.5043
## F-statistic: 11.68 on 6 and 57 DF, p-value: 1.722e-08
Only 3 of the 6 variables present are statistically significant (p value < .05) and we can see that common law has lost its significance when ethnolingusitic fractionalization becomes significant. In addition we also see that the best descriptor for corruption is still the protestant population which means that the data is robust.
(3d) Now let’s look at that relationship formally. If you were formerly a british colony, what kind of law would we predict you have – run a regression and interpret the results. What is the R-squared on this model?
model5 <- lm(commonlaw~britcolony, data = corruption)
summary(model5)
##
## Call:
## lm(formula = commonlaw ~ britcolony, data = corruption)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82353 -0.05797 -0.05797 0.17647 0.94203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05797 0.03535 1.64 0.104
## britcolony 0.76556 0.06153 12.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2937 on 101 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.6052, Adjusted R-squared: 0.6013
## F-statistic: 154.8 on 1 and 101 DF, p-value: < 2.2e-16
If the country in question is a former colony, there is an association of 76.55 percentage point increase in the probability of that country having common law when compared to countries that were not under British colonial rule. There is a 60.52% (r-squared result of .6052) of the variation in common law can be explained by looking at whether or not the nation was a former british colony