getwd()
## [1] "C:/Users/Matthew01/Documents/PS15/ProblemSet1"
setwd("/Users/Matthew01/Documents/PS15/ProblemSet1/")

Problem 1 Q1. A researcher observes that individuals with higher incomes vote at a higher rate. He decides to publish a research article that says earning over $100k/year causes people to participate in elections at a higher rate. Would you like to be a co-author on this paper? Why or why not?

I would not like to be a co-author on his paper because he is making a causal claim. If he made a correlation claim instead of a causal one I would sign it. For example, if he phrased his findings as people earning over $100k/year tend to participate in election at a higher rate. This shows that there is a realtionship between the two, but one does not cause the other.

Q2. In your own words, provide definitions for endogeneity and exogeneity. Describe how randomization produces exogeneity, and discuss why endogeneity creates challenges for making causal inferences.

An endogeneity variable is an independent variable that is influenced by other factors within the study usual. Somethign in the error term influences the changes

And exogeneity is when the factors that change the independent variable are different than the factors that change the dependent variable.

Randomnization produces endogeneity because it reduces the chances of a pattern or that the variables have anything in common outside of what is being measured in the experiment

Endogeneity creates challenges for making causal inferences because it can make you wrongly attribute the effect that one variabel has on another. In other words, a factor outside x or Y is causing the change.

Q3. What is external validity, and how might it undermine an experiment? In your view, does internal or external validity pose a bigger threat to experiments? Explain. (100 words max)

External validity is determining whether the inference gathered can be applied to more general situations.

External validity is a bigger threat to an experiment because if the findings of the study are over generalized enough, they will not be applicable because they are not under the same environment that the study was done in.

Q4. You decide to run an experiment to see whether going to section helps students learn. You randomly assign half of PS 15 to attend section and lecture, and to read the textbook. The other half of the class is assigned to read the textbook. At the end of the quarter, you give the entire class a test. You find that the students in the first group did much better than those in the second group. (120 words max)

4a. What could you call each group? The students attending lecture and section are the treatment group The students just reading the textbooks are the conrol group

4b. What is your independent variable (i.e., treatment), and what is your dependent variable (i.e., outcome)? Independent variable is test scores Dependent Variable is attending lecture and section

4c. Given this set up, list some factors you are controlling for. We are controlling the students attendance of lecture and section

4d. Can you say that attending section caused the students to do better on the test? Why or why not? Explain using the technical terms in the textbook.

You can not say that section casused the student to do better on the test because their may be exogenous or endogenous factors that are outside the error term that may cause this. You may say that there is a correlation

4e. Can you say that this finding would also apply in courses with virtual sections? Why or why not? Explain using the technical terms in the textbook.

Applying this finding to virtual sections would be an example of external validity and while the findings would not be directly applicable to virtual sections. It could be used to help find the relationship in an additional study

Q5. Imagine you are looking at the relationship between education and political participation. List some potential sources of endogeneity.

People who are more educated may live in wealthier areas and there may be alternate trends associated with political participation and wealth that affect the relationship. Families who value education may also value political participation

setwd("/Users/Matthew01/Documents/PS15/ProblemSet1/")

Q1

load('presdata.Rdata')

Q2

dim(presdata)
## [1] 17  7

Q3

mean(presdata$vote)
## [1] 52.04586
min(presdata$vote)
## [1] 44.6231
max(presdata$vote)
## [1] 61.8126

Q4

mean(presdata$rdi4)
## [1] 2.666299
min(presdata$rdi4)
## [1] -0.59695
max(presdata$rdi4)
## [1] 6.03529

Q5 and Q6

plot(presdata$rdi4, presdata$vote, ylab = "Incumbent Party's Vote Share", xlab = "Real Disposable Income", main = "The Relationship Between Incumbent Party's Vote Share and RDI,col= blue")

Q7a

model1 <- lm(presdata$vote ~ presdata$rdi4, data=presdata)
plot(presdata$rdi4, presdata$vote, ylab = "Incumbent Party's Vote Share", xlab = "Real Disposable Income", main = "The Relationship Between Incument Party's Vote Share and RDI")
abline(model1 ,col="red")

summary(model1)
## 
## Call:
## lm(formula = presdata$vote ~ presdata$rdi4, data = presdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6842 -3.7406 -0.2731  2.6357  7.5002 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    45.9385     1.6919  27.152 3.62e-14 ***
## presdata$rdi4   2.2906     0.5342   4.288 0.000648 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.765 on 15 degrees of freedom
## Multiple R-squared:  0.5507, Adjusted R-squared:  0.5207 
## F-statistic: 18.38 on 1 and 15 DF,  p-value: 0.0006477

Q7b There is a positive correlation relationship between the dependent and independent variabel because as one increases so does the other Q7c There is not enough information to make a causal claim about the relationship between these two factors Q7d These is not enough evidence to claim he is correct. However the graph above does support his argument.