library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# load dataset 
data <- read.csv("anes_timeseries_2020.csv")

#1: Select 2 interval/ratio level variables

  1. V201555 - HOW MANY GRANDPARENTS BORN OUTSIDE THE US

  2. V202023 - HOW MANY DAYS IN PAST WEEK DISCUSSED POLITICS WITH FAMILY OR FRIENDS

datafinal <- data %>%
filter(!V201555 %in% c(-9, -8), !V202023 %in% c(-9, -7, -6, -1)) %>% select(V201555, V202023)

grandparents <- datafinal$V201555
days <- datafinal$V202023

#2: Choose 5 observations

five <- datafinal[1:5, ]

#3: Make a table of respondent number and variable values

colnames(five) <- c("grandparents", "days")
five
##   grandparents days
## 1            2    0
## 2            0    1
## 3            0    7
## 4            4    2
## 5            0    3

#4: Plotting my data

plot(five$days, five$grandparents, xlab="# of Days a Week Discuss Politics", ylab= "# of Grandparents Born Outside US")

#5. Draw the OLS line

plot(five$days, five$grandparents, xlab="# of Days a Week Discuss Politics", ylab= "# of Grandparents Born Outside US")
abline(lm(days ~ grandparents))

#6. With entire dataset, regress one variable on the other

regress <- lm(grandparents ~ days, data = datafinal)
summary(regress)
## 
## Call:
## lm(formula = grandparents ~ days, data = datafinal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0879 -1.0803 -1.0701  0.9223  2.9299 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.087927   0.038026  28.610   <2e-16 ***
## days        -0.002545   0.008278  -0.308    0.758    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.567 on 6745 degrees of freedom
## Multiple R-squared:  1.402e-05,  Adjusted R-squared:  -0.0001342 
## F-statistic: 0.09456 on 1 and 6745 DF,  p-value: 0.7585

#7. T-test on the coefficient of the independent variable is significantly different from zero at p = 0.05

From the results we got from the previous question, we can see that coefficient for days is -0.002545. Following it, we see that it has a t value of -0.308, which corresponds to a p value of 0.7585. Thus, there is no statistically significant difference because 0.7585 > 0.05.

#8. Interpret results

As aforementioned, we fail to reject the null hypothesis in question #7. In other words, we cannot conclude from the data that there is a linear relationship between having grandparents born outside the US and how many days in the past week you spend discussing politics.