Part 2

Use Zelig to run a regression analysis of a new dataset.

Introduction

This is mostly already in the homework I uploaded yesterday (last night) but I was still unsatisfied with some of the code I came up with, and decided to redo a little more. Still messing up the setx but will keep working on it.

Data

I am using the Pew Research Center’s Jan. 3-10, 2018 - Core Trends Survey, which is data on social media use and attitudes towards the internet in 2018. It can be downloaded from http://www.pewinternet.org/dataset/jan-3-10-2018-core-trends-survey/ in multiple formats; I took the CSV. This dataset has 70 variables and 2,002 observations related to demographics, media consumption habits, and attitudes. One of the earlier questions is about whether or not respondents think the internet has been good or bad for society; another question asks the number of books the respondent has read over the past year. In addition to demographics like age, sex, income, education levels, there is also a question about political leanings/party affiliation. I’m interested in whether or not there is a difference between Democrat/Republican/Independent attitudes towards the internet, and if the number of books read in the past year has an effect on that. The independent variable PARTY is categorical, the dependent variable of interest PIAL11 is categorical (1 is good, 2 is bad), and the other variable I’m interested in, BOOKS1, is an interval variable. Because the dependent is categorical, I’m going to use a logistic regression to see if party affiliation affects attitudes towards the internet.

Results

library(readr)
pew <- read_csv("/Users/meredithpowers/Desktop/pew.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   usr = col_character(),
##   `pial11ao@` = col_character(),
##   weight = col_double(),
##   cellweight = col_double()
## )
## See spec(...) for full column specifications.
library(Zelig)
## Loading required package: survival
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
pew <- pew %>%
  mutate(pial11=sjmisc::rec(pew$pial11,rec="2=0;1=1"))
## Warning: package 'bindrcpp' was built under R version 3.4.4
head(pew)
pew <- pew %>%
  mutate(party=sjmisc::rec(pew$party,rec="2=democrat;1=republican"))
head(pew)
pew <- pew %>%
  mutate(sex=sjmisc::rec(pew$sex,rec="2=female;1=male"))
head(pew)
p5 <- zlogit$new()
p5$zelig(pial11 ~ party + sex + age + books1 + party*books1 + party*sex + party*age, data=pew)
summary(p5, odds_ratios = TRUE)
## Model: 
## 
## Call:
## stats::glm(formula = pial11 ~ party + sex + age + books1 + party * 
##     books1 + party * sex + party * age, family = binomial("logit"), 
##     data = as.data.frame(.))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.2785   0.3656   0.5784   0.6832   0.9870  
## 
## Coefficients:
##                         Estimate (OR) Std. Error (OR) z value Pr(>|z|)    
## (Intercept)                  7.384808        2.096194   7.044 1.87e-12 ***
## partydemocrat                2.524225        1.427453   1.637  0.10155    
## partyrepublican              0.270697        0.140499  -2.518  0.01181 *  
## sexmale                      0.980279        0.184507  -0.106  0.91572    
## age                          0.986955        0.004575  -2.832  0.00462 ** 
## books1                       1.002074        0.003832   0.542  0.58804    
## partydemocrat:books1         1.023133        0.011734   1.994  0.04613 *  
## partyrepublican:books1       0.994850        0.006408  -0.802  0.42281    
## partydemocrat:sexmale        1.885090        0.657076   1.819  0.06894 .  
## partyrepublican:sexmale      2.328796        0.734607   2.680  0.00737 ** 
## partydemocrat:age            0.986367        0.008886  -1.524  0.12757    
## partyrepublican:age          1.017396        0.008287   2.117  0.03424 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1539.4  on 1631  degrees of freedom
## Residual deviance: 1477.4  on 1620  degrees of freedom
##   (370 observations deleted due to missingness)
## AIC: 1501.4
## 
## Number of Fisher Scoring iterations: 5

This is a lot more interesting than the way I did it the first time. The specific interactions are much more clear. We see that age still has a strong influence on attitude towards the internet. It’s a lot more obvious that being affiliated with the Republican party also influences attitude towards the internet, and that Republican affliation has a strong interaction effect with sex (male in this case) and age. Interestingly, what didn’t really show in my previous attempt at analysis was the interaction between Democrat party affiliation and number of books read in the past year. While reading alone or being a Democrat alone doesn’t improve your attitude towards the internet, it seems like being a Democrat and reading more books per year improves your attitude towards the internet.

Remember this data comes from January 2018, when Democrats aren’t exactly in any real position of power in the federal government, and the Facebook Cambridge Analytica election-influencing scandal had not yet hit the news (that wasn’t until March 2018). I’m not sure why the number of books would have an effect on Democrats specifically. Maybe they are distracting fun books or they’re the kinds of books that have historical context? Or maybe people who read more books spend less time on some of the more isolating or hostile parts of the internet? It’s hard to say why it has an effect, but it’s interesting that it does.

Graphs

Still really struggling with this, mostly just getting errors with setx factors. Will keep looking into it.