Assignment

This homework has two different components (it is due on March 18):

Part 1: Take a look at the zelig-logit model help page and zelig-least-square model help page to gain a better understanding of how Zelig models work. Pay special attention to the contrast between the Zelig 5 syntax and the Zelig 4 compatibility syntax. Re-implement the computations described on pages 5.2-5.18 of the lecture slides, which used the Zelig 4 compatibility syntax, using the Zelig 5 syntax.

Part 2:Conduct a regression analysis (linear or logit) using Zelig and your own data set of choice. Make sure to do the estimation, simulation, and plotting, and write a few paragraphs to describe your procedure and results. Consult Imai et al. (2008) if necessary (It is located in Readings for Week #6). More information about Zelig can be found on its project home page The homework is due by the end of Sunday (Oct. 28).

Part 1: Zelig 5

Redoing the in-class example of Titanic passenger survival rates using Zelig 5 syntax.

library(radiant.data)

## Warning: package 'radiant.data' was built under R version 3.4.4

## Loading required package: magrittr

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.4.4

## Loading required package: lubridate

## Warning: package 'lubridate' was built under R version 3.4.4

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

## Loading required package: tidyr

## Warning: package 'tidyr' was built under R version 3.4.4

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:magrittr':
## 
##     extract

## Loading required package: dplyr

## Warning: package 'dplyr' was built under R version 3.4.4

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 
## Attaching package: 'radiant.data'

## The following objects are masked from 'package:lubridate':
## 
##     month, wday

## The following object is masked from 'package:ggplot2':
## 
##     diamonds

data(titanic)
library(Zelig)

## Loading required package: survival

## 
## Attaching package: 'Zelig'

## The following object is masked from 'package:ggplot2':
## 
##     stat

data("titanic")
titanic <- titanic %>%
  mutate(surv = as.integer(survived)) %>% 
  mutate(survival = sjmisc::rec(surv, rec = "2=0; 1=1")) %>% 
  select(pclass, survived, survival, everything())

## Warning: package 'bindrcpp' was built under R version 3.4.4

head(titanic)

z5=zlogit$new()
z5$zelig(survival ~ age + sex * pclass + fare, data = titanic)

Age effect

Using Zelig 5 syntax to find the expected values for age and displaying the results using graph().

a.range = min(titanic$age):max(titanic$age)
z5$setrange(age=a.range)
z5$sim()
z5$graph()

Just like in class, the likelihood of survival decreases with age – there is some logic to this, in that the passengers/crew would have tried to get children on to lifeboats first (this idea is informed somewhat by historical knowledge and somewhat from the 90s James Cameron movie), while older people could have been more negatively affected by icy water, prolonged exposure to cold, and other factors that might hit an older person harder than a younger person.

Fare effect

Use sim function to find expected values for fare and graph function to display in a nice way.

f.range = min(titanic$fare):max(titanic$fare)
z5$setrange(fare=f.range)
z5$sim()
z5$graph()

As we discussed in class, it’s not really clear why survival rates don’t seem to have any strong relationship with passenger fare – given the classist nature of the Edwardian period during which the Titanic sank, you would think there would be a strong preference for higher fares (i.e. higher class passengers).

We can also look at sex differences in survival rates:

library(Zelig)
z5 <- zlogit$new()
z5$zelig(survival ~ age + sex*pclass + fare, data = titanic)
summary(z5)

## Model: 
## 
## Call:
## z5$zelig(formula = survival ~ age + sex * pclass + fare, data = titanic)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0634  -0.6641  -0.4943   0.4336   2.4941  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)        4.8978215  0.6131092   7.988 1.37e-15
## age               -0.0387245  0.0068044  -5.691 1.26e-08
## sexmale           -3.8996177  0.5015659  -7.775 7.55e-15
## pclass2nd         -1.5923247  0.6024844  -2.643  0.00822
## pclass3rd         -4.1382715  0.5601819  -7.387 1.50e-13
## fare              -0.0009058  0.0020509  -0.442  0.65874
## sexmale:pclass2nd -0.0600076  0.6372949  -0.094  0.92498
## sexmale:pclass3rd  2.5019110  0.5479901   4.566 4.98e-06
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1409.99  on 1042  degrees of freedom
## Residual deviance:  931.45  on 1035  degrees of freedom
## AIC: 947.45
## 
## Number of Fisher Scoring iterations: 5
## 
## Next step: Use 'setx' method

z5$setx(sex="male")
z5$setx1(sex="female")
z5$sim()
z5$graph()

And we can also split male/female by classes.

Sex differences in 1st class:

z5=zlogit$new()
z5$zelig(survival ~ age + sex * pclass + fare, data = titanic)
z5$setx(sex="male",pclass="1st")
z5$setx1(sex="female",pclass="1st")
z5$sim()
z5$graph()

Sex differences in 2nd class:

z5=zlogit$new()
z5$zelig(survival ~ age + sex * pclass + fare, data = titanic)
z5$setx(sex="male",pclass="2nd")
z5$setx1(sex="female",pclass="2nd")
z5$sim()
z5$graph()

Sex differences in 3rd class:

z5=zlogit$new()
z5$zelig(survival ~ age + sex * pclass + fare, data = titanic)
z5$setx(sex="male",pclass="3rd")
z5$setx1(sex="female",pclass="3rd")
z5$sim()
z5$graph()

This is very cool and helpful, but it is even more convenient to put them all in one display, like we did in class with ggplot2.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The results and histograms look pretty much the same in Zelig 5 syntax as they did in the in-class exercise using zelig 4 syntax.

Part 2

Use Zelig to run a regression analysis of a new dataset.

Introduction

I am using the Pew Research Center’s Jan. 3-10, 2018 - Core Trends Survey, which is data on social media use and attitudes towards the internet in 2018. It can be downloaded from http://www.pewinternet.org/dataset/jan-3-10-2018-core-trends-survey/ in multiple formats; I took the CSV. This dataset has 70 variables and 2,002 observations related to demographics, media consumption habits, and attitudes. One of the earlier questions is about whether or not respondents think the internet has been good or bad for society; another question asks the number of books the respondent has read over the past year. In addition to demographics like age, sex, income, education levels, there is also a question about political leanings/party affiliation. I’m interested in whether or not there is a difference between Democrat/Republican/Independent attitudes towards the internet, and if the number of books read in the past year has an effect on that. The independent variable PARTY is categorical, the dependent variable of interest PIAL11 is categorical (1 is good, 2 is bad), and the other variable I’m interested in, BOOKS1, is an interval variable. Because the dependent is categorical, I’m going to use a logistic regression to see if party affiliation affects attitudes towards the internet.

Results

library(readr)
pew <- read_csv("/Users/meredithpowers/Desktop/pew.csv")
library(Zelig)
pew <- pew %>%
  mutate(pial11=sjmisc::rec(pew$pial11,rec="2=0;1=1"))
head(pew)

p5 <- zlogit$new()
p5$zelig(pial11 ~ party + sex + age + books1 + party*books1 + party*sex + party*age, data=pew)
summary(p5, odds_ratios = TRUE)

## Model: 
## 
## Call:
## stats::glm(formula = pial11 ~ party + sex + age + books1 + party * 
##     books1 + party * sex + party * age, family = binomial("logit"), 
##     data = as.data.frame(.))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2005   0.5015   0.5883   0.6613   0.9999  
## 
## Coefficients:
##              Estimate (OR) Std. Error (OR) z value Pr(>|z|)    
## (Intercept)      28.620942       13.314235   7.210 5.59e-13 ***
## party             0.730072        0.096520  -2.380   0.0173 *  
## sex               0.561857        0.128973  -2.512   0.0120 *  
## age               0.985129        0.005601  -2.635   0.0084 ** 
## books1            0.999397        0.005198  -0.116   0.9076    
## party:books1      1.001352        0.001621   0.835   0.4040    
## party:sex         1.123930        0.078485   1.673   0.0943 .  
## party:age         1.000993        0.001543   0.644   0.5197    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1539.4  on 1631  degrees of freedom
## Residual deviance: 1513.4  on 1624  degrees of freedom
##   (370 observations deleted due to missingness)
## AIC: 1529.4
## 
## Number of Fisher Scoring iterations: 4

Age, sex, and party affiliation has a statistically significant effect on the perception of whether the internet is good or bad for society. The number of books read in the past year seems to have no significant effect, and there is no significant interaction effect between any of the examined variables.

The effects of number of books read in the past year on attitudes towards the internet

b.range = min(pew$books1):max(pew$books1)
p5$setrange(books1=b.range)
p5$sim()
p5$graph()

## books1 chosen as the x-axis variable. Use the var argument to specify a different variable.

It’s not the most dramatic of slopes, but there is some correlation – there is a slight indication that as the number of books read in the past year increase, the attitudes towards the internet improves, but it is not significant (which is noted in the table above as well).

Difference in party affliation

p5=zelig(pial11 ~ party + sex + age + books1 + partybooks1 + partysex + party*age,model=“logit”,cite=F,data=pew) p5\(setx(party="1") p5\)setx1(party=“2”) p5\(sim() p5\)graph()

I am not sure what is happening here. I’m pretty sure I’m entering it just like I did for the Titanic dataset, and the variable party has multiple levels, so I’m not sure why it isn’t working. I really just wanted to see the graphs of the difference in party affiliation, but it’s not working for me right now.

Discussion

Unsurprisingly, age is the most significant predictor of attitudes towards the internet, with party afflilation and sex also being significant predictors. The amount a person reads in a year has no significant effect, though I thought that might have been interesting if there had been significant differences.

Homework 6