DATA 606 Data Project

Part 1 - Introduction

Parents transmit culture to their children. This transmission of culture may include political affiliation. Although many notable exceptions exist, it is a common observation that parents active in politics raise children with similar political sensibilities. Further, communication technologies have developed and spread throughout the last two decades providing youth with a variety of social and political perspectives.

To examine data on political affiliation among families I used “Time Series Cumulative Data File (1948-2016)” from The American National Election Studies (www.electionstudies.org).
From the site:

“ANES Time Series studies have been conducted since 1948, typically through in-person interviewing, during years of biennial national elections. Topics cover voting behavior and the elections, together with questions on public opinion and attitudes of the electorate. In all Time Series studies, an interview is completed just after the election (the Post-election or”Post" interview); during years of Presidential elections an interview is also completed just before the election (the Pre-election or “Pre” interview).“

How faithfully has political affiliation been transmitted to children through the years in the survey? Are the individuals more or less likely to adhere to the affiliation of at least one parent in recent years?

Does parent political affiliation predict the affiliation of individual adults, and does this vary across different demographic factors?

The answer to these questions would be useful to campaigns and movements that seek to modify public policy through elections.

Part 2 - Data

The data were downloaded and examined. Preliminary cleaning follows, with variables renamed to ease analysis. Two files are created–one with data, the other with variable names and explanations. The following code chunk will not be evaluated but illustrates the initial process.

library(foreign)

# load data
data<-read.spss("C://Users/Stephen Jones/Documents/anes_timeseries_cdf.sav",header=TRUE)

data<-as.data.frame(data)

variable<-c('VCF0004',
'VCF0301',
'VCF0302',
'VCF0303',
'VCF0305',
'VCF0306',
'VCF0307',
'VCF0308',
'VCF0309',
'VCF0101',
'VCF0102',
'VCF0103',
'VCF0104',
'VCF0105a',
'VCF0105b',
'VCF0106',
'VCF0107',
'VCF0108',
'VCF0109',
'VCF0110',
'VCF0111',
'VCF0112',
'VCF0113',
'VCF0846')

labels<-c('STUDY VARIABLE: Year of Study',
'PARTISANSHIP: Party Identification of Respondent- 7-point Scale',
'PARTISANSHIP: Party Identification of Respondent- Initial Party ID Response',
'PARTISANSHIP: Party Identification of Respondent- Summary 3-Category',
'PARTISANSHIP: Party Identification of Respondent- Strength of Partisanship',
'PARTISANSHIP: Party Identification of Respondents Father',
'PARTISANSHIP: Party Identification of Respondents Mother',
'PARTISANSHIP: Political Interest of Respondents Father',
'PARTISANSHIP: Political Interest of Respondents Mother',
'DEMOGRAPHICS: Respondent - Age',
'DEMOGRAPHICS: Respondent - Age Group',
'DEMOGRAPHICS: Respondent - Cohort',
'DEMOGRAPHICS: Respondent - Gender',
'DEMOCGRAPHICS: Race-ethnicity summary, 7 categories',
'DEMOGRAPHICS: Race-ethnicity summary, 4 categories',
'DEMOGRAPHICS: Race summary, 3 categories',
'DEMOGRAPHICS: Respondent - Hispanic Origin Type',
'DEMOGRAPHICS: Respondent - Hispanic Origin',
'DEMOGRAPHICS: Respondent - Ethnicity',
'DEMOGRAPHICS: Respondent - Education 4-category',
'SAMPLE DESCRIPTION: Urbanism',
'SAMPLE DESCRIPTION: Census Region',
'SAMPLE DESCRIPTION: Political South/Nonsouth',
'RELIGIOSITY: Is Religion Important to Respondent')

key<-cbind(variable,labels)

#keep only the variables of interest
data<-data[variable]

#remove observations with missing parent data
dataC<-data[which(data$VCF0301!= ''),]

dataCa<-data[which(!is.na(data$VCF0004)),]

#remove observations where party of both parents are missing 
dataD<-dataCa[which(substring(data$VCF0301,1,1)!='0'&substring(data$VCF0302,1,1)!='0'),]

colnames(dataD)<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')

new<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')

key<-cbind(new,key)

#write to csv and upload to GitHub
write.csv(dataD,"C:/MSDS/Elections.csv")
write.csv(key,"C:/MSDS/Variables.csv")

The resulting data are uploaded to GitHub.

Data: https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv

Variables: https://github.com/sigmasigmaiota/elections/blob/master/Variables.csv

Data collection

Data were collected via survey, on the phone and face-to-face. Individuals were randomly selected, one from each household. The site contains more details: https://electionstudies.org/data-quality/

Cases

There are 45760 observations in the preliminarily cleaned dataset, which each observation representinga survey respondent.
## Variables

Variables in the data are listed in the table below.

key<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Variables.csv",header=TRUE)

key$x<-NULL

library(DT)
datatable(key)

Dependent Variable

The response variable, discordance, is calculated based on two or more qualitative variables: political affiliation of the respondent and political affiliation of one or both parents.

The survey provides a 3-level collapsed variable for partisan identification for participants (Democrat, Independent, Republican); political affiliation for each parent in the survey is coded as Republican, Independent, Democrat, NA, Unknown, and Other.

Discordance is defined as a respondent partisan identity that differs from one or more parents’ partisan identity with parent identity limited to Democrat, Independent and Republican identities. If a parent identifies as NA, Unknown or Other, no discordance is found.

As a related note, parental political affiliation were obtained from respondents and not the parents themselves.

Independent Variable

The independent variables are year, age of respondent, gender of respondent, and region of the United States. In these data, year is limited to the five years with complete data (1968, 1976, 1980, 1988, 1992). Gender is limited to dichotomous identity, male or female; region is a collapsed variable included in the dataset, recoded to a binary South or Nonsouth status.

Type of study

This is an observational study. Participants were randomly selected and interviewed over the phone or face-to-face.

Scope of inference - generalizability

The population of interest is composed of American voters. Although the sample is randomly selected, the exact composition of the pool from which the sample was drawn is unknown and likely varies through the years in the survey. Additionally, electronic data collection have changed the method by which data are collected in more recent responses. Findings from this analysis are not generalizable to the entire population of American voters, but form a basis for more exploration.

Scope of inference - causality

These data can not be used to establish causal links. Political affiliation evolves through innumerable socioeconomic variables not quantified in these data.

This analysis explores the correlation between the variables and the calculated political discordant status as defined above.

Part 3 - Exploratory Data Analysis

Relevant summary statistics

Summary statistics follow:

data<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv",header=TRUE)

summary(data)
##        X              Year                            PartyID7cat  
##  Min.   :12253   Min.   :1968   1. Strong Democrat          :8802  
##  1st Qu.:25320   1st Qu.:1980   2. Weak Democrat            :8692  
##  Median :36860   Median :1992   3. Independent - Democrat   :5836  
##  Mean   :36815   Mean   :1993   4. Independent - Independent:5888  
##  3rd Qu.:48435   3rd Qu.:2008   5. Independent - Republican :5016  
##  Max.   :59944   Max.   :2016   6. Weak Republican          :6034  
##                                 7. Strong Republican        :5492  
##                           PartyIDinit   
##  1. Republican                  :11526  
##  2. Independent                 :13273  
##  3. No preference; none; neither: 2869  
##  4. Other                       :  462  
##  5. Democrat                    :17495  
##  8. DK                          :   81  
##  9. NA; refused                 :   54  
##                              PartyID3cat   
##  1. Democrats (including leaners)  :23330  
##  2. Independents                   : 5888  
##  3. Republicans (including leaners):16542  
##                                            
##                                            
##                                            
##                                            
##                                           PartyIDstr   
##  0. DK; NA; other; refused to answer; no Pre IW:   68  
##  1. Independent or Apolitical                  : 5819  
##  2. Leaning Independent                        :10852  
##  3. Weak Partisan                              :14728  
##  4. Strong Partisan                            :14293  
##                                                        
##                                                        
##                                                    FatherID    
##  0. NA; no Pre IW; no Post IW; R had no                : 1030  
##  1. Democrat                                           : 6708  
##  2. Independent (some years also: shifted around)      : 1068  
##  3. Republican                                         : 3595  
##  4. Other; minor party; apolitical; never voted, didn't:  823  
##  9. DK (exc. 1988)                                     : 1597  
##  NA's                                                  :30939  
##                                                    MotherID    
##  0. NA; no Pre IW; no Post IW; R had no                :  917  
##  1. Democrat                                           : 6512  
##  2. Independent                                        : 1149  
##  3. Republican                                         : 3476  
##  4. Other; minor party; apolitical; never voted, didn't: 1097  
##  9. DK (exc. 1988)                                     : 1670  
##  NA's                                                  :30939  
##                                     FatherInt    
##  0. NA; R had no father/father substitute:  154  
##  1. Didn't pay much attention            : 1527  
##  2. Somewhat interested                  : 2127  
##  3. Very much interested                 : 2262  
##  9. DK                                   :  374  
##  NA's                                    :39316  
##                                                  
##                                     MotherInt          Age       
##  0. NA; R had no mother/mother substitute:   98   35     : 1012  
##  1. Didn't pay much attention            : 2882   32     :  947  
##  2. Somewhat interested                  : 2070   28     :  946  
##  3. Very much interested                 : 1093   37     :  940  
##  9. DK                                   :  301   33     :  938  
##  NA's                                    :39316   34     :  938  
##                                                   (Other):40039  
##        AgeGroup               Cohort                   Gender     
##  2. 25 - 34:9244   4. 1943 - 1958:14388   0. NA; no Pre IW:   40  
##  3. 35 - 44:8630   3. 1959 - 1974: 9086   1. Male         :20564  
##  4. 45 - 54:7387   5. 1927 - 1942: 8470   2. Female       :25145  
##  5. 55 - 64:6957   6. 1911 - 1926: 6181   3. Other (2016) :   11  
##  6. 65 - 74:5303   2. 1975 - 1990: 3847                           
##  1. 17 - 24:4686   7. 1895 - 1910: 2472                           
##  (Other)   :3553   (Other)       : 1316                           
##                                                        RaceEth7cat   
##  1. White non-Hispanic (1948-2012)                           :34510  
##  2. Black non-Hispanic (1948-2012)                           : 5682  
##  3. Asian or Pacific Islander, non-Hispanic (1966-2012)      :  553  
##  4. American Indian or Alaska Native non-Hispanic (1966-2012):  331  
##  5. Hispanic (1966-2012)                                     : 3864  
##  6. Other or multiple races, non-Hispanic (1968-2012)        :  556  
##  9. Missing                                                  :  264  
##                                    RaceEth4cat   
##  1. White non-Hispanic                   :34510  
##  2. Black non-Hispanic                   : 5682  
##  3. Hispanic                             : 3864  
##  4. Other or multiple races, non-Hispanic: 1440  
##  9. Missing, DK/REF/NA                   :  264  
##                                                  
##                                                  
##                 RaceEth3cat                               HispLatType   
##  1. White non-Hispanic:34510   7. No, not Hispanic              :33657  
##  2. Black non-Hispanic: 5682   1. Yes, Mexican-American, Chicano: 2017  
##  3. Other             : 5304   3. Yes, other Hispanic           : 1167  
##  9. Missing, DK/REF/NA:  264   2. Yes, Puerto Rican             :  414  
##                                9. NA if Hispanic                :  294  
##                                (Other)                          :  209  
##                                NA's                             : 8002  
##                                   HispLatOrigin  
##  0. Short-form 'new' Cross Section (1992):    4  
##  1. Yes, R is Hispanic                   : 3701  
##  2. No, R is not Hispanic                :33657  
##  8. DK                                   :  102  
##  9. NA                                   :  294  
##  NA's                                    : 8002  
##                                                  
##                                            Ethnicity    
##  877. 'American'; 'Just American'; none; neither: 7367  
##  290                                            : 3556  
##  862                                            : 2582  
##  190                                            : 2382  
##  180                                            : 1946  
##  (Other)                                        :12523  
##  NA's                                           :15404  
##                                                   Educ4cat    
##  0. DK; NA; no Pre IW; short-form 'new' Cross Section :  403  
##  1. Grade school or less (0-8 grades)                 : 3773  
##  2. High school (12 grades or fewer, incl. non-college:19123  
##  3. Some college (13 grades or more but no degree;    :11701  
##  4. College or advanced degree (no cases 1948)        :10760  
##                                                               
##                                                               
##                                                Urbanism    
##  0. NA; telephone (RDD) sample (2000)              :  796  
##  1. Central cities                                 : 7769  
##  2. Suburban areas                                 :11575  
##  3. Rural, small towns, outlying and adjacent areas:10539  
##  NA's                                              :15081  
##                                                            
##                                                            
##                                                    CensusRegion  
##  1. Northeast (CT, ME, MA, NH, NJ, NY, PA, RI, VT)       : 8314  
##  2. North Central (IL, IN, IA, KS, MI, MN, MO, NE, ND,   :11673  
##  3. South (AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC  :16619  
##  4. West (AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA,: 9154  
##                                                                  
##                                                                  
##                                                                  
##      SouthNonsouth  
##  1. South   :14286  
##  2. Nonsouth:31474  
##                     
##                     
##                     
##                     
##                     
##                                                Religion    
##  0. NA; no Post IW; abbrev. telephone IW (1984, see: 2167  
##  1. Yes, important                                 :23676  
##  2. No, not important                              : 8148  
##  8. DK                                             :   95  
##  NA's                                              :11674  
##                                                            
## 

Complete data were found in the years 1968, 1972, 1976, 1980, 1988, and 1992; the following plot illustrates political affiliation of the respondents.

Political Affiliation: Visualizations

Respondents

library(ggplot2)

respond<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]

#party ID of respondents
ggplot(respond,aes(PartyID3cat,fill=PartyID3cat))+
  geom_bar()+
  facet_grid(.~Year)+
  theme(legend.position = "none")+
  theme_classic()+
  labs(fill = "Party")+
  theme(axis.text.x = element_blank())+
  ggtitle("Political Affiliation, Respondents")

Maternal

mother<-data[which(data$MotherID != '' & substring(data$MotherID,1,1) != 0),]

#party ID of respondents
ggplot(mother,aes(MotherID,fill=MotherID))+
  geom_bar()+
  facet_grid(.~Year)+
  theme(legend.position = "none")+
  theme_classic()+
  labs(fill = "Party")+
  theme(axis.text.x = element_blank())+
  ggtitle("Political Affiliation, Mother")

Paternal

father<-data[which(data$FatherID != '' & substring(data$FatherID,1,1) != 0),]

#party ID of respondents
ggplot(father,aes(FatherID,fill=FatherID))+
  geom_bar()+
  facet_grid(.~Year)+
  theme(legend.position = "none")+
  theme_classic()+
  labs(fill = "Party")+
  theme(axis.text.x = element_blank())+
  ggtitle("Political Affiliation, Father")

The plots appear to align; there appear to be no major differences in proportions by year.

Analysis: Chi-squared

chi<-table(data$PartyID3cat,data$FatherID)

chisq.test(chi)
## 
##  Pearson's Chi-squared test
## 
## data:  chi
## X-squared = 3329.5, df = 10, p-value < 2.2e-16

As expected, there is a significant effect of paternal affiliation on respondent partisan identity. Let’s check the maternal affiliation.

chi<-table(data$PartyID3cat,data$MotherID)

chisq.test(chi)
## 
##  Pearson's Chi-squared test
## 
## data:  chi
## X-squared = 3387.6, df = 10, p-value < 2.2e-16

Again, maternal affiliation predicts respondent partisanship.

Discordance

Below, discordance is assessed by comparing participant affiliation with the father’s affiliation and the mother’s affiliation separately. An additional variable is created to denote discordance with at least one parent. A participant was deemed discordant with a parent only if that parent identified as a Democrat, Independent or Republican; any other parental political status was deemed non-discordant.

Participants were either Democrat (including those that simply indicated that they lean Democrat), Independent, or Republican (including those that indicated that they lean Republican). Father and mother affiliation were coded as Democrat, Independent, Republican, NA, Other, and DK (except 1988).

Respondents who reported not knowing political affiliation for both parents were removed, leaving 10,497 cases wherein at least one parent was identified as Republican, Democrat or Independent.

#convert age variable and Year variables
data$Age <- as.numeric(as.character(data$Age))
data$Year <- as.factor(as.character(data$Year))

#paternal and maternal party affiliation
data$PID3num<-substring(data$PartyID3cat,1,1)
data$FIDnum<-substring(data$MotherID,1,1)
data$MIDnum<-substring(data$FatherID,1,1)

#create gender variable
data$Gen<-substring(data$Gender,1,1)
data$Gen <- as.factor(as.character(data$Gen))

#discordance, maternal, paternal and between parents
data$DiscordF <- as.numeric(data$PID3num) - as.numeric(data$FIDnum)
data$DiscordM <- as.numeric(data$PID3num) - as.numeric(data$MIDnum)
data$DiscordP <- as.numeric(data$MIDnum) - as.numeric(data$FIDnum)

data$DiscordF <- ifelse(abs(data$DiscordF)>0 & data$FIDnum >= 1 & data$FIDnum <= 3,1,0)
data$DiscordM <- ifelse(abs(data$DiscordM)>0 & data$MIDnum >= 1 & data$MIDnum <= 3,1,0)
data$DiscordP <- ifelse(abs(data$DiscordP)>0 & data$FIDnum >= 1 & data$FIDnum <= 3
                        & data$MIDnum >= 1 & data$MIDnum <= 3,1,0)

#create dichotomous variables
data$DiscordF[which(abs(data$DiscordF)>0)]<-1
data$DiscordM[which(abs(data$DiscordM)>0)]<-1
data$DiscordP[which(abs(data$DiscordP)>0)]<-1

#transform to factor
data$DiscordF<-as.factor(data$DiscordF)
data$DiscordM<-as.factor(data$DiscordM)
data$DiscordP<-as.factor(data$DiscordP)

data$FIDnum<-as.numeric(as.character(data$FIDnum))
data$MIDnum<-as.numeric(as.character(data$MIDnum))
data$PID3num<-as.numeric(as.character(data$PID3num))
data$DiscordF<-as.numeric(as.character(data$DiscordF))
data$DiscordM<-as.numeric(as.character(data$DiscordM))
data$DiscordP<-as.numeric(as.character(data$DiscordP))

data$anyDiscord<-ifelse(data$DiscordF == 1 | data$DiscordM == 1,1,0)

#keep only years in which parent political affiliation is submitted
respond2<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]

#subset dataset to remove responses wherein both mother and father political affiliation are unknown, other or NA  
respond3<-respond2[-which((respond2$FIDnum == 9 | respond2$FIDnum == 4 | respond2$FIDnum == 0) & respond2$MIDnum == respond2$FIDnum),]

Discordance: visualizations

Paternal

#discordant affiliation with father by year
ggplot(respond2,aes(DiscordF,fill=factor(DiscordF)))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Paternal Affiliation, Discordance by Year")+
  theme_classic() + 
  scale_x_discrete(breaks=c("0","1"),
        labels=c("No", "Yes"))+
  labs(fill = "Discord")

Maternal

#discordant affiliation with mother by year
ggplot(respond2,aes(DiscordM,fill=factor(DiscordM)))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Maternal Affiliation, Discordance by Year")+
  theme_classic() + 
  scale_x_discrete(breaks=c("0","1"),
        labels=c("No", "Yes"))+
  labs(fill = "Discord")

Parents

#discordant affiliation between parents by year
ggplot(respond2,aes(DiscordP,fill=factor(DiscordP)))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Between-Parent Affiliation, Discordance by Year")+
  theme_classic() + 
  scale_x_discrete(breaks=c("0","1"),
        labels=c("No", "Yes"))+
  labs(fill = "Discord")

Any

#discordant affiliation between parents by year
ggplot(respond2,aes(anyDiscord,fill=factor(anyDiscord)))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Discordance with at Least One Parent by Year")+
  theme_classic() + 
  scale_x_discrete(breaks=c("0","1"),
        labels=c("No", "Yes"))+
  labs(fill = "Discord")


Conditions for inference

Let’s look at the proportions of discordance overall:

table(respond3$anyDiscord)
## 
##    0    1 
## 6507 3990

Just under 50% of the sample’s respondents have discordant political affiliation with one or more parents.

Let’s produce a table to view contrasts by year.

#confirm data type as factor
respond3$Year <- factor(respond3$Year)

xtabs(~anyDiscord + Year, data = respond3)
##           Year
## anyDiscord 1968 1972 1976 1980 1988 1992
##          0  876 1532 1186  843  897 1173
##          1  483  870  754  478  608  797

Additionally, the code below illustrates the distribution of age in our sample to assess normality.

library("ggpubr")

ggdensity(respond3$Age, 
          main = "Density plot of age",
          xlab = "Age")

hist(respond3$Age)

The distribution is unimodal; with mean 45.1 and median 43, the distribution is right-skewed.

Below, a Q-Q plot is produced below.

ggqqplot(respond3$Age)

Age appears to deviate from normal distribution.

In order to check more formally, a Shapiro-Wilk test is performed on a random sample of 5000; the simulation is run 1000 times to get a feel for the distribution of the p-value.

library(stats)

ShapWilk<-function(x){
cases<-sample(1:nrow(respond3),x)
val<-shapiro.test(respond3$Age[cases])
return(val$p.value)
}

ShapWilk.sim<-replicate(1000,ShapWilk(5000))

hist(ShapWilk.sim,
     breaks = 50)

Age is not normally distributed (p < .05).

Part 4 - Inference

Let’s check suitability for inference.

We know that each observation is independent; we will assume each survey response represents an individual in a different family.

There are at least ten successes and failures for each year; the code below confirms this assertion.

#Discordance with at least one parent, 1968, 1976, 1980, 1988 and 1992
pol68 <- subset(respond3, Year == 1968)
pol76 <- subset(respond3, Year == 1976)
pol80 <- subset(respond3, Year == 1980)
pol88 <- subset(respond3, Year == 1988)
pol92 <- subset(respond3, Year == 1992)

prop68disc<-sum(pol68$anyDiscord == 1)/nrow(pol68)
prop76disc<-sum(pol76$anyDiscord == 1)/nrow(pol76)
prop80disc<-sum(pol80$anyDiscord == 1)/nrow(pol80)
prop88disc<-sum(pol88$anyDiscord == 1)/nrow(pol88)
prop92disc<-sum(pol92$anyDiscord == 1)/nrow(pol92)

cat("The proportion of discordance with at least one parent by year are:\n    1968:",round(prop68disc*100,0),"%\n    1976:",round(prop76disc*100,0),"%\n    1980:",round(prop80disc*100,0),"%\n    1988:",round(prop88disc*100,0),"%\n    1992:",round(prop92disc*100,0),"%\n")
## The proportion of discordance with at least one parent by year are:
##     1968: 36 %
##     1976: 39 %
##     1980: 36 %
##     1988: 40 %
##     1992: 40 %
#Discordance proportions, 1968
pol68.disc <- prop68disc*nrow(pol68)
pol68.notdisc <- (1-prop68disc)*nrow(pol68)
cat("Since",pol68.disc,"(discordant) &",pol68.notdisc,"(not discordant) are sufficiently large (> 10),\n\n the success-failure criterion is met for discordance in 1968.\n\n")
## Since 483 (discordant) & 876 (not discordant) are sufficiently large (> 10),
## 
##  the success-failure criterion is met for discordance in 1968.
#Discordance proportions, 1976
pol76.disc <- prop76disc*nrow(pol76)
pol76.notdisc <- (1-prop76disc)*nrow(pol76)
cat("Since",pol76.disc,"(discordant) &",pol76.notdisc,"(not discordant) are sufficiently large (> 10),\n\n the success-failure criterion is met for discordance in 1976.\n\n")
## Since 754 (discordant) & 1186 (not discordant) are sufficiently large (> 10),
## 
##  the success-failure criterion is met for discordance in 1976.
#Discordance proportions, 1980
pol80.disc <- prop80disc*nrow(pol80)
pol80.notdisc <- (1-prop80disc)*nrow(pol80)
cat("Since",pol80.disc,"(discordant) &",pol80.notdisc,"(not discordant) are sufficiently large (> 10),\n\n the success-failure criterion is met for discordance in 1980.\n\n")
## Since 478 (discordant) & 843 (not discordant) are sufficiently large (> 10),
## 
##  the success-failure criterion is met for discordance in 1980.
#Discordance proportions, 1988
pol88.disc <- prop88disc*nrow(pol88)
pol88.notdisc <- (1-prop88disc)*nrow(pol88)
cat("Since",pol88.disc,"(discordant) &",pol88.notdisc,"(not discordant) are sufficiently large (> 10),\n\n the success-failure criterion is met for discordance in 1988.\n\n")
## Since 608 (discordant) & 897 (not discordant) are sufficiently large (> 10),
## 
##  the success-failure criterion is met for discordance in 1988.
#Discordance proportions, 1992
pol92.disc <- prop92disc*nrow(pol92)
pol92.notdisc <- (1-prop92disc)*nrow(pol92)
cat("Since",pol92.disc,"(discordant) &",pol92.notdisc,"(not discordant) are sufficiently large (> 10),\n\n the success-failure criterion is met for discordance in 1992.\n\n")
## Since 797 (discordant) & 1173 (not discordant) are sufficiently large (> 10),
## 
##  the success-failure criterion is met for discordance in 1992.

With conditions for inference satisfied, the hypothesis for analysis becomes:

\(H_0\): There is no convincing evidence that a difference exists in political affiliation discordance between any pairing of years in the survey from 1968 and 1992
\(H_A\): There is convincing evidence that difference exists in the discordance between at least one pairing of year between 1968 and 1992

The inference command from package statsr is used below.

library(statsr)

#inference, discordance 1968
inference(y = factor(pol68$anyDiscord), data=pol76, statistic = "proportion", type = "ci", method = "theoretical", 
          success = 1)
## Single categorical variable, success: 1
## n = 1359, p-hat = 0.3554
## 95% CI: (0.33 , 0.3809)
#inference, discordance 1976
inference(y = factor(pol76$anyDiscord), data=pol76, statistic = "proportion", type = "ci", method = "theoretical", 
          success = 1)
## Single categorical variable, success: 1
## n = 1940, p-hat = 0.3887
## 95% CI: (0.367 , 0.4104)
#inference, discordance 1980
inference(y = factor(pol80$anyDiscord), data=pol80, statistic = "proportion", type = "ci", method = "theoretical", 
          success = 1)
## Single categorical variable, success: 1
## n = 1321, p-hat = 0.3618
## 95% CI: (0.3359 , 0.3878)
#inference, discordance 1988
inference(y = factor(pol88$anyDiscord), data=pol88, statistic = "proportion", type = "ci", method = "theoretical", 
          success = 1)
## Single categorical variable, success: 1
## n = 1505, p-hat = 0.404
## 95% CI: (0.3792 , 0.4288)
#inference, discordance 1992
inference(y = factor(pol92$anyDiscord), data=pol92, statistic = "proportion", type = "ci", method = "theoretical", 
          success = 1)
## Single categorical variable, success: 1
## n = 1970, p-hat = 0.4046
## 95% CI: (0.3829 , 0.4262)

Each of the confidence intervals overlap with one exception–1968 and 1992. While there is no convincing evidence that a significant change exists in the discordance between any of the years 1968, 1976, 1980, 1988, and 1992, there is no overlap between the years of 1968 and 1992. The null hypothesis is therefore rejected; there is convincing evidence that difference exists in the discordance between at least one pairing of year between 1968 and 1992.

Over the 24 years between 1968 and 1992, variations in sample discordance occurred, but a relationship was observed when 1968 and 1992 were compared.

Binary Logistic Regression

Are year, age, gender and region predictors of discordance in these data?

A model was programmed to run a binary logistic regression for any discordance on year, age, gender and region. Year is confirmed as a factor and age is confirmed as a continuous variable.

#confirm variable type
respond3$Year <- factor(respond3$Year)
respond3$Age <- as.numeric(as.character(respond3$Age))

#run model
model1 <- glm(anyDiscord ~ Year + Age + Gen + SouthNonsouth, data = respond3, family = "binomial")

#call values in summary
summary(model1)
## 
## Call:
## glm(formula = anyDiscord ~ Year + Age + Gen + SouthNonsouth, 
##     family = "binomial", data = respond3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1791  -0.9935  -0.8968   1.3365   1.7045  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -0.365946   0.088050  -4.156 3.24e-05 ***
## Year1972                  0.018539   0.071305   0.260  0.79487    
## Year1976                  0.137844   0.073871   1.866  0.06204 .  
## Year1980                  0.024237   0.081020   0.299  0.76483    
## Year1988                  0.217895   0.077771   2.802  0.00508 ** 
## Year1992                  0.218187   0.073390   2.973  0.00295 ** 
## Age                      -0.007492   0.001177  -6.363 1.98e-10 ***
## Gen2                     -0.175964   0.040711  -4.322 1.54e-05 ***
## SouthNonsouth2. Nonsouth  0.279377   0.046253   6.040 1.54e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 13896  on 10460  degrees of freedom
## Residual deviance: 13781  on 10452  degrees of freedom
##   (36 observations deleted due to missingness)
## AIC: 13799
## 
## Number of Fisher Scoring iterations: 4

In the summary output, year, age, gender and region are statistically significant. For year in age, the log odds of political discordance with at least one parent decreases slightly by .007; for females (gender and gen value of “2”), log odds of discordance are decrease by .176. For those in the south (south vs nonsouth), log odds of discordance increase by .279. Log odds for discordance increase by roughly .218 from categorical years 1980 to 1988 and 1988 to 1992.

It’s important to note that years in these data are considered to be discrete, categorical values and defined as a factor.

Confidence intervals are calculated based on standard errors:

#get confidence intervals based on standard errors
confint.default(model1)
##                                 2.5 %       97.5 %
## (Intercept)              -0.538520288 -0.193371671
## Year1972                 -0.121217544  0.158294618
## Year1976                 -0.006939765  0.282628610
## Year1980                 -0.134560041  0.183033248
## Year1988                  0.065467356  0.370322941
## Year1992                  0.074344911  0.362028864
## Age                      -0.009800024 -0.005184341
## Gen2                     -0.255756311 -0.096171872
## SouthNonsouth2. Nonsouth  0.188722127  0.370030930

Just for fun: the code below explores the significance of year in the model using the function wald.test; only terms 2 through 6 are tested below, which corresponds to the list of variables above.

library(aod)

wald.test(b = coef(model1), Sigma = vcov(model1), Terms = 2:6)
## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 20.5, df = 5, P(> X2) = 0.001

With a p-value of 0.001, the year coefficients are significant.

The code below converts log odds for the model coefficients into odds ratios.

exp(cbind(OR = coef(model1), confint(model1)))
##                                 OR     2.5 %    97.5 %
## (Intercept)              0.6935403 0.5834296 0.8239517
## Year1972                 1.0187114 0.8860529 1.1718395
## Year1976                 1.1477970 0.9932881 1.3269393
## Year1980                 1.0245327 0.8740790 1.2008863
## Year1988                 1.2434567 1.0678135 1.4484764
## Year1992                 1.2438195 1.0774367 1.4366428
## Age                      0.9925358 0.9902441 0.9948257
## Gen2                     0.8386481 0.7743303 0.9083134
## SouthNonsouth2. Nonsouth 1.3223051 1.2079728 1.4481216

According to the model, the odds of being discordant increase by a factor of 1.24 in years 1988 and 1992; being in the south increases the odds by 1.32, while being female or older decrease the odds of discordance.

Conclusion

Between the years of 1968 and 1992 there is a statistically significant difference in discordance; this difference appears to vary significantly by age, gender and region. When comparing the initial year and the last year of the survey there is an increase in likelihood of diverging political disposition. There are numerous options for further research to add depth to this conclusion.

These results solely address discordant political affiliations between parents and respondents as I have defined it; a change in definition would yield different results. Analysis could be run with paternal affiliation and maternal affiliation exclusively. Respondents who indicated discordance between parents could be filtered out or introduced as an additional effect in the model. Further, data that characterize respondent political engagement would be an interesting consideration. Expanded regional coding, race and economic identities could be introduced in the model as well.

As electronic data collection methods evolve, more data would enhance the ability to sense shifts political disposition among generations.

References

Data were collected by:

The American National Election Studies (www.electionstudies.org). These materials are based on work supported by the National Science Foundation under grant numbers SES 1444721, 2014-2017, the University of Michigan, and Stanford University, https://electionstudies.org/data-center/, “Time Series Cumulative Data File (1948-2016)”

The following site was used as a reference for logistic regression: https://stats.idre.ucla.edu/r/dae/logit-regression/

Stephen Jones

May 10, 2019