I chose data from American National Election Studies. link: https://electionstudies.org/data-center/

Data Preparation

library(foreign)

# load data
data<-read.spss("C://Users/Stephen Jones/Documents/anes_timeseries_cdf.sav",header=TRUE)

data<-as.data.frame(data)

variable<-c('VCF0004',
'VCF0301',
'VCF0302',
'VCF0303',
'VCF0305',
'VCF0306',
'VCF0307',
'VCF0308',
'VCF0309',
'VCF0101',
'VCF0102',
'VCF0103',
'VCF0104',
'VCF0105a',
'VCF0105b',
'VCF0106',
'VCF0107',
'VCF0108',
'VCF0109',
'VCF0110',
'VCF0111',
'VCF0112',
'VCF0113',
'VCF0846')

labels<-c('STUDY VARIABLE: Year of Study',
'PARTISANSHIP: Party Identification of Respondent- 7-point Scale',
'PARTISANSHIP: Party Identification of Respondent- Initial Party ID Response',
'PARTISANSHIP: Party Identification of Respondent- Summary 3-Category',
'PARTISANSHIP: Party Identification of Respondent- Strength of Partisanship',
'PARTISANSHIP: Party Identification of Respondents Father',
'PARTISANSHIP: Party Identification of Respondents Mother',
'PARTISANSHIP: Political Interest of Respondents Father',
'PARTISANSHIP: Political Interest of Respondents Mother',
'DEMOGRAPHICS: Respondent - Age',
'DEMOGRAPHICS: Respondent - Age Group',
'DEMOGRAPHICS: Respondent - Cohort',
'DEMOGRAPHICS: Respondent - Gender',
'DEMOCGRAPHICS: Race-ethnicity summary, 7 categories',
'DEMOGRAPHICS: Race-ethnicity summary, 4 categories',
'DEMOGRAPHICS: Race summary, 3 categories',
'DEMOGRAPHICS: Respondent - Hispanic Origin Type',
'DEMOGRAPHICS: Respondent - Hispanic Origin',
'DEMOGRAPHICS: Respondent - Ethnicity',
'DEMOGRAPHICS: Respondent - Education 4-category',
'SAMPLE DESCRIPTION: Urbanism',
'SAMPLE DESCRIPTION: Census Region',
'SAMPLE DESCRIPTION: Political South/Nonsouth',
'RELIGIOSITY: Is Religion Important to Respondent')

key<-cbind(variable,labels)

#keep only the variables of interest
data<-data[variable]

#remove observations with missing parent data
dataC<-data[which(data$VCF0301!= ''),]

dataCa<-data[which(!is.na(data$VCF0004)),]

#remove observations where party of both parents are missing 
dataD<-dataCa[which(substring(data$VCF0301,1,1)!='0'&substring(data$VCF0302,1,1)!='0'),]

colnames(dataD)<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')

new<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')

key<-cbind(new,key)

#write to csv and upload to GitHub
write.csv(dataD,"C:/MSDS/Elections.csv")
write.csv(key,"C:/MSDS/Variables.csv")

The resulting data are uploaded to GitHub: https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv

Research question

Parents transmit culture to their children. How has political affiliation been transmitted through time? Are the respondents more or less likely to adhere to the affiliation of a parent in recent years?

Put another way, does parent political affiliation predict the affiliation of individual adults, and does this vary across different demographic factors?

Data collection

Data were collected via survey.

“The American National Election Studies (www.electionstudies.org). THE ANES GUIDE TO PUBLIC OPINION AND ELECTORAL BEHAVIOR. These materials are based on work supported by the National Science Foundation under grant numbers SES 1444721, 2014-2017, the University of Michigan, and Stanford University. Any opinions, findings and conclusions, or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect the views of the funding organizations.”

Cases

There are 45760 observations in the preliminarily cleaned dataset.

Variables

Variables are listed in the table below.

key<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Variables.csv",header=TRUE)

key$x<-NULL

library(DT)
datatable(key)

Type of study

This is an observational study.

Scope of inference

The population of interest is composed of American voters.

Data Source

I chose data from American National Election Studies, link: https://electionstudies.org/data-center/

Dependent Variable

The response variable is political affiliation; it is a qualitative variable.

Independent Variable

The independent variables are political affiliation of one or more parents.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

data<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv",header=TRUE)

summary(data)
##        X              Year                            PartyID7cat  
##  Min.   :12253   Min.   :1968   1. Strong Democrat          :8802  
##  1st Qu.:25320   1st Qu.:1980   2. Weak Democrat            :8692  
##  Median :36860   Median :1992   3. Independent - Democrat   :5836  
##  Mean   :36815   Mean   :1993   4. Independent - Independent:5888  
##  3rd Qu.:48435   3rd Qu.:2008   5. Independent - Republican :5016  
##  Max.   :59944   Max.   :2016   6. Weak Republican          :6034  
##                                 7. Strong Republican        :5492  
##                           PartyIDinit   
##  1. Republican                  :11526  
##  2. Independent                 :13273  
##  3. No preference; none; neither: 2869  
##  4. Other                       :  462  
##  5. Democrat                    :17495  
##  8. DK                          :   81  
##  9. NA; refused                 :   54  
##                              PartyID3cat   
##  1. Democrats (including leaners)  :23330  
##  2. Independents                   : 5888  
##  3. Republicans (including leaners):16542  
##                                            
##                                            
##                                            
##                                            
##                                           PartyIDstr   
##  0. DK; NA; other; refused to answer; no Pre IW:   68  
##  1. Independent or Apolitical                  : 5819  
##  2. Leaning Independent                        :10852  
##  3. Weak Partisan                              :14728  
##  4. Strong Partisan                            :14293  
##                                                        
##                                                        
##                                                    FatherID    
##  0. NA; no Pre IW; no Post IW; R had no                : 1030  
##  1. Democrat                                           : 6708  
##  2. Independent (some years also: shifted around)      : 1068  
##  3. Republican                                         : 3595  
##  4. Other; minor party; apolitical; never voted, didn't:  823  
##  9. DK (exc. 1988)                                     : 1597  
##  NA's                                                  :30939  
##                                                    MotherID    
##  0. NA; no Pre IW; no Post IW; R had no                :  917  
##  1. Democrat                                           : 6512  
##  2. Independent                                        : 1149  
##  3. Republican                                         : 3476  
##  4. Other; minor party; apolitical; never voted, didn't: 1097  
##  9. DK (exc. 1988)                                     : 1670  
##  NA's                                                  :30939  
##                                     FatherInt    
##  0. NA; R had no father/father substitute:  154  
##  1. Didn't pay much attention            : 1527  
##  2. Somewhat interested                  : 2127  
##  3. Very much interested                 : 2262  
##  9. DK                                   :  374  
##  NA's                                    :39316  
##                                                  
##                                     MotherInt          Age       
##  0. NA; R had no mother/mother substitute:   98   35     : 1012  
##  1. Didn't pay much attention            : 2882   32     :  947  
##  2. Somewhat interested                  : 2070   28     :  946  
##  3. Very much interested                 : 1093   37     :  940  
##  9. DK                                   :  301   33     :  938  
##  NA's                                    :39316   34     :  938  
##                                                   (Other):40039  
##        AgeGroup               Cohort                   Gender     
##  2. 25 - 34:9244   4. 1943 - 1958:14388   0. NA; no Pre IW:   40  
##  3. 35 - 44:8630   3. 1959 - 1974: 9086   1. Male         :20564  
##  4. 45 - 54:7387   5. 1927 - 1942: 8470   2. Female       :25145  
##  5. 55 - 64:6957   6. 1911 - 1926: 6181   3. Other (2016) :   11  
##  6. 65 - 74:5303   2. 1975 - 1990: 3847                           
##  1. 17 - 24:4686   7. 1895 - 1910: 2472                           
##  (Other)   :3553   (Other)       : 1316                           
##                                                        RaceEth7cat   
##  1. White non-Hispanic (1948-2012)                           :34510  
##  2. Black non-Hispanic (1948-2012)                           : 5682  
##  3. Asian or Pacific Islander, non-Hispanic (1966-2012)      :  553  
##  4. American Indian or Alaska Native non-Hispanic (1966-2012):  331  
##  5. Hispanic (1966-2012)                                     : 3864  
##  6. Other or multiple races, non-Hispanic (1968-2012)        :  556  
##  9. Missing                                                  :  264  
##                                    RaceEth4cat   
##  1. White non-Hispanic                   :34510  
##  2. Black non-Hispanic                   : 5682  
##  3. Hispanic                             : 3864  
##  4. Other or multiple races, non-Hispanic: 1440  
##  9. Missing, DK/REF/NA                   :  264  
##                                                  
##                                                  
##                 RaceEth3cat                               HispLatType   
##  1. White non-Hispanic:34510   7. No, not Hispanic              :33657  
##  2. Black non-Hispanic: 5682   1. Yes, Mexican-American, Chicano: 2017  
##  3. Other             : 5304   3. Yes, other Hispanic           : 1167  
##  9. Missing, DK/REF/NA:  264   2. Yes, Puerto Rican             :  414  
##                                9. NA if Hispanic                :  294  
##                                (Other)                          :  209  
##                                NA's                             : 8002  
##                                   HispLatOrigin  
##  0. Short-form 'new' Cross Section (1992):    4  
##  1. Yes, R is Hispanic                   : 3701  
##  2. No, R is not Hispanic                :33657  
##  8. DK                                   :  102  
##  9. NA                                   :  294  
##  NA's                                    : 8002  
##                                                  
##                                            Ethnicity    
##  877. 'American'; 'Just American'; none; neither: 7367  
##  290                                            : 3556  
##  862                                            : 2582  
##  190                                            : 2382  
##  180                                            : 1946  
##  (Other)                                        :12523  
##  NA's                                           :15404  
##                                                   Educ4cat    
##  0. DK; NA; no Pre IW; short-form 'new' Cross Section :  403  
##  1. Grade school or less (0-8 grades)                 : 3773  
##  2. High school (12 grades or fewer, incl. non-college:19123  
##  3. Some college (13 grades or more but no degree;    :11701  
##  4. College or advanced degree (no cases 1948)        :10760  
##                                                               
##                                                               
##                                                Urbanism    
##  0. NA; telephone (RDD) sample (2000)              :  796  
##  1. Central cities                                 : 7769  
##  2. Suburban areas                                 :11575  
##  3. Rural, small towns, outlying and adjacent areas:10539  
##  NA's                                              :15081  
##                                                            
##                                                            
##                                                    CensusRegion  
##  1. Northeast (CT, ME, MA, NH, NJ, NY, PA, RI, VT)       : 8314  
##  2. North Central (IL, IN, IA, KS, MI, MN, MO, NE, ND,   :11673  
##  3. South (AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC  :16619  
##  4. West (AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA,: 9154  
##                                                                  
##                                                                  
##                                                                  
##      SouthNonsouth  
##  1. South   :14286  
##  2. Nonsouth:31474  
##                     
##                     
##                     
##                     
##                     
##                                                Religion    
##  0. NA; no Post IW; abbrev. telephone IW (1984, see: 2167  
##  1. Yes, important                                 :23676  
##  2. No, not important                              : 8148  
##  8. DK                                             :   95  
##  NA's                                              :11674  
##                                                            
## 
library(ggplot2)

respond<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]

#party ID of respondents
ggplot(respond,aes(PartyID3cat,fill=PartyID3cat))+
  geom_bar()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  facet_grid(.~Year)+
  theme(legend.position = "none") 

mother<-data[which(data$MotherID != '' & substring(data$MotherID,1,1) != 0),]

#party ID of respondents
ggplot(mother,aes(MotherID,fill=MotherID))+
  geom_bar()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  facet_grid(.~Year)+
  theme(legend.position = "none") 

father<-data[which(data$FatherID != '' & substring(data$FatherID,1,1) != 0),]

#party ID of respondents
ggplot(father,aes(FatherID,fill=FatherID))+
  geom_bar()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  facet_grid(.~Year)+
  theme(legend.position = "none") 

The plots appear to align; there appear to be no major differences in proportions by year.

chi<-table(data$PartyID3cat,data$FatherID)

chisq.test(chi)
## 
##  Pearson's Chi-squared test
## 
## data:  chi
## X-squared = 3329.5, df = 10, p-value < 2.2e-16

As expected, there is a significant effect of paternal affiliation on respondent partisan identity. Let’s check the maternal affiliation.

chi<-table(data$PartyID3cat,data$MotherID)

chisq.test(chi)
## 
##  Pearson's Chi-squared test
## 
## data:  chi
## X-squared = 3387.6, df = 10, p-value < 2.2e-16

Again, maternal affiliation predicts respondent partisanship.

data$PID3num<-substring(data$PartyID3cat,1,1)
data$FIDnum<-substring(data$MotherID,1,1)
data$MIDnum<-substring(data$FatherID,1,1)

data$DiscordF <- as.numeric(data$PID3num) - as.numeric(data$FIDnum)

data$DiscordM <- as.numeric(data$PID3num) - as.numeric(data$MIDnum)

data$DiscordP <- as.numeric(data$MIDnum) - as.numeric(data$FIDnum)

data$DiscordF[which(abs(data$DiscordF)>0)]<-1
data$DiscordM[which(abs(data$DiscordM)>0)]<-1
data$DiscordP[which(abs(data$DiscordP)>0)]<-1

data$DiscordF<-as.factor(data$DiscordF)
data$DiscordM<-as.factor(data$DiscordM)
data$DiscordP<-as.factor(data$DiscordP)

respond2<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]

#discordant affiliation with father by year
ggplot(respond2,aes(DiscordF,fill=DiscordF))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Paternal Affiliation, Discordance by Year")

#discordant affiliation with mother by year
ggplot(respond2,aes(DiscordM,fill=DiscordM))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Maternal Affiliation, Discordance by Year")

#discordant affiliation between parents by year
ggplot(respond2,aes(DiscordP,fill=DiscordP))+
  geom_bar()+
  theme(axis.text.x = element_blank())+
  facet_grid(.~Year)+
  ggtitle("Between-Parent Affiliation, Discordance by Year")

```