I chose data from American National Election Studies. link: https://electionstudies.org/data-center/
library(foreign)
# load data
data<-read.spss("C://Users/Stephen Jones/Documents/anes_timeseries_cdf.sav",header=TRUE)
data<-as.data.frame(data)
variable<-c('VCF0004',
'VCF0301',
'VCF0302',
'VCF0303',
'VCF0305',
'VCF0306',
'VCF0307',
'VCF0308',
'VCF0309',
'VCF0101',
'VCF0102',
'VCF0103',
'VCF0104',
'VCF0105a',
'VCF0105b',
'VCF0106',
'VCF0107',
'VCF0108',
'VCF0109',
'VCF0110',
'VCF0111',
'VCF0112',
'VCF0113',
'VCF0846')
labels<-c('STUDY VARIABLE: Year of Study',
'PARTISANSHIP: Party Identification of Respondent- 7-point Scale',
'PARTISANSHIP: Party Identification of Respondent- Initial Party ID Response',
'PARTISANSHIP: Party Identification of Respondent- Summary 3-Category',
'PARTISANSHIP: Party Identification of Respondent- Strength of Partisanship',
'PARTISANSHIP: Party Identification of Respondents Father',
'PARTISANSHIP: Party Identification of Respondents Mother',
'PARTISANSHIP: Political Interest of Respondents Father',
'PARTISANSHIP: Political Interest of Respondents Mother',
'DEMOGRAPHICS: Respondent - Age',
'DEMOGRAPHICS: Respondent - Age Group',
'DEMOGRAPHICS: Respondent - Cohort',
'DEMOGRAPHICS: Respondent - Gender',
'DEMOCGRAPHICS: Race-ethnicity summary, 7 categories',
'DEMOGRAPHICS: Race-ethnicity summary, 4 categories',
'DEMOGRAPHICS: Race summary, 3 categories',
'DEMOGRAPHICS: Respondent - Hispanic Origin Type',
'DEMOGRAPHICS: Respondent - Hispanic Origin',
'DEMOGRAPHICS: Respondent - Ethnicity',
'DEMOGRAPHICS: Respondent - Education 4-category',
'SAMPLE DESCRIPTION: Urbanism',
'SAMPLE DESCRIPTION: Census Region',
'SAMPLE DESCRIPTION: Political South/Nonsouth',
'RELIGIOSITY: Is Religion Important to Respondent')
key<-cbind(variable,labels)
#keep only the variables of interest
data<-data[variable]
#remove observations with missing parent data
dataC<-data[which(data$VCF0301!= ''),]
dataCa<-data[which(!is.na(data$VCF0004)),]
#remove observations where party of both parents are missing
dataD<-dataCa[which(substring(data$VCF0301,1,1)!='0'&substring(data$VCF0302,1,1)!='0'),]
colnames(dataD)<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')
new<-c('Year',
'PartyID7cat',
'PartyIDinit',
'PartyID3cat',
'PartyIDstr',
'FatherID',
'MotherID',
'FatherInt',
'MotherInt',
'Age',
'AgeGroup',
'Cohort',
'Gender',
'RaceEth7cat',
'RaceEth4cat',
'RaceEth3cat',
'HispLatType',
'HispLatOrigin',
'Ethnicity',
'Educ4cat',
'Urbanism',
'CensusRegion',
'SouthNonsouth',
'Religion')
key<-cbind(new,key)
#write to csv and upload to GitHub
write.csv(dataD,"C:/MSDS/Elections.csv")
write.csv(key,"C:/MSDS/Variables.csv")
The resulting data are uploaded to GitHub: https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv
Parents transmit culture to their children. How has political affiliation been transmitted through time? Are the respondents more or less likely to adhere to the affiliation of a parent in recent years?
Put another way, does parent political affiliation predict the affiliation of individual adults, and does this vary across different demographic factors?
Data were collected via survey.
“The American National Election Studies (www.electionstudies.org). THE ANES GUIDE TO PUBLIC OPINION AND ELECTORAL BEHAVIOR. These materials are based on work supported by the National Science Foundation under grant numbers SES 1444721, 2014-2017, the University of Michigan, and Stanford University. Any opinions, findings and conclusions, or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect the views of the funding organizations.”
There are 45760 observations in the preliminarily cleaned dataset.
Variables are listed in the table below.
key<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Variables.csv",header=TRUE)
key$x<-NULL
library(DT)
datatable(key)
This is an observational study.
The population of interest is composed of American voters.
I chose data from American National Election Studies, link: https://electionstudies.org/data-center/
The response variable is political affiliation; it is a qualitative variable.
The independent variables are political affiliation of one or more parents.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
data<-read.csv("https://raw.githubusercontent.com/sigmasigmaiota/elections/master/Elections.csv",header=TRUE)
summary(data)
## X Year PartyID7cat
## Min. :12253 Min. :1968 1. Strong Democrat :8802
## 1st Qu.:25320 1st Qu.:1980 2. Weak Democrat :8692
## Median :36860 Median :1992 3. Independent - Democrat :5836
## Mean :36815 Mean :1993 4. Independent - Independent:5888
## 3rd Qu.:48435 3rd Qu.:2008 5. Independent - Republican :5016
## Max. :59944 Max. :2016 6. Weak Republican :6034
## 7. Strong Republican :5492
## PartyIDinit
## 1. Republican :11526
## 2. Independent :13273
## 3. No preference; none; neither: 2869
## 4. Other : 462
## 5. Democrat :17495
## 8. DK : 81
## 9. NA; refused : 54
## PartyID3cat
## 1. Democrats (including leaners) :23330
## 2. Independents : 5888
## 3. Republicans (including leaners):16542
##
##
##
##
## PartyIDstr
## 0. DK; NA; other; refused to answer; no Pre IW: 68
## 1. Independent or Apolitical : 5819
## 2. Leaning Independent :10852
## 3. Weak Partisan :14728
## 4. Strong Partisan :14293
##
##
## FatherID
## 0. NA; no Pre IW; no Post IW; R had no : 1030
## 1. Democrat : 6708
## 2. Independent (some years also: shifted around) : 1068
## 3. Republican : 3595
## 4. Other; minor party; apolitical; never voted, didn't: 823
## 9. DK (exc. 1988) : 1597
## NA's :30939
## MotherID
## 0. NA; no Pre IW; no Post IW; R had no : 917
## 1. Democrat : 6512
## 2. Independent : 1149
## 3. Republican : 3476
## 4. Other; minor party; apolitical; never voted, didn't: 1097
## 9. DK (exc. 1988) : 1670
## NA's :30939
## FatherInt
## 0. NA; R had no father/father substitute: 154
## 1. Didn't pay much attention : 1527
## 2. Somewhat interested : 2127
## 3. Very much interested : 2262
## 9. DK : 374
## NA's :39316
##
## MotherInt Age
## 0. NA; R had no mother/mother substitute: 98 35 : 1012
## 1. Didn't pay much attention : 2882 32 : 947
## 2. Somewhat interested : 2070 28 : 946
## 3. Very much interested : 1093 37 : 940
## 9. DK : 301 33 : 938
## NA's :39316 34 : 938
## (Other):40039
## AgeGroup Cohort Gender
## 2. 25 - 34:9244 4. 1943 - 1958:14388 0. NA; no Pre IW: 40
## 3. 35 - 44:8630 3. 1959 - 1974: 9086 1. Male :20564
## 4. 45 - 54:7387 5. 1927 - 1942: 8470 2. Female :25145
## 5. 55 - 64:6957 6. 1911 - 1926: 6181 3. Other (2016) : 11
## 6. 65 - 74:5303 2. 1975 - 1990: 3847
## 1. 17 - 24:4686 7. 1895 - 1910: 2472
## (Other) :3553 (Other) : 1316
## RaceEth7cat
## 1. White non-Hispanic (1948-2012) :34510
## 2. Black non-Hispanic (1948-2012) : 5682
## 3. Asian or Pacific Islander, non-Hispanic (1966-2012) : 553
## 4. American Indian or Alaska Native non-Hispanic (1966-2012): 331
## 5. Hispanic (1966-2012) : 3864
## 6. Other or multiple races, non-Hispanic (1968-2012) : 556
## 9. Missing : 264
## RaceEth4cat
## 1. White non-Hispanic :34510
## 2. Black non-Hispanic : 5682
## 3. Hispanic : 3864
## 4. Other or multiple races, non-Hispanic: 1440
## 9. Missing, DK/REF/NA : 264
##
##
## RaceEth3cat HispLatType
## 1. White non-Hispanic:34510 7. No, not Hispanic :33657
## 2. Black non-Hispanic: 5682 1. Yes, Mexican-American, Chicano: 2017
## 3. Other : 5304 3. Yes, other Hispanic : 1167
## 9. Missing, DK/REF/NA: 264 2. Yes, Puerto Rican : 414
## 9. NA if Hispanic : 294
## (Other) : 209
## NA's : 8002
## HispLatOrigin
## 0. Short-form 'new' Cross Section (1992): 4
## 1. Yes, R is Hispanic : 3701
## 2. No, R is not Hispanic :33657
## 8. DK : 102
## 9. NA : 294
## NA's : 8002
##
## Ethnicity
## 877. 'American'; 'Just American'; none; neither: 7367
## 290 : 3556
## 862 : 2582
## 190 : 2382
## 180 : 1946
## (Other) :12523
## NA's :15404
## Educ4cat
## 0. DK; NA; no Pre IW; short-form 'new' Cross Section : 403
## 1. Grade school or less (0-8 grades) : 3773
## 2. High school (12 grades or fewer, incl. non-college:19123
## 3. Some college (13 grades or more but no degree; :11701
## 4. College or advanced degree (no cases 1948) :10760
##
##
## Urbanism
## 0. NA; telephone (RDD) sample (2000) : 796
## 1. Central cities : 7769
## 2. Suburban areas :11575
## 3. Rural, small towns, outlying and adjacent areas:10539
## NA's :15081
##
##
## CensusRegion
## 1. Northeast (CT, ME, MA, NH, NJ, NY, PA, RI, VT) : 8314
## 2. North Central (IL, IN, IA, KS, MI, MN, MO, NE, ND, :11673
## 3. South (AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC :16619
## 4. West (AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA,: 9154
##
##
##
## SouthNonsouth
## 1. South :14286
## 2. Nonsouth:31474
##
##
##
##
##
## Religion
## 0. NA; no Post IW; abbrev. telephone IW (1984, see: 2167
## 1. Yes, important :23676
## 2. No, not important : 8148
## 8. DK : 95
## NA's :11674
##
##
library(ggplot2)
respond<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]
#party ID of respondents
ggplot(respond,aes(PartyID3cat,fill=PartyID3cat))+
geom_bar()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(.~Year)+
theme(legend.position = "none")
mother<-data[which(data$MotherID != '' & substring(data$MotherID,1,1) != 0),]
#party ID of respondents
ggplot(mother,aes(MotherID,fill=MotherID))+
geom_bar()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(.~Year)+
theme(legend.position = "none")
father<-data[which(data$FatherID != '' & substring(data$FatherID,1,1) != 0),]
#party ID of respondents
ggplot(father,aes(FatherID,fill=FatherID))+
geom_bar()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_grid(.~Year)+
theme(legend.position = "none")
The plots appear to align; there appear to be no major differences in proportions by year.
chi<-table(data$PartyID3cat,data$FatherID)
chisq.test(chi)
##
## Pearson's Chi-squared test
##
## data: chi
## X-squared = 3329.5, df = 10, p-value < 2.2e-16
As expected, there is a significant effect of paternal affiliation on respondent partisan identity. Let’s check the maternal affiliation.
chi<-table(data$PartyID3cat,data$MotherID)
chisq.test(chi)
##
## Pearson's Chi-squared test
##
## data: chi
## X-squared = 3387.6, df = 10, p-value < 2.2e-16
Again, maternal affiliation predicts respondent partisanship.
data$PID3num<-substring(data$PartyID3cat,1,1)
data$FIDnum<-substring(data$MotherID,1,1)
data$MIDnum<-substring(data$FatherID,1,1)
data$DiscordF <- as.numeric(data$PID3num) - as.numeric(data$FIDnum)
data$DiscordM <- as.numeric(data$PID3num) - as.numeric(data$MIDnum)
data$DiscordP <- as.numeric(data$MIDnum) - as.numeric(data$FIDnum)
data$DiscordF[which(abs(data$DiscordF)>0)]<-1
data$DiscordM[which(abs(data$DiscordM)>0)]<-1
data$DiscordP[which(abs(data$DiscordP)>0)]<-1
data$DiscordF<-as.factor(data$DiscordF)
data$DiscordM<-as.factor(data$DiscordM)
data$DiscordP<-as.factor(data$DiscordP)
respond2<-data[which(data$Year == 1968 | data$Year == 1972 | data$Year == 1976 | data$Year == 1980 | data$Year == 1988 | data$Year == 1992),]
#discordant affiliation with father by year
ggplot(respond2,aes(DiscordF,fill=DiscordF))+
geom_bar()+
theme(axis.text.x = element_blank())+
facet_grid(.~Year)+
ggtitle("Paternal Affiliation, Discordance by Year")
#discordant affiliation with mother by year
ggplot(respond2,aes(DiscordM,fill=DiscordM))+
geom_bar()+
theme(axis.text.x = element_blank())+
facet_grid(.~Year)+
ggtitle("Maternal Affiliation, Discordance by Year")
#discordant affiliation between parents by year
ggplot(respond2,aes(DiscordP,fill=DiscordP))+
geom_bar()+
theme(axis.text.x = element_blank())+
facet_grid(.~Year)+
ggtitle("Between-Parent Affiliation, Discordance by Year")
```