This assignment required replicating research from Martin Elff’s paper “Analysing the American National Election Study of 1948.” This data-set includes 662 observations and 65 variables, which is a relatively small sample size for a data-set in any field. The underlying goal of this assignment is to conduct an analysis similar to Elff’s but WITHOUT using the “memisc” package (with the exception of loading the data initially.) In regards to the actual election itself, it is well known the Truman pushed for civil rights and the vote from black citizens helped him win the election of 1948.
The first step in this analysis requires retrieving the data file from it’s original SPSS file as follows:
library(memisc)
options(digits=3)
nes1948.por<-UnZip("anes/NES1948.ZIP","NES1948.POR",package="memisc")
library(foreign)
nes1948<-read.spss(nes1948.por)
This analysis requires using the package “dplyr.” The class function helps us figure out what kind of data we are dealing with. From this we can see that this type of data is a list. The package “dplyr” does not understand lists of data which is why it is necessary to convert this list of data into a data frame shown as follows:
library(dplyr)
class(nes1948)
## [1] "list"
vote48<-data.frame(nes1948)
As seen above, the data-set has been renamed as “vote48” for clarity. Once the data-set is loaded, the next step requires pulling up the variables we are specifically interested in for this analysis:
library(dplyr)
vote48<-data.frame(nes1948)
vote48<-select(vote48,
V480018,
V480029,
V480030,
V480045,
V480046,
V480047,
V480048,
V480049,
V480050
)
Using the “gdata” package the variables have been renamed. NOTE: IF Elff was trying to create reproducible work (as all respectable researchers aspire to), he should have noted that the “v” before the name of the variables need to be capitalized in order to be renamed. He uses lower cases in his paper which does not work when trying to re-create his data with out the “memisc” package.
library(gdata)
vote48<-rename.vars(vote48,from=c(
"V480018","V480029","V480030","V480045", "V480046","V480047","V480048","V480049","V480050"),
to=c("vote", "occupation", "unionized", "gender", "race", "age", "education", "totalncome", "religiousPref"))
##
## Changing in vote48
## From: V480018 V480029 V480030 V480045 V480046 V480047 V480048
## To: vote occupation unionized gender race age education
##
## From: V480049 V480050
## To: totalncome religiousPref
Once we have re-named the variables, we can use the following function to see the data appear in graph format. The function “Desc()” is the equivalent to what ELff refers to as a codebook, but contains more detailed information about each variable. It is noteworthy to point out that Elff’s codebook contains meta-data which he does not discuss at all. Furthermore, the plots have also been added to understand the data easier. If I were doing this analysis I would show just the charts, but I included the tables to replicate ELff’s work as follows:
library(DescTools)
Desc(vote48, plotit= TRUE)
##
## -------------------------------------------------------------------------
## 'data.frame': 662 obs. of 9 variables:
## 1 $ vote : Factor w/ 7 levels "VOTED - FOR TRUMAN",..: 1 2 1 2 1 2 2 1 2 1 ...
## 2 $ occupation : Factor w/ 12 levels "PROFESSIONAL, SEMI-PROFESSIONAL",..: 6 3 4 1 1 2 7 7 4 4 ...
## 3 $ unionized : Factor w/ 4 levels "YES","NO","DK",..: 1 2 2 2 2 2 2 2 1 1 ...
## 4 $ gender : Factor w/ 3 levels "MALE","FEMALE",..: 1 2 2 2 1 2 1 2 1 1 ...
## 5 $ race : Factor w/ 4 levels "WHITE","NEGRO",..: 1 1 1 1 1 1 1 1 1 1 ...
## 6 $ age : Factor w/ 7 levels "18-24","25-34",..: 3 3 2 3 2 3 4 5 2 2 ...
## 7 $ education : Factor w/ 4 levels "GRADE SCHOOL",..: 1 2 2 3 3 2 1 1 2 2 ...
## 8 $ totalncome : Factor w/ 8 levels "UNDER $500","$500-$999",..: 4 7 5 7 5 7 5 2 5 6 ...
## 9 $ religiousPref: Factor w/ 6 levels "PROTESTANT","CATHOLIC",..: 1 1 2 1 2 1 1 1 1 2 ...
##
## -------------------------------------------------------------------------
## 1 - vote (factor)
##
## length n NAs levels unique dupes
## 662 662 0 7 7 y
##
## level freq perc cumfreq cumperc
## 1 DID NOT VOTE 238 .360 238 .360
## 2 VOTED - FOR TRUMAN 212 .320 450 .680
## 3 VOTED - FOR DEWEY 178 .269 628 .949
## 4 VOTED - NA FOR WHOM 20 .030 648 .979
## 5 VOTED - FOR OTHER 11 .017 659 .995
## 6 NA WHETHER VOTED 2 .003 661 .998
## 7 VOTED - FOR WALLACE 1 .002 662 1.000
## -------------------------------------------------------------------------
## 2 - occupation (factor)
##
## length n NAs levels unique dupes
## 662 662 0 12 12 y
##
## level freq perc cumfreq cumperc
## 1 SKILLED AND SEMI-SKILLED 164 .248 164 .248
## 2 FARM OPERATORS AND MANAGERS 105 .159 269 .406
## 3 UNSKILLED, INCLUDING FARM AND SERVICE W 85 .128 354 .535
## 4 OTHER WHITE-COLLAR (CLERICAL, SALES, ET 79 .119 433 .654
## 5 SELF-EMPLOYED, MANAGERIAL, SUPERVISORY 73 .110 506 .764
## 6 PROFESSIONAL, SEMI-PROFESSIONAL 44 .066 550 .831
## 7 RETIRED, TOO OLD OR UNABLE TO WORK 38 .057 588 .888
## 8 HOUSEWIFE 28 .042 616 .931
## 9 NA 28 .042 644 .973
## 10 STUDENT 7 .011 651 .983
## 11 PROTECTIVE SERVICE 6 .009 657 .992
## 12 UNEMPLOYED 5 .008 662 1.000
## -------------------------------------------------------------------------
## 3 - unionized (factor)
##
## length n NAs levels unique dupes
## 662 662 0 4 4 y
##
## level freq perc cumfreq cumperc
## 1 NO 493 .745 493 .745
## 2 YES 150 .227 643 .971
## 3 NA 14 .021 657 .992
## 4 DK 5 .008 662 1.000
## -------------------------------------------------------------------------
## 4 - gender (factor)
##
## length n NAs levels unique dupes
## 662 662 0 3 3 y
##
## level freq perc cumfreq cumperc
## 1 FEMALE 357 .539 357 .539
## 2 MALE 302 .456 659 .995
## 3 NA 3 .005 662 1.000
## -------------------------------------------------------------------------
## 5 - race (factor)
##
## length n NAs levels unique dupes
## 662 662 0 4 3 y
##
## level freq perc cumfreq cumperc
## 1 WHITE 585 .884 585 .884
## 2 NEGRO 60 .091 645 .974
## 3 NA 17 .026 662 1.000
## 4 OTHER 0 .000 662 1.000
## -------------------------------------------------------------------------
## 6 - age (factor)
##
## length n NAs levels unique dupes
## 662 662 0 7 7 y
##
## level freq perc cumfreq cumperc
## 1 35-44 174 .263 174 .263
## 2 25-34 142 .215 316 .477
## 3 45-54 125 .189 441 .666
## 4 55-64 86 .130 527 .796
## 5 65 AND OVER 70 .106 597 .902
## 6 18-24 57 .086 654 .988
## 7 NA 8 .012 662 1.000
## -------------------------------------------------------------------------
## 7 - education (factor)
##
## length n NAs levels unique dupes
## 662 662 0 4 4 y
##
## level freq perc cumfreq cumperc
## 1 GRADE SCHOOL 292 .441 292 .441
## 2 HIGH SCHOOL 266 .402 558 .843
## 3 COLLEGE 100 .151 658 .994
## 4 NA 4 .006 662 1.000
## -------------------------------------------------------------------------
## 8 - totalncome (factor)
##
## length n NAs levels unique dupes
## 662 662 0 8 8 y
##
## level freq perc cumfreq cumperc
## 1 $2000-2999 185 .279 185 .279
## 2 $3000-3999 142 .215 327 .494
## 3 $1000-1999 110 .166 437 .660
## 4 $5000 AND OVER 84 .127 521 .787
## 5 $4000-4999 66 .100 587 .887
## 6 $500-$999 43 .065 630 .952
## 7 UNDER $500 25 .038 655 .989
## 8 NA 7 .011 662 1.000
## -------------------------------------------------------------------------
## 9 - religiousPref (factor)
##
## length n NAs levels unique dupes
## 662 662 0 6 6 y
##
## level freq perc cumfreq cumperc
## 1 PROTESTANT 460 .695 460 .695
## 2 CATHOLIC 140 .211 600 .906
## 3 JEWISH 25 .038 625 .944
## 4 NONE 18 .027 643 .971
## 5 OTHER 14 .021 657 .992
## 6 NA 5 .008 662 1.000
Next, using the “car” package I have re-coded the following variables similar to Elff’s method in his paper:
library(car)
vote48$vote<-recode(vote48$vote, "'VOTED - FOR TRUMAN'='Truman'")
vote48$vote<-recode(vote48$vote, "'VOTED - FOR DEWEY'='Dewey'")
vote48$vote<-recode(vote48$vote, "'VOTED - FOR WALLACE'= 'Other'")
vote48$vote<-recode(vote48$vote, "'VOTED - FOR OTHER'= 'Other'")
vote48$occupation<-recode(vote48$occupation, "'PROFESSIONAL, SEMI-PROFESSIONAL'= 'Upper white collar'")
vote48$occupation<-recode(vote48$occupation, "'SELF-EMPLOYED, MANAGERIAL, SUPERVISORY'= 'Upper white collar'")
vote48$occupation<-recode(vote48$occupation, "'OTHER WHITE-COLLAR (CLERICAL, SALES, ET'= 'Other white collar'")
vote48$occupation<-recode(vote48$occupation, "'SKILLED AND SEMI-SKILLED'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'PROTECTIVE SERVICE'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'UNSKILLED, INCLUDING FARM AND SERVICE W'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'FARM OPERATORS AND MANAGERS'= 'Farmer'")
vote48$relig<-recode(vote48$relig, "'PROTESTANT'= 'Protestant'")
vote48$relig<-recode(vote48$relig, "'CATHOLIC'= 'Catholic'")
vote48$relig<-recode(vote48$relig, "'JEWISH'= 'Other,none'")
vote48$relig<-recode(vote48$relig, "'OTHER'= 'Other,none'")
vote48$relig<-recode(vote48$relig, "'NONE'= 'Other,none'")
vote48$race<-recode(vote48$race, "'WHITE'= 'White'")
vote48$race<-recode(vote48$race, "'NEGRO'= 'Black'")
library(gmodels)
library(pander)
vote3<-filter(vote48, vote== "Truman"| vote == "Dewey" | vote == "Other")
vote3<-filter(vote3, race== "Black" | race == "White")
vote3<-filter(vote3, occupation == "Blue collar" | occupation == "Other white collar" | occupation == "Upper white collar" | occupation == "Farmer")
CrossTable(vote3$vote, vote3$race)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 336
##
##
## | vote3$race
## vote3$vote | Black | White | Row Total |
## -------------|-----------|-----------|-----------|
## Dewey | 6 | 141 | 147 |
## | 0.143 | 0.007 | |
## | 0.041 | 0.959 | 0.438 |
## | 0.375 | 0.441 | |
## | 0.018 | 0.420 | |
## -------------|-----------|-----------|-----------|
## Other | 0 | 9 | 9 |
## | 0.429 | 0.021 | |
## | 0.000 | 1.000 | 0.027 |
## | 0.000 | 0.028 | |
## | 0.000 | 0.027 | |
## -------------|-----------|-----------|-----------|
## Truman | 10 | 170 | 180 |
## | 0.238 | 0.012 | |
## | 0.056 | 0.944 | 0.536 |
## | 0.625 | 0.531 | |
## | 0.030 | 0.506 | |
## -------------|-----------|-----------|-----------|
## Column Total | 16 | 320 | 336 |
## | 0.048 | 0.952 | |
## -------------|-----------|-----------|-----------|
##
##
CrossTable(vote3$vote, vote3$occupation)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 336
##
##
## | vote3$occupation
## vote3$vote | Blue collar | Farmer | Other white collar | Upper white collar | Row Total |
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## Dewey | 36 | 14 | 31 | 66 | 147 |
## | 13.374 | 1.231 | 1.247 | 22.324 | |
## | 0.245 | 0.095 | 0.211 | 0.449 | 0.438 |
## | 0.240 | 0.326 | 0.534 | 0.776 | |
## | 0.107 | 0.042 | 0.092 | 0.196 | |
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## Other | 4 | 3 | 0 | 2 | 9 |
## | 0.000 | 2.966 | 1.554 | 0.034 | |
## | 0.444 | 0.333 | 0.000 | 0.222 | 0.027 |
## | 0.027 | 0.070 | 0.000 | 0.024 | |
## | 0.012 | 0.009 | 0.000 | 0.006 | |
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## Truman | 110 | 26 | 27 | 17 | 180 |
## | 10.935 | 0.381 | 0.533 | 17.882 | |
## | 0.611 | 0.144 | 0.150 | 0.094 | 0.536 |
## | 0.733 | 0.605 | 0.466 | 0.200 | |
## | 0.327 | 0.077 | 0.080 | 0.051 | |
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## Column Total | 150 | 43 | 58 | 85 | 336 |
## | 0.446 | 0.128 | 0.173 | 0.253 | |
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
##
##
CrossTable(vote3$vote, vote3$relig)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 336
##
##
## | vote3$relig
## vote3$vote | Catholic | Other,none | Protestant | Row Total |
## -------------|------------|------------|------------|------------|
## Dewey | 30 | 10 | 107 | 147 |
## | 3.217 | 1.364 | 2.813 | |
## | 0.204 | 0.068 | 0.728 | 0.438 |
## | 0.316 | 0.303 | 0.514 | |
## | 0.089 | 0.030 | 0.318 | |
## -------------|------------|------------|------------|------------|
## Other | 0 | 1 | 8 | 9 |
## | 2.545 | 0.015 | 1.059 | |
## | 0.000 | 0.111 | 0.889 | 0.027 |
## | 0.000 | 0.030 | 0.038 | |
## | 0.000 | 0.003 | 0.024 | |
## -------------|------------|------------|------------|------------|
## Truman | 65 | 22 | 93 | 180 |
## | 3.910 | 1.056 | 3.048 | |
## | 0.361 | 0.122 | 0.517 | 0.536 |
## | 0.684 | 0.667 | 0.447 | |
## | 0.193 | 0.065 | 0.277 | |
## -------------|------------|------------|------------|------------|
## Column Total | 95 | 33 | 208 | 336 |
## | 0.283 | 0.098 | 0.619 | |
## -------------|------------|------------|------------|------------|
##
##
CrossTable(vote3$vote, vote3$gender)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 336
##
##
## | vote3$gender
## vote3$vote | MALE | FEMALE | Row Total |
## -------------|-----------|-----------|-----------|
## Dewey | 75 | 72 | 147 |
## | 0.017 | 0.018 | |
## | 0.510 | 0.490 | 0.438 |
## | 0.431 | 0.444 | |
## | 0.223 | 0.214 | |
## -------------|-----------|-----------|-----------|
## Other | 3 | 6 | 9 |
## | 0.592 | 0.636 | |
## | 0.333 | 0.667 | 0.027 |
## | 0.017 | 0.037 | |
## | 0.009 | 0.018 | |
## -------------|-----------|-----------|-----------|
## Truman | 96 | 84 | 180 |
## | 0.083 | 0.089 | |
## | 0.533 | 0.467 | 0.536 |
## | 0.552 | 0.519 | |
## | 0.286 | 0.250 | |
## -------------|-----------|-----------|-----------|
## Column Total | 174 | 162 | 336 |
## | 0.518 | 0.482 | |
## -------------|-----------|-----------|-----------|
##
##
vote3_mod<-glm(vote~occupation+race+relig+gender, family = binomial,data=vote3)
summary(vote3_mod)
##
## Call:
## glm(formula = vote ~ occupation + race + relig + gender, family = binomial,
## data = vote3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.065 -0.982 0.659 0.859 1.856
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.548 0.644 2.40 0.01619 *
## occupationFarmer -0.159 0.397 -0.40 0.68883
## occupationOther white collar -1.287 0.336 -3.83 0.00013 ***
## occupationUpper white collar -2.333 0.336 -6.94 4e-12 ***
## raceWhite -0.130 0.594 -0.22 0.82647
## religOther,none 0.589 0.495 1.19 0.23415
## religProtestant -0.577 0.300 -1.92 0.05485 .
## genderFEMALE -0.034 0.253 -0.13 0.89314
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 460.53 on 335 degrees of freedom
## Residual deviance: 381.04 on 328 degrees of freedom
## AIC: 397
##
## Number of Fisher Scoring iterations: 4
Thus, using the small sample of the election of 1948, I have successfully replicated Martin Elff’s analysis without using the “memisc” package. Elff’s paper included inconsistencies should be noted as a point of further study when it comes to reproducible research.