Reproducing Research

Introduction

This assignment required replicating research from Martin Elff’s paper “Analysing the American National Election Study of 1948.” This data-set includes 662 observations and 65 variables, which is a relatively small sample size for a data-set in any field. The underlying goal of this assignment is to conduct an analysis similar to Elff’s but WITHOUT using the “memisc” package (with the exception of loading the data initially.) In regards to the actual election itself, it is well known the Truman pushed for civil rights and the vote from black citizens helped him win the election of 1948.

Reading the SPSS Data File

The first step in this analysis requires retrieving the data file from it’s original SPSS file as follows:

library(memisc)
options(digits=3)
nes1948.por<-UnZip("anes/NES1948.ZIP","NES1948.POR",package="memisc")

library(foreign)
nes1948<-read.spss(nes1948.por)

Data Set

This analysis requires using the package “dplyr.” The class function helps us figure out what kind of data we are dealing with. From this we can see that this type of data is a list. The package “dplyr” does not understand lists of data which is why it is necessary to convert this list of data into a data frame shown as follows:

library(dplyr)
class(nes1948)

## [1] "list"

vote48<-data.frame(nes1948)

Selecting Variables

As seen above, the data-set has been renamed as “vote48” for clarity. Once the data-set is loaded, the next step requires pulling up the variables we are specifically interested in for this analysis:

library(dplyr)
vote48<-data.frame(nes1948)

vote48<-select(vote48, 
             V480018,  
             V480029,
             V480030,
             V480045,
             V480046,
             V480047,
             V480048,
             V480049,
             V480050 
             )

Re-coding Variables

Using the “gdata” package the variables have been renamed. NOTE: IF Elff was trying to create reproducible work (as all respectable researchers aspire to), he should have noted that the “v” before the name of the variables need to be capitalized in order to be renamed. He uses lower cases in his paper which does not work when trying to re-create his data with out the “memisc” package.

library(gdata)
vote48<-rename.vars(vote48,from=c(
  "V480018","V480029","V480030","V480045", "V480046","V480047","V480048","V480049","V480050"), 
  to=c("vote", "occupation", "unionized", "gender", "race", "age", "education", "totalncome", "religiousPref"))

## 
## Changing in vote48                                                                    
## From: V480018 V480029    V480030   V480045 V480046 V480047 V480048  
## To:   vote    occupation unionized gender  race    age     education
##                               
## From: V480049    V480050      
## To:   totalncome religiousPref

Graphs

Once we have re-named the variables, we can use the following function to see the data appear in graph format. The function “Desc()” is the equivalent to what ELff refers to as a codebook, but contains more detailed information about each variable. It is noteworthy to point out that Elff’s codebook contains meta-data which he does not discuss at all. Furthermore, the plots have also been added to understand the data easier. If I were doing this analysis I would show just the charts, but I included the tables to replicate ELff’s work as follows:

library(DescTools)
Desc(vote48, plotit= TRUE)

## 
## -------------------------------------------------------------------------
## 'data.frame':    662 obs. of  9 variables:
##  1 $ vote         : Factor w/ 7 levels "VOTED - FOR TRUMAN",..: 1 2 1 2 1 2 2 1 2 1 ...
##  2 $ occupation   : Factor w/ 12 levels "PROFESSIONAL, SEMI-PROFESSIONAL",..: 6 3 4 1 1 2 7 7 4 4 ...
##  3 $ unionized    : Factor w/ 4 levels "YES","NO","DK",..: 1 2 2 2 2 2 2 2 1 1 ...
##  4 $ gender       : Factor w/ 3 levels "MALE","FEMALE",..: 1 2 2 2 1 2 1 2 1 1 ...
##  5 $ race         : Factor w/ 4 levels "WHITE","NEGRO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  6 $ age          : Factor w/ 7 levels "18-24","25-34",..: 3 3 2 3 2 3 4 5 2 2 ...
##  7 $ education    : Factor w/ 4 levels "GRADE SCHOOL",..: 1 2 2 3 3 2 1 1 2 2 ...
##  8 $ totalncome   : Factor w/ 8 levels "UNDER $500","$500-$999",..: 4 7 5 7 5 7 5 2 5 6 ...
##  9 $ religiousPref: Factor w/ 6 levels "PROTESTANT","CATHOLIC",..: 1 1 2 1 2 1 1 1 1 2 ...
## 
## ------------------------------------------------------------------------- 
## 1 - vote (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      7      7      y
## 
##                 level freq  perc cumfreq cumperc
## 1        DID NOT VOTE  238  .360     238    .360
## 2  VOTED - FOR TRUMAN  212  .320     450    .680
## 3   VOTED - FOR DEWEY  178  .269     628    .949
## 4 VOTED - NA FOR WHOM   20  .030     648    .979
## 5   VOTED - FOR OTHER   11  .017     659    .995
## 6    NA WHETHER VOTED    2  .003     661    .998
## 7 VOTED - FOR WALLACE    1  .002     662   1.000

## ------------------------------------------------------------------------- 
## 2 - occupation (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0     12     12      y
## 
##                                      level freq  perc cumfreq cumperc
## 1                 SKILLED AND SEMI-SKILLED  164  .248     164    .248
## 2              FARM OPERATORS AND MANAGERS  105  .159     269    .406
## 3  UNSKILLED, INCLUDING FARM AND SERVICE W   85  .128     354    .535
## 4  OTHER WHITE-COLLAR (CLERICAL, SALES, ET   79  .119     433    .654
## 5   SELF-EMPLOYED, MANAGERIAL, SUPERVISORY   73  .110     506    .764
## 6          PROFESSIONAL, SEMI-PROFESSIONAL   44  .066     550    .831
## 7       RETIRED, TOO OLD OR UNABLE TO WORK   38  .057     588    .888
## 8                                HOUSEWIFE   28  .042     616    .931
## 9                                       NA   28  .042     644    .973
## 10                                 STUDENT    7  .011     651    .983
## 11                      PROTECTIVE SERVICE    6  .009     657    .992
## 12                              UNEMPLOYED    5  .008     662   1.000

## ------------------------------------------------------------------------- 
## 3 - unionized (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      4      4      y
## 
##   level freq  perc cumfreq cumperc
## 1    NO  493  .745     493    .745
## 2   YES  150  .227     643    .971
## 3    NA   14  .021     657    .992
## 4    DK    5  .008     662   1.000

## ------------------------------------------------------------------------- 
## 4 - gender (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      3      3      y
## 
##    level freq  perc cumfreq cumperc
## 1 FEMALE  357  .539     357    .539
## 2   MALE  302  .456     659    .995
## 3     NA    3  .005     662   1.000

## ------------------------------------------------------------------------- 
## 5 - race (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      4      3      y
## 
##   level freq  perc cumfreq cumperc
## 1 WHITE  585  .884     585    .884
## 2 NEGRO   60  .091     645    .974
## 3    NA   17  .026     662   1.000
## 4 OTHER    0  .000     662   1.000

## ------------------------------------------------------------------------- 
## 6 - age (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      7      7      y
## 
##         level freq  perc cumfreq cumperc
## 1       35-44  174  .263     174    .263
## 2       25-34  142  .215     316    .477
## 3       45-54  125  .189     441    .666
## 4       55-64   86  .130     527    .796
## 5 65 AND OVER   70  .106     597    .902
## 6       18-24   57  .086     654    .988
## 7          NA    8  .012     662   1.000

## ------------------------------------------------------------------------- 
## 7 - education (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      4      4      y
## 
##          level freq  perc cumfreq cumperc
## 1 GRADE SCHOOL  292  .441     292    .441
## 2  HIGH SCHOOL  266  .402     558    .843
## 3      COLLEGE  100  .151     658    .994
## 4           NA    4  .006     662   1.000

## ------------------------------------------------------------------------- 
## 8 - totalncome (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      8      8      y
## 
##            level freq  perc cumfreq cumperc
## 1     $2000-2999  185  .279     185    .279
## 2     $3000-3999  142  .215     327    .494
## 3     $1000-1999  110  .166     437    .660
## 4 $5000 AND OVER   84  .127     521    .787
## 5     $4000-4999   66  .100     587    .887
## 6      $500-$999   43  .065     630    .952
## 7     UNDER $500   25  .038     655    .989
## 8             NA    7  .011     662   1.000

## ------------------------------------------------------------------------- 
## 9 - religiousPref (factor)
## 
##   length      n    NAs levels unique  dupes
##      662    662      0      6      6      y
## 
##        level freq  perc cumfreq cumperc
## 1 PROTESTANT  460  .695     460    .695
## 2   CATHOLIC  140  .211     600    .906
## 3     JEWISH   25  .038     625    .944
## 4       NONE   18  .027     643    .971
## 5      OTHER   14  .021     657    .992
## 6         NA    5  .008     662   1.000

Further Re-coding

Next, using the “car” package I have re-coded the following variables similar to Elff’s method in his paper:

library(car)

vote48$vote<-recode(vote48$vote, "'VOTED - FOR TRUMAN'='Truman'") 
vote48$vote<-recode(vote48$vote, "'VOTED - FOR DEWEY'='Dewey'")
vote48$vote<-recode(vote48$vote, "'VOTED - FOR WALLACE'= 'Other'")
vote48$vote<-recode(vote48$vote, "'VOTED - FOR OTHER'= 'Other'")

vote48$occupation<-recode(vote48$occupation, "'PROFESSIONAL, SEMI-PROFESSIONAL'= 'Upper white collar'")
vote48$occupation<-recode(vote48$occupation, "'SELF-EMPLOYED, MANAGERIAL, SUPERVISORY'= 'Upper white collar'")
vote48$occupation<-recode(vote48$occupation, "'OTHER WHITE-COLLAR (CLERICAL, SALES, ET'= 'Other white collar'")

vote48$occupation<-recode(vote48$occupation, "'SKILLED AND SEMI-SKILLED'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'PROTECTIVE SERVICE'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'UNSKILLED, INCLUDING FARM AND SERVICE W'= 'Blue collar'")
vote48$occupation<-recode(vote48$occupation, "'FARM OPERATORS AND MANAGERS'= 'Farmer'")

vote48$relig<-recode(vote48$relig, "'PROTESTANT'= 'Protestant'")
vote48$relig<-recode(vote48$relig, "'CATHOLIC'= 'Catholic'")
vote48$relig<-recode(vote48$relig, "'JEWISH'= 'Other,none'")
vote48$relig<-recode(vote48$relig, "'OTHER'= 'Other,none'")
vote48$relig<-recode(vote48$relig, "'NONE'= 'Other,none'")

vote48$race<-recode(vote48$race, "'WHITE'= 'White'")
vote48$race<-recode(vote48$race, "'NEGRO'= 'Black'")

Cross-Tabulations

library(gmodels)
library(pander)
vote3<-filter(vote48, vote== "Truman"| vote == "Dewey" | vote == "Other")
vote3<-filter(vote3, race== "Black" | race == "White")
vote3<-filter(vote3, occupation == "Blue collar" | occupation == "Other white collar" | occupation == "Upper white collar" | occupation == "Farmer")


CrossTable(vote3$vote, vote3$race)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  336 
## 
##  
##              | vote3$race 
##   vote3$vote |     Black |     White | Row Total | 
## -------------|-----------|-----------|-----------|
##        Dewey |         6 |       141 |       147 | 
##              |     0.143 |     0.007 |           | 
##              |     0.041 |     0.959 |     0.438 | 
##              |     0.375 |     0.441 |           | 
##              |     0.018 |     0.420 |           | 
## -------------|-----------|-----------|-----------|
##        Other |         0 |         9 |         9 | 
##              |     0.429 |     0.021 |           | 
##              |     0.000 |     1.000 |     0.027 | 
##              |     0.000 |     0.028 |           | 
##              |     0.000 |     0.027 |           | 
## -------------|-----------|-----------|-----------|
##       Truman |        10 |       170 |       180 | 
##              |     0.238 |     0.012 |           | 
##              |     0.056 |     0.944 |     0.536 | 
##              |     0.625 |     0.531 |           | 
##              |     0.030 |     0.506 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |        16 |       320 |       336 | 
##              |     0.048 |     0.952 |           | 
## -------------|-----------|-----------|-----------|
## 
##

CrossTable(vote3$vote, vote3$occupation)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  336 
## 
##  
##              | vote3$occupation 
##   vote3$vote |        Blue collar |             Farmer | Other white collar | Upper white collar |          Row Total | 
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
##        Dewey |                 36 |                 14 |                 31 |                 66 |                147 | 
##              |             13.374 |              1.231 |              1.247 |             22.324 |                    | 
##              |              0.245 |              0.095 |              0.211 |              0.449 |              0.438 | 
##              |              0.240 |              0.326 |              0.534 |              0.776 |                    | 
##              |              0.107 |              0.042 |              0.092 |              0.196 |                    | 
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
##        Other |                  4 |                  3 |                  0 |                  2 |                  9 | 
##              |              0.000 |              2.966 |              1.554 |              0.034 |                    | 
##              |              0.444 |              0.333 |              0.000 |              0.222 |              0.027 | 
##              |              0.027 |              0.070 |              0.000 |              0.024 |                    | 
##              |              0.012 |              0.009 |              0.000 |              0.006 |                    | 
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
##       Truman |                110 |                 26 |                 27 |                 17 |                180 | 
##              |             10.935 |              0.381 |              0.533 |             17.882 |                    | 
##              |              0.611 |              0.144 |              0.150 |              0.094 |              0.536 | 
##              |              0.733 |              0.605 |              0.466 |              0.200 |                    | 
##              |              0.327 |              0.077 |              0.080 |              0.051 |                    | 
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## Column Total |                150 |                 43 |                 58 |                 85 |                336 | 
##              |              0.446 |              0.128 |              0.173 |              0.253 |                    | 
## -------------|--------------------|--------------------|--------------------|--------------------|--------------------|
## 
##

CrossTable(vote3$vote, vote3$relig)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  336 
## 
##  
##              | vote3$relig 
##   vote3$vote |   Catholic | Other,none | Protestant |  Row Total | 
## -------------|------------|------------|------------|------------|
##        Dewey |         30 |         10 |        107 |        147 | 
##              |      3.217 |      1.364 |      2.813 |            | 
##              |      0.204 |      0.068 |      0.728 |      0.438 | 
##              |      0.316 |      0.303 |      0.514 |            | 
##              |      0.089 |      0.030 |      0.318 |            | 
## -------------|------------|------------|------------|------------|
##        Other |          0 |          1 |          8 |          9 | 
##              |      2.545 |      0.015 |      1.059 |            | 
##              |      0.000 |      0.111 |      0.889 |      0.027 | 
##              |      0.000 |      0.030 |      0.038 |            | 
##              |      0.000 |      0.003 |      0.024 |            | 
## -------------|------------|------------|------------|------------|
##       Truman |         65 |         22 |         93 |        180 | 
##              |      3.910 |      1.056 |      3.048 |            | 
##              |      0.361 |      0.122 |      0.517 |      0.536 | 
##              |      0.684 |      0.667 |      0.447 |            | 
##              |      0.193 |      0.065 |      0.277 |            | 
## -------------|------------|------------|------------|------------|
## Column Total |         95 |         33 |        208 |        336 | 
##              |      0.283 |      0.098 |      0.619 |            | 
## -------------|------------|------------|------------|------------|
## 
##

CrossTable(vote3$vote, vote3$gender)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  336 
## 
##  
##              | vote3$gender 
##   vote3$vote |      MALE |    FEMALE | Row Total | 
## -------------|-----------|-----------|-----------|
##        Dewey |        75 |        72 |       147 | 
##              |     0.017 |     0.018 |           | 
##              |     0.510 |     0.490 |     0.438 | 
##              |     0.431 |     0.444 |           | 
##              |     0.223 |     0.214 |           | 
## -------------|-----------|-----------|-----------|
##        Other |         3 |         6 |         9 | 
##              |     0.592 |     0.636 |           | 
##              |     0.333 |     0.667 |     0.027 | 
##              |     0.017 |     0.037 |           | 
##              |     0.009 |     0.018 |           | 
## -------------|-----------|-----------|-----------|
##       Truman |        96 |        84 |       180 | 
##              |     0.083 |     0.089 |           | 
##              |     0.533 |     0.467 |     0.536 | 
##              |     0.552 |     0.519 |           | 
##              |     0.286 |     0.250 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |       174 |       162 |       336 | 
##              |     0.518 |     0.482 |           | 
## -------------|-----------|-----------|-----------|
## 
##

vote3_mod<-glm(vote~occupation+race+relig+gender, family = binomial,data=vote3)

summary(vote3_mod)

## 
## Call:
## glm(formula = vote ~ occupation + race + relig + gender, family = binomial, 
##     data = vote3)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.065  -0.982   0.659   0.859   1.856  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     1.548      0.644    2.40  0.01619 *  
## occupationFarmer               -0.159      0.397   -0.40  0.68883    
## occupationOther white collar   -1.287      0.336   -3.83  0.00013 ***
## occupationUpper white collar   -2.333      0.336   -6.94    4e-12 ***
## raceWhite                      -0.130      0.594   -0.22  0.82647    
## religOther,none                 0.589      0.495    1.19  0.23415    
## religProtestant                -0.577      0.300   -1.92  0.05485 .  
## genderFEMALE                   -0.034      0.253   -0.13  0.89314    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 460.53  on 335  degrees of freedom
## Residual deviance: 381.04  on 328  degrees of freedom
## AIC: 397
## 
## Number of Fisher Scoring iterations: 4

Conclusion

Thus, using the small sample of the election of 1948, I have successfully replicated Martin Elff’s analysis without using the “memisc” package. Elff’s paper included inconsistencies should be noted as a point of further study when it comes to reproducible research.

References

http://cran.r-project.org/web/packages/memisc/vignettes/anes48.pdf