This is an unmarked optional tutorial to show the kind of thinking that goes into an exploratory data analysis

The goal of this tutorial document is to walk through some of the common issues encountered in the early stages of an exploratory analysis on a set of data. It gives examples of common problem areas in:

This data is a modified version of data from the New Zealand Election Survey, deliberately modified to introduce problems that occur naturally in many data sets.

Step One. Learn something about the data set.

In this case, the New Zealand Election Survey takes place every three years as a postal survey of a sample of registered electors. Some sampled electors were part of a sample panel of people surveyed at the previous election as part of a longitudinal study, others were randomly chosen from the electoral roll. Those electors that were part of the longitudinal panel group were randomly selected in previous elections.

As well as survey results, the data set includes information from the electoral roll, and weighting values for adjusting results. The full NZES data set has been reduced to a selected group of variables, making 3101 observations of 107 variables.

Step Two. Contemplate some questions.

Examining the codebook (or in this case the appendix at the end of the document to check out the variables).

For example, we might decide that since New Zealand is a Mixed Member Proportional voting system, where people get to vote for both an electorate (local) representative and a nationwide political party, that it would be interesting to look at strategic voting under conditions where there are many political parties to choose from. We identify some relevant variables of interest in the data, and investigate the nature of the individual variables before we explore their interactions. The kind of variables they are is going to shape our question.

Read in the data

There are many different kinds of data files in the world. Each one has its own issues when being read in by R. In this case the data is saved as a .RData file, which can be read in by using the load() command.

As with most reading in file commands, inside the parentheses needs to go a piece of text, in quotes, that is the path to the file from the working directory (the working directory is the folder that R is currently paying attention to). The easiest way to get a R Markdown (Rmd) Document and console cooperating about this is to place the file with the data in it in the same folder as the R Markdown (Rmd) Document, open the R Markdown (Rmd) document in RStudio so we are looking at the contents of the document in the editing window, then in the RStudio Session menu, use the Set Working Directory - To Source File Location command to make a common starting point. Then the code in the R Markdown (Rmd) document will use the same working space regardless of whether we knit the document or run code chunks in the Console. In this case, if the nzes2011.RData file is in the same folder as the R Markdown (Rmd) document, hence it can be read into R with the following command:

load("selected_nzes2011.Rdata")

We also want to load packages that have functions in them we want to use. For this particular analysis we will only need the dplyr package, but for your project you will also likely need other packages as well, e.g. ggplot2.

library(dplyr)

Step Three. Prepare for the first question

As a first question, we might be interested in exploring the relationship between the party the person voted for, the party that was their favourite, and if they believed that their vote makes a difference – focusing on the question that are people who believe their vote makes a difference more likely to strategically vote for a party not their favourite. To achieve this, we familiarise ourselves with the variables jpartyvote, jdiffvoting, and _singlefav. First we check the codebook (see Appendix), then we explore the data.

Viewing the entire dataset in the Data Viewer window by clicking on the data frame’s name in the Environment or running the View() command in the Console can be ineffective since the Data Viewer only shows the first 100 columns of the data frame.

Using the str() command on the entire dataset can also be equally ineffective. However we can subset the columns of interest and take a closer look at them. We can use the dplyr chain to select the variables of interest and investigate only their structure by adding str() at the end of the chain:

selected_nzes2011 %>% 
  select(jpartyvote, jdiffvoting, _singlefav) %>% 
  str()

If we try to run that line, we will get an error message about unexpected input or missing object.

We next need to diagnose where the problem lies – in the R code or in the data? The best way to troubleshoot this issue is to run each line of the dplyr chain one by one.

selected_nzes2011 

The first line runs without any erros, but the second line gives an error

selected_nzes2011 %>% 
  select(jpartyvote, jdiffvoting, _singlefav)

We know that select() is a valid dplyr function, so that cannot be the problem. This means the problem might be the variable names. The issue is that R has rules about what variable names are legal (e.g. no spaces, starting with a letter) and when data is loaded, R will often fix variable names to make them legal. This happened to the _singlefav at the time of loading the data.

We could check this by looking through every single variable name in the data with the names() command.

names(selected_nzes2011)
##   [1] "Jelect"         "jblogel"        "jnewspaper"     "jnatradio"     
##   [5] "jtalkback"      "jdiscussp"      "jrallies"       "jpersuade"     
##   [9] "jpcmoney"       "jpcposter"      "jlablike"       "jnatlike"      
##  [13] "jgrnlike"       "jnzflike"       "jactlike"       "junflike"      
##  [17] "jmaolike"       "jmnplike"       "jmostlike"      "jmostlikex"    
##  [21] "jrepublic"      "jsphealth"      "jspedu"         "jspunemp"      
##  [25] "jspdefence"     "jspsuper"       "jspbusind"      "jsppolice"     
##  [29] "jspwelfare"     "jspenviro"      "jgovpdk"        "jgovplab"      
##  [33] "jgovpnat"       "jgovpgrn"       "jgovpnzf"       "jgovpact"      
##  [37] "jgovunf"        "jgovpmao"       "jgovpmnp"       "jnevervoteno"  
##  [41] "jnevervotelab"  "jnevervotenat"  "jnevervotegrn"  "jnevervotenzf" 
##  [45] "jnevervoteact"  "jnevervoteunf"  "jnevervotemao"  "jnevervotemnp" 
##  [49] "jnevervoteoth"  "jnevervoteothx" "jfirstpx"       "jsecondp"      
##  [53] "jage"           "jlanguage"      "jlanguagex"     "jrollsex"      
##  [57] "jhqual"         "jwkft"          "jwkpt"          "jwkun"         
##  [61] "jwkret"         "jwkdis"         "jwksch"         "jwkunpo"       
##  [65] "jwkunpi"        "jhhincome"      "jhhadults"      "jhhchn"        
##  [69] "jmarital"       "r_jind"         "jlablr"         "jnatlr"        
##  [73] "jgrnlr"         "jnzflr"         "jactlr"         "junflr"        
##  [77] "jmaolr"         "jmnplr"         "jslflr"         "jrelservices"  
##  [81] "jrelnone"       "jrelang"        "jrelpres"       "jrelcath"      
##  [85] "jrelmeth"       "jrelbap"        "jrellat"        "jrelrat"       
##  [89] "jrelfun"        "jrelothc"       "jrelnonc"       "jreligionx"    
##  [93] "jreligiousity"  "jethnicity_e"   "jethnicity_m"   "jethnicity_p"  
##  [97] "jethnicity_a"   "jethnicity_o"   "jethnicityx"    "jethnicmost"   
## [101] "jethnicmostx"   "jpartyvote"     "jelecvote"      "njptyvote"     
## [105] "njelecvote"     "jdiffvoting"    "X_singlefav"

However, when we have hundreds of column names, a useful tip is to just search out only possible names. We can search the names for a fragment of the name by using the grep("FRAGMENT", variable, value = TRUE) command, which in this case might be:

grep("singlefav", names(selected_nzes2011), value = TRUE)
## [1] "X_singlefav"

The value = TRUE argument, as described in the help for the grep() function reports the mathing character string, as opposed to the index number for that string.

We can now confirm that the variable is called X_singlefav, so that is how we should be referring to it.

selected_nzes2011 %>% 
  select(jpartyvote, jdiffvoting, X_singlefav) %>% 
  str()

These are all categorical data, however they are recorded as characters (text strings) as opposed to factors.

An easy way of tabulating these data to see how many times each level of is to use the group_by() function along with the summarise() command:

selected_nzes2011 %>% 
  group_by(jpartyvote) %>% 
  summarise(count = n())
## # A tibble: 14 × 2
##       jpartyvote count
##            <chr> <int>
## 1            Act    29
## 2            ALC    10
## 3       Alliance     2
## 4  Another party     8
## 5   Conservative    74
## 6     Don't know    23
## 7          Green   348
## 8         Labour   749
## 9           Mana    62
## 10   Maori Party   128
## 11      National  1130
## 12      NZ First   216
## 13 United Future    14
## 14          <NA>   308

We can see that 23 people answered "Don't know". Since our question is about people who knew which party they voted for, we might want to exclude these observations from our analysis. We can do so by filtering them out.

selected_nzes2011 %>% 
  filter(jpartyvote != "Don't know") %>%
  group_by(jpartyvote) %>% 
  summarise(count = n())
## # A tibble: 12 × 2
##       jpartyvote count
##            <chr> <int>
## 1            Act    29
## 2            ALC    10
## 3       Alliance     2
## 4  Another party     8
## 5   Conservative    74
## 6          Green   348
## 7         Labour   749
## 8           Mana    62
## 9    Maori Party   128
## 10      National  1130
## 11      NZ First   216
## 12 United Future    14

Because there is a %>% at the end of the line, R knows to continue on to the next line, as with any other ‘to be continued’ symbol at the end of the line.

Note that adding the filter also got rid of the NA entries. NA (Not Available) is used to indicate blank entries – those observations for which there is no data recorded. It is always a good plan to be aware of NAs and deliberately include them in or exclude them from the analysis so that the final results are not surprising. In this case since NA indicates that these people did not answer the question about which party they voted for, exluding them from the analysis makes sense.

We can also similarly view the levels and number of occurances of these levels in the X_singlefav variable:

selected_nzes2011 %>% 
  group_by(X_singlefav) %>% 
  summarise(count = n())
## # A tibble: 8 × 2
##     X_singlefav count
##           <chr> <int>
## 1           Act    33
## 2         Green   388
## 3        Labour  1043
## 4          Mana    47
## 5      National  1266
## 6      NZ First   138
## 7 United Future   128
## 8          <NA>    58

This set also has NA entries, but in this case we don’t want to get rid of anything but the NAs so we need to target them directly. NA entries need special targeting because they do not actually exist (they are different to the text "NA" or a variable saved with the name NA).

If we only wanted to find the NAs we would use the is.na() function with the name of the variable inside the parentheses.

However since we want the entries that are not NAs we can use the Not operator, !, to indicate “we want all the ones that are not NA”:!is.na(). Hence we can filter out all non NAs in our dplyr chain:

selected_nzes2011 %>% 
  filter(!is.na(X_singlefav)) %>%
  group_by(X_singlefav) %>% 
  summarise(count = n())
## # A tibble: 7 × 2
##     X_singlefav count
##           <chr> <int>
## 1           Act    33
## 2         Green   388
## 3        Labour  1043
## 4          Mana    47
## 5      National  1266
## 6      NZ First   138
## 7 United Future   128

And remember that we can filter for multiple characteristics at once:

selected_nzes2011 %>% 
  filter(!is.na(X_singlefav), jpartyvote != "Don't know") %>%
  group_by(X_singlefav) %>% 
  summarise(count=n())
## # A tibble: 7 × 2
##     X_singlefav count
##           <chr> <int>
## 1           Act    29
## 2         Green   354
## 3        Labour   914
## 4          Mana    42
## 5      National  1172
## 6      NZ First   119
## 7 United Future   115

If we examine the categories in jdiffvoting we can see that this variable has levels such as both "Don't know" and NA.

selected_nzes2011 %>% 
  group_by(jdiffvoting) %>% 
  summarise(count = n())
## # A tibble: 7 × 2
##                                                         jdiffvoting count
##                                                               <chr> <int>
## 1                                                        Don't know    63
## 2                  Voting can make a big difference to what happens  1605
## 3 Voting can make a reasonable amount of difference to what happens   841
## 4                   Voting can make some difference to what happens   339
## 5                  Voting won't make any difference to what happens   119
## 6                 Voting won't make much difference to what happens   106
## 7                                                              <NA>    28

We need to decide how we want to handle these levels in our analysis.

Remember that our main question is about whether people vote for their favorite party or a diffent one. Hence an straighforwrd approach would be to first determine whether each observation in the data represents a person who voted for the party same as their favorite party or different. This requires creating a new variable with the mutate() function.

In creating this variable we want to evaluate if for a given observation the values in the jpartyvote and X_singlefav variables are the same, or different:

selected_nzes2011 <- selected_nzes2011 %>%
  mutate(sameparty = ifelse(jpartyvote == X_singlefav, "same", "different"))

This creates a new variable named sameparty that has the value "same" if jpartyvote is equal to X_singlefav, and "different" otherwise.

We can again check our work by exploring the groupings in a View:

selected_nzes2011 %>% 
group_by(jpartyvote, X_singlefav, sameparty) %>%
  summarise(count = n())
## Source: local data frame [82 x 4]
## Groups: jpartyvote, X_singlefav [?]
## 
##    jpartyvote   X_singlefav sameparty count
##         <chr>         <chr>     <chr> <int>
## 1         Act           Act      same    12
## 2         Act         Green different     1
## 3         Act      National different    14
## 4         Act United Future different     1
## 5         Act          <NA>      <NA>     1
## 6         ALC         Green different     1
## 7         ALC        Labour different     4
## 8         ALC      National different     2
## 9         ALC United Future different     3
## 10   Alliance        Labour different     1
## # ... with 72 more rows

We can see that observations where jpartyvote equaled X_singlefav, the value "same" was recorded for the new variable sameparty, and the value "different" was recorded otherwise. If either jpartyvote or X_singlefav had an NA, R could not check for equality and hence NA was recorded for the sameparty variable as well.

To view and summarize the “same” entries we can use the following:

selected_nzes2011 %>% 
  group_by(jpartyvote, X_singlefav, sameparty) %>%
  summarise(count = n()) %>% 
  filter(sameparty == "same")
## Source: local data frame [7 x 4]
## Groups: jpartyvote, X_singlefav [7]
## 
##      jpartyvote   X_singlefav sameparty count
##           <chr>         <chr>     <chr> <int>
## 1           Act           Act      same    12
## 2         Green         Green      same   237
## 3        Labour        Labour      same   632
## 4          Mana          Mana      same    31
## 5      National      National      same  1004
## 6      NZ First      NZ First      same    82
## 7 United Future United Future      same     5

And to view and summarize the “different” entries we can use the following:

selected_nzes2011 %>% 
  group_by(jpartyvote, X_singlefav, sameparty) %>%
  summarise(count = n()) %>% 
  filter(sameparty == "different")
## Source: local data frame [59 x 4]
## Groups: jpartyvote, X_singlefav [59]
## 
##       jpartyvote   X_singlefav sameparty count
##            <chr>         <chr>     <chr> <int>
## 1            Act         Green different     1
## 2            Act      National different    14
## 3            Act United Future different     1
## 4            ALC         Green different     1
## 5            ALC        Labour different     4
## 6            ALC      National different     2
## 7            ALC United Future different     3
## 8       Alliance        Labour different     1
## 9       Alliance      National different     1
## 10 Another party         Green different     2
## # ... with 49 more rows

We can also check how we got any NAs we have by using the is.na() function:

selected_nzes2011 %>% 
  group_by(jpartyvote, X_singlefav, sameparty) %>%
  summarise(count = n()) %>% 
  filter(is.na(sameparty))
## Source: local data frame [16 x 4]
## Groups: jpartyvote, X_singlefav [16]
## 
##      jpartyvote   X_singlefav sameparty count
##           <chr>         <chr>     <chr> <int>
## 1           Act          <NA>      <NA>     1
## 2  Conservative          <NA>      <NA>     1
## 3    Don't know          <NA>      <NA>     7
## 4         Green          <NA>      <NA>     1
## 5        Labour          <NA>      <NA>    11
## 6   Maori Party          <NA>      <NA>     2
## 7      National          <NA>      <NA>     7
## 8      NZ First          <NA>      <NA>     2
## 9          <NA>           Act      <NA>     4
## 10         <NA>         Green      <NA>    32
## 11         <NA>        Labour      <NA>   121
## 12         <NA>          Mana      <NA>     4
## 13         <NA>      National      <NA>    92
## 14         <NA>      NZ First      <NA>    17
## 15         <NA> United Future      <NA>    12
## 16         <NA>          <NA>      <NA>    26

The checks show that the observations with NAs in the samepartyare going to be excluded from the analysis when we fiter out the NAs in the jpartyvote and X_singlefav variables, so we don’t need to worry about them anymore.

Step four. Prepare for the second question

As a second question, we might be interested in exploring the relationship between age of voters and how much they like the NZ First party. We become familiar with the variables jnzflike and jage in the codebook, then explore the data.

str(selected_nzes2011$jnzflike)
##  Factor w/ 12 levels "0","1","10","2",..: 1 1 4 10 4 11 NA NA 1 12 ...
str(selected_nzes2011$jage)
##  int [1:3101] 37 37 28 71 43 NA 59 68 64 70 ...

jnzflike is a factor variable, in fact it’s ordinal and by default the levels are listed in alphabetical order. Since this is a categorical variable, we can also summarize the occurances of each level with group_by() and summarise() again:

selected_nzes2011 %>% 
  group_by(jnzflike) %>% 
  summarise(count = n())
## # A tibble: 13 × 2
##      jnzflike count
##        <fctr> <int>
## 1           0   622
## 2           1   298
## 3          10   134
## 4           2   266
## 5           3   227
## 6           4   162
## 7           5   544
## 8           6   165
## 9           7   138
## 10          8   107
## 11          9    81
## 12 Don't know   224
## 13         NA   133

While jnzflike is on a 0 to 10 scale, this variable also has a level labeled "Don't know", which is why R stores this variable as not a numeric variable.

jage, on the other hand, is an integer, with values that are whole numbers between 0 and infinity (or NA). For this variable we would want to take a look at numerical summaries such as means, medians, etc.

selected_nzes2011 %>% 
  summarise(agemean = mean(jage), agemedian = median(jage), agesd = sd(jage), 
            agemin = min(jage), agemax = max(jage))
##   agemean agemedian agesd agemin agemax
## 1      NA        NA   NaN     NA     NA

What went wrong? The reason why all of the results were reported as NAs is that there were some NA entries in the jage variable (people not reporting their age). Since it is not possible to take the average of a series of values that contain NAs, obtaining the numerical summaries requires that we exclude the NAs from the calculation.

Most numerical summary functions allow us to easily exclude NAs with the na.rm argument. See the help documentation for the median function for more information.

?median
## starting httpd help server ...
##  done

An alternative approach is just to filter out the NAs first, and then ask for the numerical summaries:

selected_nzes2011 %>% 
  filter(!(is.na(jage))) %>%
  summarise(agemean = mean(jage), agemedian = median(jage), agesd = sd(jage), 
            agemin = min(jage), agemax = max(jage))
##    agemean agemedian   agesd agemin agemax
## 1 53.22328        54 17.5371     18    100

An age range of 18 to 100 is a reasonable age range for a voting age population, so there are no obvious errors in the data. If there were, we would need to decide if we should filter them out of the analysis.

Having gained some familiarity with the specific variables we are using, we next need to consider if there is additional work we should do on the data in investigating the question. There are a number of different approaches we might take. For example, we could consider if those that strongly like NZ First are older than those that strongly dislike NZ First, or we could consider if old people like NZ First more than young people.

Approach 1: Strongly liking and disliking NZ First and age

If we wanted to select only two of the possible levels in how much people like NZ First, we can filter for these specific levels. When interested in filtering for multiple values a variable can take, the %in% operator can come in handy:

selected_nzes2011 %>% 
  filter(jnzflike %in% c("0","10")) %>%
  group_by(jnzflike) %>% 
  summarise(count = n())
## # A tibble: 2 × 2
##   jnzflike count
##     <fctr> <int>
## 1        0   622
## 2       10   134

Remember that the jnzflike is not a numerical variable, hence we use the quotation marks around the values (even though they happen to be numbers).

This is an example of simpligying the analysis by considering only two levels of a categorical variable, as opposed to all possible levels.

Approach 2: Age and liking for NZ First

We might also like to refine our question slightly, asking do people above retirement age (65 in New Zealand) like NZ First more than younger people. To do this we can turn the numeric age variable into a categorical variable based on whether people are 65 years or older or younger than 65. Once again we make use of the mutate() and ifelse() functions:

selected_nzes2011 <- selected_nzes2011 %>% 
  mutate(retiredage = ifelse(jage >= 65, "retired age", "working age"))
selected_nzes2011 %>% 
  group_by(retiredage) %>% 
  summarise(count = n())
## # A tibble: 3 × 2
##    retiredage count
##         <chr> <int>
## 1 retired age   876
## 2 working age  2156
## 3        <NA>    69

We can see that individuals in the dataset are now labeled as either "retired age" or "working age" or neither (NA), which we can easily filter out if need be.

This is an example of using a numerical threshold to convert a numerical variable to a categorical variable.

For approach 2, we might also be want to turn the scale of liking into numeric values, because at the moment we cannot easily get summary information of the data in factor form. For example, if we ty to run the following command, we get an error saying “need numeric data”.

selected_nzes2011 %>% 
  group_by(retiredage) %>% 
  summarise(medlike = median(jnzflike))

it generates a “need numeric data” error.

We can change the type of data with functions of the form as.thingtochangeto(), but it is easy to go wrong with factors. For example, this is wrong:

selected_nzes2011 <- selected_nzes2011 %>% 
  mutate(numlikenzf = as.numeric(jnzflike))

We can see it has gone wrong if we use grouping to check our work (and it is a very good plan to check our work after converting factors).

selected_nzes2011 %>% 
  group_by(jnzflike, numlikenzf) %>% 
  summarise(count = n())
## Source: local data frame [13 x 3]
## Groups: jnzflike [?]
## 
##      jnzflike numlikenzf count
##        <fctr>      <dbl> <int>
## 1           0          1   622
## 2           1          2   298
## 3          10          3   134
## 4           2          4   266
## 5           3          5   227
## 6           4          6   162
## 7           5          7   544
## 8           6          8   165
## 9           7          9   138
## 10          8         10   107
## 11          9         11    81
## 12 Don't know         12   224
## 13         NA         NA   133

Factor entries have two parts: the text we see on the screen, and a numeric order (remember how 10 was coming between 1 and 2 because of the alphabetical order). When we say “turn this into a number”, R uses the numeric order in which it stores the values to do that conversion, as opposed to the names of the levels of the categorical variable. Hence, we need a conversion method that will use the text strings that label the levels, as opposed to the storage order of these levels. We can do this by first saving the variable as a character variable, and then turning it into a number:

selected_nzes2011 <- selected_nzes2011 %>% 
  mutate(numlikenzf = as.numeric(as.character(jnzflike)))
## Warning in evalq(as.numeric(as.character(structure(c(1L, 1L, 4L, 10L, 4L, :
## NAs introduced by coercion

The warning “NAs introduced by coercion” happens since the level "Don't know" cannot be turned into a number. But this should be fine for our purposes since we are interested in the numerical responses anyway.

selected_nzes2011 %>% 
  group_by(jnzflike, numlikenzf) %>% 
  summarise(count = n())
## Source: local data frame [13 x 3]
## Groups: jnzflike [?]
## 
##      jnzflike numlikenzf count
##        <fctr>      <dbl> <int>
## 1           0          0   622
## 2           1          1   298
## 3          10         10   134
## 4           2          2   266
## 5           3          3   227
## 6           4          4   162
## 7           5          5   544
## 8           6          6   165
## 9           7          7   138
## 10          8          8   107
## 11          9          9    81
## 12 Don't know         NA   224
## 13         NA         NA   133

Converting the factor to a character first ensures that the numerical values used in the labels of the levels of the categorical variable are used.

Now that we cleaned up the data in a way that addresses the needs of the research questions we want to explore, we are ready to continue with our analysis.

Appendix: List of fields in example data

Variable Question DataType
jactlike A14: how much like Act Factor
jactlr A18: Act on left-right scale chr
jage Respondent’s age in years int
jblogel A6h: visit political blog for election chr
jdiffvoting A13: does voting make any difference to what happens chr
jdiscussp A11a: how often discussed politics with others chr
Jelect Electorate int
jelecvote C4: if cast electorate vote, for which party’s candidate chr
jethnicity_a F19a: ethnicity - Asian chr
jethnicity_e F19a: ethnicity - NZ European chr
jethnicity_m F19a: ethnicity - NZ Maori chr
jethnicity_o F19a: ethnicity - Other chr
jethnicity_p F19a: ethnicity - Pacific chr
jethnicityx F19ax: other ethnic group belonged to detail chr
jethnicmost F19b: ethnic group identified with most chr
jethnicmostx F19bx: other ethnic group identified with most chr
jfirstpx C10x: on election day other party most wanted to be in government chr
jgovpact C8: Act helped form the government after 2008 election chr
jgovpdk C8: can’t recall which parties formed the government after 2008 election chr
jgovpgrn C8: Greens helped form the government after 2008 election chr
jgovplab C8: Labour helped form the government after 2008 election chr
jgovpmao C8: Maori Party helped form the government after 2008 election chr
jgovpmnp C8: Mana Party helped form the government after 2008 election chr
jgovpnat C8: National helped form the government after 2008 election chr
jgovpnzf C8: NZ First helped form the government after 2008 election chr
jgovunf C8: United Future helped form the government after 2008 election chr
jgrnlike A14: how much like Greens Factor
jgrnlr A18: Greens on left-right scale chr
jhhadults F23a: number of adults in household int
jhhchn F23b: number of children in household int
jhhincome F22: household income between 1 April 2010 and 31 March 2011 chr
jhqual F8: highest formal educational qualification chr
jlablike A14: how much like Labour Factor
jlablr A18: Labour on left-right scale chr
jlanguage F3: main language spoken at your home chr
jlanguagex F3x: other main language spoken chr
jmaolike A14: how much like Maori Party Factor
jmaolr A18: Maori Party on left-right scale chr
jmarital F24: marital status chr
jmnplike A14: how much like Mana Party Factor
jmnplr A18: Mana Party on left-right scale chr
jmostlike A15: on election day which party liked most chr
jmostlikex A15x: other party liked most chr
jnatlike A14: how much like National Factor
jnatlr A18: National on left-right scale chr
jnatradio A10d: how often followed election news on Radio New Zealand: National chr
jnevervoteact C16: would never vote for Act chr
jnevervotegrn C16: would never vote for Greens chr
jnevervotelab C16: would never vote for Labour chr
jnevervotemao C16: would never vote for Maori Party chr
jnevervotemnp C16: would never vote for Mana Party chr
jnevervotenat C16: would never vote for National chr
jnevervoteno C16: no party for which you would never vote chr
jnevervotenzf C16: would never vote for NZ First chr
jnevervoteoth C16: would never vote for another party chr
jnevervoteothx C16: other party for for which you would never vote chr
jnevervoteunf C16: would never vote for United Future chr
jnewspaper A10c: how often followed election news in newspaper chr
jnzflike A14: how much like NZ First Factor
jnzflr A18: NZ First on left-right scale chr
jpartyvote C3: if cast party vote, for which party chr
jpcmoney A11d: how often contributed money to a party or candidate chr
jpcposter A11e: how often put up party or candidate posters chr
jpersuade A11c: how often talk to anyone to persuade them how to vote chr
jrallies A11b: how often attended political meetings or rallies chr
jrelang F17: anglican chr
jrelbap F17: baptist chr
jrelcath F17: catholic chr
jrelfun F17: independent-fundamentalist-pentecostal church chr
jreligionx F17x: other religion detail chr
jreligiousity F18: how religious are you chr
jrellat F17: latter day saints chr
jrelmeth F17: methodist chr
jrelnonc F17: non-Christian chr
jrelnone F17: no religion chr
jrelothc F17: other Christian chr
jrelpres F17: presbyterian chr
jrelrat F17: ratana chr
jrelservices F16: apart from weddings, funerals, baptisms, how often do you attend religious services chr
jrepublic B1: should NZ become a republic or retain Queen as head of state chr
jrollsex Respondent’s gender from electoral roll chr
jsecondp C11: on election day which party overall was you second choice to be in government chr
jslflr A19: yourself on left-right scale chr
jspbusind B3f: should there be more or less public spending on business and industry chr
jspdefence B3d: should there be more or less public spending on defence chr
jspedu B3b: should there be more or less public spending on education chr
jspenviro B3i: should there be more or less public spending on the environment chr
jsphealth B3a: should there be more or less public spending on health chr
jsppolice B3g: should there be more or less public spending on police and law enforcement chr
jspsuper B3e: should there be more or less public spending on superannuation chr
jspunemp B3c: should there be more or less public spending on unemployment benefits chr
jspwelfare B3h: should there be more or less public spending on welfare benefits chr
jtalkback A10e: how often followed election news on talkback radio chr
junflike A14: how much like United Future Factor
junflr A18: United Future on left-right scale chr
jwkdis F9: disabled, unable to work chr
jwkft F9: working full-time for pay or other income chr
jwkpt F9: working part-time for pay or other income chr
jwkret F9: retired chr
jwksch F9: at school, university, or other educational institution chr
jwkun F9: unemployed, laid off, looking for work chr
jwkunpi F9: working unpaid within the home chr
jwkunpo F9: working unpaid outside the home chr
njelecvote Electorate Vote with nonvote chr
njptyvote Party Vote with nonvote chr
r_jind Respondent Industry Codes chr
_singlefav Caluclated Variable of most liked of major parties Question A14

|chr |