Crosstab - a cross-tabulation function for use with survey data

Rationale

A function that aids the exploration of survey data through simple tabulations of respondent counts and proportions, including the ability to specify:

plus:

Whilst a host of tabulation functions are available in R (table; xtabs; CrossTable) none quite fit this bill, although ctab() from the package catspec comes closest. I have, therefore, developed the function crosstab(), taking ctab() as a starting point.

The main enhancements compared to ctab() are:

The code for the function can be downloaded from here

Implementation

1. Create some example data

ID <- seq(1:177)
Age <- sample(c("0-15", "16-29", "30-44", "45-64", "65+"), 177, replace = TRUE)
Sex <- sample(c("Male", "Female"), 177, replace = TRUE)
Country <- sample(c("England", "Wales", "Scotland", "N. Ireland"), 177, replace = TRUE)
Health <- sample(c("Poor", "Average", "Good"), 177, replace = TRUE)
Survey <- data.frame(Age, Sex, Country, Health)
head(Survey)
##     Age    Sex    Country  Health
## 1 16-29   Male   Scotland    Good
## 2   65+ Female      Wales Average
## 3  0-15   Male      Wales    Poor
## 4 16-29   Male N. Ireland Average
## 5 30-44 Female      Wales    Good
## 6 30-44 Female      Wales Average

2. Table 'types' supported by crosstab()

The basic table types supported by crosstab() are:

# Load function
source("http://pcwww.liv.ac.uk/~william/R/crosstab.r")
# Frequency count
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "f")
##       Sex Female Male Sum
## Age                      
## 0-15          19   20  39
## 16-29         11   14  25
## 30-44         23   17  40
## 45-64         15   19  34
## 65+           20   19  39
## Sum           88   89 177
# Row percentages
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "r")
##       Sex Female   Male    Sum
## Age                           
## 0-15       48.72  51.28 100.00
## 16-29      44.00  56.00 100.00
## 30-44      57.50  42.50 100.00
## 45-64      44.12  55.88 100.00
## 65+        51.28  48.72 100.00
# Column percentages
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "c")
##       Sex Female   Male
## Age                    
## 0-15       21.59  22.47
## 16-29      12.50  15.73
## 30-44      26.14  19.10
## 45-64      17.05  21.35
## 65+        22.73  21.35
## Sum       100.00 100.00
# Joint percentages (sums to 100 within final two table dimensions)
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "j")
##              Health Average   Good   Poor    Sum
## Age   Sex                                       
## 0-15  Female          15.38  15.38  17.95  48.72
##       Male            12.82  17.95  20.51  51.28
##       Sum             28.21  33.33  38.46 100.00
## 16-29 Female          12.00  12.00  20.00  44.00
##       Male            12.00  24.00  20.00  56.00
##       Sum             24.00  36.00  40.00 100.00
## 30-44 Female          15.00  25.00  17.50  57.50
##       Male            20.00  15.00   7.50  42.50
##       Sum             35.00  40.00  25.00 100.00
## 45-64 Female          17.65  14.71  11.76  44.12
##       Male            17.65  26.47  11.76  55.88
##       Sum             35.29  41.18  23.53 100.00
## 65+   Female          17.95  20.51  12.82  51.28
##       Male            15.38  25.64   7.69  48.72
##       Sum             33.33  46.15  20.51 100.00
# Total percentages (sums to 100 across entire table)
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "t")
##              Health Average   Good   Poor    Sum
## Age   Sex                                       
## 0-15  Female           3.39   3.39   3.95  10.73
##       Male             2.82   3.95   4.52  11.30
##       Sum              6.21   7.34   8.47  22.03
## 16-29 Female           1.69   1.69   2.82   6.21
##       Male             1.69   3.39   2.82   7.91
##       Sum              3.39   5.08   5.65  14.12
## 30-44 Female           3.39   5.65   3.95  12.99
##       Male             4.52   3.39   1.69   9.60
##       Sum              7.91   9.04   5.65  22.60
## 45-64 Female           3.39   2.82   2.26   8.47
##       Male             3.39   5.08   2.26  10.73
##       Sum              6.78   7.91   4.52  19.21
## 65+   Female           3.95   4.52   2.82  11.30
##       Male             3.39   5.65   1.69  10.73
##       Sum              7.34  10.17   4.52  22.03
## Sum   Female          15.82  18.08  15.82  49.72
##       Male            15.82  21.47  12.99  50.28
##       Sum             31.64  39.55  28.81 100.00

3. Table marginals supported by crosstab()

The table margin options supported by crosstab() are:

# All margins...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f")  #By default addmargins=TRUE
##              Health Average Good Poor Sum
## Age   Sex                                
## 0-15  Female              6    6    7  19
##       Male                5    7    8  20
##       Sum                11   13   15  39
## 16-29 Female              3    3    5  11
##       Male                3    6    5  14
##       Sum                 6    9   10  25
## 30-44 Female              6   10    7  23
##       Male                8    6    3  17
##       Sum                14   16   10  40
## 45-64 Female              6    5    4  15
##       Male                6    9    4  19
##       Sum                12   14    8  34
## 65+   Female              7    8    5  20
##       Male                6   10    3  19
##       Sum                13   18    8  39
## Sum   Female             28   32   28  88
##       Male               28   38   23  89
##       Sum                56   70   51 177
# No margins...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f", 
    addmargins = FALSE)
##              Health Average Good Poor
## Age   Sex                            
## 0-15  Female              6    6    7
##       Male                5    7    8
## 16-29 Female              3    3    5
##       Male                3    6    5
## 30-44 Female              6   10    7
##       Male                8    6    3
## 45-64 Female              6    5    4
##       Male                6    9    4
## 65+   Female              7    8    5
##       Male                6   10    3
# Grand margins only...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f", 
    subtotals = FALSE)
##                  Health                         
## Age     Sex      Average    Good    Poor     Sum
## 0-15    Female         6       6       7      19
##         Male           5       7       8      20
## 16-29   Female         3       3       5      11
##         Male           3       6       5      14
## 30-44   Female         6      10       7      23
##         Male           8       6       3      17
## 45-64   Female         6       5       4      15
##         Male           6       9       4      19
## 65+     Female         7       8       5      20
##         Male           6      10       3      19
## Sum                   56      70      51     177

4. Percentages and decimal places

# Calculate proportions rather than percentages...
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "t", percentages = FALSE)
##       Sex Female Male  Sum
## Age                       
## 0-15        0.11 0.11 0.22
## 16-29       0.06 0.08 0.14
## 30-44       0.13 0.10 0.23
## 45-64       0.08 0.11 0.19
## 65+         0.11 0.11 0.22
## Sum         0.50 0.50 1.00
# Round output to 1 decimal place...
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "t", percentages = FALSE, 
    dec.places = 1)
##       Sex Female Male Sum
## Age                      
## 0-15         0.1  0.1 0.2
## 16-29        0.1  0.1 0.1
## 30-44        0.1  0.1 0.2
## 45-64        0.1  0.1 0.2
## 65+          0.1  0.1 0.2
## Sum          0.5  0.5 1.0

5. Support for high dimension tables

There is no limit to the dimensions crosstab() can handle, either as 'rows' or 'columns'…

# Create a table with two row and two column variables
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = c("Health", "Country"), 
    type = "f", addmargins = FALSE)
##              Health  Average                              Good                              Poor                          
##              Country England N. Ireland Scotland Wales England N. Ireland Scotland Wales England N. Ireland Scotland Wales
## Age   Sex                                                                                                                 
## 0-15  Female               1          2        1     2       3          1        0     2       1          4        1     1
##       Male                 3          0        1     1       1          2        1     3       3          1        0     4
## 16-29 Female               0          0        3     0       1          1        0     1       1          2        1     1
##       Male                 0          1        2     0       0          3        2     1       1          0        2     2
## 30-44 Female               1          2        1     2       1          5        2     2       2          2        2     1
##       Male                 0          3        2     3       1          4        0     1       0          1        2     0
## 45-64 Female               1          4        0     1       2          2        0     1       0          0        2     2
##       Male                 2          2        0     2       3          1        1     4       0          0        2     2
## 65+   Female               3          1        2     1       3          2        2     1       2          1        1     1
##       Male                 1          2        2     1       1          3        4     2       0          1        0     2

…although in general results will display better if only one column variable is used, with all other dimensions being declared as row variables…

# Create a table with three row and one column variable
crosstab(Survey, row.vars = c("Age", "Sex", "Health"), col.vars = c("Country"), 
    type = "f", addmargins = FALSE)
##                      Country England N. Ireland Scotland Wales
## Age   Sex    Health                                           
## 0-15  Female Average               1          2        1     2
##              Good                  3          1        0     2
##              Poor                  1          4        1     1
##       Male   Average               3          0        1     1
##              Good                  1          2        1     3
##              Poor                  3          1        0     4
## 16-29 Female Average               0          0        3     0
##              Good                  1          1        0     1
##              Poor                  1          2        1     1
##       Male   Average               0          1        2     0
##              Good                  0          3        2     1
##              Poor                  1          0        2     2
## 30-44 Female Average               1          2        1     2
##              Good                  1          5        2     2
##              Poor                  2          2        2     1
##       Male   Average               0          3        2     3
##              Good                  1          4        0     1
##              Poor                  0          1        2     0
## 45-64 Female Average               1          4        0     1
##              Good                  2          2        0     1
##              Poor                  0          0        2     2
##       Male   Average               2          2        0     2
##              Good                  3          1        1     4
##              Poor                  0          0        2     2
## 65+   Female Average               3          1        2     1
##              Good                  3          2        2     1
##              Poor                  2          1        1     1
##       Male   Average               1          2        2     1
##              Good                  1          3        4     2
##              Poor                  0          1        0     2

…and crosstab() does flatten tables in precisely this way when suppressing subtotals…

crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = c("Health", "Country"), 
    type = "f", subtotals = FALSE)
## Warning: WARNING: suppression of subtotals assumes only 1 col var; table
## flattened accordingly
##                            Country                                               
## Age     Sex      Health       England N. Ireland   Scotland      Wales        Sum
## 0-15    Female   Average            1          2          1          2          6
##                  Good               3          1          0          2          6
##                  Poor               1          4          1          1          7
##         Male     Average            3          0          1          1          5
##                  Good               1          2          1          3          7
##                  Poor               3          1          0          4          8
## 16-29   Female   Average            0          0          3          0          3
##                  Good               1          1          0          1          3
##                  Poor               1          2          1          1          5
##         Male     Average            0          1          2          0          3
##                  Good               0          3          2          1          6
##                  Poor               1          0          2          2          5
## 30-44   Female   Average            1          2          1          2          6
##                  Good               1          5          2          2         10
##                  Poor               2          2          2          1          7
##         Male     Average            0          3          2          3          8
##                  Good               1          4          0          1          6
##                  Poor               0          1          2          0          3
## 45-64   Female   Average            1          4          0          1          6
##                  Good               2          2          0          1          5
##                  Poor               0          0          2          2          4
##         Male     Average            2          2          0          2          6
##                  Good               3          1          1          4          9
##                  Poor               0          0          2          2          4
## 65+     Female   Average            3          1          2          1          7
##                  Good               3          2          2          1          8
##                  Poor               2          1          1          1          5
##         Male     Average            1          2          2          1          6
##                  Good               1          3          4          2         10
##                  Poor               0          1          0          2          3
## Sum                                38         53         39         47        177

6. Accessing/viewing the data object returned by crosstab()

crosstab() returns a data object of class 'crosstab', comprising up to three variants of the requested table:

NOTE: $table and $crosstab are of class 'table'; $crosstab.nosub is of class 'data.frame'

The function print.crosstab() automatically prints the version of the table requested by the user, using a suitably adapted presentational style. Alternatively users can access, manipulate and present the returned elements $table, $crosstab and $crosstab.nosub as they see fit.

The returned data object also contains a record of the argument values used to produce tabular output:

7. Bells & Whistles

(a) Default provision of counts AND %s for univariate tables

A feature retained from ctab()…

crosstab(Survey, row.vars = "Age")
##        
## Age      Count Total %
##   0-15   39.00   22.03
##   16-29  25.00   14.12
##   30-44  40.00   22.60
##   45-64  34.00   19.21
##   65+    39.00   22.03
##   Sum   177.00  100.00

(b) Ability to request multiple table types in invocation of crosstab()

crosstab() is designed to produce one type of table at a time; if you want multiple table types it is recommended that you simply invoke the function multiple times.

That said, the ability to create multiple tables types has been carried over from ctab(), provided it doesn't involve tables with suppressed subtotals or require table marginals (which are potentially computed differently for each table type)…

crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "c"), addmargins = FALSE)
##            Count       Column %      
##       Sex Female  Male   Female  Male
## Age                                  
## 0-15       19.00 20.00    21.59 22.47
## 16-29      11.00 14.00    12.50 15.73
## 30-44      23.00 17.00    26.14 19.10
## 45-64      15.00 19.00    17.05 21.35
## 65+        20.00 19.00    22.73 21.35

c) Output display format

A final carry over from ctab() is the ability, when requesting multiple table types, to change the output format ('style') from 'wide' (tables side by side) to 'long' (tables rows interleaved)…

# Style = 'wide' [default]
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "t"), style = "wide", 
    addmargins = FALSE)
##            Count       Total %      
##       Sex Female  Male  Female  Male
## Age                                 
## 0-15       19.00 20.00   10.73 11.30
## 16-29      11.00 14.00    6.21  7.91
## 30-44      23.00 17.00   12.99  9.60
## 45-64      15.00 19.00    8.47 10.73
## 65+        20.00 19.00   11.30 10.73
# Style = 'long'
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "t"), style = "long", 
    addmargins = FALSE)
##               Sex Female  Male
## Age                           
## 0-15  Count        19.00 20.00
##       Total %      10.73 11.30
## 16-29 Count        11.00 14.00
##       Total %       6.21  7.91
## 30-44 Count        23.00 17.00
##       Total %      12.99  9.60
## 45-64 Count        15.00 19.00
##       Total %       8.47 10.73
## 65+   Count        20.00 19.00
##       Total %      11.30 10.73