A function that aids the exploration of survey data through simple tabulations of respondent counts and proportions, including the ability to specify:
plus:
Whilst a host of tabulation functions are available in R (table; xtabs; CrossTable) none quite fit this bill, although ctab() from the package catspec comes closest. I have, therefore, developed the function crosstab(), taking ctab() as a starting point.
The main enhancements compared to ctab() are:
The code for the function can be downloaded from here
ID <- seq(1:177)
Age <- sample(c("0-15", "16-29", "30-44", "45-64", "65+"), 177, replace = TRUE)
Sex <- sample(c("Male", "Female"), 177, replace = TRUE)
Country <- sample(c("England", "Wales", "Scotland", "N. Ireland"), 177, replace = TRUE)
Health <- sample(c("Poor", "Average", "Good"), 177, replace = TRUE)
Survey <- data.frame(Age, Sex, Country, Health)
head(Survey)
## Age Sex Country Health
## 1 16-29 Male Scotland Good
## 2 65+ Female Wales Average
## 3 0-15 Male Wales Poor
## 4 16-29 Male N. Ireland Average
## 5 30-44 Female Wales Good
## 6 30-44 Female Wales Average
The basic table types supported by crosstab() are:
# Load function
source("http://pcwww.liv.ac.uk/~william/R/crosstab.r")
# Frequency count
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "f")
## Sex Female Male Sum
## Age
## 0-15 19 20 39
## 16-29 11 14 25
## 30-44 23 17 40
## 45-64 15 19 34
## 65+ 20 19 39
## Sum 88 89 177
# Row percentages
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "r")
## Sex Female Male Sum
## Age
## 0-15 48.72 51.28 100.00
## 16-29 44.00 56.00 100.00
## 30-44 57.50 42.50 100.00
## 45-64 44.12 55.88 100.00
## 65+ 51.28 48.72 100.00
# Column percentages
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "c")
## Sex Female Male
## Age
## 0-15 21.59 22.47
## 16-29 12.50 15.73
## 30-44 26.14 19.10
## 45-64 17.05 21.35
## 65+ 22.73 21.35
## Sum 100.00 100.00
# Joint percentages (sums to 100 within final two table dimensions)
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "j")
## Health Average Good Poor Sum
## Age Sex
## 0-15 Female 15.38 15.38 17.95 48.72
## Male 12.82 17.95 20.51 51.28
## Sum 28.21 33.33 38.46 100.00
## 16-29 Female 12.00 12.00 20.00 44.00
## Male 12.00 24.00 20.00 56.00
## Sum 24.00 36.00 40.00 100.00
## 30-44 Female 15.00 25.00 17.50 57.50
## Male 20.00 15.00 7.50 42.50
## Sum 35.00 40.00 25.00 100.00
## 45-64 Female 17.65 14.71 11.76 44.12
## Male 17.65 26.47 11.76 55.88
## Sum 35.29 41.18 23.53 100.00
## 65+ Female 17.95 20.51 12.82 51.28
## Male 15.38 25.64 7.69 48.72
## Sum 33.33 46.15 20.51 100.00
# Total percentages (sums to 100 across entire table)
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "t")
## Health Average Good Poor Sum
## Age Sex
## 0-15 Female 3.39 3.39 3.95 10.73
## Male 2.82 3.95 4.52 11.30
## Sum 6.21 7.34 8.47 22.03
## 16-29 Female 1.69 1.69 2.82 6.21
## Male 1.69 3.39 2.82 7.91
## Sum 3.39 5.08 5.65 14.12
## 30-44 Female 3.39 5.65 3.95 12.99
## Male 4.52 3.39 1.69 9.60
## Sum 7.91 9.04 5.65 22.60
## 45-64 Female 3.39 2.82 2.26 8.47
## Male 3.39 5.08 2.26 10.73
## Sum 6.78 7.91 4.52 19.21
## 65+ Female 3.95 4.52 2.82 11.30
## Male 3.39 5.65 1.69 10.73
## Sum 7.34 10.17 4.52 22.03
## Sum Female 15.82 18.08 15.82 49.72
## Male 15.82 21.47 12.99 50.28
## Sum 31.64 39.55 28.81 100.00
The table margin options supported by crosstab() are:
all margins - addmargins=TRUE [the default]
Note that for tables of type 'row.pct' and 'col.pct' only the appropriate row or column table marginal is added.
no margins - addmargins=FALSE
grand margins only - subtotals=FALSE (not applicable to tables of type 'row.pct' or 'col.pct')
# All margins...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f") #By default addmargins=TRUE
## Health Average Good Poor Sum
## Age Sex
## 0-15 Female 6 6 7 19
## Male 5 7 8 20
## Sum 11 13 15 39
## 16-29 Female 3 3 5 11
## Male 3 6 5 14
## Sum 6 9 10 25
## 30-44 Female 6 10 7 23
## Male 8 6 3 17
## Sum 14 16 10 40
## 45-64 Female 6 5 4 15
## Male 6 9 4 19
## Sum 12 14 8 34
## 65+ Female 7 8 5 20
## Male 6 10 3 19
## Sum 13 18 8 39
## Sum Female 28 32 28 88
## Male 28 38 23 89
## Sum 56 70 51 177
# No margins...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f",
addmargins = FALSE)
## Health Average Good Poor
## Age Sex
## 0-15 Female 6 6 7
## Male 5 7 8
## 16-29 Female 3 3 5
## Male 3 6 5
## 30-44 Female 6 10 7
## Male 8 6 3
## 45-64 Female 6 5 4
## Male 6 9 4
## 65+ Female 7 8 5
## Male 6 10 3
# Grand margins only...
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = "Health", type = "f",
subtotals = FALSE)
## Health
## Age Sex Average Good Poor Sum
## 0-15 Female 6 6 7 19
## Male 5 7 8 20
## 16-29 Female 3 3 5 11
## Male 3 6 5 14
## 30-44 Female 6 10 7 23
## Male 8 6 3 17
## 45-64 Female 6 5 4 15
## Male 6 9 4 19
## 65+ Female 7 8 5 20
## Male 6 10 3 19
## Sum 56 70 51 177
# Calculate proportions rather than percentages...
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "t", percentages = FALSE)
## Sex Female Male Sum
## Age
## 0-15 0.11 0.11 0.22
## 16-29 0.06 0.08 0.14
## 30-44 0.13 0.10 0.23
## 45-64 0.08 0.11 0.19
## 65+ 0.11 0.11 0.22
## Sum 0.50 0.50 1.00
# Round output to 1 decimal place...
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = "t", percentages = FALSE,
dec.places = 1)
## Sex Female Male Sum
## Age
## 0-15 0.1 0.1 0.2
## 16-29 0.1 0.1 0.1
## 30-44 0.1 0.1 0.2
## 45-64 0.1 0.1 0.2
## 65+ 0.1 0.1 0.2
## Sum 0.5 0.5 1.0
There is no limit to the dimensions crosstab() can handle, either as 'rows' or 'columns'…
# Create a table with two row and two column variables
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = c("Health", "Country"),
type = "f", addmargins = FALSE)
## Health Average Good Poor
## Country England N. Ireland Scotland Wales England N. Ireland Scotland Wales England N. Ireland Scotland Wales
## Age Sex
## 0-15 Female 1 2 1 2 3 1 0 2 1 4 1 1
## Male 3 0 1 1 1 2 1 3 3 1 0 4
## 16-29 Female 0 0 3 0 1 1 0 1 1 2 1 1
## Male 0 1 2 0 0 3 2 1 1 0 2 2
## 30-44 Female 1 2 1 2 1 5 2 2 2 2 2 1
## Male 0 3 2 3 1 4 0 1 0 1 2 0
## 45-64 Female 1 4 0 1 2 2 0 1 0 0 2 2
## Male 2 2 0 2 3 1 1 4 0 0 2 2
## 65+ Female 3 1 2 1 3 2 2 1 2 1 1 1
## Male 1 2 2 1 1 3 4 2 0 1 0 2
…although in general results will display better if only one column variable is used, with all other dimensions being declared as row variables…
# Create a table with three row and one column variable
crosstab(Survey, row.vars = c("Age", "Sex", "Health"), col.vars = c("Country"),
type = "f", addmargins = FALSE)
## Country England N. Ireland Scotland Wales
## Age Sex Health
## 0-15 Female Average 1 2 1 2
## Good 3 1 0 2
## Poor 1 4 1 1
## Male Average 3 0 1 1
## Good 1 2 1 3
## Poor 3 1 0 4
## 16-29 Female Average 0 0 3 0
## Good 1 1 0 1
## Poor 1 2 1 1
## Male Average 0 1 2 0
## Good 0 3 2 1
## Poor 1 0 2 2
## 30-44 Female Average 1 2 1 2
## Good 1 5 2 2
## Poor 2 2 2 1
## Male Average 0 3 2 3
## Good 1 4 0 1
## Poor 0 1 2 0
## 45-64 Female Average 1 4 0 1
## Good 2 2 0 1
## Poor 0 0 2 2
## Male Average 2 2 0 2
## Good 3 1 1 4
## Poor 0 0 2 2
## 65+ Female Average 3 1 2 1
## Good 3 2 2 1
## Poor 2 1 1 1
## Male Average 1 2 2 1
## Good 1 3 4 2
## Poor 0 1 0 2
…and crosstab() does flatten tables in precisely this way when suppressing subtotals…
crosstab(Survey, row.vars = c("Age", "Sex"), col.vars = c("Health", "Country"),
type = "f", subtotals = FALSE)
## Warning: WARNING: suppression of subtotals assumes only 1 col var; table
## flattened accordingly
## Country
## Age Sex Health England N. Ireland Scotland Wales Sum
## 0-15 Female Average 1 2 1 2 6
## Good 3 1 0 2 6
## Poor 1 4 1 1 7
## Male Average 3 0 1 1 5
## Good 1 2 1 3 7
## Poor 3 1 0 4 8
## 16-29 Female Average 0 0 3 0 3
## Good 1 1 0 1 3
## Poor 1 2 1 1 5
## Male Average 0 1 2 0 3
## Good 0 3 2 1 6
## Poor 1 0 2 2 5
## 30-44 Female Average 1 2 1 2 6
## Good 1 5 2 2 10
## Poor 2 2 2 1 7
## Male Average 0 3 2 3 8
## Good 1 4 0 1 6
## Poor 0 1 2 0 3
## 45-64 Female Average 1 4 0 1 6
## Good 2 2 0 1 5
## Poor 0 0 2 2 4
## Male Average 2 2 0 2 6
## Good 3 1 1 4 9
## Poor 0 0 2 2 4
## 65+ Female Average 3 1 2 1 7
## Good 3 2 2 1 8
## Poor 2 1 1 1 5
## Male Average 1 2 2 1 6
## Good 1 3 4 2 10
## Poor 0 1 0 2 3
## Sum 38 53 39 47 177
crosstab() returns a data object of class 'crosstab', comprising up to three variants of the requested table:
$table - the underlying cross-tab frequency counts [class: table]
$crosstab - the cross-tab formatted using requested style(frequency/pct) [class:table]
$crosstab.nosub - the cross-tab with subtotals suppressed class: data.frame
NOTE: $table and $crosstab are of class 'table'; $crosstab.nosub is of class 'data.frame'
The function print.crosstab() automatically prints the version of the table requested by the user, using a suitably adapted presentational style. Alternatively users can access, manipulate and present the returned elements $table, $crosstab and $crosstab.nosub as they see fit.
The returned data object also contains a record of the argument values used to produce tabular output:
$row.vars - the row variable(s) used to produce the tables
$col.vars - the column variable(s) used to produce the tables
$dec.places - the number of decimal places to which tabular output is rounded
$type - one or more table types (frequency; row.pct; col.pct; joint.pct; total.pct)
$style - style of table (long or wide) [See section 7 below for details]
$percentages - TRUE if proportions are reported instead as percentages
$addmargins - TRUE if table includes all appropriate marginals
$subtotals - FALSE if table includes only 'grand margins' (i.e. table excludes subtotals)
(a) Default provision of counts AND %s for univariate tables
A feature retained from ctab()…
crosstab(Survey, row.vars = "Age")
##
## Age Count Total %
## 0-15 39.00 22.03
## 16-29 25.00 14.12
## 30-44 40.00 22.60
## 45-64 34.00 19.21
## 65+ 39.00 22.03
## Sum 177.00 100.00
(b) Ability to request multiple table types in invocation of crosstab()
crosstab() is designed to produce one type of table at a time; if you want multiple table types it is recommended that you simply invoke the function multiple times.
That said, the ability to create multiple tables types has been carried over from ctab(), provided it doesn't involve tables with suppressed subtotals or require table marginals (which are potentially computed differently for each table type)…
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "c"), addmargins = FALSE)
## Count Column %
## Sex Female Male Female Male
## Age
## 0-15 19.00 20.00 21.59 22.47
## 16-29 11.00 14.00 12.50 15.73
## 30-44 23.00 17.00 26.14 19.10
## 45-64 15.00 19.00 17.05 21.35
## 65+ 20.00 19.00 22.73 21.35
c) Output display format
A final carry over from ctab() is the ability, when requesting multiple table types, to change the output format ('style') from 'wide' (tables side by side) to 'long' (tables rows interleaved)…
# Style = 'wide' [default]
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "t"), style = "wide",
addmargins = FALSE)
## Count Total %
## Sex Female Male Female Male
## Age
## 0-15 19.00 20.00 10.73 11.30
## 16-29 11.00 14.00 6.21 7.91
## 30-44 23.00 17.00 12.99 9.60
## 45-64 15.00 19.00 8.47 10.73
## 65+ 20.00 19.00 11.30 10.73
# Style = 'long'
crosstab(Survey, row.vars = "Age", col.vars = "Sex", type = c("f", "t"), style = "long",
addmargins = FALSE)
## Sex Female Male
## Age
## 0-15 Count 19.00 20.00
## Total % 10.73 11.30
## 16-29 Count 11.00 14.00
## Total % 6.21 7.91
## 30-44 Count 23.00 17.00
## Total % 12.99 9.60
## 45-64 Count 15.00 19.00
## Total % 8.47 10.73
## 65+ Count 20.00 19.00
## Total % 11.30 10.73