Surveys - Frequency distributions of multiple response questions with R

This primer employs data from the European Social Survey Round 11. Please find the questionnaire here. For variable naming conventions refer to the code book pp. 61ff.

With question C18 of the survey all 40156 survey participants were asked if they would describe themselves as being a member of a group that is discrimated against in their counry of residence. Respondents who agreed were subsequently asked on what grounds their group is discriminated against (question C19). Possible answers were coded as separate binary variable [variable name in squared brackets] with the values “Marked” and “Not Marked”, respondents could select all that applied:

Colour or race [dscrrce]
Nationality [dscrntn]
Religion [dscrrlg]
Language [dscrlng]
Ethnic group [dscretn]
Age [dscrage]
Gender [dscrgnd]
Sexuality [dscrsex]
Disability [dscrdsb]
Other [dscroth]

Now we have to compile these ten multiple response questions (C19a…C19j) into a single table that informs us about the grounds to feel discriminated against.

A trivial approach

A trivial approach includes reporting the frequency tables of all single responses separately:

# collect original variable names
mrnames <- c(
  "dscrrce","dscrntn","dscrrlg","dscrlng","dscretn",
  "dscrage","dscrgnd","dscrsex","dscrdsb","dscroth"
)
# provide frequency table for each answer 
sapply(df[,mrnames], table)

##            dscrrce dscrntn dscrrlg dscrlng dscretn dscrage dscrgnd dscrsex
## Not marked   39675   39575   39660   39964   39767   39772   39537   39767
## Marked         481     581     496     192     389     384     619     389
##            dscrdsb dscroth
## Not marked   39909   39521
## Marked         247     635

In order to refer the “Marked” answers to the respective sub population we have to select respondents who marked at least one single item (i.e. who were asked question C19 and gave a valid response). Their total count is:

# create temporary data frame for multiple response items
mr_df = df[,mrnames]
# select all cases with at east one Marked answer
mr_df = mr_df[rowSums(mr_df == "Marked") > 0,]
# save and check row count
n = nrow(mr_df)
n

## [1] 3053

Now we know the number of items that were not marked by all respondents who were asked for the grounds of discrimination:

# provide frequency table for each answer and
# save as multiple response (mr) table
mr_table = sapply(mrnames, function (x) table(mr_df[,x]))
# convert into data frame
mr_table = as.data.frame(mr_table)
# print simple table
mr_table

##            dscrrce dscrntn dscrrlg dscrlng dscretn dscrage dscrgnd dscrsex
## Not marked    2572    2472    2557    2861    2664    2669    2434    2664
## Marked         481     581     496     192     389     384     619     389
##            dscrdsb dscroth
## Not marked    2806    2418
## Marked         247     635

An improved version

We will build on this table to get a more dateild overview. First, add totals, namely the numer of respondents (n) and the number of valid answers (sum) provided by them.

# add totals (valid respondents (n), valid answers (sum))
mr_table$Total = c(n, rowSums(mr_table[2,]))
mr_table # print, outcomment if not needed

##            dscrrce dscrntn dscrrlg dscrlng dscretn dscrage dscrgnd dscrsex
## Not marked    2572    2472    2557    2861    2664    2669    2434    2664
## Marked         481     581     496     192     389     384     619     389
##            dscrdsb dscroth Total
## Not marked    2806    2418  3053
## Marked         247     635  4413

Now we transpose the table, the transposition is again (re)converted into a data frame.

# transpose, save as data frame again
mr_table = as.data.frame(t(mr_table))
mr_table # print, outcomment if not needed

##         Not marked Marked
## dscrrce       2572    481
## dscrntn       2472    581
## dscrrlg       2557    496
## dscrlng       2861    192
## dscretn       2664    389
## dscrage       2669    384
## dscrgnd       2434    619
## dscrsex       2664    389
## dscrdsb       2806    247
## dscroth       2418    635
## Total         3053   4413

We have to calculate the share of “Marked” answers from all valid answers. The latter was calculated above:

## [1] 3053

which is equal to summarizing “Marked” and “Not Marked” row wise:

mr_table[,"Marked"] + mr_table[,"Not marked"]

##  [1] 3053 3053 3053 3053 3053 3053 3053 3053 3053 3053 7466

The percentages can now be computed as

# add percentages
mr_table[,"% of Respondents"] = round(mr_table$Marked*100 / n, 1)
mr_table # print, outcomment if not needed

##         Not marked Marked % of Respondents
## dscrrce       2572    481             15.8
## dscrntn       2472    581             19.0
## dscrrlg       2557    496             16.2
## dscrlng       2861    192              6.3
## dscretn       2664    389             12.7
## dscrage       2669    384             12.6
## dscrgnd       2434    619             20.3
## dscrsex       2664    389             12.7
## dscrdsb       2806    247              8.1
## dscroth       2418    635             20.8
## Total         3053   4413            144.5

The table’s row names show the variable short names which cam now be replaced by their respective lables:

# replace item names
rownames(mr_table) = c(
  "colour or race",
  "nationality",
  "religion",
  "language",
  "ethnic group",
  "age",
  "gender",
  "sexuality",
  "disability",
  "other grounds",
  "Total"
)
mr_table # print, outcomment if not needed

##                Not marked Marked % of Respondents
## colour or race       2572    481             15.8
## nationality          2472    581             19.0
## religion             2557    496             16.2
## language             2861    192              6.3
## ethnic group         2664    389             12.7
## age                  2669    384             12.6
## gender               2434    619             20.3
## sexuality            2664    389             12.7
## disability           2806    247              8.1
## other grounds        2418    635             20.8
## Total                3053   4413            144.5

Delete column “Not marked” as it is redundant:

# remove "Not marked"
mr_table = mr_table[,2:3]
mr_table # print, outcomment if not needed

##                Marked % of Respondents
## colour or race    481             15.8
## nationality       581             19.0
## religion          496             16.2
## language          192              6.3
## ethnic group      389             12.7
## age               384             12.6
## gender            619             20.3
## sexuality         389             12.7
## disability        247              8.1
## other grounds     635             20.8
## Total            4413            144.5

Finally, create a ranking to improve our message:

# get order of items, ignore Total
sort_order = order(mr_table[1:nrow(mr_table)-1, "% of Respondents"])
# reverse order, add Total
sort_order = c(rev(sort_order), nrow(mr_table))
# rank items 
mr_table = mr_table[sort_order,]
mr_table # print, outcomment if not needed

##                Marked % of Respondents
## other grounds     635             20.8
## gender            619             20.3
## nationality       581             19.0
## religion          496             16.2
## colour or race    481             15.8
## sexuality         389             12.7
## ethnic group      389             12.7
## age               384             12.6
## disability        247              8.1
## language          192              6.3
## Total            4413            144.5

Now we can easily identify the TOP 3 grounds to feel discriminated against, namely other grounds (20.8%), gender (20.3%), and nationality (19%), while disability (8.1%) and language (6.3%) are the least important factors. On average, each respondent has reported 1.45 grouns to feel discriminated against.

This result can now be labeled and styled to your personal preferences.

# show formatted table 
footnote(
  kable_styling(
    kable(mr_table,
          caption = paste0("Grounds to feel discriminated against (n=",n,")")
    ),
    font_size = 13, full_width = F, 
    bootstrap_options = c("hover", "condensed"),
  ),
  c("ESS round 11, all countries
    Only respondents who describe themselves as being 
    a member of a group that is discriminated against"), general_title=""
)

Grounds to feel discriminated against (n=3053)
	Marked	% of Respondents
other grounds	635	20.8
gender	619	20.3
nationality	581	19.0
religion	496	16.2
colour or race	481	15.8
sexuality	389	12.7
ethnic group	389	12.7
age	384	12.6
disability	247	8.1
language	192	6.3
Total	4413	144.5
ESS round 11, all countries Only respondents who describe themselves as being a member of a group that is discriminated against

Surveys - Frequency distributions of multiple response questions with R

Nils Mevenkamp | nils.mevenkamp@mci.edu

2025-05-15

A trivial approach

An improved version