Introduction

This code through explores how to perform descriptives on qualitative data using packages in R. Major packages of concern include: “dplyr”, gtsummary, ggplot, scales, pander, tidyr and KableExtra.

Content Overview

Specifically, it focuses on how to compute the frequency distribution, proportions - within-group percentages and how to explore relationships between variables using visualization.

Why You Should Care

Statistics and visualizations are useful for understanding relationships among or between variable. For example, while a correlation coefficient can tell you the possible associations (strength and direction) between variables, a regression coefficient can tell not only the association but also the direction of causation. These statistics can also be shown graphically. Examples include the use of bar charts, histogram, bar plots, pie charts etc.

Equipped with these knowledge in R, you will be able to explore and understand patterns of relationships and associations between and among variables in your dataset.

Learning Objectives

In the piece, you’ll learn how to do the following: * How to compute frequency distribution and simple percentages; How to use bar charts to visualize categorical variables using counts and in-group proportions; * How to have your table in a presentation-ready formats using the gtsummary package.

Body Title

Here, we’ll show how to create presentation-ready tables, compute in-group proportions. First, we load the packages to be used and the sample dataset to be used.

# LOAD PACKAGES

library (dplyr)
library (gtsummary)
library (magrittr)
library (backports)
library( pander )
library( tidyr )
library( reshape2 )
library( scales )
library( ggplot2 )

library(readxl)

CodeData <- read_excel("C:/Users/seyin/OneDrive - Georgia State University/R Summer Class/Assignments/Code_through_Data.xlsx")

View(CodeData)

Further Exposition

This is based on the discussions (see reference list) on creating presentation-ready summary statistics tables, visualization using ggplots and dplyr package for manipulating data.

Basic Example - Frequency Distribution and Proportions

A basic example shows how a frequency table showing percentages is computed and the result presented in a publishable format.

Using the base R - without the gtsummary package the table looks like this -

table(CodeData$Gender)

## 
## Female   Male 
##     12     14

Using the gtsummary package, the table looks cleaner and near-publishable using the code below:

CodeData %>% 
  select(Gender)%>%
  tbl_summary()

Characteristic	N = 26¹
Gender
Female	12 (46%)
Male	14 (54%)
¹ Statistics presented: n (%)

Adding more codes, we can add a column for the total sample size - N and also embolden the variables.

CodeData %>% 
  select(Gender)%>%
  tbl_summary()%>%
  add_n()%>%
  bold_labels()

Characteristic	N	N = 26¹
Gender	26
Female		12 (46%)
Male		14 (54%)
¹ Statistics presented: n (%)

We can work with as many variables as possible. In this case, two variables are used - Gender and Food Choice.

CodeData %>% 
  select(Gender, `Food choice`)%>%
  tbl_summary()%>%
  add_n()%>%
  bold_labels()

Characteristic	N	N = 26¹
Gender	26
Female		12 (46%)
Male		14 (54%)
Food choice	26
Amala		3 (12%)
Beans		8 (31%)
Eba		9 (35%)
Rice		6 (23%)
¹ Statistics presented: n (%)

Can you see the difference between the two? With this, you are ready for presentation.

Visualizations - Graphs using ggplot

A graph showing frequency counts

CodeData %>%
ggplot(mapping = aes(x = `Food choice`))+
geom_bar()

We can add titles to the graph.

CodeData %>%
ggplot(mapping = aes(x = `Food choice`))+
geom_bar()+
labs(title= "Food preference by counts",
     caption = "Source: Adedotun's Code   through",
        cex.labs = 0.5)

We can also flip the graph to make y axis be in the current position of the x axis.

CodeData %>%
ggplot(mapping = aes(x = `Food choice`))+
geom_bar()+
labs(title= "Food preference by counts",
     caption = "Source: Adedotun's Code   through",
        cex.labs = 0.5)+
coord_flip()

We can then have the y-axis show the percentage of the total that makes up the category. The scale package helps with reflecting percentages in the graph as shown in this example.

ggplot(CodeData, aes(x = `Food choice`)) +
        geom_bar(aes(y = (..count..)/sum(..count..))) +
        xlab("Food choice") +
        scale_y_continuous(labels = scales::percent, name = "Proportion") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

Advanced Examples - In-Group Proportions (Stack and Dodge)

What’s more, it can also be used for in-group proportions used for exploring a pattern by specific demographic features. For example, food choice by gender.

CD <- CodeData %>% 
    group_by(Gender,`Food choice`) %>% 
    summarise (n = n()) %>%
    mutate(prop = n / sum(n))
  
  
  ggplot(CD,aes(`Food choice`,prop,fill=Gender))+
  geom_bar(stat="identity",position=  'dodge')+
  labs(title= "Food preference by Gender (Percentages)",
     caption = "Source: Adedotun's Code through",
        cex.labs = 0.5)+
  scale_y_continuous(labels = scales::percent, name = "Proportion")+
  coord_flip()

Most notably, it’s valuable fo creating component bar plots as well.

CD <- CodeData %>% 
    group_by(Gender,`Food choice`) %>% 
    summarise (n = n()) %>%
    mutate(prop = n / sum(n))
  
  
ggplot(CD,aes(`Food choice`,prop,fill=Gender))+
  geom_bar(stat="identity",position=  'stack')+
  labs(title= "Food preference by Gender (Percentages)",
     caption = "Source: Adedotun's Code through",
        cex.labs = 0.5)+
  scale_y_continuous(labels = scales::percent, name = "Proportion")+
  coord_flip()

Further Resources

Learn more about [package, technique, dataset] with the following:

Resource I ddsjoberg / gtsummary
Resource II Relative frequencies / proportions with dplyr
Resource III ggplot2 - Multi-group histogram with in-group proportions rather than frequency

Works Cited

This code through references and cites the following sources:

Bradley Boehmke (2018). Source II. Categorical Data Descriptive Statistics
Sjoberg, D., Hannum, M., Whiting, K. (2020). Presentation-Ready Summary Tables with gtsummary

Qualitative Data Descriptives Using R

Adedotun Seyingbo

27 July 2020