Visualizations

R has excellent tools for visualizing data. Data visualization is the technique used to deliver insights in data using visuals such as graphs, charts, maps, tables, and many others.

In this tutorial, we look at tables and graphics in R. For the graphics, we focus on the basic functionalities within the base R and ggplot2.

Tables

Frequency/Percentage Tables: One Categorical Variable

Here we use the Mid-Atlantic Wage Data from the ISLR2() package. To install packages in R, you can use the install.packages() function. Remember that you only need to install a package once. After installation, you can load the package into your R session using the library() function.

Let us install and load the ISLR2() package so that we can have access to the Mid-Atlantic Wage Data.

# Install the ISLR2() package (do this only once)
# install.packages("ISLR2")
# Load the package
library(ISLR2)
# Load the Mid-Atlantic Wage Data
data(Wage)

# To get the R Documentation regarding this data, run this code: ?Wage

# Names of variables in the Mid-Atlantic Wage Data
names(Wage)

Frequency Table:

To create a frequency table in R, you can use the table() function. Create frequency tables for the variables, maritl and education from the Mid-Atlantic Wage Data.

# maritl
options(width = 300)
tab1 <- table(Wage$maritl)
tab1
# education
options(width = 300)
tab2 <- table(Wage$education)
tab2

Percentage Table:

To create a percentage table in R, you can use both the table() and prop.table() functions. Create percentage tables for the variables, maritl and education from the Mid-Atlantic Wage Data.

# maritl
options(width = 300)
tab1 <- table(Wage$maritl)
pct1 <- prop.table(tab1)*100
pct1
# education
options(width = 300)
tab2 <- table(Wage$education)
pct2 <- prop.table(tab2)*100
pct2

Frequency Tables: Two Categorical Variables

Creating a two-way table or cross-tabulation table or contingency table using the table() function. To add row and column totals to the cross-tabulation table, we can use the addmargins() function. Create a two-way table and include the row and column totals using the variables, maritl and education from the Mid-Atlantic Wage Data.

options(width = 300)
tab3 <- table(Wage$maritl, Wage$education)
addmargins(tab3)

R Base Graphics

R has built-in plot functions such as:

  • Historgram: hist() function.

  • Density Plot: plot() function.

  • Boxplot: boxplot() function.

  • Scatterplot: plot() function.

  • Bar Graph: barplot() function.

  • Pie Chart: pie() function.

Let’s continue to use the Mid-Atlantic Wage Data from the ISLR2() package.

Frequency Histogram: Wage

hist(Wage$wage, 
     xlab = "Class Boundaries (Wage)", 
     ylab = "Frequency", 
     main = "Frequency Histogram: Wage", 
     ylim = c(0, 1500), 
     freq = TRUE, 
     col = "green", 
     breaks = 6) # number of bars. R picks “pretty” cut points near that number
grid()

Relative Frequency Histogram: Wage

hist(Wage$wage, 
     xlab = "Class Boundaries (Wage)", 
     ylab = "Relative Frequency", 
     main = "Relative Frequency Histogram: Wage", 
     ylim = c(0, 0.010), 
     freq = FALSE, 
     col = "green", 
     breaks = 6) # number of bars. R picks “pretty” cut points near that number
grid()

Density: Wage

plot(density(Wage$wage), 
     col = "blue",
     xlab = "Wage", 
     main = "Density: Wage")
grid()

Horizontal Boxplot: Wage

boxplot(Wage$wage, 
        xlab = "Wage", 
        ylab = "",
        main = "Boxplot", 
        horizontal = TRUE, 
        col = "green")
grid()

Vertical Boxplot: Wage

boxplot(Wage$wage, 
        xlab = "", 
        ylab = "Wage",
        main = "Boxplot", 
        horizontal = FALSE, 
        col = "green")
grid()

Horizontal Boxplot: Wage vs Education

boxplot(Wage$wage ~ Wage$education, 
        xlab = "Wage", 
        ylab = "Education",
        main = "Boxplot: Wage vs Education", 
        horizontal = TRUE, 
        col = "blue")
grid()

Vertical Boxplot: Wage vs Education

boxplot(Wage$wage ~ Wage$education, 
        xlab = "Education", 
        ylab = "Wage",
        main = "Boxplot: Wage vs Education", 
        horizontal = FALSE, 
        col = "Purple")
grid()

Scatter Plot: Wage vs Age

plot(Wage$age, Wage$wage,
     xlab = "Age", ylab = "Wage", main = "Mid-Atlantic Wage Data: Wage vs Age",
     pch = 19, col = "blue")
grid()

Horizontal Bar Graph: Education

frequency <- table(Wage$education)
barplot(frequency, 
        xlab = "Frequency", 
        ylab = "Education",
        horiz = TRUE,
        main = "Bar Graph: Education",
        col = "blue",
        xlim = c(0, 1200))

Vertical Bar Graph: Education

frequency <- table(Wage$education)
barplot(frequency, 
        xlab = "Education", 
        ylab = "Frequency",
        horiz = FALSE,
        main = "Bar Graph: Education",
        col = "blue",
        ylim = c(0, 1200))

Pie Chart: Education

frequency <- table(Wage$education)
pie(frequency, 
    col = rainbow(5),
    clockwise = TRUE, 
    main = "Pie Chart: Education")

ggplot2

ggplot2 is a plotting system developed by Hadley Wickham in 2005. It makes it easy to create complicated graphs.

ggplot graphs are built layer by layer by adding new elements. The ggplot function uses the following basic syntax for different types of graphs:

ggplot(<DATA>, mapping = aes(<MAPPINGS>)) +
  <GEOM_FUNCTION>()
  • DATA: Data set containing the variables to be used for plotting.

  • aes: Stands for “Aesthetic”. Function that defines the variables to be plotted and other plotting characteristics such as color, shape, size, etc.

  • GEOM_FUNCTION: Defines how the data is to be represented in the plot. Popular GEOM_FUNCTIONS include:

    • geom_point() for scatter plots.

    • geom_boxplot() for boxplots.

    • geom_histogram() for histograms.

    • geom_bar() for bar graphs.

  • To add a GEOM_FUNCTION to the plot, we use + operator.

    • The + operator can also be used to add other layers such as labs() to the plot.
  • We can install and load ggplot2() package via the tidyverse() packages.

Install and Load ggplot2

# Install ggplot2
# install.packages("ggplot2") # Do this once
# or you can install the entire tidyverse package
# install.packages("tidyverse") # Do this once
# Load ggplot2
library(ggplot2)


Let’s continue to use the Mid-Atlantic Wage Data from the ISLR2() package.

Frequency Histogram: Wage

hist <- ggplot(Wage, aes(x = wage)) +
  geom_histogram(binwidth = 30, fill = "blue", color = "black") +
  labs(title = "Histogram: Wage", x = " Class Boundaries (Wage)", y = "Frequency") +
  theme_bw()
hist

Relative Frequency Histogram: Wage

rf_hist <- ggplot(Wage, aes(x = wage)) +
  geom_histogram(aes(y = ..density..), binwidth = 20, fill = "brown", color = "blue") +
  labs(title = "Relative Frequency Histogram: Wage", x = "Class Boundaries (Wage)", y = "Relative Frequency") +
  theme_minimal()
rf_hist
rf_hist <- ggplot(Wage, aes(x = wage)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 20, fill = "brown", color = "blue") +
  labs(title = "Relative Frequency Histogram: Wage", x = "Class Boundaries (Wage)", y = "Relative Frequency") +
  theme_minimal()
rf_hist

Density: Wage

dens <- ggplot(Wage, aes(x = wage)) +
  geom_density(fill = "blue", color = "black") +
  labs(title = "Density: Wage", x = "Wage", y = "Density") +
  theme_minimal()
dens

Horizontal Boxplot: Wage

bxplt <- ggplot(Wage, aes(x = wage)) +
  geom_boxplot(fill = "lightblue", color = "black") +
  labs(title = "Horizontal Boxplot: Wage", x = "Wage") +
  theme_minimal()
bxplt

Vertical Boxplot: Wage

bxplt <- ggplot(Wage, aes(y = wage)) +
  geom_boxplot(fill = "lightblue", color = "black") +
  labs(title = "Vertical Boxplot: Wage", y = "Wage") +
  theme_minimal()
bxplt

Vertical Boxplot: Wage vs Education

bxplt <- ggplot(Wage, aes(x=education, y=wage)) + 
  geom_boxplot(outlier.colour="black") + 
 labs(title = "Vertical Boxplot: Wage", y = "Wage") 
bxplt

Horizontal Boxplot: Wage vs Education

bxplt <- ggplot(Wage, aes(x=wage, y=education)) + 
  geom_boxplot(outlier.colour="black")
bxplt

  1. Southeast Missouri State University, ↩︎