R has excellent tools for visualizing data. Data visualization is the technique used to deliver insights in data using visuals such as graphs, charts, maps, tables, and many others.
In this tutorial, we look at tables and graphics in R. For the
graphics, we focus on the basic functionalities within the
base R and ggplot2.
Here we use the Mid-Atlantic Wage Data from the
ISLR2() package. To install packages in R, you can use the
install.packages() function. Remember that you only
need to install a package once. After installation, you can
load the package into your R session using the
library() function.
Let us install and load the
ISLR2() package so that we can have access to the
Mid-Atlantic Wage Data.
# Install the ISLR2() package (do this only once)
# install.packages("ISLR2")
# Load the package
library(ISLR2)
# Load the Mid-Atlantic Wage Data
data(Wage)
# To get the R Documentation regarding this data, run this code: ?Wage
# Names of variables in the Mid-Atlantic Wage Data
names(Wage)
Frequency Table:
To create a frequency table in R, you can use the
table() function. Create frequency tables for the
variables, maritl and education from the
Mid-Atlantic Wage Data.
# maritl
options(width = 300)
tab1 <- table(Wage$maritl)
tab1
# education
options(width = 300)
tab2 <- table(Wage$education)
tab2
Percentage Table:
To create a percentage table in R, you can use both the
table() and prop.table() functions. Create
percentage tables for the variables, maritl and
education from the Mid-Atlantic Wage Data.
# maritl
options(width = 300)
tab1 <- table(Wage$maritl)
pct1 <- prop.table(tab1)*100
pct1
# education
options(width = 300)
tab2 <- table(Wage$education)
pct2 <- prop.table(tab2)*100
pct2
Creating a two-way table or cross-tabulation
table or contingency table using the
table() function. To add row and column totals to the
cross-tabulation table, we can use the addmargins()
function. Create a two-way table and include the row and column totals
using the variables, maritl and education from
the Mid-Atlantic Wage Data.
options(width = 300)
tab3 <- table(Wage$maritl, Wage$education)
addmargins(tab3)
R has built-in plot functions such as:
Historgram: hist() function.
Density Plot: plot() function.
Boxplot: boxplot() function.
Scatterplot: plot() function.
Bar Graph: barplot() function.
Pie Chart: pie() function.
Let’s continue to use the Mid-Atlantic Wage Data from
the ISLR2() package.
hist(Wage$wage,
xlab = "Class Boundaries (Wage)",
ylab = "Frequency",
main = "Frequency Histogram: Wage",
ylim = c(0, 1500),
freq = TRUE,
col = "green",
breaks = 6) # number of bars. R picks “pretty” cut points near that number
grid()
hist(Wage$wage,
xlab = "Class Boundaries (Wage)",
ylab = "Relative Frequency",
main = "Relative Frequency Histogram: Wage",
ylim = c(0, 0.010),
freq = FALSE,
col = "green",
breaks = 6) # number of bars. R picks “pretty” cut points near that number
grid()
plot(density(Wage$wage),
col = "blue",
xlab = "Wage",
main = "Density: Wage")
grid()
boxplot(Wage$wage,
xlab = "Wage",
ylab = "",
main = "Boxplot",
horizontal = TRUE,
col = "green")
grid()
boxplot(Wage$wage,
xlab = "",
ylab = "Wage",
main = "Boxplot",
horizontal = FALSE,
col = "green")
grid()
boxplot(Wage$wage ~ Wage$education,
xlab = "Wage",
ylab = "Education",
main = "Boxplot: Wage vs Education",
horizontal = TRUE,
col = "blue")
grid()
boxplot(Wage$wage ~ Wage$education,
xlab = "Education",
ylab = "Wage",
main = "Boxplot: Wage vs Education",
horizontal = FALSE,
col = "Purple")
grid()
plot(Wage$age, Wage$wage,
xlab = "Age", ylab = "Wage", main = "Mid-Atlantic Wage Data: Wage vs Age",
pch = 19, col = "blue")
grid()
frequency <- table(Wage$education)
barplot(frequency,
xlab = "Frequency",
ylab = "Education",
horiz = TRUE,
main = "Bar Graph: Education",
col = "blue",
xlim = c(0, 1200))
frequency <- table(Wage$education)
barplot(frequency,
xlab = "Education",
ylab = "Frequency",
horiz = FALSE,
main = "Bar Graph: Education",
col = "blue",
ylim = c(0, 1200))
frequency <- table(Wage$education)
pie(frequency,
col = rainbow(5),
clockwise = TRUE,
main = "Pie Chart: Education")
ggplot2 is a plotting system developed by
Hadley Wickham in 2005. It makes it easy to create
complicated graphs.
ggplot graphs are built layer by layer by adding new
elements. The ggplot function uses the following basic
syntax for different types of graphs:
ggplot(<DATA>, mapping = aes(<MAPPINGS>)) +
<GEOM_FUNCTION>()
DATA: Data set containing the variables to be used
for plotting.
aes: Stands for “Aesthetic”. Function that defines
the variables to be plotted and other plotting characteristics such as
color, shape, size, etc.
GEOM_FUNCTION: Defines how the data is to be
represented in the plot. Popular GEOM_FUNCTIONS include:
geom_point() for scatter plots.
geom_boxplot() for boxplots.
geom_histogram() for histograms.
geom_bar() for bar graphs.
To add a GEOM_FUNCTION to the plot, we use
+ operator.
+ operator can also be used to add other layers
such as labs() to the plot.We can install and load ggplot2() package via the
tidyverse() packages.
# Install ggplot2
# install.packages("ggplot2") # Do this once
# or you can install the entire tidyverse package
# install.packages("tidyverse") # Do this once
# Load ggplot2
library(ggplot2)
Let’s continue to use the Mid-Atlantic Wage Data from the
ISLR2() package.
hist <- ggplot(Wage, aes(x = wage)) +
geom_histogram(binwidth = 30, fill = "blue", color = "black") +
labs(title = "Histogram: Wage", x = " Class Boundaries (Wage)", y = "Frequency") +
theme_bw()
hist
rf_hist <- ggplot(Wage, aes(x = wage)) +
geom_histogram(aes(y = ..density..), binwidth = 20, fill = "brown", color = "blue") +
labs(title = "Relative Frequency Histogram: Wage", x = "Class Boundaries (Wage)", y = "Relative Frequency") +
theme_minimal()
rf_hist
rf_hist <- ggplot(Wage, aes(x = wage)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 20, fill = "brown", color = "blue") +
labs(title = "Relative Frequency Histogram: Wage", x = "Class Boundaries (Wage)", y = "Relative Frequency") +
theme_minimal()
rf_hist
dens <- ggplot(Wage, aes(x = wage)) +
geom_density(fill = "blue", color = "black") +
labs(title = "Density: Wage", x = "Wage", y = "Density") +
theme_minimal()
dens
bxplt <- ggplot(Wage, aes(x = wage)) +
geom_boxplot(fill = "lightblue", color = "black") +
labs(title = "Horizontal Boxplot: Wage", x = "Wage") +
theme_minimal()
bxplt
bxplt <- ggplot(Wage, aes(y = wage)) +
geom_boxplot(fill = "lightblue", color = "black") +
labs(title = "Vertical Boxplot: Wage", y = "Wage") +
theme_minimal()
bxplt
bxplt <- ggplot(Wage, aes(x=education, y=wage)) +
geom_boxplot(outlier.colour="black") +
labs(title = "Vertical Boxplot: Wage", y = "Wage")
bxplt
bxplt <- ggplot(Wage, aes(x=wage, y=education)) +
geom_boxplot(outlier.colour="black")
bxplt
Southeast Missouri State University, ethompson@semo.edu↩︎