# load data
data(CPS85 , package = "mosaicData")
GGplot is used to create a graph and then add more detail onto it in order to give the reader a better understanding of the data. When creating a graph with ggplot only ggplot and geoms are required everything else is optional in order to make the data easier to read.
# specify dataset and mapping
library(ggplot2)
ggplot(data = CPS85,
mapping = aes(x = exper, y = wage))
GGplot is used to create a frame in order to put the data in, and to get the ase or aesthetic of the graph. The graph starts out as empty because we only defined the axes and not the data that should be on the graph.
# add points
ggplot(data = CPS85,
mapping = aes(x = exper, y = wage)) +
geom_point()
# delete outlier
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
plotdata <- filter(CPS85, wage < 40)
# redraw scatterplot
ggplot(data = plotdata,
mapping = aes(x = exper, y = wage)) +
geom_point()
# make points blue, larger, and semi-transparent
ggplot(data = plotdata,
mapping = aes(x = exper, y = wage)) +
geom_point(color = "cornflowerblue",
alpha = .7,
size = 3)
# add a line of best fit.
ggplot(data = plotdata,
mapping = aes(x = exper, y = wage)) +
geom_point(color = "cornflowerblue",
alpha = .7,
size = 3) +
geom_smooth(method = "lm")
The geom code is used to place the geometric objects or points onto the graph, like lines points and bars. It can be used to determine an outlier and then delete on the graph after creating the data points. The geom function can also be used to change the appearance of the data points on the graph including the size of the points, color of the points and opacity of the points. Transparency can help see data points that may be overlapping. The geom function can have smooth added to it in order to add a line of best fit to the graph.
# indicate sex using color
ggplot(data = plotdata,
mapping = aes(x = exper,
y = wage,
color = sex)) +
geom_point(alpha = .7,
size = 3) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5)
Variables can be added to places other than the axes using ggplot, they can also be added to the to the color, shape, size, opacity, and other visual characteristics.
# modify the x and y axes and specify the colors to be used
ggplot(data = plotdata,
mapping = aes(x = exper,
y = wage,
color = sex)) +
geom_point(alpha = .7,
size = 3) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5) +
scale_x_continuous(breaks = seq(0, 60, 10)) +
scale_y_continuous(breaks = seq(0, 30, 5),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue"))
Scale control can change the way visual characteristics are added to the plot using the scale_ code.
# reproduce plot for each level of job sector
ggplot(data = plotdata,
mapping = aes(x = exper,
y = wage,
color = sex)) +
geom_point(alpha = .7) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 60, 10)) +
scale_y_continuous(breaks = seq(0, 30, 5),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~sector)
The Facet code allows R to recreate the graph for each level of variables that are given, if there are 7 job titles in the data it will create one for each job title.
# add informative labels
ggplot(data = plotdata,
mapping = aes(x = exper,
y = wage,
color = sex)) +
geom_point(alpha = .7) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 60, 10)) +
scale_y_continuous(breaks = seq(0, 30, 5),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~sector) +
labs(title = "Relationship between wages and experience",
subtitle = "Current Population Survey",
caption = "source: http://mosaic-web.org/",
x = " Years of Experience",
y = "Hourly Wage",
color = "Gender")
Graphs need to be easy to read and the labs code will add labels to each of the important data sets including labels for the axes and legends as well as a title.
# use a minimalist theme
ggplot(data = plotdata,
mapping = aes(x = exper,
y = wage,
color = sex)) +
geom_point(alpha = .6) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 60, 10)) +
scale_y_continuous(breaks = seq(0, 30, 5),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~sector) +
labs(title = "Relationship between wages and experience",
subtitle = "Current Population Survey",
caption = "source: http://mosaic-web.org/",
x = " Years of Experience",
y = "Hourly Wage",
color = "Gender") +
theme_minimal()
This code will give the graph a clean apperance and is the last step in making a graph clean, easy to read, and appealing to the eye. The theme_ function includes background colors, fonts, grid-lines, legend placement, and other non-data related features of the graph.