GGplots works with dataframes and not individual vectors. all the data needed to make a plot is typically contained within a dataframe supplied to a ggplot() itself or could e supplied to respective geoms.
##Getting data for all the plots starting here
midwest <- read.csv("http://goo.gl/G1K41K")
head(midwest)
## PID county state area poptotal popdensity popwhite popblack
## 1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702
## 2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496
## 3 563 BOND IL 0.022 14991 681.4091 14477 429
## 4 564 BOONE IL 0.017 30806 1812.1176 29344 127
## 5 565 BROWN IL 0.018 5836 324.2222 5264 547
## 6 566 BUREAU IL 0.050 35688 713.7600 35157 50
## popamerindian popasian popother percwhite percblack percamerindan
## 1 98 249 124 96.71206 2.5752761 0.1482826
## 2 19 48 9 66.38434 32.9004329 0.1788067
## 3 35 16 34 96.57128 2.8617170 0.2334734
## 4 46 150 1139 95.25417 0.4122574 0.1493216
## 5 14 5 6 90.19877 9.3728581 0.2398903
## 6 65 195 221 98.51210 0.1401031 0.1821340
## percasian percother popadults perchsd percollege percprof
## 1 0.37675897 0.18762294 43298 75.10740 19.63139 4.355859
## 2 0.45172219 0.08469791 6724 59.72635 11.24331 2.870315
## 3 0.10673071 0.22680275 9669 69.33499 17.03382 4.488572
## 4 0.48691813 3.69733169 19272 75.47219 17.27895 4.197800
## 5 0.08567512 0.10281014 3979 68.86152 14.47600 3.367680
## 6 0.54640215 0.61925577 23444 76.62941 18.90462 3.275891
## poppovertyknown percpovertyknown percbelowpoverty percchildbelowpovert
## 1 63628 96.27478 13.151443 18.01172
## 2 10529 99.08714 32.244278 45.82651
## 3 14235 94.95697 12.068844 14.03606
## 4 30337 98.47757 7.209019 11.17954
## 5 4815 82.50514 13.520249 13.02289
## 6 35107 98.37200 10.399635 14.15882
## percadultpoverty percelderlypoverty inmetro category
## 1 11.009776 12.443812 0 AAR
## 2 27.385647 25.228976 0 LHR
## 3 10.852090 12.697410 0 AAR
## 4 5.536013 6.217047 1 ALU
## 5 11.143211 19.200000 0 AAR
## 6 8.179287 11.008586 0 AAR
### Loading the ggplot2 package
library(ggplot2,quietly = FALSE)
## Warning: package 'ggplot2' was built under R version 3.4.3
##
## Attaching package: 'ggplot2'
## The following object is masked _by_ '.GlobalEnv':
##
## midwest
options(scipen = 999) ### turn off scientific notation like 1e-09
## Init Ggplot ##
## let's initialise a basic ggplot based on midwest dataset.
ggplot(midwest, aes(x=area,y=poptotal)) ## area and poptotal are columns in midwest
Also note that aes() function is used to specify the X and Y axes. That’s because, any information that is part of the source dataframe has to be specified inside the aes() function.__
ggplot(midwest, aes(x=area, y=poptotal)) + geom_point()
we can see that most of the points are concentrated on bottom portion of the plot.Like geom_point(), there are many such layers which could be added in the existing plot, one such example is just adding geom_smooth(method = ‘lm’). Since the method is set as lm(linear model), it draws a line of best fit
g <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method = 'lm')
plot(g)
The line of best fit is in blue. Can you find out what other method options are available for geom_smooth? (note: see ?geom_smooth). You might have noticed that majority of points lie in the bottom of the chart which doesn’t really look nice. So, let’s change the Y-axis limits to focus on the lower half.
There are two ways to control X and Y limits. either by deleting the points outside the range or Zooming in
g <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method = 'lm') + xlim(0,0.1) +ylim(0,1000000)
plot(g)
## Warning: Removed 5 rows containing non-finite values (stat_smooth).
## Warning: Removed 5 rows containing missing values (geom_point).
Here if you noticed the line of best fit became more horizontal as compared to the original plot.This is becasue while using xlim() and ylim(), points outside specified range are deleted and will not be considered while drawing the line of best fit
g <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method = 'lm')
g1 <- g + coord_cartesian(xlim = c(0,0.1), ylim = c(0,1000000)) ### Zooom in
plot(g1)
I have stored this as g1. Let’s add the plot title and labels for X and Y axis. This can be done in one go using the labs() function with title, x and y arguments. Another option is to use the ggtitle() xlab() and ylab()
g <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method="lm")
g1 <- g + coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) # zooms in
g1 + labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
##g1 + ggtitle("Area Vs Population", subtitle="From midwest dataset") + xlab("Area") + ylab("Population")##
ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point() +
geom_smooth(method="lm") +
coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
We will modify the aesthetics of the geoms and will change the color of the respective points and line to a static value.
ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(col="steelblue", size=3) + # Set static color and size for points
geom_smooth(method="lm", col="firebrick") + # change the color of line
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
Now each point is colored based on the state it belongs because of aes(col=state). Not just color, but size, shape, stroke (thickness of boundary) and fill (fill color) can be used to discriminate groupings.
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state), size=3) + # Set color to vary based on state categories.
geom_smooth(method="lm", col="firebrick", size=2) +
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
plot(gg)
gg + scale_color_brewer(palette = "Set1") ##change color pallete
library(RColorBrewer)
display.brewer.all()
let’s see how to change the X and Y axis text and its location. This involves two aspects: breaks and labels. Step 1: Set the breaks—-The breaks should be of the same scale as the X axis variable. Note that I am using scale_x_continuous because, the X axis variable is a continuous variable. Had it been a date variable, scale_x_date could be used. Like scale_x_continuous() an equivalent scale_y_continuous() is available for Y axis.
# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state), size=3) + # Set color to vary based on state categories.
geom_smooth(method="lm", col="firebrick", size=2) +
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
# Change breaks
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))
Step 2: Change the labels You can optionally change the labels at the axis ticks. labels take a vector of the same length as breaks.Let me demonstrate by setting the labels to alphabets from a to k (though there is no meaning to it in this context).
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state), size=3) + # Set color to vary based on state categories.
geom_smooth(method="lm", col="firebrick", size=2) +
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
# Change breaks + label
gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01), labels = letters[1:11])
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state), size=3) + # Set color to vary based on state categories.
geom_smooth(method="lm", col="firebrick", size=2) +
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
# Reverse X Axis Scale
gg + scale_x_reverse()
instead of changing the theme components individually, we can change the entire theme itself using pre-built themes. The help page ?theme_bw shows all the available built-in themes.This again is commonly done in couple of ways. * Use the theme_set() to set the theme before drawing the ggplot. Note that this setting will affect all future plots.Draw the ggplot and then add the overall theme setting (eg. theme_bw())
# Base plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state), size=3) + # Set color to vary based on state categories.
geom_smooth(method="lm", col="firebrick", size=2) +
coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) +
labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")
gg <- gg + scale_x_continuous(breaks=seq(0, 0.1, 0.01))
# method 1: Using theme_set()
theme_set(theme_classic()) # not run
gg
# method 2: Adding theme Layer itself.
gg + theme_bw() + labs(subtitle="BW Theme")
gg + theme_classic() + labs(subtitle="Classic Theme")
Plot and axis titles and the axis text are part of the plot’s theme. Therefore, it can be modified using the theme() function. The theme() function accepts one of the four element_type() functions mentioned above as arguments. Since the plot and axis titles are textual components, element_text() is used to modify them.
Below, I have changed the size, color, face and line-height. The axis text can be rotated by changing the angle.
# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) +
labs(title="Area Vs Population", y="Population", x="Area", caption="Source: midwest")
# Modify theme components -------------------------------------------
gg + theme(plot.title=element_text(size=20,
face="bold",
family="American Typewriter",
color="tomato",
hjust=0.5,
lineheight=1.2), # title
plot.subtitle=element_text(size=15,
family="American Typewriter",
face="bold",
hjust=0.5), # subtitle
plot.caption=element_text(size=15), # caption
axis.title.x=element_text(vjust=10,
size=15), # X axis title
axis.title.y=element_text(size=15), # Y axis title
axis.text.x=element_text(size=10,
angle = 30,
vjust=.5), # X axis text
axis.text.y=element_text(size=10)) # Y axis text
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
This can be done using the respective scale_aesthetic_manual() function. The new legend labels are supplied as a character vector to the labels argument. If you want to change the color of the categories, it can be assigned to the values argument as shown in below example.
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) +
labs(title="Area Vs Population", y="Population", x="Area", caption="Source: midwest")
gg + scale_color_manual(name="State",
labels = c("Illinois",
"Indiana",
"Michigan",
"Ohio",
"Wisconsin"),
values = c("IL"="blue",
"IN"="red",
"MI"="green",
"OH"="brown",
"WI"="orange"))
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).
The styling of legend title, text, key and the guide can also be adjusted. The legend’s key is a figure like element, so it has to be set usingelement_rect()function.
# Base Plot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) +
labs(title="Area Vs Population", y="Population", x="Area", caption="Source: midwest")
gg + theme(legend.title = element_text(size=12, color = "firebrick"),
legend.text = element_text(size=10),
legend.key=element_rect(fill='springgreen')) +
guides(colour = guide_legend(override.aes = list(size=2, stroke=1.5)))
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).
Let us use mpg dataset for this operation in ggplot
data(mpg, package="ggplot2") # load data
# mpg <- read.csv("http://goo.gl/uEeRGu") # alt data source
g <- ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
labs(title="hwy vs displ", caption = "Source: mpg") +
geom_smooth(method="lm", se=FALSE) +
theme_bw() # apply bw theme
plot(g)
We have a simple chart of highway mileage (hwy) against the engine displacement (displ) for the whole dataset. But what if you want to study how this relationship varies for different classes of vehicles?
The facet_wrap() is used to break down a large plot into multiple small plots for individual categories. It takes a formula as the main argument. The items to the left of ~ forms the rows while those to the right form the columns.By default, all the plots share the same scale in both X and Y axis. You can set them free by setting scales=‘free’ but this way it could be harder to compare between groups.
g <- ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme_bw() # apply bw theme
# Facet wrap with common scales
g + facet_wrap( ~ class, nrow=3) + labs(title="hwy vs displ", caption = "Source: mpg", subtitle="Ggplot2 - Faceting - Multiple plots in one figure") # Shared scales
# Facet wrap with free scales
g + facet_wrap( ~ class, scales = "free") + labs(title="hwy vs displ", caption = "Source: mpg", subtitle="Ggplot2 - Faceting - Multiple plots in one figure with free scales") # Scales free
So, What do you infer from this? For one, most 2 seater cars have higher engine displacement while the minivan and compact vehicles are on the lower side. This is evident from where the points are placed along the X-axis.
How to Change Plot background
# Base Plot
g <- ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme_bw() # apply bw theme
# Change Plot Background elements -----------------------------------
g + theme(panel.background = element_rect(fill = 'khaki'),
panel.grid.major = element_line(colour = "burlywood", size=1.5),
panel.grid.minor = element_line(colour = "tomato",
size=.25,
linetype = "dashed"),
panel.border = element_blank(),
axis.line.x = element_line(colour = "darkorange",
size=1.5,
lineend = "butt"),
axis.line.y = element_line(colour = "darkorange",
size=1.5)) +
labs(title="Modified Background",
subtitle="How to Change Major and Minor grid, Axis Lines, No Border")
# Change Plot Margins -----------------------------------------------
g + theme(plot.background=element_rect(fill="salmon"),
plot.margin = unit(c(2, 2, 1, 1), "cm")) + # top, right, bottom, left
labs(title="Modified Background", subtitle="How to Change Plot Margin")
With this we come to end of the basic tutorial on geoms, aesthetics, labels, titles colors and themes in ggplot2