getwd() # print the current working directory
## [1] "C:/Users/noviyantisagala/OneDrive - Bina Nusantara University/WORK/Teaching Material/even21-22/DataMining&Visualization/Lab"
ls() # list the objects in the current workspace
## character(0)
#setwd(mydirectory) # change to mydirectory
setwd("C:/Users/noviyantisagala/Documents")
history() # display last 25 commands history(max.show=Inf) # display all previous commands loadhistory(file=“myfile”) # default is “.Rhistory”
savehistory(file=“myfile”) # default is “.Rhistory”
The base plotting system is the original plotting system for R. The basic model is sometimes referred to as the “artist’s palette” model. The idea is you start with blank canvas and build up from there.
In more R-specific terms, you typically start with plot
function (or similar plot creating function) to initiate a plot
and then annotate the plot with various annotation functions
(text, lines, points,
axis)
The base plotting system is often the most convenient plotting system to use because it mirrors how we sometimes think of building plots and analyzing data. If we don’t have a completely well-formed idea of how we want to look at some data, often we’ll start by “throwing some data on the page” and then slowly add more information to it as our thought process evolves.
The core plotting and graphics engine in R is encapsulated in the following packages:
graphics: contains plotting functions for the “base”
graphing systems, including plot, hist,
boxplot and many others.
grDevices: contains all the code implementing the
various graphics devices, including X11, PDF, PostScript, PNG,
etc.
The grDevices package contains the functionality for
sending plots to various output devices. The graphics
package contains the code for actually constructing and annotating
plots.
## Create the plot / draw canvas
with(cars, plot(speed, dist))
The downside of the base plotting system is that it’s difficult to
describe or translate a plot to others because there’s no clear
graphical language or grammar that can be used to communicate what
you’ve done. The only real way to describe what you’ve done in a base
plot is to just list the series of commands/functions that you’ve
executed, which is not a particularly compact way of communicating
things. This is one problem that the ggplot2 package
attempts to address.
Base graphics are used most commonly and are a very powerful system for creating data graphics. There are two phases to creating a base plot:
Calling plot(x, y) or hist(x) will launch a
graphics device (if one is not already open) and draw a new plot on the
device. If the arguments to plot are not of some special
class, then the default method for plot is called;
this function has many arguments, letting you set the title, x
axis label, y axis label, etc.
The base graphics system has many global parameters that can
set and tweaked. These parameters are documented in ?par
and are used to control the global behavior of plots, such as the
margins, axis orientation, and other details. It wouldn’t hurt to try to
memorize at least part of this help page!
Another typical base plot is constructed with the following code.
data(cars)
## Create the plot / draw canvas
with(cars, plot(speed, dist))
## Add annotation
title("Speed vs. Stopping distance")
Base plot with title
Here is an example of a simple histogram made using the
hist() function in the graphics package. If
you run this code and your graphics window is not already open, it
should open once you call the hist() function.
library(datasets)
## Draw a new plot on the screen device
hist(airquality$Ozone)
Ozone levels in New York City
Boxplots can be made in R using the boxplot() function,
which takes as its first argument a formula. The formula has
form of y-axis ~ x-axis. Anytime you see a ~
in R, it’s a formula. Here, we are plotting ozone levels in New York
by month, and the right hand side of the ~
indicate the month variable. However, we first have to transform the
month variable in to a factor before we can pass it to
boxplot(), or else boxplot() will treat the
month variable as continuous.
airquality <- transform(airquality, Month = factor(Month))
boxplot(Ozone ~ Month, airquality, xlab = "Month", ylab = "Ozone (ppb)")
Ozone levels by month in New York City
Each boxplot shows the median, 25th and 75th percentiles of the data (the “box”), as well as +/- 1.5 times the interquartile range (IQR) of the data (the “whiskers”). Any data points beyond 1.5 times the IQR of the data are indicated separately with circles.
In this case the monthly boxplots show some interesting features. First, the levels of ozone tend to be highest in July and August. Second, the variability of ozone is also highest in July and August. This phenomenon is common with environmental data where the mean and the variance are often related to each other.
Here is a simple scatterplot made with the plot()
function.
with(airquality, plot(Wind, Ozone))
Scatterplot of wind and ozone in New York City
Generally, the plot() function takes two vectors of
numbers: one for the x-axis coordinates and one for the y-axis
coordinates. However, plot() is what’s called a generic
function in R, which means its behavior can change depending on
what kinds of data are passed to the function.
One thing to note here is that although we did not provide labels for the x- and the y-axis, labels were automatically created from the names of the variables (i.e. “Wind” and “Ozone”). This can be useful when you are making plots quickly, but it demands that you have useful descriptive names for the your variables and R objects.
Many base plotting functions share a set of global parameters. Here are a few key ones:
pch: the plotting symbol (default is open circle)lty: the line type (default is solid line), can be
dashed, dotted, etc.lwd: the line width, specified as an integer
multiplecol: the plotting color, specified as a number, string,
or hex code; the colors() function gives you a vector of
colors by namexlab: character string for the x-axis labelylab: character string for the y-axis labelThe par() function is used to specify the
global graphics parameters that affect all plots in an R
session. These parameters can be overridden when they are specified as
arguments to specific plotting functions.
las: the orientation of the axis labels on the
plotbg: the background colormar: the margin sizeoma: the outer margin size (default is 0 for all
sides)mfrow: number of plots per row, column (plots are
filled row-wise)mfcol: number of plots per row, column (plots are
filled column-wise)You can see the default values for global graphics parameters by
calling the par() function and passing the name of the
parameter in quotes.
par("lty")
## [1] "solid"
par("col")
## [1] "black"
par("pch")
## [1] 1
Here are some more default values for global graphics parameters.
par("bg")
## [1] "white"
par("mar")
## [1] 5.1 4.1 4.1 2.1
par("mfrow")
## [1] 1 1
For the most part, you usually don’t have to modify these when making quick plots. However, you might need to tweak them for finalizing finished plots.
The most basic base plotting function is plot(). The
plot() function makes a scatterplot, or other type of plot
depending on the class of the object being plotted. Calling
plot() will draw a plot on the screen device (and open the
screen device if not already open). After that, annotation functions can
be called to add to the already-made plot.
Some key annotation functions are
lines: add lines to a plot, given a vector of
x values and a corresponding vector of y
values (or a 2-column matrix); this function just connects the dotspoints: add points to a plottext: add text labels to a plot using specified x, y
coordinatestitle: add annotations to x, y axis labels, title,
subtitle, outer marginmtext: add arbitrary text to the margins (inner or
outer) of the plotaxis: adding axis ticks/labelsHere’s an example of creating a base plot and the adding some
annotation. First we make the plot with the plot() function
and then add a title to the top of the plot with the
title() function.
library(datasets)
## Make the initial plot
with(airquality, plot(Wind, Ozone))
## Add a title
title(main = "Ozone and Wind in New York City")
Base plot with annotation
Here, I start with the same plot as above (although I add the title
right away using the main argument to plot())
and then annotate it by coloring blue the data points corresponding to
the month of May.
with(airquality, plot(Wind, Ozone, main = "Ozone and Wind in New York City"))
with(subset(airquality, Month == 5), points(Wind, Ozone, col = "blue"))
Base plot with annotation
The following plot colors the data points for the month of May blue and colors all of the other points red.
Notice that when constructing the initial plot, we use the option
type = "n" in the call to plot(). This is a
common paradigm as plot() will draw everything in the plot
except for the data points inside the plot window. Then you can use
annotation functions like points() to add data points. So
here, we create the plot without drawing the data points, then add the
blue points and then add the red points. Finally, we add a legend with
the legend() function explaining the meaning of the
different colors in the plot.
with(airquality, plot(Wind, Ozone, main = "Ozone and Wind in New York City", type = "n"))
with(subset(airquality, Month == 5), points(Wind, Ozone, col = "blue"))
with(subset(airquality, Month != 5), points(Wind, Ozone, col = "red"))
legend("topright", pch = 1, col = c("blue", "red"), legend = c("May", "Other Months"))
Base plot with multiple annotations
Plot Titles Plot titles can be specified either directly to the plotting functions during the plot creation or by using the title() function (to add titles on an existing plot).
# Add titles
barplot(c(2,5), main="Main title",
xlab="X axis title",
ylab="Y axis title",
sub="Sub-title",
col.main="red", col.lab="blue", col.sub="black")
# Increase the size of titles
barplot(c(2,5), main="Main title",
xlab="X axis title",
ylab="Y axis title",
sub="Sub-title",
cex.main=2, cex.lab=1.7, cex.sub=1.2)
The legend() function can be used. A simplified format is :
x and y : the co-ordinates to be used for the legend. Keywords can also be used for x : “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”. legend : the text of the legend col : colors of lines and points beside the text for legends
legend(x, y=NULL, legend, col)
# Generate some data
x<-1:10; y1=x*x; y2=2*y1
# First line plot
plot(x, y1, type="b", pch=19, col="red", xlab="x", ylab="y")
# Add a second line
lines(x, y2, pch=18, col="blue", type="b", lty=2)
# Add legends
legend("topleft", legend=c("Line 1", "Line 2"),
col=c("red", "blue"), lty=1:2, cex=0.8)
Add texts within the graph Add text in the margins of the graph Add mathematical annotation to a plot
To add a text to a plot in R, the text() function [to draw a text inside the plotting area] and mtext()[to put a text in one of the four margins of the plot] function can be used.
A simplified format for text() is :
text(x, y, labels)
x and y are the coordinates of the texts labels : vector of texts to be drawn
plot(cars[1:10,], pch=19)
text(cars[1:10,], row.names(cars[1:10,]),
cex=0.65, pos=1,col="red")
Add a vertical line Add an horizontal line Add regression line
The R function abline() can be used to add straight lines (vertical, horizontal or regression lines) to a graph.
A simplified format is :
abline(a=NULL, b=NULL, h=NULL, v=NULL, ...)
a, b : single values specifying the intercept and the slope of the line h : the y-value(s) for horizontal line(s) v : the x-value(s) for vertical line(s)
# Add horizontal and vertical lines
#++++++++++++++++++++++++++++++++++
plot(cars, pch=19)
abline(v=15, col="blue") # Add vertical line
# Add horizontal line, change line color, size and type
abline(h=60, col="red", lty=2, lwd=3)
# Fit regression line
#++++++++++++++++++++++++++++++++++
require(stats)
reg<-lm(dist ~ speed, data = cars)
coeff=coefficients(reg)
# equation of the regression line :
eq = paste0("y = ", round(coeff[2],1), "*x ", round(coeff[1],1))
plot(cars, main=eq, pch=18)
abline(reg, col="blue", lwd=2)
Paul Murrell (2011). R Graphics, CRC Press.
Hadley Wickham (2009). ggplot2, Springer.
Deepayan Sarkar (2008). Lattice: Multivariate Data Visualization with R, Springer.