By the end of this lesson, you will be able to:
R has powerful plotting tools built directly into the base
language.
Today we will learn how to create two very common types of plots:
These skills form the foundation for data visualization in R.
These functions initiate a new graph in the plotting window. The specific function called depends on the class of the input data.
Examples include:
plot()hist()boxplot()barplot()Arguments can be passed to high-level functions to control appearance.
Common options include:
xlab, ylab – Specify labels for the x and
y axesmain – Sets the main title of the plottype – Character indicating the plot type
"p" for points"l" for lines"b" for bothcol – Specifies colorspch – Defines the plotting symbol (e.g.,
19 for filled circles)lwd – Controls line widthlty – Specifies line type (e.g., 1 =
solid, 2 = dashed)xlim, ylim – Define the limits for the x
and y axesThese functions add features to the current active plot without starting a new one.
points() – Adds pointslines() – Adds connected line segmentsabline() – Adds a straight line
abline(v = value) for vertical linesabline(h = value) for horizontal lineslegend() – Adds a legendtitle() – Adds main, subtitle, xlab, and ylab after the
plot has been createdpar()The par() function is used to set or query global
graphical parameters that affect all subsequent plots during an R
session.
We will begin using a built-in dataset called
mtcars.
Each row represents a car model.
Each column represents a variable such as miles per gallon, horsepower,
or weight.
df1 = read.csv("data-2_13_2.csv")
head(df1)
## first_name age gpa favorite_pet voted birthdate
## 1 Zachary 20 NA hamster 1 4/17/2004
## 2 Diane 18 2.1 dog 1 9/17/2006
## 3 Natasha 22 NA cat 0 9/13/2002
## 4 Wendy 19 2.9 parrot 0 7/1/2005
## 5 Ray 22 3.8 rabbit 0 5/6/2002
## 6 Kari 20 3.9 rabbit 0 9/20/2004
head(df1)
## first_name age gpa favorite_pet voted birthdate
## 1 Zachary 20 NA hamster 1 4/17/2004
## 2 Diane 18 2.1 dog 1 9/17/2006
## 3 Natasha 22 NA cat 0 9/13/2002
## 4 Wendy 19 2.9 parrot 0 7/1/2005
## 5 Ray 22 3.8 rabbit 0 5/6/2002
## 6 Kari 20 3.9 rabbit 0 9/20/2004
plot(df1$age, df1$voted)
In base R:
First variable → x-axis Second variable → y-axis
plot(mtcars$wt, mtcars$mpg)
plot(
mtcars$wt,
mtcars$mpg,
xlab = "Weight (1000 lbs)",
ylab = "Miles per Gallon",
main = "Fuel Efficiency vs Car Weight"
)
Useful options:
pch → shape col → color cex → size
plot(
mtcars$wt,
mtcars$mpg,
pch = 19,
col = "blue",
cex = 1.2,
xlab = "Weight (1000 lbs)",
ylab = "Miles per Gallon",
main = "Fuel Efficiency vs Car Weight"
)
Make a scatterplot with:
1.Horsepower (hp) on x-axis 2.Miles per gallon (mpg) on y-axis 3.Change color and point shape.
plot(
mtcars$wt,
mtcars$mpg,
pch = 11,
col = "cyan",
cex = 1.2,
xlab = "Horsepower",
ylab = "Miles per Gallon (mpg) ",
main = "Fuel Efficiency vs Car Weight"
)
Line plots connect points in order.
They are best for:
Time series Ordered data Showing trends
year <- c(2000, 2005, 2010, 2015, 2020)
population <- c(282, 295, 309, 321, 331)
plot(year, population, type = "b")
plot(year, population, type = "l")
points(year, population)
#type = "l" means line.
plot(
year,
population,
type = "l",
lwd = 2,
col = "darkgreen",
xlab = "Year",
ylab = "Population (millions)",
main = "Population Growth Over Time"
)
plot(
year,
population,
type = "b",
pch = 19,
col = "purple",
lwd = 2,
xlab = "Year",
ylab = "Population (millions)",
main = "Population Growth Over Time"
)
Use scatterplots when:
Use line plots when:
Create your own line plot using the example data below.
These data represent test scores over five weeks.
Make a line plot that:
week <- c(1, 2, 3, 4, 5)
score <- c(70, 75, 78, 85, 90)
plot(
week,
score,
type = "b",
pch = 19,
col = "blue",
lwd = 2,
xlab = "week",
ylab = "score",
main = "Grades Over Time"
)
abline(a = 71, b = 2, col = "red", lty = 2, lwd = 2)
For this homework, you will create:
You will also use at least ONE new base R plotting feature that we did not practice in class.
Choose at least ONE of the following new features (you can also find a different one if you’d like):
abline() → adds a trend linegrid() → adds background grid lineslegend() → adds a legend to your plot(You only need to choose ONE, but you may use more if you’d like.)
Use the mtcars dataset.
Create a scatterplot that:
abline(),
grid(), or legend())plot(
mtcars$wt,
mtcars$mpg,
pch = 12,
col = "red",
cex = 1.2,
xlab = "Horsepower",
ylab = "Miles per Gallon (mpg) ",
main = "Fuel Efficiency vs Car Weight"
)
grid()
Create your own small dataset (at least 5 points). Create a line plot that:
Shows both points and lines (type = “b”)
places <- c(1, 2, 3, 4, 5)
miles <- c(50, 55, 78, 82, 90)
plot(
places,
miles,
type = "b",
pch = 19,
col = "blue",
lwd = 2,
xlab = "Places",
ylab = "Miles",
main = "Distances By Routes"
)
abline(a = 71, b = 2, col = "red", lty = 2, lwd = 2)