Packages often used are MASS and ggplot2, and in 3142 we also use ISLR
## Warning: package 'ISLR' was built under R version 3.6.3
Base R plot examples
Bar plot Example:
Line Graph Example:
obs_line <- c(rnorm(500, 3, 1), rnorm(500,3,3))+(1:1000)/100
plot(1:length(obs_line),obs_line,"l", xlab = "Time", ylab = "Value")
Scatter plot Example:
Boxplot Example:
Further Line Graph Example:
plot(longley$Year, longley$Armed.Forces, type="l", lwd = 2, main="Armed Forces and Unemployment (1947-1962)", sub="Exploratory analysis",
xlab="Year", ylab="Number", ylim=c(0,600))
points(longley$Year, longley$Unemployed, type="l", lwd = 2, col="red") #Same as plot but does not create an entirely new plot but rather adds onto the original plot
legend('bottomright', legend=c("Armed Forces","Unemployed"), col=c("black","red"), lty=c(1,1))
Curve Function Example:
Advanced curve example:
set.seed(1)
hist(rnorm(10000, -3, 1), breaks = 50, prob = T, col = "lightblue", xlab = "", xlim = c(-8, 8), main = "Two empirical distributions")
hist(rnorm(10000, 3, 1), breaks = 50, prob = T, col = "blue", add =T) #add=T makes it add ontop of the created plot
curve(dnorm(x, -3, 1), lwd = 3, add = T, col = "red")
curve(dnorm(x, 3, 1), lwd = 3, add = T, col = "red")
legend("topright", ncol = 1, cex = 1.5, legend=c("Sample 1", "Sample 2"), fill=c("lightblue", "blue"))
Graphical windows example
# Split the window to have 3 lines and 2 columns of plots (to be filled by column)
par(mfcol=c(3,2))
# First plot
plot(1:3,1:3,col="red",type="l", xlab= "", ylab= "")
plot(1:5,1:5,col="orange",type="p", xlab= "", ylab= "")
plot(1:7,1:7,col="blue",type="b", xlab= "", ylab= "")
plot(ts(rnorm(100)),col="purple", xlab= "", ylab= "")
curve(sqrt(x^2 - x^4), from = -1, to = 1, lwd = 2, xlab= "", ylab= "")
hist(rexp(100,1/2), main = "", xlab= "", ylab= "")
You can add lines to a plot manually with the abline function. It can be in the form \(y=a+bx\) or in terms of horizontal and vertical location:
plot(0,0,"n", xlab="", ylab="")
abline(h=0.5, v=0, col = "purple", lwd = 4)
abline(a=1, b=1, col = "orange", lwd = 4)
Can add lines through the use of segments which join set coordinates together:
# create an empty plot
plot(0,0,"n", xlab="", ylab="")
# add a segment joining two points
segments(x0=-0.5, y0=0, x1=1, y1=-0.5, col = "red", lwd = 4)
# add two segments, joining three points: first argument is the 'x' values, second is the 'y' values
lines(c(-1,0,1), c(-1,0.2,1), col = "blue", lwd = 4)
Contour Plots can be used to create more sophisticated plots. They can be used to represent three dimensional data, like a topographical map. The contour() function takes a x vector, a y vector and a matrix whose elements respond to the z value for each pair of x,y coordinations
x = seq(-pi, pi, length=50)
y = x
f = outer(x,y, function(x,y) cos(y)/(1+x^2))
contour(x,y,f)
contour(x,y,f,nlevels=45,add = T)
The pairs() function can be used to create a scatterplot matrix. This can also be done for just a subset of variables
ggplot2 Graphs
ggplot is a layering system, where you add layers of visulation ontop of the prior * Needs data to be in a data frame * Use aes() in the first ggplot statement if is the same for all plots * aes stands for aesthetic and is telling R what we want to see on the graph Examples:
data <- as.data.frame(Seatbelts)
ggplot(data) +
geom_histogram(aes(drivers), bins = 10) +
labs(x = "monthly car drivers killed or seriously injured",
title = "Histogram of UK - Jan 1969 to Dec 1984")
data$law <- factor(data$law, labels = c("0", "1"))
ggplot(data) +
geom_boxplot(aes(x = law, y = drivers)) +
labs(x = "Mandatory Seatbelts: NO(0) or YES(1)",
y = "Deaths & Serious Injuries")
Can use stat functions with ggplot aswell, which allows the plotting of a function:
ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
stat_function(fun = dnorm, n = 100, args = list(mean = 0, sd = 1))
ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
xlim(c(0, 0.1)) +
ylim(c(0, 500000)) +
theme(text = element_text(size=20)) #make axis bigger
## Warning: Removed 15 rows containing missing values (geom_point).