R Notes: Scatter plots

Preparation

Import data.

library(sm)
attach(mtcars)
data = as.data.frame(na.omit(mtcars))
names(data)

##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

Simple scatter plot

Produce a simple scatter plot.

plot(data$wt, data$mpg, xlab="Car Weight", ylab="Miles Per Gallon", 
     pch=19)

Add text within the graph.

plot(data$wt, data$mpg, xlab="Car Weight", ylab="Miles Per Gallon", 
     pch=19)
text(data$wt, data$mpg, row.names(data), cex=0.5, pos=4, col="red")

Add fit lines.

plot(data$wt, data$mpg, xlab="Car Weight", ylab="Miles Per Gallon", 
     pch=19)
abline(lm(data$mpg ~ data$wt), col="red")  # regression line (y~x) 
lines(lowess(data$wt, data$mpg), col="blue")  # lowess line (x,y)

Scatterplot matrices

Produce a basic scatterplot matrix.

pairs(~mpg + disp + drat + wt, data=data)

Produce a scatterplot matrix using the lattice package.

The lattice package provides options to condition the scatterplot matrix on a factor.

library(lattice)

super.sym = trellis.par.get("superpose.symbol")
splom(data[c(1,3,5,6)], groups=data$cyl, data=data, 
      panel=panel.superpose, 
      key=list(columns=3,points=list(pch=super.sym$pch[1:3],col=super.sym$col[1:3]),text=list(c("4 Cylinder","6 Cylinder","8 Cylinder"))))

Produce a scatterplot matrix using the glus package.

The gclus package provides options to rearrange the variables so that those with higher correlations are closer to the principal diagonal. It can also color code the cells to reflect the size of the correlations.

library(gclus)

dta = data[c(1,3,5,6)]  # get data 
dta.r = abs(cor(dta))  # get correlations
dta.col = dmat.color(dta.r)  # get colors
# reorder variables so those with highest correlation are closest to the
# diagonal
dta.o = order.single(dta.r)
cpairs(dta, dta.o, panel.colors=dta.col, gap=0.5)