Introduction

This is a R markdown document, to show you some of the basics to develop graphics in R. If you want the data set or more information, look at my Githubpage.

Packages

library(car)
library(knitr)
library(foreign)
library(gplots)
library(psych)

Reading the data

We use a data set from the Department of Psychology of the Univerity of Colorado about Interests of people. You can get it also here. It is a spss data, so we need the package foreign to read the data set and to convert it to a data frame.

interest <- as.data.frame(read.spss("/home/matti/R-for-Beginners/Datasets/interest.sav"))
#file.choose() a good help

We have to change the label of the gender, because the plots will look better with it:

interest$gender  <- recode(interest$gender, '1="female" ; 2="male" ')

Graphs for categorial variables

In this chapter we look for nominally and ordinal scaled variables.

Barplot with absolute frequency

We will create a barplot for the variable “education”. Befor that we use the table() function to build a vector:

educ<-  table(interest$educ)

barplot(educ)

Barplot with relative frequencys

Relative frequencys show which percentage have some specific education. Just the y axis ist changing.

barplot(prop.table(educ))

barplot(prop.table(educ), horiz = T ,las=1)

To connect two variables, here education and gender:

barplot(table(interest$educ, interest$gender), legend=T)

barplot(table(interest$educ, interest$gender), legend=T,beside=T)

Pie chart

What is about some delicious pie?

pie(table(interest$educ))

Graphs for metric variables

We build a barplot with the geometry variable:

barplot(interest$geometry)

Now a histogram:

hist(interest$geometry)

Now a histogram with 10 breaks and a suggested normal distribution:

mean  <-  mean(interest$geometry, na.rm=T)
sd  <-  sd(interest$geometry, na.rm = T)
hist(interest$geometry, breaks=10)
curve(dnorm(x, mean, sd)*length(interest$geometry), add=T)

Sometimes it is difficult to interpretate a histogram, because it depends on the breaks. So we build a density plot with a continous line:

plot(density(interest$geometry, na.rm = T))
curve(dnorm(x, mean, sd), add=T, lty=2)

Boxplots

Boxplots are important for explorative data analysis and show characteristics of the distribution. Here we use the age:

boxplot(interest$age)

You can also create some Boxplots together:

boxplot(data.frame(interest$vocab, interest$reading, interest$sentcomp))

To show some Boxplots with informations about some groups (e.g. for women and men) you can use the ~ sign. With the identify() you can click on the Boxplots, for example to get to know extreme values.

boxplot(interest$age ~  interest$gender)

#identify(interest$gender, interest$age)

Barplot for means

We want to visualize the mean of education seperated for men and women. To do that we integrate the tapply() function in barplot():

barplot(tapply(interest$educ, interest$gender, mean, na.rm=T))

To integrate more means of variables we also use the tapply function:

barplot(tapply(interest$educ, data.frame(interest$gender, interest$age), mean, na.rm=T, na.omit=T), legend=T, xlab="age", ylab="education")

#for easier interpretations add: beside=TRUE

Error bars

To build an error bar, you can use the package gplots and the plotmeans() function. This function build confidence intervals and with p=0.95 we set it for the 95% confidence interval:

plotmeans(interest$educ~ interest$gender, p=0.95)

Cats-eye plot

If you want to compare the confidence intervalss of some variables use the error.bars() function of the psych package. Here we compare geometry and reading:

geo.read <- na.omit(data.frame(interest$geometry, interest$reading))
error.bars(geo.read, within=F,alpha=0.95, eyes=T)

To compare means of several groups use the error.bars.by() function. But you have to build data.frames:

educ <- as.data.frame(na.omit(interest$educ))
gender  <- as.data.frame(na.omit(interest$gender))
error.bars.by(educ, gender, by.var=T, ylim=c(11,13), legend=1, eyes=T, alpha=0.95)

Create own Error bars

To create your own error bars use the arrow() function and build arrows around the means of the barplot with a distance of the standard deviation. We do this for the first block in the barplot:

des.read  <-  describeBy(interest$reading,interest$gender, mat=T)
plot.read  <- barplot(des.read$mean, names.arg = des.read$group1)
arrows(plot.read[1], des.read$mean[1]-des.read$mean[1], plot.read[1], des.read$mean[1]+des.read$mean[1], code=3, angle=90)

To do this for the second or maybe more blocks in the barplot define the i in the following command. It allows you to get all columns in the table of des.read:

plot.read  <- barplot(des.read$mean, names.arg = des.read$group1)
for(i in 1:nrow(des.read)){
  arrows(plot.read[i], des.read$mean[i]-des.read$mean[i], 
         plot.read[i], des.read$mean[i]+des.read$mean[i], 
         code=3, angle=90)
}

Q-Q-Plot

When we are interested in a normal distribution histograms can`t help us much. Better are qqplots. The nearer the points are at the line the better is the normal distribution. qqnorm() is in the car package:

qqnorm(interest$reading)

qqPlot() is in the car package and shows you the confidence interval. If the point lays near the interval it is normal distributed:

qqPlot(interest$reading)

Scatterplot

To get the connection between two variables use scatterplots with two vectors:

plot(interest$reading, interest$geometry)

Adding a Lowess-curve

A Lowess curve is a nonparametric curve that rebuilds the connection between two variables, but we need varaibles without missing values:

read.geo <- na.omit(data.frame(interest$reading, interest$geometry))
plot(interest$reading, interest$geometry)
lines(lowess(read.geo[,1],read.geo[,2]))

Adding a regression line

To get a regression line use the abline() function combinated with a lm(). With identify() you can again search for extreme values:

plot(interest$reading, interest$geometry)
abline(lm(interest$reading~interest$geometry))

The plot() function

The output of the plot() function changes with the input. Here a table to visualise it:

plot-function Diagram
plot(vector) Index-diagram
plot(factor) Histogram
plot(vector,vector) scatterplot
plot(factor, vector) boxplot
plot(factor, factor) mosaic-diagram

Edit graphs

To change the standard setting use the par() function to visit the possibilities for changing something:

par()
help(par)

You can add more things in a plot() function. Look here:

Adding Change
pch=x type of points in a scatterplot
ces=x size of points or lettering
lty=x type of line
lwd=x size of the line
las=1 numbers on the y axis are shown horizontal
xlab=“x”, ylab=“x” the names for the axes
xlim=c(min, max), ylim=c(min,max) the scale of the axes
main=“text” heading
sub=“text” subheading
col=x the color (colors() gives a full list)

Adding elements

To add something, write it directly under the plot() function, than it will be added to this graph:

Command Element
abline() straight line
arrows() arrows
axis() lettering axes
curve(….,add=TRUE) functioncurves
legend() legend
lines() lines
points() points
symbols() symbols
text() text

To get more informations use the help(abline) function.