histogram in R using base graphics

First read data from clipboard into an object in R:

# examscores=read.delim("clipboard")
# previous line commented out for demo

Or read data from a .csv file:

examscores=read.csv("P:\\Class\\KEEP DA\\admin\\datasets-masterfile\\examscoresDOS.csv")
# examscores=read.csv(file.choose())
# if use previous line, it will open a file browsing dialog box for you to look for the .csv file

If you want to see the data you just loaded, type the object name, examscores, at an R prompt:

# examscores
# previous line commented out, to avoid 54 lines of numbers in this document when printed!

(Full disclosure: the remainder of this document is cribbed from: http://www.r-bloggers.com/basics-of-histograms/)

To use base graphics to make a histogram of these data, type:

# hist(examscores$scores)
# previous line commented out, b/c we're going to do it again, with something added. Read on.

If you want to see how the hist() function binned and counted your data, you can save the histogram as an object in R and then look at the contents of that object:

histinfo=hist(examscores$scores)

plot of chunk unnamed-chunk-5

histinfo

## $breaks
## [1]  20  30  40  50  60  70  80  90 100
## 
## $counts
## [1]  1  0  1  2  7 20 18  5
## 
## $density
## [1] 0.001852 0.000000 0.001852 0.003704 0.012963 0.037037 0.033333 0.009259
## 
## $mids
## [1] 25 35 45 55 65 75 85 95
## 
## $xname
## [1] "examscores$scores"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

These data are already binned the way we want them (bin size=10, bin start=20), with the vertical axis displaying the count within each bin, rather than the probability or density. If we wanted to change any of these, we’d change values using the following labels (found in the help file for the hist() function):

hist(examscores$scores, breaks=c(15,25,35,45,55,65,75,85,95,105), freq=FALSE)

plot of chunk unnamed-chunk-6

You can also specify breaks using a function like seq(), which can create a sequence of equally-spaced values. Let’s use that to return the histogram to bin start = 20 (and also set the type of vertical axis back to frequency count):

hist(examscores$scores, breaks=seq(20,100,by=10), freq=TRUE)

plot of chunk unnamed-chunk-7

Now let’s see how to adjust the main title, axis labels, scale ranges, and fill color:

hist(examscores$scores, main="Distribution of Exam Scores in Astronomy 101", xlab="exam score (%)", xlim=c(0,100),  ylim=c(0, 22), col="lightgray")

plot of chunk mainhist

Now try to add horizontal grid lines and minor ticks:

hist(examscores$scores, main="Distribution of Exam Scores in Astronomy 101", xlab="exam score (%)", xlim=c(0,100),  ylim=c(0, 22), col="lightgray")
axis(4, labels=FALSE, col = "lightgray", lty=2, tck=1)
# labels=FALSE prevents the command from printing the numbers used to label ticks on left axis
# lty = makes line type = dashed (use 1 for solid)
# tck = 1 makes full gridlines at tick locations; default is -0.01, which just makes marks on axis scale
library(Hmisc)

## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

minor.tick(nx=2, ny=5, tick.ratio=0.5)

plot of chunk unnamed-chunk-8

histogram in R using base graphics

PCT

Friday, June 06, 2014