Creating a histogram

To create a visual representation of an interval variable, we use a histogram. For your data project you’ll want to use the histogram() function, which is part of the lattice package. For the examples on this page, you’ll also want to require the openintro package.

require(lattice)
require(openintro)

The ageAtMar data set (part of the openintro package) contains the age at first marriage for a sample of 5,534 US women. Let’s look at the default histogram for this variable.

histogram(ageAtMar$age)

By looking at the label for the y-axis, we can infer that default histogram is a relative frequency histogram. The help documentation for the function tells us that we can specify three different types: “percent”, “count”, or “density”.

Frequency Histograms

Let’s create a frequency histogram first

histogram(ageAtMar$age, type = "count",  #Specify the variable and type of histogram
          main = "US Women and Marriage",  #Create a title for the chart
          xlab = "Age at first marriage in years",  #Label the x-axis. Include units!
          col = "gold2")

You can see from the graph that the mean age is somewhere near 23, since that is where the histogram will “balance”.

Relative Frequency Histograms

For a relative frequency histogram, you can use the default, or you can specify the type. So a relative frequency histogram for the same data is created as follows.

histogram(ageAtMar$age, type = "percent",  #Specify the variable and type of histogram
          main = "US Women and Marriage",  #Create a title for the chart
          xlab = "Age at first marriage in years",  #Label the x-axis. Include units!
          col = "darkorchid2")

Changing the bin widths

If you want to change the number of bins in the histogram there are a couple of ways to do this. The first is to specify the number of bins you’d like R to try using the breaks argument.

histogram(ageAtMar$age, type = "percent",  #Specify the variable and type of histogram
          main = "US Women and Marriage",  #Create a title for the chart
          xlab = "Age at first marriage in years",  #Label the x-axis. Include units!
          col = "darkslategray2",
          breaks = 25)

Notice that this did not give us exactly 25 bins. It’s more of a guideline. If you want the bins to be created at a specific set of points, you can specify them exactly using either a sequence or a vector.

mybins2 <- seq(10,50,2) #This creates a list of numbers that starts at 5 and goes to 50 in increments of 5
mybins10 <- c(0, 10, 20, 30, 40, 50) #This creates the list of numbers as shown.

Drawing the histograms

histogram(ageAtMar$age, type = "percent",  #Specify the variable and type of histogram
          main = "US Women and Marriage",  #Create a title for the chart
          xlab = "Age at first marriage in years",  #Label the x-axis. Include units!
          col = "darkslategray3",
          breaks = mybins2)

histogram(ageAtMar$age, type = "percent",  #Specify the variable and type of histogram
          main = "US Women and Marriage",  #Create a title for the chart
          xlab = "Age at first marriage in years",  #Label the x-axis. Include units!
          col = "darkslategray4",
          breaks = mybins10)