To create a visual representation of an interval variable, we use a histogram. For your data project you’ll want to use the histogram()
function, which is part of the lattice
package. For the examples on this page, you’ll also want to require the openintro
package.
require(lattice)
require(openintro)
The ageAtMar
data set (part of the openintro
package) contains the age at first marriage for a sample of 5,534 US women. Let’s look at the default histogram for this variable.
histogram(ageAtMar$age)
By looking at the label for the y-axis, we can infer that default histogram is a relative frequency histogram. The help documentation for the function tells us that we can specify three different types: “percent”, “count”, or “density”.
Let’s create a frequency histogram first
histogram(ageAtMar$age, type = "count", #Specify the variable and type of histogram
main = "US Women and Marriage", #Create a title for the chart
xlab = "Age at first marriage in years", #Label the x-axis. Include units!
col = "gold2")
You can see from the graph that the mean age is somewhere near 23, since that is where the histogram will “balance”.
For a relative frequency histogram, you can use the default, or you can specify the type. So a relative frequency histogram for the same data is created as follows.
histogram(ageAtMar$age, type = "percent", #Specify the variable and type of histogram
main = "US Women and Marriage", #Create a title for the chart
xlab = "Age at first marriage in years", #Label the x-axis. Include units!
col = "darkorchid2")
If you want to change the number of bins in the histogram there are a couple of ways to do this. The first is to specify the number of bins you’d like R to try using the breaks
argument.
histogram(ageAtMar$age, type = "percent", #Specify the variable and type of histogram
main = "US Women and Marriage", #Create a title for the chart
xlab = "Age at first marriage in years", #Label the x-axis. Include units!
col = "darkslategray2",
breaks = 25)
Notice that this did not give us exactly 25 bins. It’s more of a guideline. If you want the bins to be created at a specific set of points, you can specify them exactly using either a sequence or a vector.
mybins2 <- seq(10,50,2) #This creates a list of numbers that starts at 5 and goes to 50 in increments of 5
mybins10 <- c(0, 10, 20, 30, 40, 50) #This creates the list of numbers as shown.
Drawing the histograms
histogram(ageAtMar$age, type = "percent", #Specify the variable and type of histogram
main = "US Women and Marriage", #Create a title for the chart
xlab = "Age at first marriage in years", #Label the x-axis. Include units!
col = "darkslategray3",
breaks = mybins2)
histogram(ageAtMar$age, type = "percent", #Specify the variable and type of histogram
main = "US Women and Marriage", #Create a title for the chart
xlab = "Age at first marriage in years", #Label the x-axis. Include units!
col = "darkslategray4",
breaks = mybins10)