More on basic graphs in R

Experimenting with histograms

Let’s create a vector that we will use for experimenting with graphics:

x <- c(2, 8, 10, 1, 2, 12, 10, 7, 7, 7, 8,
       3, 3, 3, 3, 5, 6, 5)

First, let’s plot a basic histogram and change its color:

hist(x, col = "deepskyblue")

Multiple colors are usually used for bar charts, for plots of categorical variables instead of quantitative ones. One color corresponds to one category, so different categories are represented by bars in different colors. In histograms all bins stand for the same, for absolute frequencies. That’s why, histograms are never multicolored.

We can change a border color as well:

# make a border darkblue 
hist(x, col = "deepskyblue", border = "darkblue")

Add a title:

hist(x, col = "deepskyblue", main = "My histogram")

We can specify the limits of a horizontal or of a vertical axis in graphs. For example, we can restrict x-axis to include values from 5 to 15:

# xlim - x-axis
hist(x, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))

The same can be done for y-axis, let’s set it from 0 to 8:

# y-lim - y-axis
hist(x, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15),
     ylim = c(0, 8))

When setting these limits can be useful? When we want to compare to distributions. To compare several distributions, evaluate ranges of values or variances of variables, it is more convenient to plot different histograms with the same values by x-axis:

par(mfrow=c(1, 2))
hist(x, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))
hist(log(x**2), col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))

Note. The line par(mfrow=c(1, 2)) is used for arranging several graphs in rows or columns. Here we want to have one row of graphs and two columns, so we type c(1, 2). As usual, a number of rows goes first and then goes a number of columns. Compare the code above with the following code for two graphs per row and two graphs per column:

par(mfrow=c(2, 2))
hist(x, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))
hist(log(x**2), col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))
hist(x-2, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))
hist(log(x**2)-2, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15))

We can change the text for the x-axis:

# xlab - text for x-axis
# ylab - the same for y-axis (if needed)
hist(x, col = "deepskyblue", 
     main = "My histogram",
     xlim = c(-5, 15),
     ylim = c(0, 8),
     xlab = "Sample")

Some special arguments for histograms

At the first seminar we discussed that by default R uses left-open intervals for arranging values to bins, like this: (a, b]. So, if we have two intervals (5, 7] and (7, 9], a value of 7 will go to the former, not to the latter. However, if we want to change this arrangement, we can add the option right=FALSE and get right-open intervals instead: [a, b).

Let’s compare two histograms for the same data, but with different intervals:

par(mfrow=c(1, 2))
hist(x, col = "deepskyblue", 
     main = "My histogram")
hist(x, col = "deepskyblue", 
     main = "My histogram",
     right = FALSE)

If you are interested in values that lie behind a histogram, you can ask for them instead of plotting a graph:

# plot=FALSE - get graph as a list of values
hist(x, plot = FALSE)
## $breaks
## [1]  0  2  4  6  8 10 12
## 
## $counts
## [1] 3 4 3 5 2 1
## 
## $density
## [1] 0.08333333 0.11111111 0.08333333 0.13888889 0.05555556 0.02777778
## 
## $mids
## [1]  1  3  5  7  9 11
## 
## $xname
## [1] "x"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

This option is not special, it can be used with any type of graph, not only for histograms. However, for different types of graph, it will return different information. For histograms it returns:

  • breaks: boarders of bins (here a bin from 0 to 2, from 2 to 4, etc.)
  • counts: absolute frequencies for each bin
  • density: values of a density function
  • mids: middle values of bins (here 1 is the middle of (0, 2], etc.)
  • xname: label of x-axis (text)
  • equidist: whether widths of bins are equal

To access a particular element, we can use $:

# get counts
hist(x, plot = FALSE)$counts
## [1] 3 4 3 5 2 1