In this lesson, we consider ways to organize and summarize one categorical variable. This allows us to see the distribution of the variable. The distribution is the values that the variable can take on along with how often they occur.
A frequency table lists each category of data and the frequencies (count) for each category of data.
| Defect | Frequency |
|---|---|
| A | \(10\) |
| B | \(1\) |
| C | \(1\) |
| D | \(2\) |
| E | \(2\) |
| F | \(1\) |
| G | \(3\) |
| Total | \(20\) |
It’s a good idea to add up the frequency column to make sure that it sums to the number of observations. In the case of the above example, the frequency column sums to 20, as it should.
To construct a frequency table in R, we first need to enter the data
defect <- c("A","A","B","C","D","D","E","A","F","G","G","A","E","A","A","A","A","A","A","G")
Next we can use the table command. It is convenient to save the frequency table in an object, let’s call it defect.freq.
> defect.freq <- table(defect)
> defect.freq
defect
A B C D E F G
10 1 1 2 2 1 3
Often, rather than being concerned with the frequency with which categories occur, we want to know the relative frequency (proportion) of the categories. The relative frequency is found by dividing the frequency by the total number of observations.
Example 2: Construct a relative frequency table of the defect data.
Click For AnswerTo construct a relative frequency table, we divide each of the frequencies by the number of observations.
| Defect | Relative Frequency |
|---|---|
| A | \(10/20=0.50\) |
| B | \(1/20=0.05\) |
| C | \(1/20=0.05\) |
| D | \(2/20=0.10\) |
| E | \(2/20=0.10\) |
| F | \(1/20=0.05\) |
| G | \(3/20=0.15\) |
| Total | 1 |
It’s a good idea to add up the relative frequency column to make sure that it sums to 1. In this example, the relative frequency column sums to 1, as it should.
In R, we can find the relative frequency by using the length command which gives us the number of observations
> length(defect)
[1] 20
It is convenient to save the relative frequency table in an object, let’s call it defect.relfreq.
> defect.relfreq <- defect.freq/length(defect)
> defect.relfreq
defect
A B C D E F G
0.50 0.05 0.05 0.10 0.10 0.05 0.15
A barchart puts the categories on the horizontal axis and the frequency (or relative frequency) on the vertical axis. Rectangles of equal width are placed over each category with height equal to the corresponding frequency (or relative frequency). In Example 1, we would put a rectangle of height 10 over the category A since the frequency in that category is equal to 10.
Example 3: Construct a barchart of the data in Example 1
Click For AnswerIn R, we can construct a barchart using the barplot command on our frequency table or on our relative frequency table.
> barplot(defect.freq,xlab="Defect Type",ylab="Frequency",main="Barplot Example")
> barplot(defect.relfreq,xlab="Defect Type",ylab="Relative Frequency",main="Barplot Example")
The main= subcommand sets the main title for the barchart. The xlab= subcommand sets the label for the x axis and the ylab= subcommand sets the label for the y axis.
A pie chart is a circle divided into slices. The slice size for each category is equal to 360° multiplied by the relative frequency of that category.
Example 4: Construct a piechart of the data in Example 1
Click For AnswerIn R, we can construct a piechart using the pie command on our frequency table.
> pie(defect.freq,main="Pie Chart Example")
The main= subcommand sets the main title for the piechart.
In all of the tables and graphs above, the main thing to look for is the most frequently occurring and the least frequently occurring categories. For example, in the defect data we can see that the most common type of defect is type A and the least common types are types B, C and F.
Example 5: An experiment was conducted to measure and compare the effectiveness of various feed supplements on the weight of chickens. The data is stored in a file called chickwts. Here are the first five observations in the file:
weight feed
1 179 horsebean
2 160 horsebean
3 136 horsebean
4 227 horsebean
5 217 horsebean
6 168 horsebean
Answer:
> feed.freq <- table(chickwts$feed)
> feed.freq
casein horsebean linseed meatmeal soybean sunflower
12 10 12 11 14 12
Next, create the relative frequency table
> feed.relfreq <- table(chickwts$feed)/length(chickwts$feed)
> feed.relfreq
casein horsebean linseed meatmeal soybean sunflower
0.1690141 0.1408451 0.1690141 0.1549296 0.1971831 0.1690141
Create the barchart
> barplot(feed.freq,xlab="Type of Feed",ylab="Frequency",main="Barchart for Chickwts Data")
Create the piechart
> pie(feed.freq,main="Piechart for Chickwts Data")