Introduction

In this lesson, we consider ways to organize and summarize one categorical variable. This allows us to see the distribution of the variable. The distribution is the values that the variable can take on along with how often they occur.

Construct a frequency table

A frequency table lists each category of data and the frequencies (count) for each category of data.

Example 1: A company has determined that there are seven possible defects (A,B,C,D,E,F,G) for its product lines. A random sample of 20 defects found in the past quarter were taken with the following results:
A, A, B, C, D, D, E, A, F, G, G, A, E, A, A, A, A, A, A, G
  • What are the observational units in this example?
  • What are the variable(s) and are they categorical or quantitative?
  • Construct a frequency table for the type of defects.

  • The observational units are the production lines
  • The variable is type of defect which is categorical
  • To construct a frequency table, we create a list of the defects (categories) and count each occurrence to determine the frequency.
Defect Frequency
A \(10\)
B \(1\)
C \(1\)
D \(2\)
E \(2\)
F \(1\)
G \(3\)
Total \(20\)

It’s a good idea to add up the frequency column to make sure that it sums to the number of observations. In the case of the above example, the frequency column sums to 20, as it should.

To construct a frequency table in R, we first need to enter the data

defect <- c("A","A","B","C","D","D","E","A","F","G","G","A","E","A","A","A","A","A","A","G")

Next we can use the table command. It is convenient to save the frequency table in an object, let’s call it defect.freq.

> defect.freq <- table(defect)
> defect.freq

defect
 A  B  C  D  E  F  G 
10  1  1  2  2  1  3 

Construct a relative frequency table

Often, rather than being concerned with the frequency with which categories occur, we want to know the relative frequency (proportion) of the categories. The relative frequency is found by dividing the frequency by the total number of observations.

Example 2: Construct a relative frequency table of the defect data.

To construct a relative frequency table, we divide each of the frequencies by the number of observations.

Defect Relative Frequency
A \(10/20=0.50\)
B \(1/20=0.05\)
C \(1/20=0.05\)
D \(2/20=0.10\)
E \(2/20=0.10\)
F \(1/20=0.05\)
G \(3/20=0.15\)
Total 1

It’s a good idea to add up the relative frequency column to make sure that it sums to 1. In this example, the relative frequency column sums to 1, as it should.

In R, we can find the relative frequency by using the length command which gives us the number of observations

> length(defect)

[1] 20

It is convenient to save the relative frequency table in an object, let’s call it defect.relfreq.

> defect.relfreq <- defect.freq/length(defect)
> defect.relfreq

defect
   A    B    C    D    E    F    G 
0.50 0.05 0.05 0.10 0.10 0.05 0.15 

Construct a barchart

A barchart puts the categories on the horizontal axis and the frequency (or relative frequency) on the vertical axis. Rectangles of equal width are placed over each category with height equal to the corresponding frequency (or relative frequency). In Example 1, we would put a rectangle of height 10 over the category A since the frequency in that category is equal to 10.

Example 3: Construct a barchart of the data in Example 1

In R, we can construct a barchart using the barplot command on our frequency table or on our relative frequency table.

> barplot(defect.freq,xlab="Defect Type",ylab="Frequency",main="Barplot Example")

> barplot(defect.relfreq,xlab="Defect Type",ylab="Relative Frequency",main="Barplot Example")

The main= subcommand sets the main title for the barchart. The xlab= subcommand sets the label for the x axis and the ylab= subcommand sets the label for the y axis.

Construct a piechart

A pie chart is a circle divided into slices. The slice size for each category is equal to 360° multiplied by the relative frequency of that category.

Example 4: Construct a piechart of the data in Example 1

In R, we can construct a piechart using the pie command on our frequency table.

> pie(defect.freq,main="Pie Chart Example")

The main= subcommand sets the main title for the piechart.

What to look for

In all of the tables and graphs above, the main thing to look for is the most frequently occurring and the least frequently occurring categories. For example, in the defect data we can see that the most common type of defect is type A and the least common types are types B, C and F.

Another example

Example 5: An experiment was conducted to measure and compare the effectiveness of various feed supplements on the weight of chickens. The data is stored in a file called chickwts. Here are the first five observations in the file:

  weight      feed
1    179 horsebean
2    160 horsebean
3    136 horsebean
4    227 horsebean
5    217 horsebean
6    168 horsebean
  • What are the observational units in this study?
  • What are the variables and are they quantitative or categorical?
  • Create a frequency table, a relative frequency table, a barplot and a pie chart for type of feed in this study
  • Write a short summary of type of feed.

Answer:

  • The observational units are the chicks.
  • The two variables are weight (quantitative) and type of feed (categorical).
  • First, create the frequency table. Recall the syntax we use: datasetname$variablename.

> feed.freq <- table(chickwts$feed)
> feed.freq


   casein horsebean   linseed  meatmeal   soybean sunflower 
       12        10        12        11        14        12 

Next, create the relative frequency table

> feed.relfreq <- table(chickwts$feed)/length(chickwts$feed)
> feed.relfreq


   casein horsebean   linseed  meatmeal   soybean sunflower 
0.1690141 0.1408451 0.1690141 0.1549296 0.1971831 0.1690141 

Create the barchart

> barplot(feed.freq,xlab="Type of Feed",ylab="Frequency",main="Barchart for Chickwts Data")

Create the piechart

> pie(feed.freq,main="Piechart for Chickwts Data")

  • The chickwts study used five different types of feed: casein, horsebean, linseed, meatmeal, soybean, sunflower. The number of chicks assigned to each type of feed are all about the same with the most chicks (14) assigned to soybean feed and the least chicks (10) assigned to horsebean feed.