Scenario

A local cafe recorded the drink orders of their top 20 customers. We’ll treat these 20 orders as a tiny sample and write a quick, readable summary for a manager who just wants the gist: Are people ordering hot or iced? and Which sizes are most common?

We will keep things intro-level—just one variable at a time, using frequency tables, relative frequencies, and basic bar charts.

The Data

We have two categorical variables: CoffeeTemp comes as either “hot” or “iced”, and CoffeeSize comes as “S”, “M”, or “L” (for small, medium, or large).

CoffeeTemp <- c("Hot","Hot","Hot","Iced","Hot","Iced","Hot","Iced","Iced","Iced","Iced","Iced","Hot","Hot","Hot","Iced","Iced","Hot","Hot","Iced")
CoffeeSize <- c("S","S","S","S","M","M","M","M","L","L","L","L","S","S","M","L","L","L","S","L")

These are simply the recorded categories for each of the 20 orders. No prices, add-ons, or what-not; just temperature and size.

Coffee Temperature

We start by counting how many Hot vs Iced orders there were and then convert those counts to relative frequencies (proportions).

table(CoffeeTemp)
## CoffeeTemp
##  Hot Iced 
##   10   10
prop.table(table(CoffeeTemp))
## CoffeeTemp
##  Hot Iced 
##  0.5  0.5

In this small snapshot, hot and iced are split evenly (10 Hot, 10 Iced). That means about 50% of orders are hot and 50% are iced. For a manager, this suggests stocking for both options comparably—at least among the most frequent customers.

Coffee Size

Next, we look at cup sizes by themselves.

table(CoffeeSize)
## CoffeeSize
## L M S 
## 8 5 7
prop.table(table(CoffeeSize))
## CoffeeSize
##    L    M    S 
## 0.40 0.25 0.35

Large appears the most common (8 out of 20 orders), followed by Small (7 out of 20 orders). Medium is the least common (5 out of 20 orders). We display this information in the following bar plot:

# Reformating CoffeeSize as a "data frame"

CoffeeDataSet <- data.frame(CoffeeSize = factor(CoffeeSize, levels = c("S","M","L")))

# Bar plot of counts
ggplot(CoffeeDataSet, aes(x = CoffeeSize))+
  geom_bar()+
  scale_x_discrete(labels = c(S = "Small", M = "Medium", L = "Large")) +
  labs(x = "Coffee size", y = "Count") +
  theme_minimal()

In Conclusion

Going on this data alone, we see that orders are split evenly for hot vs. iced coffee, and that there are slightly more orders for the extreme sizes (small or large) than for a middle-of-the-road size (medium). Of course, this data set is quite small (only 20 customers) and quite contrived (there are no statistics about other types of orders, or any attempt to get data from anyone but the 20 most present coffee-drinkers). Follow-up questions that we might want to ask include: Do size preferences differ for hot vs iced? Is this enough data to draw any larger conclusions? What other variables should we take into account?