Datasets and Charts

rm(list = ls())
library(datasetsICR)
library(ggplot2)
data(customers)

head(customers, 10)
   Channel Region Fresh  Milk Grocery Frozen Detergents_Paper Delicassen
1        2      3 12669  9656    7561    214             2674       1338
2        2      3  7057  9810    9568   1762             3293       1776
3        2      3  6353  8808    7684   2405             3516       7844
4        1      3 13265  1196    4221   6404              507       1788
5        2      3 22615  5410    7198   3915             1777       5185
6        2      3  9413  8259    5126    666             1795       1451
7        2      3 12126  3199    6975    480             3140        545
8        2      3  7579  4956    9426   1669             3321       2566
9        1      3  5963  3648    6192    425             1716        750
10       2      3  6006 11093   18881   1159             7425       2098
table(customers$Region)

  1   2   3 
 77  47 316 
ggplot(data = customers, mapping = aes(x = Frozen, y = Fresh, color = Region)) + geom_point() + scale_x_log10()+
  labs(title = "The correlation of Fresh, Frozen, and Region",
       subtitle = "Frozen and fresh measured by region",
       caption = "Source: {customers} dataset",
       x = "Frozen products",
       y = "Fresh products",
       color = "region")

  1. Many regions have more frozen than fresh products. Even though the chart shows more frozen increasing than fresh, the scale is slightly smaller for the frozen products. Only one region has more than 100,000 fresh products. No region has more than 100,000 frozen products. Many regions are in region 3, with region 2 being the fewest with only 47 in that region.