TidyTuesday

Load the weekly Data

Dowload the weekly data and make available in the tt object.

tt <- tt_load("2021-03-30")
## 
##  Downloading file 1 of 5: `ulta.csv`
##  Downloading file 2 of 5: `sephora.csv`
##  Downloading file 3 of 5: `allShades.csv`
##  Downloading file 4 of 5: `allNumbers.csv`
##  Downloading file 5 of 5: `allCategories.csv`

Getting the data into csvs:

write.csv(tt$allShades, "allShades.csv")
write.csv(tt$allNumbers, "allNumbers.csv")
write.csv(tt$allCategories, "allCategories.csv")

Let’s rename shade to lightness, since it is already scaled in such a manner:

allShades <- read_csv("allShades.csv") %>%
  mutate(shade = lightness)
allNumbers <- read_csv("allNumbers.csv") %>%
  mutate(shade = lightness)
allCategories <- read_csv("allCategories.csv") %>%
  mutate(shade = lightness)

Prof. Hardin found a way to represent the individual products as their actual listed product color. We will save this for later!

shade_colors <- allShades %>%
  pull(hex, hex)

allShades %>%
  filter(brand == "ULTA") %>% 
  ggplot() +
  geom_jitter(aes(x = shade, y = hue, fill = hex), size = 7, pch = 21) +
  scale_fill_manual(values = shade_colors) + 
  theme_minimal() + 
  theme(legend.position = "none")

Here we can visualize the actual colors of the foundations in the dataset. Off the bat, we recognize a higher count for lighter hued foundations, when compared to the darker hued foundations.

#(Un)Naturally Named

allCategories %>%
  group_by(brand) %>%
  mutate(num_prod = n()) %>%
  filter(num_prod > 150) %>%
  ungroup() %>%
  mutate(name = tolower(name)) %>%
  mutate(nature = case_when(
    # looking at whether the product name includes any
    # of the following words
    str_detect(name, "natural") ~ "nat word",
    str_detect(name, "nature") ~ "nat word",
    str_detect(name, "nude") ~ "nat word",
    str_detect(name, "naked") ~ "nat word",
    str_detect(name, "neutral") ~ "nat word",
    TRUE ~ "no nat word"
  )) %>%
  # get the mean shade for both the nat and no-nat group
  group_by(nature) %>%
  mutate(med_shade = median(shade), mean_shade = mean(shade)) %>%
  ungroup() %>%
  ggplot() +
    # show the distribution of shades
    geom_boxplot(aes(x = shade, y = brand)) +
    geom_jitter(aes(x = shade, y = brand, fill = hex), size = 2, pch = 21, height = 0.1) +
  # plot the mean shade for both groups
  geom_vline(aes(xintercept = mean_shade, group = nature)) +
  # color by the product color
  scale_fill_manual(values = shade_colors) + 
  theme_minimal() + 
  theme(legend.position = "none") +
  facet_wrap(~ nature, ncol = 1)

The visualization reveals a trend in the data that products that are deemed ‘natural’, ‘naked’, or a variant of the word are represented as lighter hues, as compared to products without such words in their name. This data is for the top 10 brands, or more specifically, the brands with the most products.