Dowload the weekly data and make available in the tt object.
tt <- tt_load("2021-03-30")
##
## Downloading file 1 of 5: `ulta.csv`
## Downloading file 2 of 5: `sephora.csv`
## Downloading file 3 of 5: `allShades.csv`
## Downloading file 4 of 5: `allNumbers.csv`
## Downloading file 5 of 5: `allCategories.csv`
Getting the data into csvs:
write.csv(tt$allShades, "allShades.csv")
write.csv(tt$allNumbers, "allNumbers.csv")
write.csv(tt$allCategories, "allCategories.csv")
Let’s rename shade to lightness, since it is already scaled in such a manner:
allShades <- read_csv("allShades.csv") %>%
mutate(shade = lightness)
allNumbers <- read_csv("allNumbers.csv") %>%
mutate(shade = lightness)
allCategories <- read_csv("allCategories.csv") %>%
mutate(shade = lightness)
Prof. Hardin found a way to represent the individual products as their actual listed product color. We will save this for later!
shade_colors <- allShades %>%
pull(hex, hex)
allShades %>%
filter(brand == "ULTA") %>%
ggplot() +
geom_jitter(aes(x = shade, y = hue, fill = hex), size = 7, pch = 21) +
scale_fill_manual(values = shade_colors) +
theme_minimal() +
theme(legend.position = "none")
Here we can visualize the actual colors of the foundations in the dataset. Off the bat, we recognize a higher count for lighter hued foundations, when compared to the darker hued foundations.
#(Un)Naturally Named
allCategories %>%
group_by(brand) %>%
mutate(num_prod = n()) %>%
filter(num_prod > 150) %>%
ungroup() %>%
mutate(name = tolower(name)) %>%
mutate(nature = case_when(
# looking at whether the product name includes any
# of the following words
str_detect(name, "natural") ~ "nat word",
str_detect(name, "nature") ~ "nat word",
str_detect(name, "nude") ~ "nat word",
str_detect(name, "naked") ~ "nat word",
str_detect(name, "neutral") ~ "nat word",
TRUE ~ "no nat word"
)) %>%
# get the mean shade for both the nat and no-nat group
group_by(nature) %>%
mutate(med_shade = median(shade), mean_shade = mean(shade)) %>%
ungroup() %>%
ggplot() +
# show the distribution of shades
geom_boxplot(aes(x = shade, y = brand)) +
geom_jitter(aes(x = shade, y = brand, fill = hex), size = 2, pch = 21, height = 0.1) +
# plot the mean shade for both groups
geom_vline(aes(xintercept = mean_shade, group = nature)) +
# color by the product color
scale_fill_manual(values = shade_colors) +
theme_minimal() +
theme(legend.position = "none") +
facet_wrap(~ nature, ncol = 1)
The visualization reveals a trend in the data that products that are deemed ‘natural’, ‘naked’, or a variant of the word are represented as lighter hues, as compared to products without such words in their name. This data is for the top 10 brands, or more specifically, the brands with the most products.