Hi there! I’ve included this summary for grader convenience so you don’t have to scan the sephora file. As you’ll notice, this data set lacks numerical data. This is a shortcoming on my part, but I have been searching for a new dataset. Further, I’ve chosen this data set over the ‘rocket league’ set due to this ultimately being more legible for grading. I would love to do this assignment again in my spare-time with the new data set that I pick just to ensure that I’ve correctly learned the material. In the meantime, let’s tell a story.
library(readr)
sephora <- read.csv("sephora.csv")
summary(sephora)
## brand product url description
## Length:4371 Length:4371 Length:4371 Length:4371
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## imgSrc imgAlt name specific
## Length:4371 Length:4371 Length:4371 Length:4371
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
To make sense of these columns, we can go to understand the context in which this data was collected to begin with.
In March 2021, journalist Ofunne Amaka with The Pudding magazine released an article titled “The Naked Truth”, which sought to expose how the names of women’s complexion products revealed bias in the beauty industry.
‘Foundation’ is a common product in a makeup routine that is typically applied to the face and neck to reduce the appearance of blemishes, fine lines, acne and other scars. Despite this product being a crucial step in most peoples’ makeup routines, brands have made an odd choice in their naming conventions.
The article found that, in total, of the products surveyed in the aforementioned brands, a whopping 1,302 contain food and drink related names for shades marketed towards those with tan, beige or darker skin . While these names may not seem offensive at first glance, they can read as dehumanizing or fetishizing.
Here’s a quick glance of the brands that were surveyed (6,816 products in total across brands):
library(ggplot2)
ggplot(sephora, aes(factor(1), fill = brand)) + geom_bar()
library(readr, dplyr)
table(sephora$brand)
##
## AMOREPACIFIC Anastasia Beverly Hills
## 5 54
## Antonym Armani Beauty
## 7 169
## bareMinerals beautyblender
## 252 40
## BECCA Cosmetics Benefit Cosmetics
## 48 38
## Bite Beauty Black Up
## 32 1
## Bobbi Brown boscia
## 183 4
## Charlotte Tilbury CLINIQUE
## 76 201
## COOLA Dior
## 3 155
## Dr. Dennis Gross Skincare Dr. Jart+
## 2 4
## Erborian Estée Lauder
## 4 125
## FENTY BEAUTY by Rihanna Givenchy
## 200 56
## Gucci Guerlain
## 40 44
## Hourglass HUDA BEAUTY
## 104 69
## ILIA IT Cosmetics
## 36 91
## Josie Maran Jouer Cosmetics
## 11 50
## KEVYN AUCOIN Koh Gen Do
## 46 10
## Kosas KVD Vegan Beauty
## 16 160
## La Mer Lancôme
## 39 104
## Laura Mercier LAWLESS
## 182 17
## lilah b. MAKE UP FOR EVER
## 5 170
## Marc Jacobs Beauty MILK MAKEUP
## 3 63
## NARS Natasha Denona
## 125 83
## NUDESTIX Origins
## 25 30
## PAT McGRATH LABS Perricone MD
## 36 8
## Pretty Vulgar Rare Beauty by Selena Gomez
## 8 48
## rms beauty SEPHORA COLLECTION
## 32 229
## Sephora Favorites Shiseido
## 34 49
## Smashbox surratt beauty
## 114 14
## tarte TOM FORD
## 202 79
## Too Faced Urban Decay
## 121 111
## Wander Beauty Yves Saint Laurent
## 24 80
library(ggplot2)
ggplot(sephora, aes(product)) + geom_bar()
These brands range from low to high end (price-wise) and can be found in major retailers such as Walmart, Sephora, Ulta, Target etc.
The first visualization serves to show a ‘rainbow’ (haha) of options with which consumers have to choose from. These brands represent major players in the beauty industry and have some of the highest profit margins regardless of the product. However, while these brands are quite popular and make up a good portion of the beauty industry, they are not all-encompassing.
Not every brand is made equal! While some brands offer a wide range of shades for their complexion products, others have limited options for consumers. As seen in the second table (shown for convenience, given that the visualization below is more difficult to read due to the length of the product names), the number of complexion products differ greatly. As you might imagine, this creates an issue of inclusivity for people who fall outside of a specific shade range.
Now that we have some context behind this data set, what kinds of things would be interesting to know? I believe we should ask:
What is the proportion of brands that utilize ‘food and drink’ based names to those that don’t?
Given the hex of the deepest shade, how many brands offer shades five or more shades below or above the deepest shade?
How many of each product type exists in a given brand? (answered above)