The cosmetics industry is a 50 billion-dollar-a-year business in the US alone. Cosmetics sales are even used as an indicator for economic recessions (i.e., the “lipstick index”).
You are working for a cosmetics start-up that plans to launch a new line of organic products. The launch will focus on the following cosmetics categories: fragrances, makeup, and skincare. Your task is to find the average prices for four competitor brands across these three categories.
You will be using data collected from Sephora’s global store in your analysis. However, there are some additional requirements before you can present the results:
| New category | Existing categories to reclassify |
|---|---|
| Fragrance | Perfume, Cologne |
| Makeup | Eye Palettes, Lipstick, Highlighter, Foundation, Mascara, Eyeliner, Makeup, Eyeshadow |
| Skincare | Moisturizers, Face Serums, Face Wash & Cleansers, Face Masks, Face Primer, Body Lotions & Body Oils, Lotions & Oils |
Currently, it is unclear who the primary competitors are. Find the top four brands by the combined number of fragrance, makeup, and skincare products produced.
Exclude any products containing the toxic ingredient “toluene” from your analysis. Assume products with no ingredient information do not contain “toluene”.
Product prices should be compared in USD. Below are the conversions for currencies found in the dataset:
The datasets available are listed below:
Source: Kaggle
Source: Kaggle
# Use this cell to begin your analysis, and add as many as you would like!
###---Libraries
library(dplyr) #For transformations
library(tidyr) #For pivoting data
library(stringr) #For string extraction
library(knitr) #For table printing
###---Load Data
cosmetics <- read.csv('datasets/cosmetics.csv')
ingredients <- read.csv("datasets/ingredients.txt", sep="\t", header=T)
cos_cat <- read.csv("datasets/cosmetics_categories.csv")
###---Clean Data
cosmetics <- cosmetics %>%
mutate(brand_lower = tolower(brand),
Local_Amount = as.numeric(str_extract(price, "\\d+")),
Currency = str_extract(price, "EUR|GBP|USD|YEN"),
USD_Amount = case_when(
Currency == "USD" ~ Local_Amount,
Currency == "GBP" ~ 1.42 * Local_Amount,
Currency == "EUR" ~ 1.22 * Local_Amount,
Currency == "YEN" ~ 0.01 * Local_Amount,
TRUE ~ NA_real_
)
)
###---Limit Data to fragrence, makeup, and skin care
brand_prices <- cosmetics %>%
right_join(cos_cat, by = c("category" = "sub_category"))
###---Terminate toluene with extreme prejudice
ingredients_toluene <- ingredients %>%
filter(grepl('toluene', ingredients, ignore.case = TRUE))
brand_prices <- brand_prices %>%
anti_join(ingredients_toluene, by = c('id' = 'product_id'))
###---Filter to top 4 brands
top_4_brands <- brand_prices %>%
group_by(brand_lower) %>%
summarise(Total_Products = n()) %>%
arrange(desc(Total_Products)) %>%
slice_head(n=4) %>%
pull(brand_lower)
brand_prices <- brand_prices %>%
filter(brand_lower %in% top_4_brands)
###---Return average price by brand and category
brand_prices <- brand_prices %>%
group_by(Brand = brand_lower,
broad_category) %>%
summarise(mcost = round(mean(USD_Amount),2)) %>%
pivot_wider(Brand,
names_from = broad_category,
values_from = mcost)
kable(brand_prices, caption = "Average Cost By Brand And Category")
| Brand | Fragrance | Makeup | Skincare |
|---|---|---|---|
| clinique | 58.00 | 24.08 | 36.90 |
| dior | 92.85 | 39.12 | 88.08 |
| sephora collection | 17.00 | 16.83 | 10.04 |
| tom ford | 182.70 | 61.35 | 74.86 |