Cosmetics Brand Analysis

The cosmetics industry is a 50 billion-dollar-a-year business in the US alone. Cosmetics sales are even used as an indicator for economic recessions (i.e., the “lipstick index”).

You are working for a cosmetics start-up that plans to launch a new line of organic products. The launch will focus on the following cosmetics categories: fragrances, makeup, and skincare. Your task is to find the average prices for four competitor brands across these three categories.

You will be using data collected from Sephora’s global store in your analysis. However, there are some additional requirements before you can present the results:

New category Existing categories to reclassify
Fragrance Perfume, Cologne
Makeup Eye Palettes, Lipstick, Highlighter, Foundation, Mascara, Eyeliner, Makeup, Eyeshadow
Skincare Moisturizers, Face Serums, Face Wash & Cleansers, Face Masks, Face Primer, Body Lotions & Body Oils, Lotions & Oils

The datasets available are listed below:

datasets/cosmetics.csv - Catalogue of cosmetics products, including brand, category, price, and other characteristics

Source: Kaggle

  • id: The product id.
  • brand: The brand.
  • category: The category of product.
  • name: The name of the product.
  • size: The size of the product.
  • price: The price of the product (in various currencies).
  • rating: The consumer rating of the product.
  • how_to_use: Instructions for the product.
  • online_only: Whether or not the product is online-exclusive.
  • limited_edition: Whether or not the product is limited edition.

datasets/ingredients.txt - A tab-delimited file containing the product IDs and ingredients for products in the cosmetics dataset

Source: Kaggle

  • product_id: The product id (corresponding to the id column in the cosmetics dataset).
  • ingredients: The list of ingredients in the product.

datasets/ingredients.txt - A tab-delimited file containing the product IDs and ingredients for products in the cosmetics dataset

  • sub_category: The sub categories that exist in the cosmetics CSV that need to be recategorized.
  • broad_category: The corresponding broader category to which the sub_category belongs to and needs to be recategorized as.
# Use this cell to begin your analysis, and add as many as you would like!

###---Libraries
    library(dplyr) #For transformations
    library(tidyr) #For pivoting data
    library(stringr) #For string extraction
    library(knitr) #For table printing

###---Load Data
    cosmetics <- read.csv('datasets/cosmetics.csv')
    ingredients <- read.csv("datasets/ingredients.txt", sep="\t", header=T)
    cos_cat <- read.csv("datasets/cosmetics_categories.csv")

###---Clean Data
    cosmetics <- cosmetics %>%
        mutate(brand_lower = tolower(brand),
               Local_Amount = as.numeric(str_extract(price, "\\d+")),
                Currency = str_extract(price, "EUR|GBP|USD|YEN"),
               USD_Amount = case_when(
                   Currency == "USD" ~ Local_Amount,
                   Currency == "GBP" ~ 1.42 * Local_Amount,
                   Currency == "EUR" ~ 1.22 * Local_Amount,
                   Currency == "YEN" ~ 0.01 * Local_Amount,
                   TRUE ~ NA_real_
               )
               )

###---Limit Data to fragrence, makeup, and skin care
    brand_prices <- cosmetics %>%
        right_join(cos_cat, by = c("category" = "sub_category"))

###---Terminate toluene with extreme prejudice
    ingredients_toluene <- ingredients %>%
        filter(grepl('toluene', ingredients, ignore.case = TRUE))

    brand_prices <- brand_prices %>%
        anti_join(ingredients_toluene, by = c('id' = 'product_id'))

###---Filter to top 4 brands
    top_4_brands <- brand_prices %>%
        group_by(brand_lower) %>%
        summarise(Total_Products = n()) %>%
        arrange(desc(Total_Products)) %>%
        slice_head(n=4) %>%
        pull(brand_lower)

    brand_prices <- brand_prices %>%
        filter(brand_lower %in% top_4_brands)

###---Return average price by brand and category
    brand_prices <- brand_prices %>%
        group_by(Brand = brand_lower,
                broad_category) %>%
        summarise(mcost = round(mean(USD_Amount),2)) %>%
        pivot_wider(Brand,
                   names_from = broad_category,
                   values_from = mcost)
kable(brand_prices, caption = "Average Cost By Brand And Category")
Average Cost By Brand And Category
Brand Fragrance Makeup Skincare
clinique 58.00 24.08 36.90
dior 92.85 39.12 88.08
sephora collection 17.00 16.83 10.04
tom ford 182.70 61.35 74.86