Datacamp coding challenge: Cosmetic Brand Analysis

Cosmetics Brand Analysis

The cosmetics industry is a 50 billion-dollar-a-year business in the US alone. Cosmetics sales are even used as an indicator for economic recessions (i.e., the “lipstick index”).

You are working for a cosmetics start-up that plans to launch a new line of organic products. The launch will focus on the following cosmetics categories: fragrances, makeup, and skincare. Your task is to find the average prices for four competitor brands across these three categories.

You will be using data collected from Sephora’s global store in your analysis. However, there are some additional requirements before you can present the results:

The data available is unprocessed as received from Sephora. The start-up requests the categories as follows:

New category	Existing categories to reclassify
Fragrance	Perfume, Cologne
Makeup	Eye Palettes, Lipstick, Highlighter, Foundation, Mascara, Eyeliner, Makeup, Eyeshadow
Skincare	Moisturizers, Face Serums, Face Wash & Cleansers, Face Masks, Face Primer, Body Lotions & Body Oils, Lotions & Oils

Currently, it is unclear who the primary competitors are. Find the top four brands by the combined number of fragrance, makeup, and skincare products produced.
Exclude any products containing the toxic ingredient “toluene” from your analysis. Assume products with no ingredient information do not contain “toluene”.
Product prices should be compared in USD. Below are the conversions for currencies found in the dataset:
- 1 EUR = 1.22 USD
- 1 GBP = 1.42 USD
- 1 Yen = 0.01 USD

The datasets available are listed below:

datasets/cosmetics.csv - Catalogue of cosmetics products, including brand, category, price, and other characteristics

Source: Kaggle

id: The product id.
brand: The brand.
category: The category of product.
name: The name of the product.
size: The size of the product.
price: The price of the product (in various currencies).
rating: The consumer rating of the product.
how_to_use: Instructions for the product.
online_only: Whether or not the product is online-exclusive.
limited_edition: Whether or not the product is limited edition.

datasets/ingredients.txt - A tab-delimited file containing the product IDs and ingredients for products in the cosmetics dataset

Source: Kaggle

product_id: The product id (corresponding to the id column in the cosmetics dataset).
ingredients: The list of ingredients in the product.

datasets/ingredients.txt - A tab-delimited file containing the product IDs and ingredients for products in the cosmetics dataset

sub_category: The sub categories that exist in the cosmetics CSV that need to be recategorized.
broad_category: The corresponding broader category to which the sub_category belongs to and needs to be recategorized as.

# Use this cell to begin your analysis, and add as many as you would like!

###---Libraries
    library(dplyr) #For transformations
    library(tidyr) #For pivoting data
    library(stringr) #For string extraction
    library(knitr) #For table printing

###---Load Data
    cosmetics <- read.csv('datasets/cosmetics.csv')
    ingredients <- read.csv("datasets/ingredients.txt", sep="\t", header=T)
    cos_cat <- read.csv("datasets/cosmetics_categories.csv")

###---Clean Data
    cosmetics <- cosmetics %>%
        mutate(brand_lower = tolower(brand),
               Local_Amount = as.numeric(str_extract(price, "\\d+")),
                Currency = str_extract(price, "EUR|GBP|USD|YEN"),
               USD_Amount = case_when(
                   Currency == "USD" ~ Local_Amount,
                   Currency == "GBP" ~ 1.42 * Local_Amount,
                   Currency == "EUR" ~ 1.22 * Local_Amount,
                   Currency == "YEN" ~ 0.01 * Local_Amount,
                   TRUE ~ NA_real_
               )
               )

###---Limit Data to fragrence, makeup, and skin care
    brand_prices <- cosmetics %>%
        right_join(cos_cat, by = c("category" = "sub_category"))

###---Terminate toluene with extreme prejudice
    ingredients_toluene <- ingredients %>%
        filter(grepl('toluene', ingredients, ignore.case = TRUE))

    brand_prices <- brand_prices %>%
        anti_join(ingredients_toluene, by = c('id' = 'product_id'))

###---Filter to top 4 brands
    top_4_brands <- brand_prices %>%
        group_by(brand_lower) %>%
        summarise(Total_Products = n()) %>%
        arrange(desc(Total_Products)) %>%
        slice_head(n=4) %>%
        pull(brand_lower)

    brand_prices <- brand_prices %>%
        filter(brand_lower %in% top_4_brands)

###---Return average price by brand and category
    brand_prices <- brand_prices %>%
        group_by(Brand = brand_lower,
                broad_category) %>%
        summarise(mcost = round(mean(USD_Amount),2)) %>%
        pivot_wider(Brand,
                   names_from = broad_category,
                   values_from = mcost)

kable(brand_prices, caption = "Average Cost By Brand And Category")

Average Cost By Brand And Category
Brand	Fragrance	Makeup	Skincare
clinique	58.00	24.08	36.90
dior	92.85	39.12	88.08
sephora collection	17.00	16.83	10.04
tom ford	182.70	61.35	74.86