Introduction/Essay (a): The topic of the data, any variables included, what kind of variables they are, where the data came from and how you cleaned it up (be detailed and specific, using proper terminology where appropriate). Be sure to explain why you chose this topic and dataset – what meaning does it have for you?

Dataset:Makeup Shades Dataset Source: https://www.kaggle.com/shivamb/makeup-shades-dataset

In my final project, I am using the Makeup Shades Dataset. A list of beauty brands in the US, Nigeria, India, and Japan was collected that were considered by several sources to be “best sellers” in their home countries. The original author visited each brand’s website during May 2018, found their liquid foundation line that (at the time of our sampling) had the largest number of shades available, and recorded the hex color values for each of the colored swatches shown for the product. Then, using Adobe Photoshop, they extracted the lightness value of each color (using the CIE Lab color model).

I will review this dataset to explore:

What brands have the most foundation shades?
Which makeup brands are the best sellers by country and/or by type of company?
How complex are the beauty brand’s foundation shades?

The following variables are included in the dataset: “brand” (categorical variable), “brand_short” (categorical variable) , “product” (categorical variable), “product_short” (categorical variable), “hex” (categorical variable), “H” (continuous variable), “S” (continuous variable), “V” (continuous variable), “L” (interger/discrete variable), and “group” (integer/discrete variable). Variables that I would like to define further include the following:

Hex: The hexadecimal color code for a particular shade (Note: this does not contain the leading # symbol)
H: Hue
S: Saturation
V: Value
L: Lightness
Group: Country (0-7)
- 0: Fenty Beauty’s PRO FILT’R Foundation Only
- 1: Make Up For Ever’s Ultra HD Foundation Only
- 2: US Best Sellers
- 3: BIPOC-recommended Brands with BIPOC Founders
- 4: BIPOC-recommended Brands with White Founders
- 5: Nigerian Best Sellers
- 6: Japanese Best Sellers
- 7: Indian Best Sellers

I cleaned this dataset prior due to the number of symbols it contained. I didn’t want any errors when loading. As a result, I removed symbols from the make-up “brand” column and verify “hex” numbers in the original data set (some of the hex numbers showed up as expressions).

decided to use this dataset for my final project because I have always been really into beauty products and makeup. Over the last few years the range of foundation shades have really expanded. These new options have been really great for women of color. I thought this dataset would be an interesting look inside the major brands that have been apart of this increase.

Essay (b): Incorporate background research about this topic. This background information will include information you find in an article, website, or book. Please source this background information within the essay or if you have multiple sources, include a bibliography. I am not particular about the format of this bibliography. If you need help finding articles, I am happy to help you and/or show you how to search the MC Library Database.

Sin (2021) recently completed her dissertation called “Colorism Toward BIOPIC Community in the Makeup Industry”. She shares her history as a woman of color with makeup and using a lighter shade. Her study found that consumers are becoming more aware of and want more variety starting a movement that has and can continue to result in companies increasing their shade offerings. The Fashion Network (2018) published an article on December 3, 2018 a few months after this dataset was collected. Their was a 28% increase in new foundation products from August 2017 to July 2018 sparked by the trend for a more natural or “second skin” look. During this time colour is where brands began innovating and more than 330 new shades were launched between August 2017 and July 2018, around 100 more than in the previous year. It’s more evident then ever that BIPOC not only set beauty trends but our dollars are a large part of the market share yet BIPOC are still excluded from advertising and marketing. At this time it is important that brands understand the importance of diversity and inclusion and more specifically that the products meet the need of the people (Brown, 2021).

Source:

Ahssen, S. and R. Driver (2018). Sales of foundation boosted by expand shade offering. Fashion Network. https://us.fashionnetwork.com/news/Sales-of-foundation-boosted-by-expanded-shade-offering,1041805.html

Brown, D. (2021). What Diversity Looks like in Foundation and the Beauty Industry? Essence. https://www.essence.com/beauty/what-diversity-looks-like-in-foundation-and-the-beauty-industry/

Sin, P. P. (2021). Colorism Toward the Black, Indigenous and People of Color (BIPOC) Community in the Makeup Industry(Doctoral dissertation). https://cache.kzoo.edu/handle/10920/39131

Load in the Dataset

library(readr)
shades <- read_csv("shades.csv")

## Rows: 625 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): brand, brand_short, product, product_short, hex
## dbl (5): H, S, V, L, group

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Summary of the Dataset

An initial summary of the dataset was run in R Studio. Using R Studio I ran an overall summary of the data.

str(shades)

## spec_tbl_df [625 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ brand        : chr [1:625] "Maybelline" "Maybelline" "Maybelline" "Maybelline" ...
##  $ brand_short  : chr [1:625] "mb" "mb" "mb" "mb" ...
##  $ product      : chr [1:625] "Fit Me" "Fit Me" "Fit Me" "Fit Me" ...
##  $ product_short: chr [1:625] "fmf" "fmf" "fmf" "fmf" ...
##  $ hex          : chr [1:625] "f3cfb3" "ffe3c2" "ffe0cd" "ffd3be" ...
##  $ H            : num [1:625] 26 32 23 19 18 20 28 24 26 20 ...
##  $ S            : num [1:625] 0.26 0.24 0.2 0.25 0.3 0.29 0.31 0.33 0.38 0.38 ...
##  $ V            : num [1:625] 0.95 1 1 1 0.74 0.92 0.98 0.89 0.89 0.7 ...
##  $ L            : num [1:625] 86 92 91 88 65 80 87 77 77 60 ...
##  $ group        : num [1:625] 2 2 2 2 2 2 2 2 2 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   brand = col_character(),
##   ..   brand_short = col_character(),
##   ..   product = col_character(),
##   ..   product_short = col_character(),
##   ..   hex = col_character(),
##   ..   H = col_double(),
##   ..   S = col_double(),
##   ..   V = col_double(),
##   ..   L = col_double(),
##   ..   group = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

summary(shades)

##     brand           brand_short          product          product_short     
##  Length:625         Length:625         Length:625         Length:625        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      hex                  H               S                V         
##  Length:625         Min.   : 4.00   Min.   :0.1000   Min.   :0.2000  
##  Class :character   1st Qu.:23.00   1st Qu.:0.3500   1st Qu.:0.6900  
##  Mode  :character   Median :26.00   Median :0.4400   Median :0.8400  
##                     Mean   :25.31   Mean   :0.4595   Mean   :0.7795  
##                     3rd Qu.:29.00   3rd Qu.:0.5600   3rd Qu.:0.9100  
##                     Max.   :45.00   Max.   :1.0000   Max.   :1.0000  
##                     NA's   :12      NA's   :12       NA's   :12      
##        L             group      
##  Min.   :11.00   Min.   :0.000  
##  1st Qu.:55.00   1st Qu.:2.000  
##  Median :71.00   Median :3.000  
##  Mean   :65.92   Mean   :3.472  
##  3rd Qu.:79.00   3rd Qu.:5.000  
##  Max.   :95.00   Max.   :7.000  
##

library("table1")

## 
## Attaching package: 'table1'

## The following objects are masked from 'package:base':
## 
##     units, units<-

Cross Tabulation

Next, I ran cross-tabulation by brand and group to get a better visual of how many brands were any each group category.

## cross tabulation brand * group

table(shades$brand,shades$group)

##                   
##                     0  1  2  3  4  5  6  7
##   Addiction         0  0  0  0  0  0 17  0
##   bareMinerals      0  0 29  0  0  0  0  0
##   Beauty Bakerie    0  0  0 30  0  0  0  0
##   Bharat and Doris  0  0  0  0  0  0  0  7
##   Black Opal        0  0  0 12  0  0  0  0
##   Black Up          0  0  0 18  0  0  0  0
##   Blue Heaven       0  0  0  0  0  0  0  2
##   Bobbi Brown       0  0  0  0 30  0  0  0
##   Colorbar          0  0  0  0  0  0  0  3
##   Covergirl Olay    0  0 12  0  0  0  0  0
##   Dior              0  0  0  0  0  0  6  0
##   Elsas Pro         0  0  0  0  0 11  0  0
##   Estee Lauder      0  0 42  0  0  0  0  0
##   Fenty            40  0  0  0  0  0  0  0
##   Hegai and Ester   0  0  0  0  0 10  0  0
##   House of Tara     0  0  0  0  0 11  0  0
##   Iman              0  0  0  8  0  0  0  0
##   IPSA              0  0  0  0  0  0  6  0
##   Kate              0  0  0  0  0  0  6  0
##   Kuddy             0  0  0  0  0  5  0  0
##   Lakme             0  0  0  0  0  0  0  4
##   Lancome           0  0  0  0 40  0  0  0
##   Laws of Nature    0  0  0 17  0  0  0  0
##   LOreal            0  0 22  0  0  0  0 14
##   Lotus Herbals     0  0  0  0  0  0  0  4
##   MAC               0  0  0  0 42  0  0  0
##   Make Up For Ever  0 40  0  0  0  0  0  0
##   Maybelline        0  0 40  0  0  0  0 14
##   NARS              0  0  0  0  0  0 13  0
##   Nykaa             0  0  0  0  0  0  0  5
##   Olivia            0  0  0  0  0  0  0  4
##   Revlon            0  0 22  0  0  0  0  0
##   RMK               0  0  0  0  0  0  9  0
##   Shiseido          0  0  0  0  0  0  6  0
##   Shu Uemera        0  0  0  0  0  0 11  0
##   Trim and Prissy   0  0  0  0  0 13  0  0

Statistical Analysis (Pick one correlation, outlier analysis, histogram, boxplots, or linear or multiple regression analysis)

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Histogram of Best Selling Make-up Brands by Group

To further analyse the data I used histograms to further look at the variables “brands” by “group”.

country <- shades$group
hist(country,
    main="Best Selling Make-up Brands by Group",
     xlab="Group",
     border="red", 
     col="orange"
)

country <- shades$group
hist(country, 
     main="Best Selling Make-up Brands by Group",
     xlab="Group",
     border="red", 
     col="orange", 
     prob = TRUE)

lines(density(country))

The rankings for best country are US, Japan, India, and Nigeria. I also need to consider looking into larger categories that are not representative of a country specifically - ex.”3” and “4”. These are 3: BIPOC-recommended Brands with BIPOC Founders and 4: BIPOC-recommended Brands with White Founders. “BIPOC” is Black, Indigenous, (and) People of Color.

Filter group by country best sellers (US, Nigeria, Japan, and India)

I decided to filter the dataset to specifically focus on the beauty brands that re the best sellers in the following four countries: US, Nigeria, Japan, and India).

shades_country <- shades %>%
  select (brand, brand_short, product, product_short, hex, H, S, V, L, group) %>%
  filter(group %in% c("2", "5", "6", "7"))

Mutate group to country name (US, Nigerian, Japan and India)

shades_country <- shades_country %>%
  mutate(group = case_when(
    group %in% 2 ~ "United States",
    group %in% 5 ~ "Nigeria",
    group %in% 6 ~ "Japan",
    group %in% 7 ~ "India",
  ))

shades_country <- shades_country %>%
  mutate(group = factor(group, levels= rev(c("United States", "Nigeria", "Japan", "India"))))

Explore both quantitative and categorical variables with simple plots to determine what you want to focus on for your final visualization. # Incorporate at least one type of comparison – of factors within a variable or of different variables.

Scatterplot matrix

Conduct andditional analysis by plotting the variables Hue (H), Saturation (S), Value (V), and Lightness (L) against each other to visualize the correlation between the variables.

pairs(~H + S + V + L, data = shades_country)

Scatterplot matrix (Factor Variable)

Plot the variables Hue (H), Saturation (S), Value (V), and Lightness (L) against each other, specifying factor variable “group” = country, to visualize the correlation between the variables.

H - The bottom 3 charts have no correlation.

S - The top chart no correlation and bottom 2 charts moderate negative correlation.

V - The 1st top chart has a weak positive correlation, top 2nd chart weak negative correlation, and bottom strong positive correlation.

L - The 1st top chart has a weak positive correlation, top 2nd chart weak weak negative correlation, and 3rd top chart has a strong positive correlation.

pairs(~H + S + V + L, col = factor(shades_country$group), pch = 19, data = shades_country)

library ("ggplot2")

Scatter plot to see if their is a correlation between brands and variables Hue and Saturation.

Scatter plot to see if their is a correlation between brands and variables Value and Lightness.

ggplot(shades_country, aes(x = log(H), y = log(S))) +
    geom_point(aes(color = factor(brand)))

## Warning: Removed 12 rows containing missing values (geom_point).

ggplot(shades_country, aes(x = log(V), y = log(L))) +
    geom_point(aes(color = factor(brand)))

## Warning: Removed 12 rows containing missing values (geom_point).

Plot one or more various visualizations we have discussed throughout the course, which may or may not include GIS information. You could also use Tableau for your visualization, but be sure to include the link to your Tableau visualization in your Markdown File. During your exploration, keep a running commentary in the Markdown text area of what you are doing and why you are doing it.

Link to Tableau Dashboards:

https://public.tableau.com/views/DATA110_FinalProject/MakeupBrandsbyGroup?:language=en-US&:display_count=n&:origin=viz_share_link

Essay (c): What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.

The visualization for this project was created in Tableau. I did add a column with the “group” names to the variable dataset to assist with creating data visualizations in Tableau. Separate dashboards were created to create visuals for “Makeup Brands by Group”, “Hex County by Makeup Brand”, and “HSVL by Makeup Brand”. The dashboard were created to combine multiple worksheet topics and to continue to explore the following topics:

What brands have the most foundation shades?
Which makeup brands are the best sellers by country and/or by type of company?
How complex are the beauty brand’s foundation shades?

In the “Makeup Brands by Group” dashboard I used a treemap and a horizontal bar. In the treemap US and overall show that the United States makeup brands as best sellers followed by BIPOC - recommended Brands with White Founders.

The “Hex Count by Makeup Brand” dashboard is use to show the color range within each brand. The top brands with the widest (hex) or number of colors are Maybelline. I created a buble chart to demonstrate the brands with the widest selection. The top brands include: Mac. Este Lauder, Fenty, Lancome, and Makeup Forever. I also created a bar chart that show the brand breakout by the “Group” and “Brand” variables to further show the impact of the best selling countries.

The final dashboard that I created was Hue, Saturation, Value, and Lightness by Makeup Brand. Although, we saw a positive correlation for “Value” earlier the bar chart is not representative of a huge impact across brands. However, “Hue” and “Lightness” appear to be a large part of what beauty brands focus on with foundation shade colors. This would seem accurate with the expansion of colors.

One thing I wish this dataset provided was multiple years of data. This would have allowed for a comparison for how the beauty industry has changed and shown an increase in foundation colors over time.

DATA 110 Final Project

Jannety Mosley

12/9/2021