Summary

These data reflect information that has been reported to the California Safe Cosmetics Program (CSCP) in the California Department of Public Health (CDPH). The primary purpose of the CSCP is to collect information on hazardous and potentially hazardous ingredients in cosmetic products sold in California and to make this information available to the public. For all cosmetic products sold in California, the California Safe Cosmetics Act (“the Act”) requires the manufacturer, packer, and/or distributor named on the product label to provide to the CSCP a list of all cosmetic products that contain any ingredients known or suspected to cause cancer, birth defects, or other developmental or reproductive harm.

This data has been reflected and cleaned to display only one variation of each product that’s was sold in multiple different colors. Reason being, many products that sort of various different colors usually consist of the same ingredient composition and same chemicals which could lead to skewed data.

Source: Chemicals in Cosmetics by California Department of Public Health

healthdata.gov

Qualitative Analysis

In the state of California there were…

31,626 various different cosmetics products identified in the retail consumer market represented and sold by 2,237 different brands.

Each product is categorized among 13 primary categories:

setwd("/Users/melodykuzu/OneDrive - Loyola University Maryland/ELMBA - Melody/Data Visualiztion/DataSets")

library(data.table)
library(ggplot2)
library(scales)

library(RColorBrewer)
library(ggthemes)
library(plyr)


filename <- ("chemicals-in-cosmetics1.csv")
df <- fread(filename)

class(df$ChemicalId) = "character"
class(df$CDPHId) = "character"
class(df$CSFId) = "character"
class(df$PrimaryCategoryId) = "character"
class(df$CompanyId) = "character"
class(df$SubCategoryId) = "character"
class(df$CasId) = "character"
class(df$BrandName) = "character"
class(df$ChemicalNate) = "character"
library(dplyr)
detach("package:dplyr", unload = TRUE)
df_category <- count(df, "PrimaryCategory")
df_category <- df_category[order(df_category$freq, decreasing = TRUE),]
df_category
##                      PrimaryCategory  freq
## 6    Makeup Products (non-permanent) 15365
## 7                      Nail Products  5790
## 11                Skin Care Products  4455
## 2                      Bath Products  2022
## 4  Hair Care Products (non-coloring)  1122
## 12              Sun-Related Products   844
## 5             Hair Coloring Products   625
## 3                         Fragrances   402
## 9             Personal Care Products   336
## 8              Oral Hygiene Products   302
## 10                  Shaving Products   168
## 13      Tattoos and Permanent Makeup   156
## 1                      Baby Products    39

Below are the Top 5 Product Categories:

top_category <- df_category$PrimaryCategory[1:5]
top_category
## [1] "Makeup Products (non-permanent)"   "Nail Products"                    
## [3] "Skin Care Products"                "Bath Products"                    
## [5] "Hair Care Products (non-coloring)"

Below are the all Chemical Names and number of products containing the respective chemical:

df_chem <- count(df, "ChemicalName")
df_chem <- df_chem[order(df_chem$freq, decreasing = TRUE),]
df_chem
##                                                       ChemicalName  freq
## 72                                                Titanium dioxide 25898
## 21                                                        Cocamide  1014
## 63                                                         Retinol   698
## 67     Silica, crystalline (airborne particles of respirable size)   468
## 71                                                            Talc   457
## 74                                                    Trade Secret   395
## 78                                                       Vitamin A   290
## 14                                        Butylated hydroxyanisole   283
## 64                                                         Retinyl   280
## 76                                                 Triethanolamine   266
## 44                                                            Mica   252
## 18                                                    Carbon black   246
## 45                      Mineral oils, untreated and mildly treated   116
## 22                                                          Coffee   101
## 9                                                        Aloe Vera    98
## 28                                                       Estragole    92
## 33                                                    Formaldehyde    65
## 12                                                    Benzophenone    62
## 43                                                   Methyleugenol    58
## 73                                                         Toluene    43
## 70                                                         Styrene    41
## 35                                           Ginkgo biloba extract    39
## 1                                                      1,4-Dioxane    35
## 56                                                      Phenacetin    32
## 17                                                        Caffeine    31
## 31                                                        Ethylene    20
## 26                                                  Diethanolamine    19
## 20                                                       Coal tars    18
## 4                                                     Acetaldehyde    15
## 58                                                    Progesterone    15
## 13                                                    beta-Myrcene    13
## 53                                                   Oil Orange SS    13
## 11                                                      Avobenzone     9
## 29                                  Ethanol in alcoholic beverages     8
## 47                                             N-Methylpyrrolidone     8
## 49                                        N,N-Dimethyl-p-toluidine     8
## 52                                                  o-Phenylphenol     8
## 59                                                       Propylene     8
## 61                                                          Quartz     7
## 10                                                         Aspirin     6
## 34                                            Genistein (purified)     6
## 51                                o-Phenylenediamine and its salts     6
## 23                                      Di-n-butyl phthalate (DBP)     5
## 40                                                    Lead acetate     5
## 60                                                        Pulegone     5
## 66                                                Selenium sulfide     5
## 8                                          All-trans retinoic acid     4
## 41                                                        Methanol     4
## 46                                                     Musk xylene     4
## 19                                 Chromium (hexavalent compounds)     3
## 37                Isopropyl alcohol manufacture using strong acids     3
## 38                                                   Lauramide DEA     3
## 75                                            Trichloroacetic acid     3
## 3  2,4-Hexadienal (89% trans, trans isomer; 11% cis, trans isomer)     2
## 5                                       Acetic acid, retinyl ester     2
## 7                                                       Acrylamide     2
## 30                                                  Ethyl acrylate     2
## 36                                          Goldenseal root powder     2
## 42                                                Methylene glycol     2
## 50                                               Nickel (Metallic)     2
## 54                                                      Permethrin     2
## 57                                                      Polygeenan     2
## 62                             Quinoline and its strong acid salts     2
## 2                             2,2-Bis(bromomethyl)-1,3-propanediol     1
## 6                                             Acetylsalicylic acid     1
## 15                                               C.I. Acid Red 114     1
## 16                                   Cadmium and cadmium compounds     1
## 24                                             Dichloroacetic acid     1
## 25               Diethanolamides of the fatty acids of coconut oil     1
## 27                                          Distillates (coal tar)     1
## 32                                          Extract of coffee bean     1
## 39                                        Lauramide diethanolamine     1
## 48                                         N-Nitrosodiethanolamine     1
## 55                                                     Phenacemide     1
## 65                                                         Safrole     1
## 68                                                  Sodium Bromate     1
## 69                                                  Spironolactone     1
## 77                                                   Vinyl acetate     1

Below are Top 6 Most Common Chemicals:

top_chem <- count(df$ChemicalName)
top_chem <- (df_chem$ChemicalName[1:5])

ChemicalNameCount <- data.frame(count(df$ChemicalName))
ChemicalNameCount <- ChemicalNameCount[order(ChemicalNameCount$freq, decreasing = TRUE), ]
head(ChemicalNameCount)
##                                                              x  freq
## 72                                            Titanium dioxide 25898
## 21                                                    Cocamide  1014
## 63                                                     Retinol   698
## 67 Silica, crystalline (airborne particles of respirable size)   468
## 71                                                        Talc   457
## 74                                                Trade Secret   395

Below are Top 5 Brands who sell the largest breadth of products:

## [1] "The Body Shop"            "Gelish"                  
## [3] "Revlon"                   "Anastasia Beverly Hills" 
## [5] "Victoria's Secret Beauty"

Products with “Trade Secret” as their Hazerdous Chemical (by Category):

#--- are the 5 most frequent category in the Top 5 Brands?----#

ProductNameCount <- data.frame(count(df$ProductName))
ProductNameCount <- ProductNameCount[order(ProductNameCount$freq, decreasing = TRUE), ]

# Categories for all products
PrimaryCategoryCount <- data.frame(count(df$PrimaryCategory))
PrimaryCategoryCount <- PrimaryCategoryCount[order(PrimaryCategoryCount$freq, decreasing = TRUE), ]

BrandNameCount <- data.frame(count(df$BrandName))
BrandNameCount <- BrandNameCount[order(BrandNameCount$freq, decreasing = TRUE), ]
# chemical count for all products
ChemicalCountCount <- data.frame(count(df$ChemicalCount))
ChemicalCountCount <- ChemicalCountCount[order(ChemicalCountCount$freq, decreasing = TRUE), ]

# Products with Trade Secret as Chemical
TS <- df[df$ChemicalName %in% c(NA, "Trade Secret")]

# Products with Trade Secret as Chemical by Category
df_TS <-count(TS$PrimaryCategory)
df_TS <- df_TS[order(df_TS$freq, decreasing = TRUE),]
df_TS
##                                    x freq
## 5             Hair Coloring Products  205
## 4  Hair Care Products (non-coloring)   75
## 6    Makeup Products (non-permanent)   42
## 9                 Skin Care Products   38
## 2                      Bath Products   18
## 10              Sun-Related Products    7
## 3                         Fragrances    5
## 8             Personal Care Products    3
## 1                      Baby Products    1
## 7                      Nail Products    1

Below is total number of products sold by their chemical count:

head(ChemicalCountCount, 10)
##   x  freq
## 2 1 27971
## 3 2  2848
## 1 0   497
## 4 3   230
## 5 4    67
## 6 5     5
## 9 8     4
## 8 7     3
## 7 6     1
ChemicalCountCount$n <- as.numeric(ChemicalCountCount$freq)

Product Categories

Top 5 Categories and Other

Thee first pie graph displayed the top five categories of products sold in California which accounted for roughly 91% of the data. It shows that make-up products made up almost half of the data. This category of data is usually pigmented facial products. Types of products in this category includes, for example, foundation, lipstick, eyeshadow, mascara, and blush. Proceeded by nail products, skin care products, bath products, and hair care products.

#---- donut chart:top 5 categories and other -----# 

library(dplyr)
  count.data <- data.frame(
    Category = c("Makeup Products","Nail Products","Skin Care Products","Bath Products","Hair Care Products","Other"),
    n = c(15365, 5790, 4455, 2022, 1122, 2872),
    prop = c(48.58, 18.31, 14.09, 6.39, 3.55, 9.08)
  )
  count.data
##             Category     n  prop
## 1    Makeup Products 15365 48.58
## 2      Nail Products  5790 18.31
## 3 Skin Care Products  4455 14.09
## 4      Bath Products  2022  6.39
## 5 Hair Care Products  1122  3.55
## 6              Other  2872  9.08
  count.data <- count.data %>%
    arrange((Category)) %>%
    mutate(lab.ypos = cumsum(prop) - 0.5*prop)
  count.data
##             Category     n  prop lab.ypos
## 1      Bath Products  2022  6.39    3.195
## 2 Hair Care Products  1122  3.55    8.165
## 3    Makeup Products 15365 48.58   34.230
## 4      Nail Products  5790 18.31   67.675
## 5              Other  2872  9.08   81.370
## 6 Skin Care Products  4455 14.09   92.955
  mycols <- c("#66FF33", "#FF9966", "#FFCC99", "#FF66CC", "#CCC444", "#0000FF", "#CCFFFF", "#993333")
  
  count.data$Category = factor(count.data$Category)
  ggplot(data=count.data, aes(x = "", y = n, fill = Category)) +
    geom_bar(stat = "identity", position="fill", color="black") +
    ggtitle("Top 5 Categories") +
    coord_polar(theta="y", start = 0)+
    geom_text(aes(x=1.1,label = prop), size=4, color="black", position=position_fill(vjust=0.5)) +
    scale_fill_manual(values = mycols) +
    theme_void() + 
    theme(plot.title = element_text(hjust = 0.5, face="bold", size=20))

Other Categories

The category of “other” is around 9.08% of the total data. The most common product category is sun-related products, while baby products were the least commonly sold.

#____donut other categories----#
  
  library(dplyr)
  count.other.data <- data.frame(
    Category = c("Sun-Related Products","Hair-Coloring Products","Fragrances","Personal Care Products","Oral Hygiene Products","Shaving Products", "Tattoos and Permanent Makeup Products", "Baby Products"),
    n = c(844, 625, 402, 336, 302, 168, 156, 39),
    prop = c(2.67, 1.98, 1.27, 1.06, .95, .53, .49, .12)
  )
  count.other.data
##                                Category   n prop
## 1                  Sun-Related Products 844 2.67
## 2                Hair-Coloring Products 625 1.98
## 3                            Fragrances 402 1.27
## 4                Personal Care Products 336 1.06
## 5                 Oral Hygiene Products 302 0.95
## 6                      Shaving Products 168 0.53
## 7 Tattoos and Permanent Makeup Products 156 0.49
## 8                         Baby Products  39 0.12
  count.other.data <- count.other.data %>%
    arrange((Category)) %>%
    mutate(lab.ypos = cumsum(prop) - 0.5*prop)
  count.other.data
##                                Category   n prop lab.ypos
## 1                         Baby Products  39 0.12    0.060
## 2                            Fragrances 402 1.27    0.755
## 3                Hair-Coloring Products 625 1.98    2.380
## 4                 Oral Hygiene Products 302 0.95    3.845
## 5                Personal Care Products 336 1.06    4.850
## 6                      Shaving Products 168 0.53    5.645
## 7                  Sun-Related Products 844 2.67    7.245
## 8 Tattoos and Permanent Makeup Products 156 0.49    8.825
  mycols <- c("#66FF33", "#FF9966", "#FFCC99", "#FF66CC", "#CCC444", "#0000FF", "#CCFFFF", "#993333")

  
  ggplot(data=count.other.data, aes(x = 2, y = n, fill = Category)) +
    geom_bar(stat = "identity", position="fill", color="black") +
    ggtitle("Other Categories") +
    coord_polar(theta="y", start = 0)+
    geom_text(aes(x=1.9,label = prop), size=4, color="black", position=position_fill(vjust=0.5)) +
    scale_fill_manual(values = mycols) +
    theme_void() +
    theme(plot.title = element_text(hjust = 0.5, face="bold", size=20)) +
    xlim(.2,2.5)

Chemicals

Top 4 Chemicals

This visualization demonstrates the type of hazardous chemical ingredients that each of these brands use in their products that contain at least one of the four most commonly seen chemical (Titanium Dioxide, Cocamide, Retinol, and Silica.

We found that all brands’ use Titanium Dioxide in the formulation of their products. For Revlon and Anastasia, this is the only common hazardous chemical that is found in their products. Gel (a nail polish brand) also has Silica in about 15% of their products.

A reasonable assumption of the largest brands having mainly titanium dioxide in a majority of their products that’s because it is a very cheap and cost effective ingredient, although not the best for the consumer.

#--- pie chart --# 

library(dplyr)
library(tidyverse)
library(tidyr)
library(tibble)

BS=c(7.04,0,0,92.96)
VS=c(0,8.90,0,91.10)
Gel=c(0,0,14.89,85.11)
Anastasia=c(0,0,0,100)
Revlon=c(0,0,0,100)

data=cbind(BS,VS,Gel,Anastasia,Revlon)
colnames(data)<-c("The Body Shop","Victoria Secret","Gel","Anastasia Beverly Hills","Revlon")
rownames(data)<-c("Cocamide","Retinol","Silica","Titanium Dioxide")

data %>%
  as.data.frame() %>%
  rownames_to_column() %>% 
  gather(column, Brand, -rowname) %>% 
  arrange(column) %>% 
  ggplot(aes(x = "", y = Brand, fill = rowname)) +
  ggtitle("Most Used Hazerdous Chemicals of Top 5 Brands ") +
  geom_col() +
  labs(fill="Chemical Name") + 
  coord_polar("y", start = 0) + 
  facet_wrap(~column) +
 theme(plot.title = element_text(hjust = 0.1, face="bold", size=20))

After growing familiarity of our data, the four most frequently observed chemicals within these products are the following with their respective count of products containing the chemical; Titanium Dioxide (25,898), Cocamide (1,014), Retinol (280), and Silica (468).

#---- histogram ----#  

head(df_chem, 11)
##                                                   ChemicalName  freq
## 72                                            Titanium dioxide 25898
## 21                                                    Cocamide  1014
## 63                                                     Retinol   698
## 67 Silica, crystalline (airborne particles of respirable size)   468
## 71                                                        Talc   457
## 74                                                Trade Secret   395
## 78                                                   Vitamin A   290
## 14                                    Butylated hydroxyanisole   283
## 64                                                     Retinyl   280
## 76                                             Triethanolamine   266
## 44                                                        Mica   252
df_chem$freq <- as.numeric(df_chem$freq)

ggplot(df_chem[1:11,], aes(x = ChemicalName, y=freq)) + geom_bar(colour="white", fill="#912473", stat="identity") + 
  labs(title = "Top 10 Most Common Chemicals", x = "Chemical Name", y= "Product Count") + 
  theme(plot.title = element_text(hjust = 0.5)) + coord_flip() +
  geom_text(aes(label=freq), position=position_dodge(width=.9), vjust=-.25) +
   theme(plot.title = element_text(hjust = 0.5, face="bold", size=15))

Categories of Top 5 Brands

Given that there are 2,237 unique brands, this visualization displays the five brands that that had the largest product breadth available for consumers to purchase in the state of California. These brands included Anastasia, Gel, Revlon, The Body Shop, and Victoria Secrets.

Majority of Anastasias products are among the Makeup Product category, however the company only possesses roughly a quarter of the market share from their competitors Revlon, The Body Shop, and Victorias Secret.

Gelish’s main product category falls into Nail Products- including nail polish.

The body shop sells products in a wider range of categories including skin care, hair care, personal care, makeup, bath, and baby products. The company is the only Top 5 company to sell baby products.

Victoria’s Secret is the only company to sell products related to shaving and has majority of the market share in fragrance, skin-care, and sun-related products of the Top 5.

Revlon is the only Top 5 brand to sell hair coloring products. They also are among a quarter of the marketshare of makeup products among the Top 5.

TB <- df[df$BrandName %in% c(top_brand)]
unique(TB$BrandName)
## [1] "The Body Shop"            "Gelish"                  
## [3] "Revlon"                   "Anastasia Beverly Hills" 
## [5] "Victoria's Secret Beauty"
TB2 <- TB[TB$ChemicalName %in% c(top_chem)]

unique(TB2$ChemicalName)
## [1] "Titanium dioxide"                                           
## [2] "Cocamide"                                                   
## [3] "Retinol"                                                    
## [4] "Silica, crystalline (airborne particles of respirable size)"
library(ggplot2)

# -- What do the top 5 brands sell? -- #
ggplot(TB2, aes(fill=BrandName, y="", x=PrimaryCategory)) + 
  geom_bar(stat="identity", position="fill") + 
  labs(title = "What do the top 5 brands sell?", x = "Primary Category", y= "", fill = "Brands") + 
  theme(panel.background = element_blank()) +
  coord_flip()

Chemical Count by Category of Total Products Sold in California

Out of all the products solid in California, 27,971 of the products had only 1 hazardous chemical. Only 497 products contained no hazardous chemical. 3,158 products had 2 or more hazardous chemicals.

#---Chemical Count  by Category---#

M <- ggplot(df, aes(x = ChemicalCount, fill = PrimaryCategory))

M + 
  geom_bar(position = 'stack') +
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Chemical Count by Category", y = "# of Products Sold in California", x = "Chemical Count", fill = "Primary Category") +
   theme(plot.title = element_text(hjust = 0.5, face="bold", size=22))

Conclusion

The global cosmetics market size was valued at $380.2 billion in 2019. The revenue of the U.S. cosmetic industry is estimated to amount to about 49.2 billion U.S. dollars in 2019.

NEWS: The Toxic-Free Cosmetics Act, also known as Assembly Bill 2762, makes California the first to establish a state-level ban of 24 ingredients from beauty and personal-care products.

To conclude my analysis and story on the types of chemicals that are found in every day products that consumers are constantly purchasing, it goes to show that even with heavy regulation of cosmetics in states such as California, there is still the consumer should be cautious of when purchasing retail products.