| Name: “Ferhiwot Kidane” |
| Date:“08-17-22” |
| Topic: “Sephora Beauty Skincare product Rating and Product Analysis” |
| Dataset: “Sephore Website Data” |
| Source: “Kaggle.com” |
| Data dictionary can be found at” https://www.kaggle.com/datasets/raghadalharbi/all-products-available-on-sephora-website” |
The variables included in this entire data set are the following: id, brand, category, name, size, rating, number of reviews, love, price, value price, URL, marketing flags, ingredients, online only, exclusive, limited edition, limited time offer.
This data set contains 9800 observations and 21 variables. My data does contain both quantitative and categorical values.
Quantitative values include price, rating, love, number of reviews, and value price.
The categorical values include brand, name, category, and ingredients.
#We will beging loading the packages we will need for this project.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr) #to wrangle out data
library(ggplot2) #for plottinf our data
library(RColorBrewer) #adding some color
## Warning: package 'RColorBrewer' was built under R version 4.1.3
library(readr)
options(readr.show_col_types = FALSE)
sephora_website_dataset <- read_csv("sephora_website_dataset.csv")
sephora_website_dataset
## # A tibble: 9,168 × 21
## id brand categ…¹ name size rating numbe…² love price value…³ URL
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2218774 Acqua D… Fragra… Blu … 5 x … 4 4 3002 66 75 http…
## 2 2044816 Acqua D… Cologne Colo… 0.7 … 4.5 76 2700 66 66 http…
## 3 1417567 Acqua D… Perfume Aran… 5 oz… 4.5 26 2600 180 180 http…
## 4 1417617 Acqua D… Perfume Mirt… 2.5 … 4.5 23 2900 120 120 http…
## 5 2218766 Acqua D… Fragra… Colo… 5 x … 3.5 2 943 72 80 http…
## 6 1417609 Acqua D… Perfume Fico… 5 oz… 4.5 79 2600 180 180 http…
## 7 1638832 Acqua D… Perfume Rosa… 3.4 … 4.5 79 5000 210 210 http…
## 8 1284462 Acqua D… Cologne Colo… 1.7 … 5 13 719 120 120 http…
## 9 2221588 Acqua D… Body M… Peon… 1.7o… 4 5 800 58 58 http…
## 10 2221596 Acqua D… Perfume Rosa… 1.7o… 3 5 2100 58 58 http…
## # … with 9,158 more rows, 10 more variables: MarketingFlags <lgl>,
## # MarketingFlags_content <chr>, options <chr>, details <chr>,
## # how_to_use <chr>, ingredients <chr>, online_only <dbl>, exclusive <dbl>,
## # limited_edition <dbl>, limited_time_offer <dbl>, and abbreviated variable
## # names ¹category, ²number_of_reviews, ³value_price
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
———- Contest of the Doctor Brands: Skincare Edition ——————
I selected 8 variables from this data set, and then I began filtering using the filter() function from the dplyr package of five different Doctor-owned brands in Sephora. These brands carried so many different products to work with such as face washes, face serums, face masks, sunscreen, and other specialty products.
Questions
I want to see if there is a relationship between price points and product size.
I would also like to see if there is a relationship between price point and brand rating.
what facial product had the highest rating and look at the cost?
what product was the most popular – use a frequency table .
I will also, filter the data and look to see which brand had the best-rated face serum, face wash, and moisturizer.
BOD <- sephora_website_dataset %>%
select(brand, category, price, name, number_of_reviews, love, rating, size, value_price) %>%
filter(brand %in% c("Dr Roebuck's", "Dr. Barbara Sturm", "Dr. Brandt Skincare", "Dr. Dennis Gross Skincare", "Dr. Jart+")) %>%
filter(category %in% c("Face Wash & Cleansers", "Face Serums", "Face Masks", "Toners", "Moisturizers")) %>%
group_by(category)
BOD
## # A tibble: 105 × 9
## # Groups: category [5]
## brand category price name numbe…¹ love rating size value…²
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Dr Roebuck's Moisturizers 45 No W… 236 5300 4 1.69… 45
## 2 Dr Roebuck's Face Masks 28 Ulur… 28 4200 4.5 1.69… 28
## 3 Dr Roebuck's Face Serums 60 Perk… 37 2200 4 1 oz… 60
## 4 Dr Roebuck's Face Masks 28 Tama… 15 3000 4 1.69… 28
## 5 Dr Roebuck's Face Masks 28 Iceb… 8 1600 4 1.69… 28
## 6 Dr Roebuck's Face Serums 60 True… 22 1900 4.5 1 oz… 60
## 7 Dr Roebuck's Face Wash & Clea… 25 Noos… 37 1300 4 3.38… 25
## 8 Dr Roebuck's Moisturizers 45 Stok… 3 272 4 1.69… 45
## 9 Dr Roebuck's Face Wash & Clea… 25 Kibo… 6 772 4 3.38… 25
## 10 Dr Roebuck's Face Serums 60 Surf… 16 782 4 1 oz… 60
## # … with 95 more rows, and abbreviated variable names ¹number_of_reviews,
## # ²value_price
## # ℹ Use `print(n = ...)` to see more rows
My dataset did not contain any missing values. I did find that my dataset contains binary variables such as 0 and 1 within the column variables.
sum(is.na(BOD))
## [1] 0
(sum (is.na (BOD))/prod (dim (BOD)))*100
## [1] 0
var.test(BOD$value_price, BOD$price) #As the p value is greater than 0.05, there is no evidence to suggest that the variances are unequal.
##
## F test to compare two variances
##
## data: BOD$value_price and BOD$price
## F = 0.99926, num df = 104, denom df = 104, p-value = 0.997
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.6789316 1.4707262
## sample estimates:
## ratio of variances
## 0.999261
varttest <- t.test(BOD$value_price, BOD$price, var.equal = TRUE)
head(varttest) #I prefer this format/view
## $statistic
## t
## 0.006690631
##
## $parameter
## df
## 208
##
## $p.value
## [1] 0.9946681
##
## $conf.int
## [1] -22.37379 22.52617
## attr(,"conf.level")
## [1] 0.95
##
## $estimate
## mean of x mean of y
## 84.75238 84.67619
##
## $null.value
## difference in means
## 0
Chart: Value Price vs Price: A two-sample t-test is used to test the null hypothesis that the two samples come from distributions with the same mean (i.e. the means are not different). For my two-sample t-test, I calculated and plotted the value price vs price variables for ALL the doctor brands that I selected from this dataset and found the estimated mean of differences was 0.07619048. x= value price and y = price value of x = 84.75238 or $84.75 and the value of y= 84.67619 or $84.67. My chart shows most of the distribution scattered between $50-$100 on both x-axis and y-axis.
pairedtwosampletest <- t.test(BOD$value_price, BOD$price, paired= TRUE)
head(pairedtwosampletest) #I prefer this format/view
## $statistic
## t
## 1.421062
##
## $parameter
## df
## 104
##
## $p.value
## [1] 0.1582902
##
## $conf.int
## [1] -0.0301304 0.1825114
## attr(,"conf.level")
## [1] 0.95
##
## $estimate
## mean difference
## 0.07619048
##
## $null.value
## mean difference
## 0
Chi-Square Testing: Two Categorical Variables H0: The variables are not associated i.e., are independent. (Null Hypothesis) H1: The variables are associated, i.e., are dependent. (Alternative Hypothesis) If the “p” value is above 0.05, it means the probability of independence is high and sufficient enough to conclude that the variables do not have a relationship. However, anything below 0.05 means that the probability of independence is insignificantly low, and the variables share a strong correlation.
I used the chisq.test() function to test brand vs category, brand vs ingredients, and brand vs marketing flag content. I found that the two variables in my dataset that did have a correlation were Brand and Marketing Flags Content.
BOD2 <- sephora_website_dataset %>%
select(brand, category, price, name, MarketingFlags_content, details, ingredients, rating) %>%
filter(brand %in% c("Dr Roebuck's", "Dr. Barbara Sturm", "Dr. Brandt Skincare", "Dr. Dennis Gross Skincare", "Dr. Jart+")) %>%
filter(category %in% c("Face Serums", "Toners", "Moisturizers")) %>%
filter(rating > 4.0)
BOD2
## # A tibble: 26 × 8
## brand category price name Marke…¹ details ingre…² rating
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 Dr Roebuck's Face Serums 60 True Blu… exclus… "What … "-Hyal… 4.5
## 2 Dr Roebuck's Toners 28 Lifesave… exclus… "What … "-Glyc… 4.5
## 3 Dr. Barbara Sturm Moisturizers 230 Face Cre… 0 "What … "-Skul… 4.5
## 4 Dr. Barbara Sturm Face Serums 145 Anti-Pol… 0 "What … "-Hyal… 5
## 5 Dr. Barbara Sturm Face Serums 310 Night Se… online… "What … "-Extr… 4.5
## 6 Dr. Barbara Sturm Face Serums 300 Darker S… 0 "What … "-Hyal… 5
## 7 Dr. Barbara Sturm Moisturizers 215 Clarifyi… 0 "What … "-Comp… 4.5
## 8 Dr. Barbara Sturm Face Serums 55 Clarifyi… 0 "What … "-Comp… 5
## 9 Dr. Barbara Sturm Moisturizers 205 Face Cre… 0 "What … "-Purs… 4.5
## 10 Dr. Barbara Sturm Moisturizers 230 Brighten… 0 "What … "-Extr… 5
## # … with 16 more rows, and abbreviated variable names ¹MarketingFlags_content,
## # ²ingredients
## # ℹ Use `print(n = ...)` to see more rows
Chi-Square T-Test
chiqSB <- chisq.test(BOD2$brand, BOD2$MarketingFlags_content)
## Warning in chisq.test(BOD2$brand, BOD2$MarketingFlags_content): Chi-squared
## approximation may be incorrect
chisq.test(BOD2$brand, BOD2$ingredients)
## Warning in chisq.test(BOD2$brand, BOD2$ingredients): Chi-squared approximation
## may be incorrect
##
## Pearson's Chi-squared test
##
## data: BOD2$brand and BOD2$ingredients
## X-squared = 104, df = 100, p-value = 0.3721
chisq.test(BOD2$brand, BOD2$category)
## Warning in chisq.test(BOD2$brand, BOD2$category): Chi-squared approximation may
## be incorrect
##
## Pearson's Chi-squared test
##
## data: BOD2$brand and BOD2$category
## X-squared = 9.7144, df = 8, p-value = 0.2856
H0: The variables are not associated i.e., are independent. (Null Hypothesis) H1: The variables are associated, i.e., are dependent. (Alternative Hypothesis)
If the “p” value is above 0.05, it means the probability of independence is fairly high and sufficient enough to conclude that the variables do not have a relationship. However, anything below 0.05 means that the probability of independence is insignificantly low, and the variables share a strong correlation.
chiqSB
##
## Pearson's Chi-squared test
##
## data: BOD2$brand and BOD2$MarketingFlags_content
## X-squared = 31.06, df = 12, p-value = 0.001929
The two variables in my data set that did have a strong correlation were Brand and Marketing Flags Content.
correlationSB <- table(BOD2$brand, BOD2$MarketingFlags_content, BOD2$name)
correlationSB
## , , = Alpha Beta® Exfoliating Moisturizer
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 1 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Alpha Beta® Pore Perfecting & Refining Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 1 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Anti-Pollution Drops
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Brightening Face Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Ceramidin™ Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 1 0 0 0
##
## , , = Cicapair™ Tiger Grass Calming Gel Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 1 0 0
##
## , , = Cicapair™ Tiger Grass Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 1 0 0
##
## , , = Cicapair™ Tiger Grass Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 1 0 0
##
## , , = Clarifying Face Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Clarifying Spot Treatment
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Dark Spots No More® Triple Acid Spot Minimizing Concentrate
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 1 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Darker Skin Tones Face Cream
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Darker Skin Tones Hyaluronic Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Do Not Age with Dr. Brandt Transforming Pearl Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 1 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Face Cream Light
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Face Cream Rich
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 1 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Hyaluronic Marine Hydration Booster
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 1 0
## Dr. Jart+ 0 0 0 0
##
## , , = Lifesaver Skin Brightening Toner
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 1 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Night Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 1
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Peptidin™ Firming Serum with Energy Peptides
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 1 0 0 0
##
## , , = Peptidin™ Radiance Serum with Energy Peptides
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 1 0 0 0
##
## , , = Stress Repair Face Cream with Niacinamide
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 1 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Stress Rescue Super Serum with Niacinamide
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 1 0 0 0
## Dr. Jart+ 0 0 0 0
##
## , , = Teatreement™ Moisturizer
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 1 0 0
##
## , , = Teatreement™ Toner
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 0 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 1 0 0
##
## , , = True Blue Hydrating Serum
##
##
## 0 exclusive exclusive · online only online only
## Dr Roebuck's 0 0 1 0
## Dr. Barbara Sturm 0 0 0 0
## Dr. Brandt Skincare 0 0 0 0
## Dr. Dennis Gross Skincare 0 0 0 0
## Dr. Jart+ 0 0 0 0
Plot
Dr. Jart+: Winner – Price point, reviews, marketing, etc. – Dr. Jart+ had 4.0 – 4.5 star ratings, an affordable price point for the everyday consumer, and many customers who left product reviews. I would consider Dr. Jart+ to be a top-trusted brand based on these findings and variables. If you look at my frequency table, you will see that Dr. Jart+ is marketing flagged content such as “exclusive only”.
BODplot <- ggplot(BOD, aes(x=rating, y=price, size = number_of_reviews, color=brand)) +
geom_point(alpha=0.9)+
scale_size(range = c(.1, 9), name="Customer Who Left a Review") +
labs(title= "Contest of the Doctors: SkinCare Edition")+
ylab("Price (in USD)") +
xlab("Ratings out of 5 Stars")
BODplot
BOD2bar <- barplot(table(BOD2$brand, BOD2$MarketingFlags_content),
main = "ChiSquare Test: Relationship Between Brand and Marketing Flag ",
xlab = "Brand",
ylab = "Marketing Flags")
BOD3 <- sephora_website_dataset %>%
select(brand, category, price, value_price, name) %>%
filter(brand %in% c("Dr. Jart+")) %>%
filter(category %in% c( "Face Serums")) %>%
filter(price < 50) %>%
filter(value_price < 50)
BOD3
## # A tibble: 8 × 5
## brand category price value_price name
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Dr. Jart+ Face Serums 46 46 Cicapair™ Tiger Grass Serum
## 2 Dr. Jart+ Face Serums 18 18 Focuspot™ Micro Tip™ Patches
## 3 Dr. Jart+ Face Serums 18 18 Focuspot™ Blemish Micro Tip™ Patch
## 4 Dr. Jart+ Face Serums 48 48 Peptidin™ Radiance Serum with Energy …
## 5 Dr. Jart+ Face Serums 18 18 Focuspot™ Dark Spot Micro Tip™ Patch
## 6 Dr. Jart+ Face Serums 18 18 Focuspot™ Line & Wrinkle Micro Tip™ P…
## 7 Dr. Jart+ Face Serums 48 48 Peptidin™ Firming Serum with Energy P…
## 8 Dr. Jart+ Face Serums 18 18 Focuspot™ Dark Circle Micro Tip™ Patch
var.test(BOD3$value_price, BOD3$price) #As the p value is greater than 0.05, there is no evidence to suggest that the variances are unequal.
##
## F test to compare two variances
##
## data: BOD3$value_price and BOD3$price
## F = 1, num df = 7, denom df = 7, p-value = 1
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2002038 4.9949092
## sample estimates:
## ratio of variances
## 1
varttest3 <- t.test(BOD3$value_price, BOD3$price, var.equal = TRUE)
t.test(BOD3$value_price, BOD3$price)
##
## Welch Two Sample t-test
##
## data: BOD3$value_price and BOD3$price
## t = 0, df = 14, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.29393 16.29393
## sample estimates:
## mean of x mean of y
## 29 29
priceSB <- plot(x = BOD$value_price,y =BOD$price,
xlab = "Value Price in USD",
ylab = "Actual Price is USD",
main = "Value Price vs Actual Price")
priceSB
## NULL
Summary
• Zero Missing Values • Variance, Two-Sample T-test • Chi-Squared test • Dr. Jart+: Winner – Price point, reviews, marketing, etc. • Chart: Chi-Squared • Chart: Value Price vs Price: No significance.
The plot shows that the customer ratings on the X-axis and the price on the Y-axis. The different dot color represents the four doctor brands as you can see in the legend to the right. Below that, I was able the measure the size of the dots based on the number of customers who left product reviews for that brand which you can also identify in the second legend. Dr. Jart+ had a 4.5-star review, received the most customer reviews.
The data dictionary was very useful in defining my variables. I was able to locate this on the Kaggle website but still needed to further define some variables like “love”. The “love” button is defined as “The number of people loving the product”. But to this measured the product’s sephora URLs provided in this data set, you will find a heart-shaped button representing the “love” varibale. This button is a similar feature to the like button on many social platforms.