For this task, I explored the Sephora dataset and created three interactive visualizations: a pie chart of the top 10 brands by product count, a bar chart of the top 10 brands by total number of “Loves,” and a horizontal bar chart showing the top 10 product categories. I chose these plots because they each highlight different aspects of the data—brand presence, customer engagement, and product variety—and together they offer a more complete picture of trends on the Sephora website.
I initially experimented with scatter and box plots but found them visually cluttered and harder to interpret, especially with so many brands and categories involved. Switching to simpler chart types like bar charts and pie charts made it much easier to communicate insights clearly. I also learned how useful plotly can be for adding interactivity to static charts, and how to customize hover tooltips to make them more informative for the viewer.
Overall, this exercise helped me better understand the importance of choosing the right type of visualization for the data and the audience. I’m more confident now in using dplyr to summarize data and in using plotly to create engaging and informative interactive charts in R.
Which brands and product categories are the most prominent and popular on the Sephora website?
library(tidyverse)
library(plotly)
# Load the dataset
sephora <- read_csv("sephora_website_dataset.csv")
Do the following:
Make a plot. Any kind of plot will do (though it might be easiest
to work with geom_point()).
Make the plot interactive with ggplotly().
Make sure the hovering tooltip is more informative than the default.
Good luck and have fun!
brand_counts <- sephora %>%
filter(!is.na(brand)) %>%
count(brand) %>%
top_n(10, n)
plot_ly(brand_counts,
labels = ~brand,
values = ~n,
type = 'pie') %>%
layout(title = "Top 10 Brands by Product Count")
# Filter missing data
loved_brands <- sephora %>%
filter(!is.na(`love`), !is.na(brand)) %>%
group_by(brand) %>%
summarise(Total_Loves = sum(`love`, na.rm = TRUE)) %>%
arrange(desc(Total_Loves)) %>%
slice_head(n = 10)
# Make interactive bar chart
plot_ly(loved_brands,
x = ~reorder(brand, Total_Loves),
y = ~Total_Loves,
type = "bar",
marker = list(color = 'hotpink')) %>%
layout(title = "Top 10 Brands by Total Loves",
xaxis = list(title = "Brand"),
yaxis = list(title = "Total Loves"))
category_counts <- sephora %>%
filter(!is.na(category)) %>%
count(category, sort = TRUE) %>%
slice_max(n, n = 10)
plot_ly(category_counts,
x = ~n,
y = ~reorder(category, n),
type = "bar",
orientation = 'h',
marker = list(color = 'lightblue')) %>%
layout(title = "Top 10 Categories",
xaxis = list(title = "Number of Products"),
yaxis = list(title = "Category"))
Install the {flexdashboard} package and create a new R Markdown file in your project by going to File > New File… > R Markdown… > From Template > Flexdashboard.
Using the documentation for {flexdashboard} online, create a basic dashboard that shows a plot (static or interactive) in at least three chart areas. Play with the layout if you’re feeling brave.