Hello Regork! We are a group of data scientists that are focusing on identifying certain traits in your Cosmetics department, and in-turn suggest your team ideas on how sales and revenue can be increased. We will help you narrow down to the product, identify your target customers and present you with strategies to better market your products to them.
We will use the complete datasets of the transactions and promotions, products and demographics, put together useful permutations and combinations of datasets and provide you with valuable analysis for your cosmetics department.
We will perform qualitative and quantitative data analysis and help you visualize product level data alongside demographics and promotions. We will also help you market your products using unique strategies which will help increase sales and profits. We will also help identify manufacturers to help them better handle demand and supply.
Below are the packages required to be able to replicate/ customize this report.
install.packages("completejourney", repos="https://cran.r-project.org/")
## package 'completejourney' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\K VAMSI KRISHNA\AppData\Local\Temp\RtmpqCd8wl\downloaded_packages
install.packages("DT", repos="https://cran.r-project.org/")
## package 'DT' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\K VAMSI KRISHNA\AppData\Local\Temp\RtmpqCd8wl\downloaded_packages
library(completejourney)#Retail shopping transactions for 2469 households
library(dplyr) # transforming (joining, summarizing, etc.) data
library(ggplot2)#visualizing data
library(stringr)# evaluating regular expression to match patterns
library(lubridate)#to work with dates and times
library(DT)# to display dataframes as tables on HTML pages
library(scales)# to wrap format around axis labels
all_transactions<-get_transactions()# loading complete datasets to a dataframe
all_promotions<-get_promotions()# loading complete datasets to a dataframe
Y<-all_transactions%>%
inner_join(products)%>%
inner_join(coupons)%>%
inner_join(campaigns)%>% #Joining multiple dataframes
filter(str_detect(department, regex("COSMETIC", ignore_case = TRUE)))%>%
group_by(Month = month(transaction_timestamp,label=TRUE),product_category)%>%
summarise(total_sales =sum(sales_value))# Filtering and summarizing by sales value.
## Joining, by = "product_id"
## Joining, by = "product_id"
## Joining, by = c("household_id", "campaign_id")
## `summarise()` has grouped output by 'Month'. You can override using the
## `.groups` argument.
ggplot(data=Y, aes(x= Month, y=total_sales, group = product_category, colour = product_category))+
geom_line()+geom_point()+
labs(title = "Figure 1: Types of Cosmetics sold throughout the year",subtitle = " Monthwise Total Sales",x = "Months",y = "Total sales of Cosmetics")
The plot above shows the total sales of cosmetics over the course of the year differentiated by product category. In terms of total sales, we can depict that Makeup and Treatment are the most popular product categories and the other four products are far behind. One interesting aspect to observe is that the fragrances sales shoot up from the November month to the December month. Its probably because of the christmas eve when most people purchase scents and fragrances to gift their loved ones.
frag_dec<- products%>%
inner_join(all_transactions, by = "product_id") %>%
group_by(Month = month(transaction_timestamp, label=TRUE)) %>%
filter(str_detect(department, regex("COSMETIC", ignore_case = TRUE))) %>%
filter(str_detect(product_category, regex("FRAGRANCES", ignore_case = TRUE))) %>%
filter(str_detect(Month, regex("Dec", ignore_case = TRUE)))%>%
group_by(product_type)%>%
summarize(Total_sales= sum(sales_value))%>%
arrange(desc(Total_sales))
ggplot(data = frag_dec, aes(x = product_type, y = Total_sales)) +
geom_bar(stat="Identity", position="dodge", color = "black", fill="red")+
coord_cartesian(xlim = c(1,4), ylim = c(0,350))+
labs(title = "Figure 2: Total Sales of Fragrances in December",
x = "Fragrance Type",
y = "Value in $ of sales")+
scale_x_discrete(label = wrap_format(10))
We wanted to analyze the spike we saw in the first graph for the Monthly sales posted on all product categories. The spike is from designer fragrances mainly. If we put aside the miscellaneous category, PARFUMS DE COUER and REVLON fragrances have a very low sales value. We should come up with a way to promote these products as well.
quant_sales_in_cosmetics_dept <- products%>%
filter(str_detect(department, regex("COSMETIC", ignore_case = TRUE)))%>%
inner_join(all_transactions, by = "product_id") %>%
inner_join(demographics,by ="household_id")%>%#Joining transactions and demographics data
group_by(Age = age) %>%
group_by(Age, Categories=product_category)%>%
summarise(total_quant = sum(quantity)) %>% # grouping by age and category and summarizing sales by quantity
arrange(desc(total_quant))
## `summarise()` has grouped output by 'Age'. You can override using the `.groups`
## argument.
datatable(quant_sales_in_cosmetics_dept, caption='Table 1: Age Wise Sales in Cosmetics Department')# converting a dataframe to an HTML table
ggplot(data = quant_sales_in_cosmetics_dept, aes(x = Age, fill= Categories , y = total_quant)) +
geom_bar(stat="Identity", position="dodge")+
coord_cartesian(xlim = c(1,7), ylim = c(0,800))+
labs(title = "Figure 3: AGE WISE SALES IN COSMETICS DEPARTMENT",
subtitle = "Age Groups, Categories vs Total Quantity",
x = "Age group",
y = "# of Cosmetics Sold by Product category")
It is evident from the graph above that Age Groups 35-44 and 45-54 bought the highest number of MAKEUP AND TREATMENT ITEMS. There are two ways of looking at this. One is a positive aspect, which says that our target audience can continue to be the age group of 35-54 and we can try to market several other products types to them. However, the fragrances are not being purchased by a lot of age groups. Its best to come up with a discounts on fragrances to be able to increase sales.
Makeup_and_treatment<-products%>%
filter(str_detect(department, regex("COSMETIC", ignore_case = TRUE)))%>%
filter(str_detect(product_category, regex("MAKEUP AND TREATMENT", ignore_case = TRUE)))%>%
inner_join(all_transactions, by = "product_id") %>%
inner_join(demographics,by ="household_id")%>% #Joining transactions and demographics data
group_by(Age = age) %>%
group_by(Age, Categories=product_category, Type=product_type)%>%
summarise(total_quant = sum(quantity)) %>% # Grouping by type and category and summarizing the sales in terms of quantity
arrange(desc(total_quant)) %>%
slice(1:5)# Getting the top 5 products purchased by age group
## `summarise()` has grouped output by 'Age', 'Categories'. You can override using
## the `.groups` argument.
ggplot(data = Makeup_and_treatment, aes(x = Age , fill=Type, y = total_quant)) +
geom_bar(stat="Identity", position="dodge")+
coord_cartesian(xlim = c(1,7), ylim = c(0,200))+
labs(title = "Figure 4: AGE-WISE SALES OF MAKEUP AND TREATMENT CATEGORY",
subtitle = "Age Groups, MAKEUP AND TREATMENT",
x = "Age group",
y = "# of MAKEUP AND TREATMENT ITEMS Sold")
If we observe this graph carefully, we can clearly see that the same age group has higher purchases, but the top 5 products each of the group purchases are mostly inclined towards maybelline and covergirl products. To further analyze, we will deep dive to see which package size is being sold most. This will allow to give feedback to manufacturers on which products to manufacture more inorder to tackle demand and supply. At the same time, the sizes not being sold much can have reduced manufacturing and focus on other sizes which are being sold in larger quantities.
Product_sizes<- products%>%
filter(str_detect(department, regex("COSMETIC", ignore_case = TRUE)))%>%
filter(str_detect(product_category, regex("MAKEUP AND TREATMENT", ignore_case = TRUE)))%>%
filter(str_detect(product_type, regex("COVERGIRL|MAYBELLINE", ignore_case = TRUE)))%>%#Filtering to only analyze covergirl and maybelline products
filter(!is.na(package_size))%>%
inner_join(all_transactions, by = "product_id") %>%
group_by(product_type, package_size, manufacturer_id)%>%# Included manufacturer ID to help them predict demand and supply.
summarise(Volume = sum(quantity))%>%
arrange(desc(Volume))
## `summarise()` has grouped output by 'product_type', 'package_size'. You can
## override using the `.groups` argument.
datatable(Product_sizes, caption='Table 2: Most Sold Product Sizes in Cosmetics Department: Makeup and treatment - Maybelline and Covergirl')
Great, now we have the manufacturer IDs. We can now go ahead by communicating this result with the manufacturers of maybelline and covergirl to help them predict sales for the next year. This will help the stores also manage their supply chain in a better way. Also, if we observe carefully, the maybelline product is mostly sold in that 1 CT package size. Covergirl on the other hand has different package sizes being sold. So, these sizes are something the manufacturers can specifically focus on.