Business Understanding
I’m a jewelry entrepreneur and analyst studying jewelry sales trend. The questions I will answer are the following:
1.What is the most popular items ?
2.Which gender buys the most jewelry ?
3.Which days/months the customer buying mostly ?
4.What is the product types sells in years ?
This is our online jewelry sales analysis is that gathered the purchases during 3 years.
Data Understanding
I found this data on <https://www.kaggle.com/datasets/mkechinov/ecommerce-purchase-history-from-jewelry-store
## Rows: 95,911
## Columns: 13
## $ `Order datetime` <chr> "2018-12-01 11:40:29 UTC", "2018-12-01 …
## $ `Order ID` <dbl> 1.92472e+18, 1.92490e+18, 1.92551e+18, …
## $ `Product ID` <dbl> 1.84220e+18, 1.80683e+18, 1.84221e+18, …
## $ `Quantity of SKU in the order` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ ...5 <dbl> 1.80683e+18, 1.80683e+18, 1.80683e+18, …
## $ `Category ID` <chr> "jewelry.earring", NA, "jewelry.pendant…
## $ `Brand ID` <dbl> 0, NA, 1, 0, 0, 1, 0, 0, 2, 1, 1, 1, 1,…
## $ `Price in USD` <dbl> 561.51, 212.14, 54.66, 88.90, 417.67, 1…
## $ `User ID` <dbl> 1.51592e+18, 1.51592e+18, 1.51592e+18, …
## $ `Product Gender` <chr> NA, NA, "f", "f", NA, NA, NA, NA, "f", …
## $ `Main Color` <chr> "red", "yellow", "white", "red", "red",…
## $ `Main metal` <chr> "gold", "gold", "gold", "gold", "gold",…
## $ `Main gem` <chr> "diamond", NA, "sapphire", "diamond", "…
Data Preparation
This data required quite a bit of preparation, including many of verbs of dplry . I also use the select , mutate and group by functions.
Here is my data preparation codes.
#Create the data copy
jewelry.copy3 <- jewelry.3
#Rename the columns name
colnames(jewelry.copy3) <- c("date", "order_ID", "product_ID", "quantity", "x","products", "brand_ID", "price", "user_ID", "gender","color", "main_metal", "gem")
#Change the date format
jewelry.copy3<-jewelry.copy3 %>% mutate(date2=substr(date, 1,10))
#Diselect the unnecessary columns
jew1 <- jewelry.copy3 %>% select(-order_ID, -date, -product_ID, -brand_ID, -x, -user_ID)
#Switch the columns
jew1 <- jew1 %>% select(products, quantity, everything())
#Create a new date column
jew1 <- jew1 %>% mutate(date3=as.Date(date2))
#Create the weekday column
jew1 <- jew1 %>% mutate(weekday=wday(date3,label=TRUE))
#Create the monthly column
jew1 <- jew1 %>% mutate(monthly=month(date3,label=TRUE))I also recoded variable products to be called product_type.
#Rename the products name
jew1$product_type[jew1$products=="jewelry.earring"] <-"Earrings"
jew1$product_type[jew1$products=="jewelry.bracelet"] <-"Bracelet"
jew1$product_type[jew1$products=="jewelry.ring"] <-"Ring"
jew1$product_type[jew1$products=="jewelry.pendant"] <-"Pendant"
jew1$product_type[jew1$products=="jewelry.brooch"] <-"Brooch"
jew1$product_type[jew1$products=="jewelry.necklace"] <-"Necklace"
jew1$product_type[jew1$products=="electronics.clocks"] <-"Clock"
jew1$product_type[jew1$products=="jewelry.stud"] <-"Earrings"
jew1$product_type[jew1$products=="jewelry.souvenir"] <-"Souvenir"
#Select and diselect columns
jew1 <- jew1 %>% select(product_type, quantity, everything())
jew1 <- jew1 %>% select(-products)
jew1 <- jew1 %>% select(-date2)I made a table of gender.
## # A tibble: 3 × 2
## gender count
## <chr> <int>
## 1 f 47379
## 2 m 364
## 3 <NA> 48168
Calculate average order price (USD)
## [1] 362.2152
## [1] 362.2152
Data Modeling
jew1 %>% group_by(date3,product_type) %>% summarize(count=n()) %>% ggplot(aes(date3, count, group=product_type))+geom_line(aes(color=product_type))+labs(title="Sales of Jewelry Over Time by Type", x="time")+theme_classic()table_total <-jew1 %>% group_by(product_type, weekday) %>% summarise(count=n())
table_total %>% ggplot(aes(product_type,count,fill=weekday))+ geom_col(show.legend = FALSE)+ scale_fill_brewer(palette = "Paired")+facet_wrap(~weekday,scales="free_y")+coord_flip()+theme_classic()jew1 %>% group_by(monthly,gem) %>% summarize(count=n()) %>% ggplot(aes(monthly, count, group=gem))+geom_line(aes(color=gem))+labs(title="Sales of Gemstones Types Monthly", x="monthly")+theme_classic()jew1 %>% group_by(monthly,main_metal) %>% summarize(count=n()) %>% ggplot(aes(monthly, count, group=main_metal))+geom_line(aes(color=main_metal))+labs(title="Sales of Metals Types Monthly", x="monthly")+theme_classic()jew1 %>% group_by(monthly,product_type) %>% summarize(count=n()) %>% ggplot(aes(monthly, count, group=product_type))+geom_point(aes(color=product_type))+labs(title="Sales of Product Types Monthly", x="monthly")+theme_classic()Evaluation
Data models show us that selecting the appropriate evaluation metrics for the models in which we know the product types, main metal, prices, time bases and compare how close the predicted values are to the actual ones. According to the models earrings and rings are most selling items in any days of week, as a jewelry entrepreneur, I would recommend to start focusing on another items such as pendants, necklace and bracelet. Clocks almost never sell either. To develop the company, I would suggest to work on more advertising and marketing campaigns and promotions.
Deployment
From the analysis above, a few things are clear:
-Gold was the most purchased jewelry metal type, again making up the overwhelming majority of all purchases.
-The top months for orders occur in the second half of the year, from July-December. November holds the top spot for total sales, likely due to the holiday/Christmas gifting season.
-Earrings, rings and pendants are the top 3 jewelry types purchased.
-Diamonds, fianit and topaz are the top 3 gemstones purchased.
-The average order amount is $362.21.
-Sales doubled from 2019-2020 and 2020-2021, indicating healthy growth for the company.
Ethical Consideration
The data ethics is crucial for protecting privacy rights, ensuring fairness and appropriateness and promoting trust and using fairly, responsibly and respectfully which have a significant impact on individuals, communities and society. I have two ethical consideration in my online jewelry sales analysis project that I noticed it. One of them is about privacy rights which makes the data not reliable.When the customers accept the “terms and conditions” , “cookies” , “pops-up to track for the companies collecting”, they might not know the information of sales for this project. The other one is about data representative. For all jewelry purchases where the product gender was specified and was not unisex, women jewelry made up the overwhelming majority of purchases and also there was a lots of null value in the data.
In my opinion, prioritizing data ethics can lead to better business outcomes in the long term, as well as ensuring that data and analytics are used in ways that are fair, transparent, and accountable.