Wardrobe Economics: Examining Sales, Categories, and Geography for Products (Clothes and Accessories). This comprehensive collection of data opens the door to a wealth of insights waiting to be discovered. Gain a profound understanding of consumer preferences and buying behavior as you immerse yourself in the intricate details of sales transactions, diverse product categories, and the geographical spread of fashion trends.

now we explain step by step, how to make visualisation in R with data set Wardrabe Product Detail

1 Data Preparation

1.1 1.Prerequisites

1.1.1 Importing Libraries

library(dplyr)
library(lubridate)
library(ggplot2)
library(tidyr)

1.1.2 Importing Dataset**

wardrobe <- read.csv("Sales_Product_Details.csv")
wardrobe

let’s inspect our data use head()

head(wardrobe)

from inspect data above, we can see the data consist :

str(wardrobe)

#> 'data.frame':    30 obs. of  13 variables:
#>  $ Date               : int  20210601 20210602 20210603 20210604 20210605 20210606 20210607 20210608 20210609 20210610 ...
#>  $ Customer_ID        : int  98 92 92 99 66 97 45 81 47 24 ...
#>  $ Product_ID         : int  321 261 264 251 251 304 357 258 260 263 ...
#>  $ Quantity           : int  1 4 1 3 1 3 2 1 3 3 ...
#>  $ Unit_Price         : num  117.3 32.3 36.2 29.9 41.8 ...
#>  $ Sales_Revenue      : num  117.3 129.1 36.2 89.7 41.8 ...
#>  $ Product_Description: chr  "Cycling Jerseys" "Casual Shirts" "Casual Shirts" "Jeans" ...
#>  $ Product_Category   : chr  "Sports" "Menswear" "Menswear" "Menswear" ...
#>  $ Product_Line       : chr  "Tops" "Tops" "Tops" "Trousers" ...
#>  $ Raw_Material       : chr  "Fabrics" "Cotton" "Cotton" "Cotton" ...
#>  $ Region             : chr  "York" "Worcester" "Worcester" "Winchester" ...
#>  $ Latitude           : num  54 52.2 52.2 51.1 51.1 ...
#>  $ Longitude          : num  -1.08 -2.22 -2.22 -1.31 -1.31 ...

1.2 2. Data Processing

The data processing is start change not correct type data into correct data use library lubridate and dplyr and save with ne name

wardrobe_clean <- wardrobe %>%
  mutate(Date = ymd(Date),
         Customer_ID = as.character(Customer_ID),
         Product_ID = as.character(Product_ID),
         Product_Category = as.factor(Product_Category),
         Product_Line = as.factor(Product_Line),
         Raw_Material = as.factor(Raw_Material),
         )
head(wardrobe_clean)

then make sure no duplicates and missing value

sum(duplicated(wardrobe_clean))

#> [1] 0

no one duplicates

then check missing value

colSums(is.na(wardrobe_clean))

#>                Date         Customer_ID          Product_ID            Quantity 
#>                   0                   0                   0                   0 
#>          Unit_Price       Sales_Revenue Product_Description    Product_Category 
#>                   0                   0                   0                   0 
#>        Product_Line        Raw_Material              Region            Latitude 
#>                   0                   0                   0                   0 
#>           Longitude 
#>                   0

no missing value our data

2 Exploratory Data Analysis

check again our data, we can see how the distribution data

summary(wardrobe_clean)

#>       Date            Customer_ID         Product_ID           Quantity    
#>  Min.   :2021-06-01   Length:30          Length:30          Min.   :1.000  
#>  1st Qu.:2021-06-08   Class :character   Class :character   1st Qu.:1.000  
#>  Median :2021-06-15   Mode  :character   Mode  :character   Median :2.000  
#>  Mean   :2021-06-15                                         Mean   :2.067  
#>  3rd Qu.:2021-06-22                                         3rd Qu.:3.000  
#>  Max.   :2021-06-30                                         Max.   :4.000  
#>    Unit_Price     Sales_Revenue    Product_Description    Product_Category
#>  Min.   : 21.97   Min.   : 21.97   Length:30           Accessories: 2     
#>  1st Qu.: 32.39   1st Qu.: 36.77   Class :character    Menswear   :13     
#>  Median : 36.19   Median : 79.26   Mode  :character    Sports     : 2     
#>  Mean   : 40.50   Mean   : 79.69                       Womenswear :13     
#>  3rd Qu.: 44.34   3rd Qu.:113.76                                          
#>  Max.   :117.31   Max.   :175.49                                          
#>    Product_Line    Raw_Material    Region             Latitude    
#>  Leathers: 1    Cashmere : 4    Length:30          Min.   :50.26  
#>  Shoes   : 1    Cotton   :15    Class :character   1st Qu.:51.06  
#>  Tops    :23    Fabrics  : 1    Mode  :character   Median :52.19  
#>  Trousers: 5    Leather  : 4                       Mean   :52.24  
#>                 Polyester: 2                       3rd Qu.:53.68  
#>                 Wool     : 4                       Max.   :53.96  
#>    Longitude     
#>  Min.   :-5.051  
#>  1st Qu.:-2.647  
#>  Median :-1.490  
#>  Mean   :-2.270  
#>  3rd Qu.:-1.353  
#>  Max.   :-1.080

our data consist data sales wardrobe form Januari - Deseber 2021, we can see the frequency visual doing explatory visualization. Exploratory visualization is visualize to know our data, we can the distribution frequency, we can make histogram with the formula hist().

hist(wardrobe_clean$Sales_Revenue)

Sales revenue from production wardrobe the most under 50

let’s see sum from product category

table(wardrobe_clean$Product_Category)

#> 
#> Accessories    Menswear      Sports  Womenswear 
#>           2          13           2          13

plot(wardrobe_clean$Product_Category)

from the chart Womenswear dan Menswear similirarity amount

plot(wardrobe_clean$Raw_Material)

from the chart raw material cotton the most amount

3 Explanatory Visualization

the stage of creating visualizations to present our data. Therefore, at this stage we will create a visualization with an attractive informative display.

Let’s create a neater and more interesting visualization using the ggplot2 library. First, let’s try to improve the barchart in the Exploratory Data Analysis section above.

Let’s create a dataframe from average sales revenue from each product category, we can use formula group_by ,summarise ,ungroup is that same with formula aggregate.

wardrobe_revenue <- wardrobe_clean %>% 
  group_by(Product_Category) %>% 
  summarise(avg_revenue = mean(Sales_Revenue)) %>% 
  ungroup()
wardrobe_revenue

ggplot(data = wardrobe_revenue, mapping = aes(x = reorder(Product_Category, avg_revenue), y = avg_revenue ))+
  geom_col(mapping = aes(fill = avg_revenue)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c") +
  labs(title = "Average Revenue Product Category",
       x = "Product Category",
       y = "Product Revenue") +
  theme_minimal() +
  theme(legend.position = "none")

the conclusion from the chart above is the average sales, the accessories product category is the most sold

then Let’s create a dataframe from average price revenue from each product category e can use formula group_by() ,summarise ,ungroup

wardrobe_price <- wardrobe_clean %>% 
  group_by(Product_Category) %>% 
  summarise(avg_price = mean(Unit_Price)) %>% 
  ungroup()
wardrobe_price

then we can make the chart

ggplot(data = wardrobe_price, mapping = aes(x = reorder(Product_Category,avg_price), y = avg_price)) +
  geom_col(mapping = aes(fill = avg_price)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c") +
  labs(title = "Average Price of Product Category",
       x = "Product Category",
       y = "Acerage Price") +
  theme_minimal() +
  theme(legend.position = "none")

the conclusion from the chart above is the average price, the womenswear product category is the most sold

now we can make the chart from all product

wardrobe_product <- wardrobe_clean %>% 
  group_by(Product_Description) %>% 
  summarise(avg_price = mean(Unit_Price)) %>% 
  ungroup()
wardrobe_product

ggplot(data = wardrobe_product, mapping = aes(x = avg_price, y = reorder(Product_Description, avg_price))) +
  geom_col(mapping = aes(fill = avg_price)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")+
  labs(title = "Average Price Of Product",
       x = "Avarage Price",
       y = NULL) +
  theme_minimal() +
  theme(legend.position = "none")

the conclusion from the chart above is the average price, the cycling jerseys is the most sold

4 Summary

Based on the explanation above we can conclude : - based on average sales Product Category of the most popular wardrobe is accessories an average price 42.65 but we can compare this with the average product category that is most liked is sports with average price 69.63. - The top 5 average price of the most sold products is : 1. Cycling Jersey 2. Underwear 3. Sweats 4. Belts 5. Pyjamas

Data Visualization Preference Wardrobe Consumer

Lela Novi

2023-10-23