Wardrobe Economics: Examining Sales, Categories, and Geography for Products (Clothes and Accessories’). This comprehensive collection of data opens the door to a wealth of insights waiting to be discovered. Gain a profound understanding of consumer preferences and buying behavior as you immerse yourself in the intricate details of sales transactions, diverse product categories, and the geographical spread of fashion trends.
now we explain step by step, how to make visualisation in R with data set Wardrabe Product Detail
#DATA PREPARATION
##1.Prerequisites
Importing Libraries
library(dplyr)
library(lubridate)
library(ggplot2)
library(tidyr)
Importing Dataset
wardrobe <- read.csv("Sales_Product_Details.csv")
wardrobe
#> Date Customer_ID Product_ID Quantity Unit_Price Sales_Revenue
#> 1 20210601 98 321 1 117.30602 117.30602
#> 2 20210602 92 261 4 32.27240 129.08961
#> 3 20210603 92 264 1 36.19336 36.19336
#> 4 20210604 99 251 3 29.91340 89.74021
#> 5 20210605 66 251 1 41.84343 41.84343
#> 6 20210606 97 304 3 49.88752 149.66257
#> 7 20210607 45 357 2 35.41602 70.83203
#> 8 20210608 81 258 1 29.08421 29.08421
#> 9 20210609 47 260 3 44.49808 133.49423
#> 10 20210610 24 263 3 38.49740 115.49219
#> 11 20210611 10 265 4 27.04896 108.19582
#> 12 20210612 45 260 3 28.54090 85.62270
#> 13 20210613 55 260 1 34.74291 34.74291
#> 14 20210614 44 286 3 27.02857 81.08571
#> 15 20210615 97 291 1 34.79245 34.79245
#> 16 20210616 31 265 4 43.87154 175.48615
#> 17 20210617 47 274 1 51.96824 51.96824
#> 18 20210618 47 276 4 33.93177 135.72710
#> 19 20210619 98 280 3 41.41250 124.23751
#> 20 20210620 34 273 1 38.51622 38.51622
#> 21 20210621 90 336 1 21.96581 21.96581
#> 22 20210622 12 293 2 38.71884 77.43768
#> 23 20210623 9 285 3 36.19046 108.57138
#> 24 20210624 66 276 1 54.99430 54.99430
#> 25 20210625 89 277 2 50.79596 101.59191
#> 26 20210626 32 278 2 47.43332 94.86663
#> 27 20210627 15 288 1 50.00262 50.00262
#> 28 20210628 56 262 1 33.47094 33.47094
#> 29 20210629 13 286 1 32.74551 32.74551
#> 30 20210630 91 291 1 31.87911 31.87911
#> Product_Description Product_Category Product_Line Raw_Material Region
#> 1 Cycling Jerseys Sports Tops Fabrics York
#> 2 Casual Shirts Menswear Tops Cotton Worcester
#> 3 Casual Shirts Menswear Tops Cotton Worcester
#> 4 Jeans Menswear Trousers Cotton Winchester
#> 5 Shorts Womenswear Trousers Cotton Winchester
#> 6 Belts Accessories Leathers Leather Wells
#> 7 Ties Accessories Tops Leather Wakefield
#> 8 Polo Shirts Menswear Tops Cotton Wakefield
#> 9 Tshirts Womenswear Tops Cotton Wakefield
#> 10 Formal Shirts Womenswear Tops Wool Winchester
#> 11 Formal Shirts Menswear Tops Wool Wakefield
#> 12 Polo Shirts Menswear Tops Cotton Wakefield
#> 13 Formal Shirts Menswear Tops Cotton York
#> 14 Knitwear Womenswear Tops Cashmere Wells
#> 15 Knitwear Menswear Tops Cashmere Wells
#> 16 Suits Menswear Tops Wool Wakefield
#> 17 Sweats Womenswear Tops Polyester Wakefield
#> 18 Shorts Womenswear Tops Cotton Wakefield
#> 19 Pants Womenswear Trousers Cotton York
#> 20 Pants Womenswear Trousers Cotton Wakefield
#> 21 GolfShoes Sports Shoes Leather Truro
#> 22 Dress Womenswear Tops Polyester Truro
#> 23 Coats Womenswear Tops Cotton Truro
#> 24 Underwear Womenswear Tops Cotton Winchester
#> 25 Pyjamas Womenswear Tops Cotton Truro
#> 26 Pyjamas Menswear Tops Cotton Truro
#> 27 Pants Menswear Trousers Leather Worcester
#> 28 Formal Shirts Menswear Tops Wool York
#> 29 Knitwear Womenswear Tops Cashmere Wells
#> 30 Knitwear Menswear Tops Cashmere Wells
#> Latitude Longitude
#> 1 53.95833 -1.080278
#> 2 52.19200 -2.220000
#> 3 52.19200 -2.220000
#> 4 51.06320 -1.308000
#> 5 51.06320 -1.308000
#> 6 51.20900 -2.647000
#> 7 53.68000 -1.490000
#> 8 53.68000 -1.490000
#> 9 53.68000 -1.490000
#> 10 51.06320 -1.308000
#> 11 53.68000 -1.490000
#> 12 53.68000 -1.490000
#> 13 53.95833 -1.080278
#> 14 51.20900 -2.647000
#> 15 51.20900 -2.647000
#> 16 53.68000 -1.490000
#> 17 53.68000 -1.490000
#> 18 53.68000 -1.490000
#> 19 53.95833 -1.080278
#> 20 53.68000 -1.490000
#> 21 50.26000 -5.051000
#> 22 50.26000 -5.051000
#> 23 50.26000 -5.051000
#> 24 51.06320 -1.308000
#> 25 50.26000 -5.051000
#> 26 50.26000 -5.051000
#> 27 52.19200 -2.220000
#> 28 53.95833 -1.080278
#> 29 51.20900 -2.647000
#> 30 51.20900 -2.647000
let’s inspect our data use head()
head(wardrobe)
#> Date Customer_ID Product_ID Quantity Unit_Price Sales_Revenue
#> 1 20210601 98 321 1 117.30602 117.30602
#> 2 20210602 92 261 4 32.27240 129.08961
#> 3 20210603 92 264 1 36.19336 36.19336
#> 4 20210604 99 251 3 29.91340 89.74021
#> 5 20210605 66 251 1 41.84343 41.84343
#> 6 20210606 97 304 3 49.88752 149.66257
#> Product_Description Product_Category Product_Line Raw_Material Region
#> 1 Cycling Jerseys Sports Tops Fabrics York
#> 2 Casual Shirts Menswear Tops Cotton Worcester
#> 3 Casual Shirts Menswear Tops Cotton Worcester
#> 4 Jeans Menswear Trousers Cotton Winchester
#> 5 Shorts Womenswear Trousers Cotton Winchester
#> 6 Belts Accessories Leathers Leather Wells
#> Latitude Longitude
#> 1 53.95833 -1.080278
#> 2 52.19200 -2.220000
#> 3 52.19200 -2.220000
#> 4 51.06320 -1.308000
#> 5 51.06320 -1.308000
#> 6 51.20900 -2.647000
from inspect data above, we can see the data consist :
str(wardrobe)
#> 'data.frame': 30 obs. of 13 variables:
#> $ Date : int 20210601 20210602 20210603 20210604 20210605 20210606 20210607 20210608 20210609 20210610 ...
#> $ Customer_ID : int 98 92 92 99 66 97 45 81 47 24 ...
#> $ Product_ID : int 321 261 264 251 251 304 357 258 260 263 ...
#> $ Quantity : int 1 4 1 3 1 3 2 1 3 3 ...
#> $ Unit_Price : num 117.3 32.3 36.2 29.9 41.8 ...
#> $ Sales_Revenue : num 117.3 129.1 36.2 89.7 41.8 ...
#> $ Product_Description: chr "Cycling Jerseys" "Casual Shirts" "Casual Shirts" "Jeans" ...
#> $ Product_Category : chr "Sports" "Menswear" "Menswear" "Menswear" ...
#> $ Product_Line : chr "Tops" "Tops" "Tops" "Trousers" ...
#> $ Raw_Material : chr "Fabrics" "Cotton" "Cotton" "Cotton" ...
#> $ Region : chr "York" "Worcester" "Worcester" "Winchester" ...
#> $ Latitude : num 54 52.2 52.2 51.1 51.1 ...
#> $ Longitude : num -1.08 -2.22 -2.22 -1.31 -1.31 ...
##2. Data Processing**
The data processing is start change not correct type data into correct data use library lubridate and dplyr and save with ne name
wardrobe_clean <- wardrobe %>%
mutate(Date = ymd(Date),
Customer_ID = as.character(Customer_ID),
Product_ID = as.character(Product_ID),
Product_Category = as.factor(Product_Category),
Product_Line = as.factor(Product_Line),
Raw_Material = as.factor(Raw_Material),
)
then make sure no duplicates and missing value
sum(duplicated(wardrobe_clean))
#> [1] 0
no one duplicates
then check missing value
colSums(is.na(wardrobe_clean))
#> Date Customer_ID Product_ID Quantity
#> 0 0 0 0
#> Unit_Price Sales_Revenue Product_Description Product_Category
#> 0 0 0 0
#> Product_Line Raw_Material Region Latitude
#> 0 0 0 0
#> Longitude
#> 0
no missing value our data
check again our data, we can see how the distribution data
summary(wardrobe_clean)
#> Date Customer_ID Product_ID Quantity
#> Min. :2021-06-01 Length:30 Length:30 Min. :1.000
#> 1st Qu.:2021-06-08 Class :character Class :character 1st Qu.:1.000
#> Median :2021-06-15 Mode :character Mode :character Median :2.000
#> Mean :2021-06-15 Mean :2.067
#> 3rd Qu.:2021-06-22 3rd Qu.:3.000
#> Max. :2021-06-30 Max. :4.000
#> Unit_Price Sales_Revenue Product_Description Product_Category
#> Min. : 21.97 Min. : 21.97 Length:30 Accessories: 2
#> 1st Qu.: 32.39 1st Qu.: 36.77 Class :character Menswear :13
#> Median : 36.19 Median : 79.26 Mode :character Sports : 2
#> Mean : 40.50 Mean : 79.69 Womenswear :13
#> 3rd Qu.: 44.34 3rd Qu.:113.76
#> Max. :117.31 Max. :175.49
#> Product_Line Raw_Material Region Latitude
#> Leathers: 1 Cashmere : 4 Length:30 Min. :50.26
#> Shoes : 1 Cotton :15 Class :character 1st Qu.:51.06
#> Tops :23 Fabrics : 1 Mode :character Median :52.19
#> Trousers: 5 Leather : 4 Mean :52.24
#> Polyester: 2 3rd Qu.:53.68
#> Wool : 4 Max. :53.96
#> Longitude
#> Min. :-5.051
#> 1st Qu.:-2.647
#> Median :-1.490
#> Mean :-2.270
#> 3rd Qu.:-1.353
#> Max. :-1.080
our data consist data sales wardrobe form Januari - Deseber 2021, we
can see the frequency visual doing explatory visualization. Exploratory
visualization is visualize to know our data, we can the distribution
frequency, we can make histogram with the formula
hist()
.
hist(wardrobe_clean$Sales_Revenue)
Sales revenue from production wardrobe the most under 50
let’s see sum from product category
table(wardrobe_clean$Product_Category)
#>
#> Accessories Menswear Sports Womenswear
#> 2 13 2 13
plot(wardrobe_clean$Product_Category)
from the chart Womenswear dan Menswear similirarity amount
plot(wardrobe_clean$Raw_Material)
from the chart raw material cotton the most amount
##Explanatory Visualization##
the stage of creating visualizations to present our data. Therefore, at this stage we will create a visualization with an attractive informative display.
Let’s create a neater and more interesting visualization using the ggplot2 library. First, let’s try to improve the barchart in the Exploratory Data Analysis section above.
Let’s create a dataframe from average sales revenue from each product category, we can use formula group_by ,summarise ,ungroup is that same with formula aggregate.
wardrobe_revenue <- wardrobe_clean %>%
group_by(Product_Category) %>%
summarise(avg_revenue = mean(Sales_Revenue)) %>%
ungroup()
wardrobe_revenue
#> # A tibble: 4 × 2
#> Product_Category avg_revenue
#> <fct> <dbl>
#> 1 Accessories 110.
#> 2 Menswear 71.8
#> 3 Sports 69.6
#> 4 Womenswear 84.4
ggplot(data = wardrobe_revenue, mapping = aes(x = Product_Category, y = avg_revenue)) +
geom_col(mapping = aes(fill = avg_revenue)) +
scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")
the conclusion from the chart above is the average sales, the
accessories product category is the most sold
then Let’s create a dataframe from average price revenue from each
product category e can use formula group_by()
,summarise
,ungroup
wardrobe_price <- wardrobe_clean %>%
group_by(Product_Category) %>%
summarise(avg_price = mean(Unit_Price)) %>%
ungroup()
wardrobe_price
#> # A tibble: 4 × 2
#> Product_Category avg_price
#> <fct> <dbl>
#> 1 Accessories 42.7
#> 2 Menswear 35.3
#> 3 Sports 69.6
#> 4 Womenswear 40.9
then we can make the chart
ggplot(data = wardrobe_price, mapping = aes(x = Product_Category, y = avg_price)) +
geom_col(mapping = aes(fill = avg_price)) +
scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")
the conclusion from the chart above is the average price, the womenswear product category is the most sold
now we can make the chart from all product
wardrobe_product <- wardrobe_clean %>%
group_by(Product_Description) %>%
summarise(avg_price = mean(Unit_Price)) %>%
ungroup()
wardrobe_product
#> # A tibble: 18 × 2
#> Product_Description avg_price
#> <chr> <dbl>
#> 1 Belts 49.9
#> 2 Casual Shirts 34.2
#> 3 Coats 36.2
#> 4 Cycling Jerseys 117.
#> 5 Dress 38.7
#> 6 Formal Shirts 33.4
#> 7 GolfShoes 22.0
#> 8 Jeans 29.9
#> 9 Knitwear 31.6
#> 10 Pants 43.3
#> 11 Polo Shirts 28.8
#> 12 Pyjamas 49.1
#> 13 Shorts 37.9
#> 14 Suits 43.9
#> 15 Sweats 52.0
#> 16 Ties 35.4
#> 17 Tshirts 44.5
#> 18 Underwear 55.0
ggplot(data = wardrobe_product, mapping = aes(x = avg_price, y = Product_Description)) +
geom_col(mapping = aes(fill = avg_price)) +
scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")+
labs(title = "Average Price Of Product",
x = "Avarage Price",
y = NULL) +
theme_minimal() +
theme(legend.position = "none")
the conclusion from the chart above is the average price, the cycling
jerseys is the most sold