Data Visualisation

Wardrobe Economics: Examining Sales, Categories, and Geography for Products (Clothes and Accessories’). This comprehensive collection of data opens the door to a wealth of insights waiting to be discovered. Gain a profound understanding of consumer preferences and buying behavior as you immerse yourself in the intricate details of sales transactions, diverse product categories, and the geographical spread of fashion trends.

now we explain step by step, how to make visualisation in R with data set Wardrabe Product Detail

#DATA PREPARATION

##1.Prerequisites

Importing Libraries

library(dplyr)
library(lubridate)
library(ggplot2)
library(tidyr)

Importing Dataset

wardrobe <- read.csv("Sales_Product_Details.csv")
wardrobe

#>        Date Customer_ID Product_ID Quantity Unit_Price Sales_Revenue
#> 1  20210601          98        321        1  117.30602     117.30602
#> 2  20210602          92        261        4   32.27240     129.08961
#> 3  20210603          92        264        1   36.19336      36.19336
#> 4  20210604          99        251        3   29.91340      89.74021
#> 5  20210605          66        251        1   41.84343      41.84343
#> 6  20210606          97        304        3   49.88752     149.66257
#> 7  20210607          45        357        2   35.41602      70.83203
#> 8  20210608          81        258        1   29.08421      29.08421
#> 9  20210609          47        260        3   44.49808     133.49423
#> 10 20210610          24        263        3   38.49740     115.49219
#> 11 20210611          10        265        4   27.04896     108.19582
#> 12 20210612          45        260        3   28.54090      85.62270
#> 13 20210613          55        260        1   34.74291      34.74291
#> 14 20210614          44        286        3   27.02857      81.08571
#> 15 20210615          97        291        1   34.79245      34.79245
#> 16 20210616          31        265        4   43.87154     175.48615
#> 17 20210617          47        274        1   51.96824      51.96824
#> 18 20210618          47        276        4   33.93177     135.72710
#> 19 20210619          98        280        3   41.41250     124.23751
#> 20 20210620          34        273        1   38.51622      38.51622
#> 21 20210621          90        336        1   21.96581      21.96581
#> 22 20210622          12        293        2   38.71884      77.43768
#> 23 20210623           9        285        3   36.19046     108.57138
#> 24 20210624          66        276        1   54.99430      54.99430
#> 25 20210625          89        277        2   50.79596     101.59191
#> 26 20210626          32        278        2   47.43332      94.86663
#> 27 20210627          15        288        1   50.00262      50.00262
#> 28 20210628          56        262        1   33.47094      33.47094
#> 29 20210629          13        286        1   32.74551      32.74551
#> 30 20210630          91        291        1   31.87911      31.87911
#>    Product_Description Product_Category Product_Line Raw_Material     Region
#> 1      Cycling Jerseys           Sports         Tops      Fabrics       York
#> 2        Casual Shirts         Menswear         Tops       Cotton  Worcester
#> 3        Casual Shirts         Menswear         Tops       Cotton  Worcester
#> 4                Jeans         Menswear     Trousers       Cotton Winchester
#> 5               Shorts       Womenswear     Trousers       Cotton Winchester
#> 6                Belts      Accessories     Leathers      Leather      Wells
#> 7                 Ties      Accessories         Tops      Leather  Wakefield
#> 8          Polo Shirts         Menswear         Tops       Cotton  Wakefield
#> 9              Tshirts       Womenswear         Tops       Cotton  Wakefield
#> 10       Formal Shirts       Womenswear         Tops         Wool Winchester
#> 11       Formal Shirts         Menswear         Tops         Wool  Wakefield
#> 12         Polo Shirts         Menswear         Tops       Cotton  Wakefield
#> 13       Formal Shirts         Menswear         Tops       Cotton       York
#> 14            Knitwear       Womenswear         Tops     Cashmere      Wells
#> 15            Knitwear         Menswear         Tops     Cashmere      Wells
#> 16               Suits         Menswear         Tops         Wool  Wakefield
#> 17              Sweats       Womenswear         Tops    Polyester  Wakefield
#> 18              Shorts       Womenswear         Tops       Cotton  Wakefield
#> 19               Pants       Womenswear     Trousers       Cotton       York
#> 20               Pants       Womenswear     Trousers       Cotton  Wakefield
#> 21           GolfShoes           Sports        Shoes      Leather      Truro
#> 22               Dress       Womenswear         Tops    Polyester      Truro
#> 23               Coats       Womenswear         Tops       Cotton      Truro
#> 24           Underwear       Womenswear         Tops       Cotton Winchester
#> 25             Pyjamas       Womenswear         Tops       Cotton      Truro
#> 26             Pyjamas         Menswear         Tops       Cotton      Truro
#> 27               Pants         Menswear     Trousers      Leather  Worcester
#> 28       Formal Shirts         Menswear         Tops         Wool       York
#> 29            Knitwear       Womenswear         Tops     Cashmere      Wells
#> 30            Knitwear         Menswear         Tops     Cashmere      Wells
#>    Latitude Longitude
#> 1  53.95833 -1.080278
#> 2  52.19200 -2.220000
#> 3  52.19200 -2.220000
#> 4  51.06320 -1.308000
#> 5  51.06320 -1.308000
#> 6  51.20900 -2.647000
#> 7  53.68000 -1.490000
#> 8  53.68000 -1.490000
#> 9  53.68000 -1.490000
#> 10 51.06320 -1.308000
#> 11 53.68000 -1.490000
#> 12 53.68000 -1.490000
#> 13 53.95833 -1.080278
#> 14 51.20900 -2.647000
#> 15 51.20900 -2.647000
#> 16 53.68000 -1.490000
#> 17 53.68000 -1.490000
#> 18 53.68000 -1.490000
#> 19 53.95833 -1.080278
#> 20 53.68000 -1.490000
#> 21 50.26000 -5.051000
#> 22 50.26000 -5.051000
#> 23 50.26000 -5.051000
#> 24 51.06320 -1.308000
#> 25 50.26000 -5.051000
#> 26 50.26000 -5.051000
#> 27 52.19200 -2.220000
#> 28 53.95833 -1.080278
#> 29 51.20900 -2.647000
#> 30 51.20900 -2.647000

let’s inspect our data use head()

head(wardrobe)

#>       Date Customer_ID Product_ID Quantity Unit_Price Sales_Revenue
#> 1 20210601          98        321        1  117.30602     117.30602
#> 2 20210602          92        261        4   32.27240     129.08961
#> 3 20210603          92        264        1   36.19336      36.19336
#> 4 20210604          99        251        3   29.91340      89.74021
#> 5 20210605          66        251        1   41.84343      41.84343
#> 6 20210606          97        304        3   49.88752     149.66257
#>   Product_Description Product_Category Product_Line Raw_Material     Region
#> 1     Cycling Jerseys           Sports         Tops      Fabrics       York
#> 2       Casual Shirts         Menswear         Tops       Cotton  Worcester
#> 3       Casual Shirts         Menswear         Tops       Cotton  Worcester
#> 4               Jeans         Menswear     Trousers       Cotton Winchester
#> 5              Shorts       Womenswear     Trousers       Cotton Winchester
#> 6               Belts      Accessories     Leathers      Leather      Wells
#>   Latitude Longitude
#> 1 53.95833 -1.080278
#> 2 52.19200 -2.220000
#> 3 52.19200 -2.220000
#> 4 51.06320 -1.308000
#> 5 51.06320 -1.308000
#> 6 51.20900 -2.647000

from inspect data above, we can see the data consist :

str(wardrobe)

#> 'data.frame':    30 obs. of  13 variables:
#>  $ Date               : int  20210601 20210602 20210603 20210604 20210605 20210606 20210607 20210608 20210609 20210610 ...
#>  $ Customer_ID        : int  98 92 92 99 66 97 45 81 47 24 ...
#>  $ Product_ID         : int  321 261 264 251 251 304 357 258 260 263 ...
#>  $ Quantity           : int  1 4 1 3 1 3 2 1 3 3 ...
#>  $ Unit_Price         : num  117.3 32.3 36.2 29.9 41.8 ...
#>  $ Sales_Revenue      : num  117.3 129.1 36.2 89.7 41.8 ...
#>  $ Product_Description: chr  "Cycling Jerseys" "Casual Shirts" "Casual Shirts" "Jeans" ...
#>  $ Product_Category   : chr  "Sports" "Menswear" "Menswear" "Menswear" ...
#>  $ Product_Line       : chr  "Tops" "Tops" "Tops" "Trousers" ...
#>  $ Raw_Material       : chr  "Fabrics" "Cotton" "Cotton" "Cotton" ...
#>  $ Region             : chr  "York" "Worcester" "Worcester" "Winchester" ...
#>  $ Latitude           : num  54 52.2 52.2 51.1 51.1 ...
#>  $ Longitude          : num  -1.08 -2.22 -2.22 -1.31 -1.31 ...

##2. Data Processing**

The data processing is start change not correct type data into correct data use library lubridate and dplyr and save with ne name

wardrobe_clean <- wardrobe %>%
  mutate(Date = ymd(Date),
         Customer_ID = as.character(Customer_ID),
         Product_ID = as.character(Product_ID),
         Product_Category = as.factor(Product_Category),
         Product_Line = as.factor(Product_Line),
         Raw_Material = as.factor(Raw_Material),
         )

then make sure no duplicates and missing value

sum(duplicated(wardrobe_clean))

#> [1] 0

no one duplicates

then check missing value

colSums(is.na(wardrobe_clean))

#>                Date         Customer_ID          Product_ID            Quantity 
#>                   0                   0                   0                   0 
#>          Unit_Price       Sales_Revenue Product_Description    Product_Category 
#>                   0                   0                   0                   0 
#>        Product_Line        Raw_Material              Region            Latitude 
#>                   0                   0                   0                   0 
#>           Longitude 
#>                   0

no missing value our data

Exploratory Data Analysis

check again our data, we can see how the distribution data

summary(wardrobe_clean)

#>       Date            Customer_ID         Product_ID           Quantity    
#>  Min.   :2021-06-01   Length:30          Length:30          Min.   :1.000  
#>  1st Qu.:2021-06-08   Class :character   Class :character   1st Qu.:1.000  
#>  Median :2021-06-15   Mode  :character   Mode  :character   Median :2.000  
#>  Mean   :2021-06-15                                         Mean   :2.067  
#>  3rd Qu.:2021-06-22                                         3rd Qu.:3.000  
#>  Max.   :2021-06-30                                         Max.   :4.000  
#>    Unit_Price     Sales_Revenue    Product_Description    Product_Category
#>  Min.   : 21.97   Min.   : 21.97   Length:30           Accessories: 2     
#>  1st Qu.: 32.39   1st Qu.: 36.77   Class :character    Menswear   :13     
#>  Median : 36.19   Median : 79.26   Mode  :character    Sports     : 2     
#>  Mean   : 40.50   Mean   : 79.69                       Womenswear :13     
#>  3rd Qu.: 44.34   3rd Qu.:113.76                                          
#>  Max.   :117.31   Max.   :175.49                                          
#>    Product_Line    Raw_Material    Region             Latitude    
#>  Leathers: 1    Cashmere : 4    Length:30          Min.   :50.26  
#>  Shoes   : 1    Cotton   :15    Class :character   1st Qu.:51.06  
#>  Tops    :23    Fabrics  : 1    Mode  :character   Median :52.19  
#>  Trousers: 5    Leather  : 4                       Mean   :52.24  
#>                 Polyester: 2                       3rd Qu.:53.68  
#>                 Wool     : 4                       Max.   :53.96  
#>    Longitude     
#>  Min.   :-5.051  
#>  1st Qu.:-2.647  
#>  Median :-1.490  
#>  Mean   :-2.270  
#>  3rd Qu.:-1.353  
#>  Max.   :-1.080

our data consist data sales wardrobe form Januari - Deseber 2021, we can see the frequency visual doing explatory visualization. Exploratory visualization is visualize to know our data, we can the distribution frequency, we can make histogram with the formula hist().

hist(wardrobe_clean$Sales_Revenue)

Sales revenue from production wardrobe the most under 50

let’s see sum from product category

table(wardrobe_clean$Product_Category)

#> 
#> Accessories    Menswear      Sports  Womenswear 
#>           2          13           2          13

plot(wardrobe_clean$Product_Category)

from the chart Womenswear dan Menswear similirarity amount

plot(wardrobe_clean$Raw_Material)

from the chart raw material cotton the most amount

##Explanatory Visualization##

the stage of creating visualizations to present our data. Therefore, at this stage we will create a visualization with an attractive informative display.

Let’s create a neater and more interesting visualization using the ggplot2 library. First, let’s try to improve the barchart in the Exploratory Data Analysis section above.

Let’s create a dataframe from average sales revenue from each product category, we can use formula group_by ,summarise ,ungroup is that same with formula aggregate.

wardrobe_revenue <- wardrobe_clean %>% 
  group_by(Product_Category) %>% 
  summarise(avg_revenue = mean(Sales_Revenue)) %>% 
  ungroup()
wardrobe_revenue

#> # A tibble: 4 × 2
#>   Product_Category avg_revenue
#>   <fct>                  <dbl>
#> 1 Accessories            110. 
#> 2 Menswear                71.8
#> 3 Sports                  69.6
#> 4 Womenswear              84.4

ggplot(data = wardrobe_revenue, mapping = aes(x = Product_Category, y = avg_revenue)) +
  geom_col(mapping = aes(fill = avg_revenue)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")

the conclusion from the chart above is the average sales, the accessories product category is the most sold

then Let’s create a dataframe from average price revenue from each product category e can use formula group_by() ,summarise ,ungroup

wardrobe_price <- wardrobe_clean %>% 
  group_by(Product_Category) %>% 
  summarise(avg_price = mean(Unit_Price)) %>% 
  ungroup()
wardrobe_price

#> # A tibble: 4 × 2
#>   Product_Category avg_price
#>   <fct>                <dbl>
#> 1 Accessories           42.7
#> 2 Menswear              35.3
#> 3 Sports                69.6
#> 4 Womenswear            40.9

then we can make the chart

ggplot(data = wardrobe_price, mapping = aes(x = Product_Category, y = avg_price)) +
  geom_col(mapping = aes(fill = avg_price)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")

the conclusion from the chart above is the average price, the womenswear product category is the most sold

now we can make the chart from all product

wardrobe_product <- wardrobe_clean %>% 
  group_by(Product_Description) %>% 
  summarise(avg_price = mean(Unit_Price)) %>% 
  ungroup()
wardrobe_product

#> # A tibble: 18 × 2
#>    Product_Description avg_price
#>    <chr>                   <dbl>
#>  1 Belts                    49.9
#>  2 Casual Shirts            34.2
#>  3 Coats                    36.2
#>  4 Cycling Jerseys         117. 
#>  5 Dress                    38.7
#>  6 Formal Shirts            33.4
#>  7 GolfShoes                22.0
#>  8 Jeans                    29.9
#>  9 Knitwear                 31.6
#> 10 Pants                    43.3
#> 11 Polo Shirts              28.8
#> 12 Pyjamas                  49.1
#> 13 Shorts                   37.9
#> 14 Suits                    43.9
#> 15 Sweats                   52.0
#> 16 Ties                     35.4
#> 17 Tshirts                  44.5
#> 18 Underwear                55.0

ggplot(data = wardrobe_product, mapping = aes(x = avg_price, y = Product_Description)) +
  geom_col(mapping = aes(fill = avg_price)) + 
  scale_fill_gradient(low = "#b8d5e6", high = "#0a7e8c")+
  labs(title = "Average Price Of Product",
       x = "Avarage Price",
       y = NULL) +
  theme_minimal() +
  theme(legend.position = "none")

the conclusion from the chart above is the average price, the cycling jerseys is the most sold

Data Visualisation

Lela Novi

2023-10-23

Exploratory Data Analysis