This is an EDA project analyzing super store data set and visualizing it. The objective of this project is to analyze and identify trends and patterns in the current retail sales and identify which sector of the market is under loss and which sector is making huge profits.

LOADING LIBRARIES

library(ggplot2)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## v purrr   0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

R Markdown

READING DATASET

store <- read.csv("SampleSuperstore.csv")
View(store)

str(store)
## 'data.frame':    9994 obs. of  13 variables:
##  $ Ship.Mode   : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Segment     : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Country     : chr  "United States" "United States" "United States" "United States" ...
##  $ City        : chr  "Henderson" "Henderson" "Los Angeles" "Fort Lauderdale" ...
##  $ State       : chr  "Kentucky" "Kentucky" "California" "Florida" ...
##  $ Postal.Code : int  42420 42420 90036 33311 33311 90032 90032 90032 90032 90032 ...
##  $ Region      : chr  "South" "South" "West" "South" ...
##  $ Category    : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Sales       : num  262 731.9 14.6 957.6 22.4 ...
##  $ Quantity    : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount    : num  0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
##  $ Profit      : num  41.91 219.58 6.87 -383.03 2.52 ...
summary(store)
##   Ship.Mode           Segment            Country              City          
##  Length:9994        Length:9994        Length:9994        Length:9994       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     State            Postal.Code       Region            Category        
##  Length:9994        Min.   : 1040   Length:9994        Length:9994       
##  Class :character   1st Qu.:23223   Class :character   Class :character  
##  Mode  :character   Median :56431   Mode  :character   Mode  :character  
##                     Mean   :55190                                        
##                     3rd Qu.:90008                                        
##                     Max.   :99301                                        
##  Sub.Category           Sales              Quantity        Discount     
##  Length:9994        Min.   :    0.444   Min.   : 1.00   Min.   :0.0000  
##  Class :character   1st Qu.:   17.280   1st Qu.: 2.00   1st Qu.:0.0000  
##  Mode  :character   Median :   54.490   Median : 3.00   Median :0.2000  
##                     Mean   :  229.858   Mean   : 3.79   Mean   :0.1562  
##                     3rd Qu.:  209.940   3rd Qu.: 5.00   3rd Qu.:0.2000  
##                     Max.   :22638.480   Max.   :14.00   Max.   :0.8000  
##      Profit         
##  Min.   :-6599.978  
##  1st Qu.:    1.729  
##  Median :    8.666  
##  Mean   :   28.657  
##  3rd Qu.:   29.364  
##  Max.   : 8399.976

DATA PREPARATION AND CLEANING

## [1] FALSE

VISUALIZATIONS 1) Sales vs Quantity In below graph, we see pattern that most of sales have triggered by the standard class of shipment mode

#Let's analyze patterns
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## 1) Sales vs Quantity
sale_v_quantity <- ggplot(data = store_1, aes(x = Quantity, y = Sales, fill = Ship.Mode)) +
  geom_bar(stat = "identity")

ggplotly(sale_v_quantity)
  1. Sales vs Profit In the graph below we observe more profit/loss have been availed from standard shipment class. No higher range of profit is seen in the feature.
sale_v_profit <- ggplot(data = store_1, aes(x = Sales, y = Profit, color = Ship.Mode)) + 
  geom_point()

ggplotly(sale_v_profit)
  1. Sales vs Discount Let’s see how sales are affected if discounts are offered
sale_v_discount <- ggplot() + 
  geom_point(data = store_1, aes(x = Discount, y = Sales, color = Ship.Mode))

ggplotly(sale_v_discount)

It is evident from graph that discounts attract more sales. Mostly discount attracts Standard class shipment

  1. Profits vs Discount Let’s see whether profits have been triggered if discounts have been redeemed
profit_v_discount <- ggplot() +
  geom_bar(data = store_1, aes(x = Discount, y = Profit, fill = Ship.Mode), stat = "identity")

ggplotly(profit_v_discount)

Here we clearly see that more discounts have been offered and redeemed, segments have received lesser profits. Products with no discounts show high range of profits but as the discount range increases, we only see more and more loss with hardly any profit.

Now let’s see if this case happens with rest of the segments

#Plot for category vs profit
category_v_profit <- ggplot() + 
  geom_bar(data = store_1, aes(x = Sub.Category, y = Profit, fill = Region), stat = "identity") + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

ggplotly(category_v_profit)

We see more losses in Binders industry mainly in central region and Machines and Tables industry also face losses

category_v_sales <- ggplot() + 
  geom_bar(data = store_1, aes(x = Category, y = Sales, fill = Region), stat = "identity") + 
  theme(axis.text.x = element_text(angle = 90 , vjust = 0.5, hjust = 1))

ggplotly(category_v_sales)

Sales have been incurred by technology followed by furniture and office supplies. Mostly sales have been made from West and East regions.

category_v_profit_1 <-  ggplot() + geom_bar(data = store_1, aes(x = Category, y = Profit, fill = Region), stat = "identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

ggplotly(category_v_profit_1)

The furniture category incurrs more losses than losses in the technology and Office Supplies category. Since, Sales vary from low to high in this category so there is profit.

#Sales vs profit filling category

sales_v_profit <- ggplot() + 
  geom_point(data = store_1, aes(x = Sales , y = Profit, color = Category))

ggplotly(sales_v_profit)

From above graphs we conclude that sales to profit ratio is same in every category, no matter how they are clubbed.

CONCLUSION :-

Same day shipment if receives more discounts can trigger sales/profits. Discounts should be based on the Sales and should not increase a particular range otherwise unnecessary discounts with low sales can witness huge losses Binders and Machines industry should be focused upon more so as to strengthen these weakened industry areas. Office Supplies and the Furniture industries do not seem to boom in the Central Region.