Data-driven decision-making

Data-driven decision-making (DDDM) is defined as using facts, metrics, and data to guide strategic business decisions that align with your goals, objectives, and initiatives. Various of Data-driven decision-making process has been defined by Industry experts. There is no universal DDDM process but most of the DDDM follows the almost same process. The method used in this project is Ask - Prepare - Process - Analyze - Share - Act ( APPASA )

ASK

Introduction

The online Retail is a UK based seller of occasional gifts which operates in Various country across the globe, many customers of company are wholesellers. This case study has been done to answer following questions

  1. Which are Top 5 country based on Order volume ?

  2. Which are Top products in various country ?

  3. Which products generate most sales in Top country ?

  4. What time products are sold most in across Country ?

  5. Which day products are sold most in across Country ?

  6. Which month products are sold most in across Country ?

And to help making a data driven decision making using (APPASA) in managing inventory, generating more revenue and as well as in tracking various KPI.

Stakeholder

Primary Stakeholder : Yeah ! thats me as I as I invested My time in doing EDA

Secondary Stakeholder : Thats you as you are reading my portfolio project

Prepare

Data Set & Metadata

The dataset used in this case study is historical transactions happened between 2009 and 2010. It is available at UCI repository can be downloaded from here. and it was made available by Dr Daqing Chen.

This data set includes

InvoiceNo: Invoice number. Nominal

StockCode: Product (item) code. Nominal

Description: Product (item) name. Nominal

Quantity: The quantities of each product (item) per transaction. Numeric

InvoiceDate: Invoice Date and time. Numeric, the day and time when each transaction was generated.

UnitPrice: Unit price. Numeric

CustomerID: Customer number. Nominal

Country: Country name. Nominal

ROCCC analysis

  1. Reliable

  2. Original

  3. Comprehensive

  4. Current

  5. Cited

Data set checks yes in all criteria of ROCCC analysis.

Installing Required Packages

The packages used in this data analysis are : tidyverse, dplyr, skimr, janitor, lubridate, readxl

and installed using : install.packages(“package_name”)

Loading Required Packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(here)
## here() starts at C:/Users/mayan/Documents/instacart
library(dplyr)
library(skimr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
## Loading required package: timechange
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(readr)
library(readxl)

PROCESS

Uploading data Into Workbench

Online_Retail_1_ <- read_excel("~/Data/Instacart/Online Retail (1).xlsx")
online_retail_II <- read_excel("~/Data/Instacart/online_retail_II.xlsx")
#Checking Column Names and looking for Inconsistency
str(Online_Retail_1_)
## tibble [541,909 × 8] (S3: tbl_df/tbl/data.frame)
##  $ InvoiceNo  : chr [1:541909] "536365" "536365" "536365" "536365" ...
##  $ StockCode  : chr [1:541909] "85123A" "71053" "84406B" "84029G" ...
##  $ Description: chr [1:541909] "WHITE HANGING HEART T-LIGHT HOLDER" "WHITE METAL LANTERN" "CREAM CUPID HEARTS COAT HANGER" "KNITTED UNION FLAG HOT WATER BOTTLE" ...
##  $ Quantity   : num [1:541909] 6 6 8 6 6 2 6 6 6 32 ...
##  $ InvoiceDate: POSIXct[1:541909], format: "2010-12-01 08:26:00" "2010-12-01 08:26:00" ...
##  $ UnitPrice  : num [1:541909] 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
##  $ CustomerID : num [1:541909] 17850 17850 17850 17850 17850 ...
##  $ Country    : chr [1:541909] "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...
#Checking Column Names and looking for Inconsistency
str(online_retail_II)
## tibble [525,461 × 8] (S3: tbl_df/tbl/data.frame)
##  $ Invoice    : chr [1:525461] "489434" "489434" "489434" "489434" ...
##  $ StockCode  : chr [1:525461] "85048" "79323P" "79323W" "22041" ...
##  $ Description: chr [1:525461] "15CM CHRISTMAS GLASS BALL 20 LIGHTS" "PINK CHERRY LIGHTS" "WHITE CHERRY LIGHTS" "RECORD FRAME 7\" SINGLE SIZE" ...
##  $ Quantity   : num [1:525461] 12 12 12 48 24 24 24 10 12 12 ...
##  $ InvoiceDate: POSIXct[1:525461], format: "2009-12-01 07:45:00" "2009-12-01 07:45:00" ...
##  $ Price      : num [1:525461] 6.95 6.75 6.75 2.1 1.25 1.65 1.25 5.95 2.55 3.75 ...
##  $ Customer ID: num [1:525461] 13085 13085 13085 13085 13085 ...
##  $ Country    : chr [1:525461] "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...

Inconsistency found in column names between Online_Retail_1_ And online_retail_II

#Changing Column names in Both Online_Retail_1_ And online_retail_II & Converting into New Data Frame
Online_Retail_1_v1 <- Online_Retail_1_ %>% rename(invoice_number = InvoiceNo, stock_code = StockCode, description = Description, quantity = Quantity, invoice_date = InvoiceDate, unit_price = UnitPrice, customer_id = CustomerID, country = Country)
online_retail_II_v1 <- online_retail_II %>% rename(invoice_number = Invoice, stock_code = StockCode, description = Description, quantity = Quantity, invoice_date = InvoiceDate, unit_price = Price, customer_id = `Customer ID`, country = Country)
#Checking Column New data Frame and looking for Incosistancy
str(Online_Retail_1_v1)
## tibble [541,909 × 8] (S3: tbl_df/tbl/data.frame)
##  $ invoice_number: chr [1:541909] "536365" "536365" "536365" "536365" ...
##  $ stock_code    : chr [1:541909] "85123A" "71053" "84406B" "84029G" ...
##  $ description   : chr [1:541909] "WHITE HANGING HEART T-LIGHT HOLDER" "WHITE METAL LANTERN" "CREAM CUPID HEARTS COAT HANGER" "KNITTED UNION FLAG HOT WATER BOTTLE" ...
##  $ quantity      : num [1:541909] 6 6 8 6 6 2 6 6 6 32 ...
##  $ invoice_date  : POSIXct[1:541909], format: "2010-12-01 08:26:00" "2010-12-01 08:26:00" ...
##  $ unit_price    : num [1:541909] 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
##  $ customer_id   : num [1:541909] 17850 17850 17850 17850 17850 ...
##  $ country       : chr [1:541909] "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...
#Checking Column New data Frame and looking for Incosistancy
str(online_retail_II_v1)
## tibble [525,461 × 8] (S3: tbl_df/tbl/data.frame)
##  $ invoice_number: chr [1:525461] "489434" "489434" "489434" "489434" ...
##  $ stock_code    : chr [1:525461] "85048" "79323P" "79323W" "22041" ...
##  $ description   : chr [1:525461] "15CM CHRISTMAS GLASS BALL 20 LIGHTS" "PINK CHERRY LIGHTS" "WHITE CHERRY LIGHTS" "RECORD FRAME 7\" SINGLE SIZE" ...
##  $ quantity      : num [1:525461] 12 12 12 48 24 24 24 10 12 12 ...
##  $ invoice_date  : POSIXct[1:525461], format: "2009-12-01 07:45:00" "2009-12-01 07:45:00" ...
##  $ unit_price    : num [1:525461] 6.95 6.75 6.75 2.1 1.25 1.65 1.25 5.95 2.55 3.75 ...
##  $ customer_id   : num [1:525461] 13085 13085 13085 13085 13085 ...
##  $ country       : chr [1:525461] "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...

Making a new Data Frame by Combining Rows of Online_Retail_1_v1 & online_retail_II_v1

online_shop <- bind_rows(Online_Retail_1_v1, online_retail_II_v1)

Inspecting New Data Frame online_shop

str(online_shop)
## tibble [1,067,370 × 8] (S3: tbl_df/tbl/data.frame)
##  $ invoice_number: chr [1:1067370] "536365" "536365" "536365" "536365" ...
##  $ stock_code    : chr [1:1067370] "85123A" "71053" "84406B" "84029G" ...
##  $ description   : chr [1:1067370] "WHITE HANGING HEART T-LIGHT HOLDER" "WHITE METAL LANTERN" "CREAM CUPID HEARTS COAT HANGER" "KNITTED UNION FLAG HOT WATER BOTTLE" ...
##  $ quantity      : num [1:1067370] 6 6 8 6 6 2 6 6 6 32 ...
##  $ invoice_date  : POSIXct[1:1067370], format: "2010-12-01 08:26:00" "2010-12-01 08:26:00" ...
##  $ unit_price    : num [1:1067370] 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
##  $ customer_id   : num [1:1067370] 17850 17850 17850 17850 17850 ...
##  $ country       : chr [1:1067370] "United Kingdom" "United Kingdom" "United Kingdom" "United Kingdom" ...
#Checking Column names
colnames(online_shop)
## [1] "invoice_number" "stock_code"     "description"    "quantity"      
## [5] "invoice_date"   "unit_price"     "customer_id"    "country"
#Number Of Row
nrow(online_shop)
## [1] 1067370
dim(online_shop)
## [1] 1067370       8
#Checking First 6 Rows of Data
head(online_shop)
## # A tibble: 6 × 8
##   invoice_…¹ stock…² descr…³ quant…⁴ invoice_date        unit_…⁵ custo…⁶ country
##   <chr>      <chr>   <chr>     <dbl> <dttm>                <dbl>   <dbl> <chr>  
## 1 536365     85123A  WHITE …       6 2010-12-01 08:26:00    2.55   17850 United…
## 2 536365     71053   WHITE …       6 2010-12-01 08:26:00    3.39   17850 United…
## 3 536365     84406B  CREAM …       8 2010-12-01 08:26:00    2.75   17850 United…
## 4 536365     84029G  KNITTE…       6 2010-12-01 08:26:00    3.39   17850 United…
## 5 536365     84029E  RED WO…       6 2010-12-01 08:26:00    3.39   17850 United…
## 6 536365     22752   SET 7 …       2 2010-12-01 08:26:00    7.65   17850 United…
## # … with abbreviated variable names ¹​invoice_number, ²​stock_code, ³​description,
## #   ⁴​quantity, ⁵​unit_price, ⁶​customer_id
#Checking Summary Of Data Frame
summary(online_shop)
##  invoice_number      stock_code        description           quantity        
##  Length:1067370     Length:1067370     Length:1067370     Min.   :-80995.00  
##  Class :character   Class :character   Class :character   1st Qu.:     1.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :     3.00  
##                                                           Mean   :     9.94  
##                                                           3rd Qu.:    10.00  
##                                                           Max.   : 80995.00  
##                                                                              
##   invoice_date                      unit_price         customer_id    
##  Min.   :2009-12-01 07:45:00.00   Min.   :-53594.36   Min.   :12346   
##  1st Qu.:2010-07-09 09:46:00.00   1st Qu.:     1.25   1st Qu.:13975   
##  Median :2010-12-07 15:28:00.00   Median :     2.10   Median :15255   
##  Mean   :2011-01-02 21:13:27.81   Mean   :     4.65   Mean   :15325   
##  3rd Qu.:2011-07-22 10:23:00.00   3rd Qu.:     4.15   3rd Qu.:16797   
##  Max.   :2011-12-09 12:50:00.00   Max.   : 38970.00   Max.   :18287   
##                                                       NA's   :243007  
##    country         
##  Length:1067370    
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
#Viewing Some Data
glimpse(online_shop)
## Rows: 1,067,370
## Columns: 8
## $ invoice_number <chr> "536365", "536365", "536365", "536365", "536365", "5363…
## $ stock_code     <chr> "85123A", "71053", "84406B", "84029G", "84029E", "22752…
## $ description    <chr> "WHITE HANGING HEART T-LIGHT HOLDER", "WHITE METAL LANT…
## $ quantity       <dbl> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2, 3, …
## $ invoice_date   <dttm> 2010-12-01 08:26:00, 2010-12-01 08:26:00, 2010-12-01 0…
## $ unit_price     <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1.85, 1…
## $ customer_id    <dbl> 17850, 17850, 17850, 17850, 17850, 17850, 17850, 17850,…
## $ country        <chr> "United Kingdom", "United Kingdom", "United Kingdom", "…

Filtering, Cleaning & Adding new Columns

#Checking Null values
colSums(is.na(online_shop))
## invoice_number     stock_code    description       quantity   invoice_date 
##              0              0           4382              0              0 
##     unit_price    customer_id        country 
##              0         243007              0
#Droping Null values
online_shop <- drop_na(online_shop)
#Checking Null values again
online_shop %>% is.na %>% colSums
## invoice_number     stock_code    description       quantity   invoice_date 
##              0              0              0              0              0 
##     unit_price    customer_id        country 
##              0              0              0
#Converting country & description to Factor
online_shop <- online_shop %>% mutate(description = as.factor(description), country = as.factor(country))
#Saving new Column date_time
online_shop$date_time <- strptime(online_shop$invoice_date, "%Y-%m-%d %H:%M:%S")
#Saving new Column month
online_shop$month <- format(as.Date(online_shop$date_time), "%m")
#Saving new Column  month_str
online_shop$month_str <- format(as.Date(online_shop$date_time), "%B")
#Saving new Column date
online_shop$date <- format(as.Date(online_shop$date_time), "%d")
#Saving new Column time_hms
online_shop$time_hms <- format(online_shop$date_time, "%H:%M:%S")
#Saving new Column time_hm
online_shop$time_hm <- format(online_shop$date_time, "%H:%M")
#Saving new Column time_h
online_shop$time_h <- format(online_shop$date_time, "%H")
#Saving new Column Day
online_shop$day <- format(as.Date(online_shop$invoice_date), "%A")
#Saving new Column total_sales
online_shop$total_sales <- online_shop$quantity * online_shop$unit_price
#Viewing Some Data
glimpse(online_shop)
## Rows: 824,363
## Columns: 17
## $ invoice_number <chr> "536365", "536365", "536365", "536365", "536365", "5363…
## $ stock_code     <chr> "85123A", "71053", "84406B", "84029G", "84029E", "22752…
## $ description    <fct> "WHITE HANGING HEART T-LIGHT HOLDER", "WHITE METAL LANT…
## $ quantity       <dbl> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2, 3, …
## $ invoice_date   <dttm> 2010-12-01 08:26:00, 2010-12-01 08:26:00, 2010-12-01 0…
## $ unit_price     <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1.85, 1…
## $ customer_id    <dbl> 17850, 17850, 17850, 17850, 17850, 17850, 17850, 17850,…
## $ country        <fct> United Kingdom, United Kingdom, United Kingdom, United …
## $ date_time      <dttm> 2010-12-01 08:26:00, 2010-12-01 08:26:00, 2010-12-01 0…
## $ month          <chr> "12", "12", "12", "12", "12", "12", "12", "12", "12", "…
## $ month_str      <chr> "December", "December", "December", "December", "Decemb…
## $ date           <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01", "…
## $ time_hms       <chr> "08:26:00", "08:26:00", "08:26:00", "08:26:00", "08:26:…
## $ time_hm        <chr> "08:26", "08:26", "08:26", "08:26", "08:26", "08:26", "…
## $ time_h         <chr> "08", "08", "08", "08", "08", "08", "08", "08", "08", "…
## $ day            <chr> "Wednesday", "Wednesday", "Wednesday", "Wednesday", "We…
## $ total_sales    <dbl> 15.30, 20.34, 22.00, 20.34, 20.34, 15.30, 25.50, 11.10,…

ANALYZE

Calculating Number Of Order

#Number Of Order by Country
Number_Of_order <- online_shop %>% count(country) %>% ggplot() + geom_col(aes(x = country, y = n))+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Number Of Order By country")
Number_Of_order # Viewing Plot

As it is clear from the plot that maximum order is placed from United Kingdom (UK) and no country is closer to UK in volume of order.

#Number of Order without UK
Number_of_order_without_uk <- online_shop %>% filter(country != "United Kingdom") %>% group_by(country) %>% summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% ggplot(aes(x = country, y = number_of_product_bought, fill = country)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = " Number of Order (Without UK)")
Number_of_order_without_uk

From the plot Number_of_order_without_uk it is visible that volume of order across country is varying too much.

#Ploting Top 10 country by Number of Order Without UK
Number_of_order_without_uk_top_10 <- online_shop %>% filter(country != "United Kingdom") %>% group_by(country) %>% summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = country, y = number_of_product_bought, fill = country)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = " TOP 10 Country by Number of Order (Without UK)")
Number_of_order_without_uk_top_10 #Viewing plot

From the plot Number_of_order_without_uk_top_10 Order volume is still varying among Top 10 country by order volume and most of the country has order number below 5000.

#Ploting Top 5 country by Number of Order Without UK
Number_of_order_without_uk_top_5 <- online_shop %>% filter(country != "United Kingdom") %>% group_by(country) %>% summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(5) %>% ggplot() + geom_col(aes(x = country, y = number_of_product_bought, fill = country))+ theme(axis.text.x = element_text(angle = 90)) + labs(title = " TOP 5 Country by Number of Order (Without UK)")
Number_of_order_without_uk_top_5

As UK has most order, so considering Top 4 From Top 5 country by order for rest of analysis. while considering it is highly probable that next order will be placed from these Country First.

Top products By Number Of Order

#Ploting Top 10 products
Top_10_products <- online_shop %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top 10 Products ")
Top_10_products

#Ploting Top 10 products Without UK
Top_10_products_without_uk <- online_shop %>% filter(country != "United Kingdom") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) +  labs(title = "Top 10 Products Without UK ")
Top_10_products_without_uk

#Ploting Top Products in Uk
Top_products_uk <- online_shop %>% filter(country == "United Kingdom") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top Products In United Kingdom ")
Top_products_uk

#Ploting Top Products in Germany
Top_products_germany <- online_shop %>% filter(country == "Germany") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE)  + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top Products In Germany ")
Top_products_germany

#Ploting Top Products in France
Top_products_france <- online_shop %>% filter(country == "France") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top Products In France ")
Top_products_france

#Ploting Top Products in EIRE
Top_products_eire <- online_shop %>% filter(country == "EIRE") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top Products In EIRE ")
Top_products_eire

#Ploting Top Products in Netherlands
Top_products_netherlands <- online_shop %>% filter(country == "Netherlands") %>% group_by(description) %>%  summarise(number_of_product_bought = n()) %>% arrange(desc(number_of_product_bought)) %>% head(10) %>% ggplot(aes(x = number_of_product_bought, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top Products In Netherlands ")
Top_products_netherlands

BESTSELLING Products By Total Sales

#Ploting Top Products By Revenue
Bestselling_products <- online_shop %>% group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by Total Sales")
Bestselling_products

#Ploting Top Products By Revenue Without UK
Bestselling_products_without_uk <- online_shop %>% filter(country != "United Kingdom") %>% group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE)  + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by Total Sales Without UK")
Bestselling_products_without_uk

#Ploting Top Products By Revenue In UK
Bestselling_products_uk <- online_shop %>% filter(country == "United Kingdom") %>% group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE)  + theme(axis.text.x = element_text(angle = 90))  + labs(title = "Bestselling products by Total Sales In UK")
Bestselling_products_uk

#Ploting Top Products By Revenue In Germany
Bestselling_products_germany <- online_shop %>% filter(country == "Germany") %>%  group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by Total Sales In Germany")
Bestselling_products_germany

#Ploting Top Products By Revenue In France
Bestselling_products_france <- online_shop %>% filter(country == "France") %>%  group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by total Sales In France")
Bestselling_products_france

#Ploting Top Products By Revenue In EIRE
Bestselling_products_eire <- online_shop %>% filter(country == "EIRE") %>%  group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by Total Sales In EIRE")
Bestselling_products_eire

#Ploting Top Products By Revenue In Netherlands
Bestselling_products_netherlands <- online_shop %>% filter(country == "Netherlands") %>%  group_by(description) %>% summarise(total_sales_sum = sum(total_sales)) %>% arrange(desc(total_sales_sum)) %>% head(10) %>% ggplot(aes(x = total_sales_sum, y = description, fill = description)) + geom_col(show.legend = FALSE) + theme(axis.text.x = element_text(angle = 90)) + labs(title = "Bestselling products by Total Sales In Netherlands")
Bestselling_products_netherlands

Products Sold By Time

#Sales By Time
Sales_by_time <- ggplot(data = online_shop, aes(x=time_h)) + geom_histogram(stat = "count", fill = "steelblue") +labs(title = "Sales By Time")
## Warning in geom_histogram(stat = "count", fill = "steelblue"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
Sales_by_time

#Sales By Time Without UK
Sales_by_time_without_uk <- online_shop %>% filter(country != "United Kingdom")%>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "darkorange") + labs(title = "Sales by Time Without Uk")
## Warning in geom_histogram(stat = "count", fill = "darkorange"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_without_uk

#Sales By Time In UK
Sales_by_time_uk <- online_shop %>% filter(country == "United Kingdom")%>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "darkorange") +labs(title = "Sales By Time In UK")
## Warning in geom_histogram(stat = "count", fill = "darkorange"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_uk

#Sales By Time In Germany
Sales_by_time_germany <- online_shop %>% filter(country == "Germany") %>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "steelblue") +labs(title = "Sales by Time In Germany")
## Warning in geom_histogram(stat = "count", fill = "steelblue"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_germany

#Sales By Time In France
Sales_by_time_france <- online_shop %>% filter(country == "France") %>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "darkorange") +labs(title = "Sales by Time In France")
## Warning in geom_histogram(stat = "count", fill = "darkorange"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_france

#Sales By Time In EIRE
Sales_by_time_eire <- online_shop %>% filter(country == "EIRE") %>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "steelblue") +labs(title = "Sales by Time In EIERE")
## Warning in geom_histogram(stat = "count", fill = "steelblue"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_eire

#Sales By Time In Netherlands
Sales_by_time_netherlands <-online_shop %>% filter(country == "Netherlands") %>% ggplot(aes(x = time_h))+ geom_histogram(stat = "count", fill = "darkorange") +labs(title = "Sales by Time In Netherlands")
## Warning in geom_histogram(stat = "count", fill = "darkorange"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
Sales_by_time_netherlands

Total Sales By Month

#Total Monthly Sales
Monthly_sales <- online_shop %>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales")
Monthly_sales

#Total Monthly Sales Without UK
Monthly_sales_without_uk <-online_shop %>% filter(country != "United Kingdom")%>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales Without UK ")
Monthly_sales_without_uk

#Monthly Sales In UK
Monthly_sales_in_uk <-online_shop %>% filter(country == "United Kingdom")%>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales In UK")
Monthly_sales_in_uk

#Total Monthly Sales In Germany
Monthly_sales_in_germany <-online_shop %>% filter(country == "Germany")%>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales In Germany ")
Monthly_sales_in_germany

#Total Monthly Sales In France
Monthly_sales_in_france <-online_shop %>% filter(country == "France") %>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales In France ")
Monthly_sales_in_france

#Total Monthly Sales In EIRE
Monthly_sales_in_eire <-online_shop %>% filter(country == "EIRE")%>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales In EIRE")
Monthly_sales_in_eire

#Total Monthly Sales In Netherlands
Monthly_sales_in_netherlands <-online_shop %>% filter(country == "Netherlands")%>% group_by(month_str) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = ordered(month_str, month.name), y =total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) +labs(title = "Total Monthly Sales In Netherlands ")
Monthly_sales_in_netherlands

Total say By Day

#converting day into Ordered Factor Vector
online_shop$day <- factor(online_shop$day, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
#Total Sales By Day
Total_sales_by_day <-online_shop %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day ")
Total_sales_by_day

#Total Sales By Day Without UK
Total_sales_by_day_without_uk <-online_shop %>% filter(country != "United Kingdom") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day Without UK ")
Total_sales_by_day_without_uk

#Total Sales By Day In UK
Total_sales_by_day_in_uk <-online_shop %>% filter(country == "United Kingdom") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day In Uk")
Total_sales_by_day_in_uk

#Total Sales By Day In Germany
Total_sales_by_day_in_germany <-online_shop %>% filter(country == "Germany") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day In Germany ")
Total_sales_by_day_in_germany

#Total Sales By Day In France
Total_sales_by_day_in_france <-online_shop %>% filter(country == "France") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day In France ")
Total_sales_by_day_in_france

#Total Sales By Day In EIRE
Total_sales_by_day_in_eire <-online_shop %>% filter(country == "EIRE") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day In EIRE")
Total_sales_by_day_in_eire

#Total Sales By Day In Netherlands
Total_sales_by_day_in_netherlands <-online_shop %>% filter(country == "Netherlands") %>% group_by(day) %>% summarise(total_montly_sales = sum(total_sales)) %>% ggplot(aes(x = day, y = total_montly_sales)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Total Sales By Day In Netherlands ")
Total_sales_by_day_in_netherlands

Top Customer By Total Sales

#Converting customer_id to factor
online_shop <- online_shop %>% mutate(customer_id = as.factor(customer_id))
#Top Customer By Total Sales
Top_customer <-online_shop %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase")
Top_customer

#Top Customer By Total Sales Without UK
Top_customer_without_uk <-online_shop %>% filter(country != "United Kingdom") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase Excuding UK")
Top_customer_without_uk

#Top Customer By Total Sales In UK
Top_customer_in_uk <-online_shop %>% filter(country == "United Kingdom") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase In UK")
Top_customer_in_uk

#Top Customer By Total Sales In Germany
Top_customer_in_germany <-online_shop %>% filter(country == "Germany") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase In Germany")
Top_customer_in_germany

#Top Customer By Total Sales In France
Top_customer_in_france <-online_shop %>% filter(country == "France") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase In France")
Top_customer_in_france

#Top Customer By Total Sales In EIRE
Top_customer_in_eire <-online_shop %>% filter(country == "EIRE") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "darkorange")+ theme(axis.text.x = element_text(angle = 90))+ labs(title = "Top customer By Purchase In EIRE")
Top_customer_in_eire

#Top Customer By Total Sales In Netherlands
Top_customer_in_netherlands <-online_shop %>% filter(country == "Netherlands") %>% group_by(customer_id) %>%  summarise(total_purchase = sum(total_sales)) %>% arrange(desc(total_purchase)) %>% head(10) %>% ggplot(aes(x = customer_id, y = total_purchase)) + geom_col( fill = "steelblue")+ theme(axis.text.x = element_text(angle = 90)) + labs(title = "Top customer By Purchase In Netherlands")
Top_customer_in_netherlands

Conclusion

The majority of sales for an online retail come from customers located in the United Kingdom. The top five countries with over 5000 sales per year are the United Kingdom, Germany, EIRE, France, and the Netherlands. The products that bring in the most revenue may not necessarily the ones with the highest number of sales. Most sales take place between 8 AM and 4 PM, with a similar pattern across top 5 countries. There is a seasonal increase in sales from September to December, with an additional spike in Eire in January. The lowest sales volume is observed on the weekends, particularly on Saturdays and Sundays.

Based on analysis my key recommendation are :

  1. Online retail should focus expanding the customer base in the United Kingdom, as it is the top source of sales and revenue.

  2. Creating a promotional discount and offer on products which have more sales but generate less profit by to increase profitability.

  3. Create a marketing plan to take advantage of peak sales hour (between 8 AM and 4 PM)

  4. Creating a Flash sales campaign for weekdays particularly on Saturdays and Sundays to increase sales.

Future Scope

Further analysis Regression, K-Means Clustering can be done to further enhance the result.

Share & Act

I feel happy to share this out here, as you are reading this, any suggestion of improvement is highly valuable to me, write me at mayankmecs@gmail.com

Data-driven decision-making

This data analysis follows : Ask - Prepare - Process - Analyze - Share - Act.