1.0 Introduction

The visualization below will be of an interactive dashboard for an e-commerce business. The main objective is to provide a snapshot view of the business whilst allowing for the user to carry out some analysis made easier by visualizations. The dashboard will focus on some key statistics that are important for the analysis of business performance Of which, the insights can be used for strategy planning, marketing purposes or just general monitoring and understanding of the health of business.

The source of the data was obtained from from Kaggle and was provided by Olist - a brazilian e-commerce company. The data contains ecommerce data of orders amde at the Olist Store and contains samples of over 100k orders from 2016 - 2018 made at multiple marketplaces in Brazil.



2.0 Data and Design Challenges

2.1 Original Data Set does not allow easy analysis

The original data set is in a form that does not allow for easy analysis. It is a huge and fragmented dataset which has many data points that are not categories or cleaned. As a result, it was confusing and difficult to understand. To solve this, the data has to be first transformed and processed so that proper visualization can be carried out. Once we are able to create meaningful visualization plots, it will be easier to create visualizations that can help in the understanding


2.2 No view of overall performance and performance over time

The main issue of the data set is that it does not provide any valuable insights as the data is static and contains merely order information. It does not provide any indictors on the health and performance of the ecommerce business, nor is there any vision on the performance over time.

Hence, the key purpose of the dashboard is to resolve the issues mentioned above. The dasboard will contain interactive charts that will allow users to adjust the data based on different timeframes, and zoom in to specific periods if need be.

It will also includes key statistic that provides a quick view of health and performance of the business.These statistics include:

In addition, the dashboard will also showcase different visualizations such as:


3.0 Proposed Design

The proposed design that will improve the above conditions is as follows:



4.0 Step-by-Step Instructions

4.1 Load R packages

Load the following R packages:

  • shiny
  • shinydashboard
  • dplyr
  • magrittr
  • plotly
  • ggplot2
  • tidyverse
  • scales
  • DT
packages = c('shiny','shinydashboard','tidyverse','dplyr', 'magrittr', 'plotly', 'ggplot2', 'scales', 'DT')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)

}


4.2 Processing and load the Data Set

The data is divided into multiple datasets with related foreign and primary keys as per the schema above. To better create a dashboard that contains only the necessary information required as per the sketch, a consolidated excel file was created containing the customer, order and product focused data and will serve as the primary data for this visualization.

Additionally, data wrangling was carried out to include additional data points which would be useful for the visualization:

  • In the original dataset, there are 72 different product categories, some of which appear to be overlapping. Hence, to make the data more readable, these 72 different product categories were reclassified into 22 larger category class as follows:

  • An additional column depicting the number of days a delivery takes place was also added.

The final data set will contain the following columns:

##  [1] "order_id"                      "order_item_id"                
##  [3] "product_id"                    "product_category_name"        
##  [5] "product_class"                 "order_status"                 
##  [7] "order_purchase_timestamp"      "order_approved_at"            
##  [9] "order_delivered_customer_date" "order_estimated_delivery_date"
## [11] "Days_to_Delivery"              "seller_id"                    
## [13] "shipping_limit_date"           "price"                        
## [15] "freight_value"                 "customer_id"                  
## [17] "customer_unique_id"            "customer_city"                
## [19] "customer_state"                "customer_zip_code_prefix"     
## [21] "geolocation_lat"               "geolocation_lng"

The dataset is then loaded and initialized

combined_df <- read_csv('Combined Dataset_v2.csv')


4.3 Preparing the Shiny Dashboard Temmplate

A shiny dashboard minmally has three parts: a header, sidebar and a body. To create a blank dashboard, the following code is used to get the UI and server set up similar to a normal shiny app. In addition, a title can be added for the dashboard

ui <- dashboardPage(
  dashboardHeader(title = "Olist Performance"),
  dashboardSidebar(),
  dashboardBody()
)

server <- function(input, output) {}

shinyApp(ui, server)


4.4 Adding Control widgets into the sidebar

To make this dashboard more interactive and provide greater insights, control widgets will be added as shown in the sketch above. The control widgets will allow the user to alter the data visualization according to their inputs. A total of 3 different control widgets will be added:

  • A dateRangeInput - to adjust the dates that will be considered in the visualization. For this visualization, this date range will be related to the delivery date as that is when a sale is considered complete.
  • Radio buttons - to change the properties of Gross Merchandise Value (GMV) time series line chart that is displayed. It will allow the chart to be visualizaed on a cumulative, daily and logarithmic scale.
  • A Slider - to adjust the size of the bins in the histogram depicting the number of days it takes to deliver an item from when the user purchases an item.
ui <- dashboardPage(
  dashboardHeader(title = "Olist Performance"),
  dashboardSidebar(
    dateRangeInput('dateRange',
      label = 'Date Range',
      start = as.Date('2016-10-11'), end = as.Date('2018-09-03')),
      radioButtons("GraphType", "GMV Graph Type:",
                   c("Cumulative" = "cum",
                     "Daily" = "day",
                     "Logarithmic" = "Log_Graph")
                   ),
      sliderInput(inputId = "bins",
        label = "Bin Size of Histogram:",
            min = 1,
            max = 15,
            value = 5)
  ),
  dashboardBody()
)

server <- function(input, output) {}

shinyApp(ui, server)

4.5 Adding Key Stats into the dashboard Body

The first part of the dashboard will showcase 3 key statistics that can give a flavor of the business health and performance that is doing in a single glance. The 3 key statistics are:

  • Gross Merchendise Value (GMV) - An important figure in e-commerce, it shows the value of merchandise sold (and delivered) less returns and not including shipping costs. As information on returns is not available in the data-set, we will assume that GMV is the sum of all the prices for orderers that have been delivered successfully. This figure will be cumulative as of selected date range.
  • Average Order Value (AOV) - Average order value shows the average order value of customers. It is obtained by dividing the GMV by the total number of order during the selected daterange.
  • Active 30 (A30) users - A30 users refer to the number of users who are active in the past 30 days. In this case, a user is considered active if he/she makes a purchase. The reference date is taken as the upper limit of the date range input.

The 3 key stats will be placed into valueBoxes which will then be placed into a fluidRow. We use valueBoxOutput as it will allow for the figures to dynamically change according to the date range selected.

On the server side, we will also create new dataframes which will allow for the filtering of data within the date range respectively.

The following code will be used for the UI and server section respectively.

## For UI: 

dashboardBody(
    fluidRow(
      valueBoxOutput("GMV_fig"), ## valuebox for GMV
      valueBoxOutput("AOV_fig"), ## valuebox For AOV
      valueBoxOutput("A30_user") ## valuebox For A30 user
      ),


## for server:

delivered_df <- combined_df %>%
  filter(order_status == "delivered")   ## filter for delivered orders

 output$GMV_fig <- renderValueBox({  ## output for GMV 
    deliveredToDate <- delivered_df %>%
      filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) ## filter for date range
    valueBox(
      paste0("$", prettyNum(ceiling(sum(deliveredToDate$price)/1000), big.mark = ",")),
      "Total GMV ('000s)",
      color = "yellow",
      icon = icon("dollar-sign"))
  })
    
  output$AOV_fig <- renderValueBox({  ## output for AOV
    deliveredToDate <- delivered_df %>%
      filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) ## filter for date range
    valueBox(
      paste0("$", round(sum(deliveredToDate$price)/length(unique(deliveredToDate$order_id)),digits = 2)),
      "Average Order Value",
      color = "olive",
      icon = icon("shopping-basket")
    )
    
  })
  
    output$A30_user <- renderValueBox({  ## output for A30
    UserToDate <- combined_df %>%
      filter(order_purchase_timestamp <= input$dateRange[2] & order_purchase_timestamp >= (input$dateRange[2] - 30)) ## filter for date range
    valueBox(
      prettyNum(length(unique(UserToDate$customer_id)),big.mark = ","),
      "A30 Users",
      color = "blue",
      icon = icon("user-alt")
    )
    
  })


4.6 Adding a time series chart of GMV into the dashboard Body

Aside from the key stats, a time series line chart is also added for visualization of GMV over time. To give different views for more insights, the user will be able to change the graph properties using the radio buttons included in the sidebar. The 3 views are:

  • Cumulative : For an analysis of the actual amount to date between a time period. Can be used for calculations such as percentage change
  • Daily : To view the daily performance over time
  • Logarithmic: To better understand the rate of change in GMV over time.

For this chart, we will place it into a box and place it into a fluid row. The width will be set to 12 to fill up an entire row in the body.The box will also be collapsible, and by default will be collapsed to make the dashboard neater.

In addition plotly will be used in this visualization to allow more interactivity for users. A plotly render will allow users to zoom in on certain parts of the graph among other actions.

To carry out this visualization, a new dataframe will be created summarizing the total GMV by date. Additionally, it will also allow for a filtering of dateRange.

The following code will be used for the UI and server section respectively:

##For UI:

fluidRow(
  box(
     title = "Gross Merchandise Value (GMV)", background = "black", solidHeader = TRUE, width = 12, collapsible = TRUE, collapsed = TRUE,
     plotlyOutput("GMV_Plot", height = 200))
)

##For Server:

 output$GMV_Plot <- renderPlotly({
      
      dailyGMV <- delivered_df %>%
      filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>% ## to filter by date
      group_by(order_delivered_customer_date) %>%
      summarize(price_sum = sum(price)) ## Group and summarize GMV by date
      
      
      ## filters for graph properties
      
      if (input$GraphType %in% c("cum","Log_Graph")) {
        GMV <- cumsum(dailyGMV$price_sum)/1000
      }
      else if (input$GraphType == "day") {
        GMV <- (dailyGMV$price_sum)/1000  
      }
      
      
      ## creating the initial plot
      p <- ggplot(dailyGMV, aes(x= order_delivered_customer_date, y= GMV, group=1,
                                text = paste('Date : ', as.Date(order_delivered_customer_date),  
                                             "<br>GMV: $ ", prettyNum(GMV * 1000, big.mark = ",")))) + ## to adjust tooltip labels
        
            geom_line(color = "dodgerblue4") +
        
        ##For log scale
            (if (input$GraphType == "Log_Graph") { 
              scale_y_log10()}
            )+
            labs(x = "Date", y = "GMV ($'000)" ) + ##adjusting the axis titles
            theme(plot.background = element_rect(fill = "ghostwhite"),
                  panel.background = element_rect(fill = "ghostwhite")) ## changing the color of the plot
      
      ggplotly(p, tooltip = c("text")) ##plotting the graph using plotly and chaiging the tooltip

    })


4.7 Adding a data table with breakdown of GMV by Category class into dashboard Body

A data table providing a breakdown of GMV by Category will also be added. The data table will be added to a box that will be inserted into a fluid row beneath the line chart with a column inside containing the data table. as it will share the same row as the next chart, the width is limited to 6. The box will likewise be collapsable and collapsed by default for neatness.

By using the DT library to visualize this table, it will give users the interactivity to sort the data according to his/her preferences. It will also allow for the user to search for a specific category with the search bar, and will also allow user to adjust the number of results displayed.

This data table will allow users to get better insights into how each class is performing by GMV and if any categories might require more marketing and promotional efforts.

To carry out this visualization, a new dataframe will be created summarizing the total GMV by product category classification. Additionally, it will also allow for a filtering of dateRange.

The following code will be used for the UI and server section respectively:

## For UI

fluidRow(
  box(
    title = "GMV by Categories", background = "light-blue", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
    column(12, (DT::dataTableOutput("topCat")))  
    ),
    
    
## For server

output$topCat <- DT::renderDataTable(
      
      (dailyGMVbyCat <- delivered_df %>%
        filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>% ##allow for filtering of dates
        group_by("Category" = product_class) %>%
        summarize("GMV ($ '000s)" = ceiling(sum(price)/1000))), ## summarize GMV by product category classification
      rownames = FALSE, ##remove index
      options = list(lengthMenu = c(5,10), pageLength =5)  #let user choose between 5 or 10 entries per page. Default is at 5 for neatness
    )

4.8 Adding a histogram depicting the frequency that an item is delivered in x number of days from order

Another important performance measure to look at will be the time taken for an order delivery to be completed from when the user places his/her order.

To do so, we will include a histogram as a data visualization of the information above.

For this chart, we will place it into a box and place it into the same fluid row as the data table above. The width will be set to 6 to fill up the rest of the row in the body. The box will also be collapsible, and by default will be collapsed to make the dashboard neater.

The visualization will also allow for the selection of bin sizes for the user to adjust the granuality for which he/she would like to carry out analysis with. For example, by setting the bin size to 50, you can easily see how many orders take less than or more than 50 days to be delivered.

In addition plotly will be used in this visualization to allow more interactivity for users. A plotly render will allow users to zoom in on certain parts of the graph among other actions.

To carry out this visualization, a new dataframe will be created to also allow for a filtering of dateRange.

The following code will be used for the UI and server section respectively:

# For UI:

box(
          title = "Days to Delivery", background = "orange", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
          plotlyOutput("deliveryTime")
        )

# For Server:

output$deliveryTime <- renderPlotly({
      
      deliverydays <- delivered_df %>%
        filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
      
      #to create the histogram
        p <- ggplot(deliverydays, aes(x = Days_to_Delivery)) +
        geom_histogram(binwidth = input$bins)+
        xlim(0,210) +         #limit set to longest delivery time with buffer
        labs(x = "Days to Delivery", y = "# of Orders" ) +
            theme(plot.background = element_rect(fill = "ghostwhite"),
                  panel.background = element_rect(fill = "ghostwhite"))
      
      # plot histogram using plotly
      ggplotly(p)

4.9 Combining all parts to form the final code

Although the preparation of the visualization was done all at one go, I had split up the code different features above for easier and clearer explanation. In actuality, the entire code for the dashboard should be a combined version of all the smaller parts above and will be as follows:

packages = c('shiny','shinydashboard','tidyverse','dplyr', 'magrittr', 'plotly', 'ggplot2', 'scales', 'DT')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

combined_df <- read_csv('Combined Dataset_v2.csv')

library(shiny)
library(shinydashboard)

ui <- dashboardPage(
  dashboardHeader(title = "Olist Performance"),
  
  dashboardSidebar(
      dateRangeInput('dateRange',
      label = 'Date Range',
      start = as.Date('2016-10-11'), end = as.Date('2018-09-03')),
      radioButtons("GraphType", "GMV Graph Type:",
                   c("Cumulative" = "cum",
                     "Daily" = "day",
                     "Logarithmic" = "Log_Graph")
                   ),
      sliderInput(inputId = "bins",
        label = "Bin Size of Histogram:",
            min = 1,
            max = 15,
            value = 5)
  ),
  
  dashboardBody(
    fluidRow(
      valueBoxOutput("GMV_fig"),
      valueBoxOutput("AOV_fig"),
      valueBoxOutput("A30_user")
      ),
  
    fluidRow(
        box(
         title = "Gross Merchandise Value (GMV)", background = "black", solidHeader = TRUE, width = 12, collapsible = TRUE, collapsed = TRUE,
         plotlyOutput("GMV_Plot", height = 200))
    ),
    fluidRow(
        box(
          title = "GMV by Categories", background = "light-blue", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
          column(12, (DT::dataTableOutput("topCat")))
          ),
        box(
          title = "Days to Delivery", background = "orange", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
          plotlyOutput("deliveryTime")
        )
    )
  )
  )

server <- function(input, output){


delivered_df <- combined_df %>%
  filter(order_status == "delivered")

  
  output$GMV_fig <- renderValueBox({
    deliveredToDate <- delivered_df %>%
      filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
    valueBox(
      paste0("$", prettyNum(ceiling(sum(deliveredToDate$price)/1000), big.mark = ",")),
      "Total GMV ('000s)",
      color = "yellow",
      icon = icon("dollar-sign"))
  })
    
  output$AOV_fig <- renderValueBox({
    deliveredToDate <- delivered_df %>%
      filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
    valueBox(
      paste0("$", round(sum(deliveredToDate$price)/length(unique(deliveredToDate$order_id)),digits = 2)),
      "Average Order Value",
      color = "olive",
      icon = icon("shopping-basket")
    )
    
  })
  
    output$A30_user <- renderValueBox({
    UserToDate <- combined_df %>%
      filter(order_purchase_timestamp <= input$dateRange[2] & order_purchase_timestamp >= (input$dateRange[2] - 30))
    valueBox(
      prettyNum(length(unique(UserToDate$customer_id)),big.mark = ","),
      "A30 Users",
      color = "blue",
      icon = icon("user-alt")
    )
    
  })
    
    
    output$GMV_Plot <- renderPlotly({
      
      dailyGMV <- delivered_df %>%
        filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>%
        group_by(order_delivered_customer_date) %>%
        summarize(price_sum = sum(price))
      
      
      if (input$GraphType %in% c("cum","Log_Graph")) {
        GMV <- cumsum(dailyGMV$price_sum)/1000
      }
      else if (input$GraphType == "day") {
        GMV <- (dailyGMV$price_sum)/1000
      }
      
      p <- ggplot(dailyGMV, aes(x= order_delivered_customer_date, y= GMV, group=1,
                                text = paste('Date : ', as.Date(order_delivered_customer_date),
                                             "<br>GMV: $ ", prettyNum(GMV * 1000, big.mark = ",")))) + 
        
            geom_line(color = "dodgerblue4") +
            (if (input$GraphType == "Log_Graph") { 
              scale_y_log10()}
            )+
            labs(x = "Date", y = "GMV ($'000)" ) +
            theme(plot.background = element_rect(fill = "ghostwhite"),
                  panel.background = element_rect(fill = "ghostwhite"))
      
      ggplotly(p, tooltip = c("text"))

    })
    
    
    
    output$topCat <- DT::renderDataTable(
      
      (dailyGMVbyCat <- delivered_df %>%
        filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>%
        group_by("Category" = product_class) %>%
        summarize("GMV ($ '000s)" = ceiling(sum(price)/1000))),
      rownames = FALSE,
      options = list(lengthMenu = c(5,10), pageLength =5)
    )
    
    output$deliveryTime <- renderPlotly({
      
      deliverydays <- delivered_df %>%
        filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
      
        p <- ggplot(deliverydays, aes(x = Days_to_Delivery)) +
        geom_histogram(binwidth = input$bins)+
        xlim(0,210) + 
        labs(x = "Days to Delivery", y = "# of Orders" ) +
            theme(plot.background = element_rect(fill = "ghostwhite"),
                  panel.background = element_rect(fill = "ghostwhite"))
      
      ggplotly(p)
      
    })
                         
      
  
  
}

shinyApp(ui, server)

5.0 Final Visualization

You can find the published version of the dashboard at : https://shauncwh.shinyapps.io/Assignment5Dashboard/

An additional screen shot of the dashboard is as follows:

Alternatively, you may allow for the dashboard to be run in RStudio by allowing for runtime: Shiny located in the YAML documentation at the top of this markdown document

Shiny applications not supported in static R Markdown documents



5.1 Description of Visualization and Insights

The Visualization dashboard offers insights into the health and performance of the e-commerce business from differing lenses for different uses with different priorities and responsibilities. Examples of insights and their uses include:

  • The GMV of the business has increased rapidly over the years with a cumulative growth of $13.2mn between October’16 and September’18. However, looking at the logarithmic scale, we observe that the growth was the fastest in the initial stages (oct’16), with a slight boost in Jan’17. However, the rate of growth has since slowed down. Although this is common when businesses grow, the business should still take note and try to continuously increase its conversion and sales.

  • Home/Household Products are the best-selling products (in terms of GMV) for this e-commerce site. Hence, the business should focus on further growing and milking the sales of products from this category. One such way would be to increase the diversity of home/household products and sellers that are focused on this category. Engaging in any promotion in price might not be useful as this is already a mature category where willing users are less sensitive to price.

  • On the other hand, categories such as art, security and festivals seem to lag behind in sales. Should the ecommerce want to try and balance their sales portfolio more, they could look at marketing strategies targeted at boosting sales in these categories. For example, they could look at cross selling or promotions such as free shipping vouchers which, according to industry knowledge, is the most attractive offer to users.

  • Most orders are fulfilled within a week to 2 weeks which is quite a fair delivery time for ecommerce orders. However, it should also be noted that some orders take up to or more than a month to be fulfilled. These would negatively affect the user experience and efforts should be carried out to improve this.

These are just a few of the insights that can be obtained from the dashboard. The dashboard is dynamic which allows for more views and tabs to be added if necessary to serve a wider audience.