The visualization below will be of an interactive dashboard for an e-commerce business. The main objective is to provide a snapshot view of the business whilst allowing for the user to carry out some analysis made easier by visualizations. The dashboard will focus on some key statistics that are important for the analysis of business performance Of which, the insights can be used for strategy planning, marketing purposes or just general monitoring and understanding of the health of business.
The source of the data was obtained from from Kaggle and was provided by Olist - a brazilian e-commerce company. The data contains ecommerce data of orders amde at the Olist Store and contains samples of over 100k orders from 2016 - 2018 made at multiple marketplaces in Brazil.
2.1 Original Data Set does not allow easy analysis
The original data set is in a form that does not allow for easy analysis. It is a huge and fragmented dataset which has many data points that are not categories or cleaned. As a result, it was confusing and difficult to understand. To solve this, the data has to be first transformed and processed so that proper visualization can be carried out. Once we are able to create meaningful visualization plots, it will be easier to create visualizations that can help in the understanding
2.2 No view of overall performance and performance over time
The main issue of the data set is that it does not provide any valuable insights as the data is static and contains merely order information. It does not provide any indictors on the health and performance of the ecommerce business, nor is there any vision on the performance over time.
Hence, the key purpose of the dashboard is to resolve the issues mentioned above. The dasboard will contain interactive charts that will allow users to adjust the data based on different timeframes, and zoom in to specific periods if need be.
It will also includes key statistic that provides a quick view of health and performance of the business.These statistics include:
In addition, the dashboard will also showcase different visualizations such as:
The proposed design that will improve the above conditions is as follows:
Load the following R packages:
packages = c('shiny','shinydashboard','tidyverse','dplyr', 'magrittr', 'plotly', 'ggplot2', 'scales', 'DT')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The data is divided into multiple datasets with related foreign and primary keys as per the schema above. To better create a dashboard that contains only the necessary information required as per the sketch, a consolidated excel file was created containing the customer, order and product focused data and will serve as the primary data for this visualization.
Additionally, data wrangling was carried out to include additional data points which would be useful for the visualization:
The final data set will contain the following columns:
## [1] "order_id" "order_item_id"
## [3] "product_id" "product_category_name"
## [5] "product_class" "order_status"
## [7] "order_purchase_timestamp" "order_approved_at"
## [9] "order_delivered_customer_date" "order_estimated_delivery_date"
## [11] "Days_to_Delivery" "seller_id"
## [13] "shipping_limit_date" "price"
## [15] "freight_value" "customer_id"
## [17] "customer_unique_id" "customer_city"
## [19] "customer_state" "customer_zip_code_prefix"
## [21] "geolocation_lat" "geolocation_lng"
The dataset is then loaded and initialized
combined_df <- read_csv('Combined Dataset_v2.csv')
A shiny dashboard minmally has three parts: a header, sidebar and a body. To create a blank dashboard, the following code is used to get the UI and server set up similar to a normal shiny app. In addition, a title can be added for the dashboard
ui <- dashboardPage(
dashboardHeader(title = "Olist Performance"),
dashboardSidebar(),
dashboardBody()
)
server <- function(input, output) {}
shinyApp(ui, server)
The first part of the dashboard will showcase 3 key statistics that can give a flavor of the business health and performance that is doing in a single glance. The 3 key statistics are:
The 3 key stats will be placed into valueBoxes which will then be placed into a fluidRow. We use valueBoxOutput as it will allow for the figures to dynamically change according to the date range selected.
On the server side, we will also create new dataframes which will allow for the filtering of data within the date range respectively.
The following code will be used for the UI and server section respectively.
## For UI:
dashboardBody(
fluidRow(
valueBoxOutput("GMV_fig"), ## valuebox for GMV
valueBoxOutput("AOV_fig"), ## valuebox For AOV
valueBoxOutput("A30_user") ## valuebox For A30 user
),
## for server:
delivered_df <- combined_df %>%
filter(order_status == "delivered") ## filter for delivered orders
output$GMV_fig <- renderValueBox({ ## output for GMV
deliveredToDate <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) ## filter for date range
valueBox(
paste0("$", prettyNum(ceiling(sum(deliveredToDate$price)/1000), big.mark = ",")),
"Total GMV ('000s)",
color = "yellow",
icon = icon("dollar-sign"))
})
output$AOV_fig <- renderValueBox({ ## output for AOV
deliveredToDate <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) ## filter for date range
valueBox(
paste0("$", round(sum(deliveredToDate$price)/length(unique(deliveredToDate$order_id)),digits = 2)),
"Average Order Value",
color = "olive",
icon = icon("shopping-basket")
)
})
output$A30_user <- renderValueBox({ ## output for A30
UserToDate <- combined_df %>%
filter(order_purchase_timestamp <= input$dateRange[2] & order_purchase_timestamp >= (input$dateRange[2] - 30)) ## filter for date range
valueBox(
prettyNum(length(unique(UserToDate$customer_id)),big.mark = ","),
"A30 Users",
color = "blue",
icon = icon("user-alt")
)
})
Aside from the key stats, a time series line chart is also added for visualization of GMV over time. To give different views for more insights, the user will be able to change the graph properties using the radio buttons included in the sidebar. The 3 views are:
For this chart, we will place it into a box and place it into a fluid row. The width will be set to 12 to fill up an entire row in the body.The box will also be collapsible, and by default will be collapsed to make the dashboard neater.
In addition plotly will be used in this visualization to allow more interactivity for users. A plotly render will allow users to zoom in on certain parts of the graph among other actions.
To carry out this visualization, a new dataframe will be created summarizing the total GMV by date. Additionally, it will also allow for a filtering of dateRange.
The following code will be used for the UI and server section respectively:
##For UI:
fluidRow(
box(
title = "Gross Merchandise Value (GMV)", background = "black", solidHeader = TRUE, width = 12, collapsible = TRUE, collapsed = TRUE,
plotlyOutput("GMV_Plot", height = 200))
)
##For Server:
output$GMV_Plot <- renderPlotly({
dailyGMV <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>% ## to filter by date
group_by(order_delivered_customer_date) %>%
summarize(price_sum = sum(price)) ## Group and summarize GMV by date
## filters for graph properties
if (input$GraphType %in% c("cum","Log_Graph")) {
GMV <- cumsum(dailyGMV$price_sum)/1000
}
else if (input$GraphType == "day") {
GMV <- (dailyGMV$price_sum)/1000
}
## creating the initial plot
p <- ggplot(dailyGMV, aes(x= order_delivered_customer_date, y= GMV, group=1,
text = paste('Date : ', as.Date(order_delivered_customer_date),
"<br>GMV: $ ", prettyNum(GMV * 1000, big.mark = ",")))) + ## to adjust tooltip labels
geom_line(color = "dodgerblue4") +
##For log scale
(if (input$GraphType == "Log_Graph") {
scale_y_log10()}
)+
labs(x = "Date", y = "GMV ($'000)" ) + ##adjusting the axis titles
theme(plot.background = element_rect(fill = "ghostwhite"),
panel.background = element_rect(fill = "ghostwhite")) ## changing the color of the plot
ggplotly(p, tooltip = c("text")) ##plotting the graph using plotly and chaiging the tooltip
})
A data table providing a breakdown of GMV by Category will also be added. The data table will be added to a box that will be inserted into a fluid row beneath the line chart with a column inside containing the data table. as it will share the same row as the next chart, the width is limited to 6. The box will likewise be collapsable and collapsed by default for neatness.
By using the DT library to visualize this table, it will give users the interactivity to sort the data according to his/her preferences. It will also allow for the user to search for a specific category with the search bar, and will also allow user to adjust the number of results displayed.
This data table will allow users to get better insights into how each class is performing by GMV and if any categories might require more marketing and promotional efforts.
To carry out this visualization, a new dataframe will be created summarizing the total GMV by product category classification. Additionally, it will also allow for a filtering of dateRange.
The following code will be used for the UI and server section respectively:
## For UI
fluidRow(
box(
title = "GMV by Categories", background = "light-blue", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
column(12, (DT::dataTableOutput("topCat")))
),
## For server
output$topCat <- DT::renderDataTable(
(dailyGMVbyCat <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>% ##allow for filtering of dates
group_by("Category" = product_class) %>%
summarize("GMV ($ '000s)" = ceiling(sum(price)/1000))), ## summarize GMV by product category classification
rownames = FALSE, ##remove index
options = list(lengthMenu = c(5,10), pageLength =5) #let user choose between 5 or 10 entries per page. Default is at 5 for neatness
)
Another important performance measure to look at will be the time taken for an order delivery to be completed from when the user places his/her order.
To do so, we will include a histogram as a data visualization of the information above.
For this chart, we will place it into a box and place it into the same fluid row as the data table above. The width will be set to 6 to fill up the rest of the row in the body. The box will also be collapsible, and by default will be collapsed to make the dashboard neater.
The visualization will also allow for the selection of bin sizes for the user to adjust the granuality for which he/she would like to carry out analysis with. For example, by setting the bin size to 50, you can easily see how many orders take less than or more than 50 days to be delivered.
In addition plotly will be used in this visualization to allow more interactivity for users. A plotly render will allow users to zoom in on certain parts of the graph among other actions.
To carry out this visualization, a new dataframe will be created to also allow for a filtering of dateRange.
The following code will be used for the UI and server section respectively:
# For UI:
box(
title = "Days to Delivery", background = "orange", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
plotlyOutput("deliveryTime")
)
# For Server:
output$deliveryTime <- renderPlotly({
deliverydays <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
#to create the histogram
p <- ggplot(deliverydays, aes(x = Days_to_Delivery)) +
geom_histogram(binwidth = input$bins)+
xlim(0,210) + #limit set to longest delivery time with buffer
labs(x = "Days to Delivery", y = "# of Orders" ) +
theme(plot.background = element_rect(fill = "ghostwhite"),
panel.background = element_rect(fill = "ghostwhite"))
# plot histogram using plotly
ggplotly(p)
Although the preparation of the visualization was done all at one go, I had split up the code different features above for easier and clearer explanation. In actuality, the entire code for the dashboard should be a combined version of all the smaller parts above and will be as follows:
packages = c('shiny','shinydashboard','tidyverse','dplyr', 'magrittr', 'plotly', 'ggplot2', 'scales', 'DT')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
combined_df <- read_csv('Combined Dataset_v2.csv')
library(shiny)
library(shinydashboard)
ui <- dashboardPage(
dashboardHeader(title = "Olist Performance"),
dashboardSidebar(
dateRangeInput('dateRange',
label = 'Date Range',
start = as.Date('2016-10-11'), end = as.Date('2018-09-03')),
radioButtons("GraphType", "GMV Graph Type:",
c("Cumulative" = "cum",
"Daily" = "day",
"Logarithmic" = "Log_Graph")
),
sliderInput(inputId = "bins",
label = "Bin Size of Histogram:",
min = 1,
max = 15,
value = 5)
),
dashboardBody(
fluidRow(
valueBoxOutput("GMV_fig"),
valueBoxOutput("AOV_fig"),
valueBoxOutput("A30_user")
),
fluidRow(
box(
title = "Gross Merchandise Value (GMV)", background = "black", solidHeader = TRUE, width = 12, collapsible = TRUE, collapsed = TRUE,
plotlyOutput("GMV_Plot", height = 200))
),
fluidRow(
box(
title = "GMV by Categories", background = "light-blue", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
column(12, (DT::dataTableOutput("topCat")))
),
box(
title = "Days to Delivery", background = "orange", solidHeader = TRUE, width = 6, collapsible = TRUE, collapsed = TRUE,
plotlyOutput("deliveryTime")
)
)
)
)
server <- function(input, output){
delivered_df <- combined_df %>%
filter(order_status == "delivered")
output$GMV_fig <- renderValueBox({
deliveredToDate <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
valueBox(
paste0("$", prettyNum(ceiling(sum(deliveredToDate$price)/1000), big.mark = ",")),
"Total GMV ('000s)",
color = "yellow",
icon = icon("dollar-sign"))
})
output$AOV_fig <- renderValueBox({
deliveredToDate <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
valueBox(
paste0("$", round(sum(deliveredToDate$price)/length(unique(deliveredToDate$order_id)),digits = 2)),
"Average Order Value",
color = "olive",
icon = icon("shopping-basket")
)
})
output$A30_user <- renderValueBox({
UserToDate <- combined_df %>%
filter(order_purchase_timestamp <= input$dateRange[2] & order_purchase_timestamp >= (input$dateRange[2] - 30))
valueBox(
prettyNum(length(unique(UserToDate$customer_id)),big.mark = ","),
"A30 Users",
color = "blue",
icon = icon("user-alt")
)
})
output$GMV_Plot <- renderPlotly({
dailyGMV <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>%
group_by(order_delivered_customer_date) %>%
summarize(price_sum = sum(price))
if (input$GraphType %in% c("cum","Log_Graph")) {
GMV <- cumsum(dailyGMV$price_sum)/1000
}
else if (input$GraphType == "day") {
GMV <- (dailyGMV$price_sum)/1000
}
p <- ggplot(dailyGMV, aes(x= order_delivered_customer_date, y= GMV, group=1,
text = paste('Date : ', as.Date(order_delivered_customer_date),
"<br>GMV: $ ", prettyNum(GMV * 1000, big.mark = ",")))) +
geom_line(color = "dodgerblue4") +
(if (input$GraphType == "Log_Graph") {
scale_y_log10()}
)+
labs(x = "Date", y = "GMV ($'000)" ) +
theme(plot.background = element_rect(fill = "ghostwhite"),
panel.background = element_rect(fill = "ghostwhite"))
ggplotly(p, tooltip = c("text"))
})
output$topCat <- DT::renderDataTable(
(dailyGMVbyCat <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1]) %>%
group_by("Category" = product_class) %>%
summarize("GMV ($ '000s)" = ceiling(sum(price)/1000))),
rownames = FALSE,
options = list(lengthMenu = c(5,10), pageLength =5)
)
output$deliveryTime <- renderPlotly({
deliverydays <- delivered_df %>%
filter(order_delivered_customer_date <= input$dateRange[2] & order_delivered_customer_date >= input$dateRange[1])
p <- ggplot(deliverydays, aes(x = Days_to_Delivery)) +
geom_histogram(binwidth = input$bins)+
xlim(0,210) +
labs(x = "Days to Delivery", y = "# of Orders" ) +
theme(plot.background = element_rect(fill = "ghostwhite"),
panel.background = element_rect(fill = "ghostwhite"))
ggplotly(p)
})
}
shinyApp(ui, server)
You can find the published version of the dashboard at : https://shauncwh.shinyapps.io/Assignment5Dashboard/
An additional screen shot of the dashboard is as follows:
Alternatively, you may allow for the dashboard to be run in RStudio by allowing for runtime: Shiny located in the YAML documentation at the top of this markdown document
The Visualization dashboard offers insights into the health and performance of the e-commerce business from differing lenses for different uses with different priorities and responsibilities. Examples of insights and their uses include:
The GMV of the business has increased rapidly over the years with a cumulative growth of $13.2mn between October’16 and September’18. However, looking at the logarithmic scale, we observe that the growth was the fastest in the initial stages (oct’16), with a slight boost in Jan’17. However, the rate of growth has since slowed down. Although this is common when businesses grow, the business should still take note and try to continuously increase its conversion and sales.
Home/Household Products are the best-selling products (in terms of GMV) for this e-commerce site. Hence, the business should focus on further growing and milking the sales of products from this category. One such way would be to increase the diversity of home/household products and sellers that are focused on this category. Engaging in any promotion in price might not be useful as this is already a mature category where willing users are less sensitive to price.
On the other hand, categories such as art, security and festivals seem to lag behind in sales. Should the ecommerce want to try and balance their sales portfolio more, they could look at marketing strategies targeted at boosting sales in these categories. For example, they could look at cross selling or promotions such as free shipping vouchers which, according to industry knowledge, is the most attractive offer to users.
Most orders are fulfilled within a week to 2 weeks which is quite a fair delivery time for ecommerce orders. However, it should also be noted that some orders take up to or more than a month to be fulfilled. These would negatively affect the user experience and efforts should be carried out to improve this.
These are just a few of the insights that can be obtained from the dashboard. The dashboard is dynamic which allows for more views and tabs to be added if necessary to serve a wider audience.