Lately I have been looking at the Kaggle Retail Data Analytics set. Businesses large and small have a need to make sense of their data.

Dashboards are popular tools for visualizing business data. Using R’s Shiny package we can make a dashboard of a sorts–though one where we don’t need to worry about keeping the data up to date. As it happens, this is a highly effective way to visualize the Retail Analytics dataset.

The goal here is to compare the performance of the 45 stores in the dataset.



Take a look at the finished application here: https://connorh982.shinyapps.io/Retail_Compare_Final/

Full Code on github



Code behind the app

Shiny apps have two components: a ui and a server function.

The ui governs the layout of what the user sees, buttons they can click, and plots they are presented with.

The server meanwhile is what runs the R code to generate plots, tables etc.

The two communicate using two lists:

Here is the barebones of the app being used here, with the server code removed for conciseness (this code is described in the next section). We load in the librarys used, the data, then we set up the ui and server sections. Finally we run the app.

The ui code is governing the layout, in this case organizing the visuals into tabs.

library(shiny)
library(ggplot2)
library(data.table)
library(scales)
library(magrittr)
library(DT)
library(tidyr)


sales <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/sales%20data-set.csv")
sales$Store<-as.factor(sales$Store)
sales$Date <- sales$Date %>% as.Date("%d/%m/%Y")
storesDat <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/stores%20data-set.csv")
storesDat$Store<-as.factor(storesDat$Store)

ui <- fluidPage(
  
  tabsetPanel(
    tabPanel("Weekly sales over time",
             sidebarLayout(position = "left",
                           
                           sidebarPanel(
                             selectInput(inputId = "selection", 
                                         label = "Select a Store's Sales to visualize", choices = c(1:length(unique(sales$Store)),"All"))
                           ),
                           mainPanel(
                             
                             plotOutput("sales")
                             
                           )
             )
    ),
    tabPanel("Total yearly sales by store",
             p("Table is sorted by 2012 (descending total sales)"),
             dataTableOutput("YearlySales")
    ),
    tabPanel("Store size and type",
             plotOutput("characteristics"))
  )
  
)

server <- function(input, output, session) {
  output$sales <- renderPlot({
    if(input$selection=="All"){
      # If all stores are selected
      # "sales" plot
    }
    else{
      # If individual stores are selected
      # "sales" plot
    }
  })
  
  output$YearlySales <-DT::renderDataTable({
    # "YearlySales" plot
  })
  
  output$characteristics <- renderPlot({
    # "characteristics" plot
  })
}

shinyApp(ui, server)

Code behind the visualizations

The data we are using comes from two sets: “stores data-set.csv”, “sales data-set.csv”

The two sets tell us the following. We have 45 separate stores, of varying sizes and types (stores data-set.csv). We also know their weekly sales over 3 years.

First prepare the data, then on to the visualizations…

    library(shiny)
    library(ggplot2)
    library(data.table)
    library(scales)
    library(magrittr)
    library(DT)
    library(tidyr)
    sales <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/sales%20data-set.csv")
    sales$Store<-as.factor(sales$Store)
    sales$Date <- sales$Date %>% as.Date("%d/%m/%Y")
    storesDat <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/stores%20data-set.csv")
    storesDat$Store<-as.factor(storesDat$Store)

‘Weekly sales over time’

if(input$selection=="All"){
  
      ggplot(data = sales,aes(x=sales$Date,y=sales$Weekly_Sales)) +
      geom_smooth(color = "dodgerblue4", aes(group = as.factor(sales$Store)), size = 0.1)+
      xlab("Date")+ylab("Weekly Sales")+ggtitle("Weekly Sales for all Stores (45)")+scale_x_date(
      labels =     labelsdate_format("%Y")) + geom_smooth(color = "dark red", size =1,
      aes(x=sales$Date,y=sales$Weekly_Sales,group = NULL))
}

    else{
      
      subData <- sales[sales$Store==input$selection,]
      
      ggplot(data = subData,aes(x=subData$Date,y=subData$Weekly_Sales))+ 
      geom_smooth(color = "dark red",     size=0.1) +
      geom_point(color = "dodgerblue4", size = 0.2, shape = 18)+
      xlab("Date")+ylab("Weekly Sales")+ggtitle(paste("Weekly Sales for Store", ... input$selection))+
      scale_x_date(labels = date_format("%Y"))
    }

This code block is dependent on user input, where they can choose to plot weekly sales over time for one or all stores. If they choose to visualize one store, then a the data is subset by that store. A scatter plot (geom_point) is generated that includes a smoothed conditional means line to show overall behavior (geom_smooth).

If “All” is selected by the user, then smoothed lines are generated showing the conditional means of the individual stores in blue (with a 95% prediciton interval). A red line showing the overall behavior of all stores is also generated.

‘Total yearly sales by store’

    ##Prepare data, cut data by year, convert to wide format while summing up yearly revenue

    SaleDT <-aggregate(sales$Weekly_Sales, by = list(sales$Store,cut.Date(sales$Date, 
    x = "years")),sum)%>%spread(2,3)
    colnames(SaleDT) <- c("Store","2010","2011","2012")
    Type <-storesDat[,-3]
    SaleDT <- merge(Type,SaleDT, by = "Store")
    rownames(SaleDT) <- paste("Store",rownames(SaleDT))
    SaleDT <- SaleDT[order(SaleDT$`2012`,decreasing = TRUE),-1]
    SaleDT$Type<-as.factor(SaleDT$Type)
    
    ##Display using Datatable
    
    #Set up colors used in the heatmap
    brks <- quantile(SaleDT[,-1], probs = seq(.05, .95, .05), na.rm = TRUE)
    clrs <- round(seq(255, 150, length.out = length(brks) + 1), 0) %>%
    {paste0("rgb(", ., ",", ., ",255)")}
    
    #Set up the datatable
    datatable(SaleDT) %>% formatCurrency(2:4) %>% formatStyle(c("2010","2011","2012"), 
    backgroundColor =styleInterval(brks, clrs),backgroundPosition = 'center')%>%
    formatStyle("Type",backgroundColor = styleEqual(levels(SaleDT$Type),c("tomato","seagreen","goldenrod")))

Arguably the most interesting plot though also the most complicated to implement. This code block makes extensive use of two packages: tidyvr and DT.

Here is what the data we are interested in currently looks like:

Store Date Weekly_Sales
1 05/02/2010 24924.50
1 12/02/2010 46039.49
1 19/02/2010 41595.55
1 26/02/2010 19403.54
1 05/03/2010 21827.90

What we want is to show yearly sales by each store. First prepare the data using tidyvr’s spread function and base R’s aggreate function. We aggregate the data by slicing it into sections by year, which gives us the following columns: Store, Year, and sum of sales given year and store.

This is “long” format data, so for presentation we spread the data using tidyvr by year and yearly revenue. We are left with Stores names as “rows” and 3 new columns for years, with the cells containing the yearly revenue. We also perform a few simple steps the make the data more presentable: merging the store “Types” from the “stores data-set”, removing Stores as a column and instead renaming the rows “Store 1”,“Store 2”, etc.

    SaleDT <-aggregate(sales$Weekly_Sales, by = list(sales$Store,cut.Date(sales$Date, 
    x = "years")),sum)%>%spread(2,3)

The second code section is all focused on making the data more presentable using the DT package, which is designed to implement datatables in Shiny. First set up breaks and colors for a heatmap (I went with a light blue theme). Second we set up the visuals:

  • convert SaleDT to a datatable
  • format the yearly revenue as currency
  • For year columns: set up the “heatmap” based on the breaks and colors set up previously
  • For the Type column: color these based on values used in the next visual for consistency
    ##Display using Datatable
    
    #Set up colors used in the heatmap
    brks <- quantile(SaleDT[,-1], probs = seq(.05, .95, .05), na.rm = TRUE)
    clrs <- round(seq(255, 150, length.out = length(brks) + 1), 0) %>%
    {paste0("rgb(", ., ",", ., ",255)")}
    
    #Set up the datatable
    datatable(SaleDT) %>% formatCurrency(2:4) %>% formatStyle(c("2010","2011","2012"), 
    backgroundColor =styleInterval(brks, clrs),backgroundPosition = 'center')%>%
    formatStyle("Type",backgroundColor = styleEqual(levels(SaleDT$Type),c("tomato","seagreen","goldenrod")))

‘Store size and type’

We set up a barplot that gives us the size of each of the 45 stores colored by their type. This is nothing particularly complicated compared to the other visuals, though we can immediately see a pattern in the data that combines with the data table in the previous tab.

  ggplot(data = storesDat)+geom_bar(aes(x = storesDat$Store,y = storesDat$Size, fill = storesDat$Type),
  stat = "identity")+xlab("Store #")+ylab("Store Size")+guides(fill = guide_legend(title = "Store Type"))+
  scale_fill_manual(values = c("tomato","seagreen","goldenrod"))