10 December 2017

Analytical Goal

This presentation is for explaining the idea behind the PredictingSales Interactive document containing the Shiny App.

To build a linear regression model we need to know the relationship between each variable in the data set and the response variable, whether they vary linearly , or are not correlated , or is there a non-linear relation ?

Hence the need to build a simple shiny app where in the user can select the variable he/she wants to regress against the response variable , see the data visualization and be able to evaluate the model .

Data Source

The data for this Shiny App demo is from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/machine-learning-databases/00409/)

The data obtained is from a real Brazilian Logistic company . It is a 60 day data comprising of various types of orders , and other entities that comprise the business .

Glimpse of the data .

##   Week Day Non.urgent.order Urgent.order Order.type.A Order.type.B
## 1    1   4          316.307      223.270       61.543      175.586
## 2    1   5          128.633       96.042       38.058       56.037
##   Order.type.C Fiscal.orders Traffic.contrl.orders BnkOrder1 BnkOrder2
## 1      302.448             0                 65556     44914    188411
## 2      130.580             0                 40419     21399     89461
##   BnkOrder3 Target.ttl.order
## 1     14793          539.577
## 2      7679          224.675

Data Summary

##       Week            Day        Non.urgent.order  Urgent.order   
##  Min.   :1.000   Min.   :2.000   Min.   : 43.65   Min.   : 77.37  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:125.35   1st Qu.:100.89  
##  Median :3.000   Median :4.000   Median :151.06   Median :113.11  
##  Mean   :3.017   Mean   :4.033   Mean   :172.55   Mean   :118.92  
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:194.61   3rd Qu.:132.11  
##  Max.   :5.000   Max.   :6.000   Max.   :435.30   Max.   :223.27  
##   Order.type.A     Order.type.B     Order.type.C    Fiscal.orders    
##  Min.   : 21.83   Min.   : 25.12   Min.   : 74.37   Min.   :  0.000  
##  1st Qu.: 39.46   1st Qu.: 74.92   1st Qu.:113.63   1st Qu.:  1.243  
##  Median : 47.17   Median : 99.48   Median :127.99   Median :  7.832  
##  Mean   : 52.11   Mean   :109.23   Mean   :139.53   Mean   : 77.396  
##  3rd Qu.: 58.46   3rd Qu.:132.17   3rd Qu.:160.11   3rd Qu.: 20.361  
##  Max.   :118.18   Max.   :267.34   Max.   :302.45   Max.   :865.000  
##  Traffic.contrl.orders   BnkOrder1        BnkOrder2        BnkOrder3    
##  Min.   :11992         Min.   :  3452   Min.   : 16411   Min.   : 7679  
##  1st Qu.:34994         1st Qu.: 20130   1st Qu.: 50681   1st Qu.:12610  
##  Median :44312         Median : 32528   Median : 67181   Median :18012  
##  Mean   :44504         Mean   : 46641   Mean   : 79401   Mean   :23115  
##  3rd Qu.:52112         3rd Qu.: 45119   3rd Qu.: 94788   3rd Qu.:31048  
##  Max.   :71772         Max.   :210508   Max.   :188411   Max.   :73839  
##  Target.ttl.order
##  Min.   :129.4   
##  1st Qu.:238.2   
##  Median :288.0   
##  Mean   :300.9   
##  3rd Qu.:334.2   
##  Max.   :616.5

Visualizing the correlation between variables.

  • Are any of the variables correlated with the response variable : Target.ttl.orders? How strong/weak is the correlation?

Building models

We use the Shiny App - PredictSales to visualise the distribution of the predictor with the response and look into the model diagnostics.Next build multivariate regression model.

## Analysis of Variance Table
## 
## Model 1: Target.ttl.order ~ Non.urgent.order
## Model 2: Target.ttl.order ~ Non.urgent.order + Order.type.B
## Model 3: Target.ttl.order ~ Non.urgent.order + Order.type.B + Order.type.C
## Model 4: Target.ttl.order ~ Non.urgent.order + Order.type.B + Order.type.C + 
##     BnkOrder2
## Model 5: Target.ttl.order ~ Non.urgent.order + Order.type.B + Order.type.C + 
##     BnkOrder2 + Urgent.order
##   Res.Df   RSS Df Sum of Sq        F    Pr(>F)    
## 1     58 60004                                    
## 2     57 35560  1   24444.9 218.7901 < 2.2e-16 ***
## 3     56 11949  1   23610.6 211.3232 < 2.2e-16 ***
## 4     55 11652  1     297.1   2.6596    0.1087    
## 5     54  6033  1    5618.5  50.2880 2.927e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1