Initial Processing

Load data, clean it up, and do some brief analytics

As a first pass, I load the data, drop the null columns, extra the month and the month name, and do a quick glance of the data. I see that the classes of the data were passed into R as intended, and the distribution of the data seems fairly normal.

orders <- data.table(read.xlsx2(file = "XLS_takehome_NA.xlsx", 
                                sheetName = "Orders",
                                stringsAsFactors = F,
                                colClasses = c("Date", "Date", "character", "numeric")))

## clean up data
orders <- orders[, 1:4][order(dateordered)]
orders[, month := month(dateordered)]
orders$monthName <- month.name[ orders$month]

## descriptives
sapply(orders, class)
##  dateordered datereturned  orderstatus       orders        month 
##       "Date"       "Date"  "character"    "numeric"    "numeric" 
##    monthName 
##  "character"
summary(orders)
##   dateordered          datereturned        orderstatus       
##  Min.   :2016-08-01   Min.   :2016-08-25   Length:213        
##  1st Qu.:2016-09-08   1st Qu.:2016-11-13   Class :character  
##  Median :2016-10-19   Median :2016-12-11   Mode  :character  
##  Mean   :2016-10-18   Mean   :2016-12-05                     
##  3rd Qu.:2016-11-28   3rd Qu.:2017-01-05                     
##  Max.   :2016-12-31   Max.   :2017-01-28                     
##                       NA's   :153                            
##      orders          month       monthName        
##  Min.   : 1.00   Min.   : 8.0   Length:213        
##  1st Qu.: 1.00   1st Qu.: 9.0   Class :character  
##  Median : 4.00   Median :10.0   Mode  :character  
##  Mean   : 5.13   Mean   :10.1                     
##  3rd Qu.: 7.00   3rd Qu.:11.0                     
##  Max.   :36.00   Max.   :12.0                     
## 

Aggregation I

At first glance, one may think that the rate of returns is rising– the sheer number of returns has increased over time. However in order to answer the first question, I did an aggregation to get the return rate by month. I did so by calculating the number of orders returned per the number of orders placed by the month in which that order was placed. According to these trends, the return rate has been steadily falling since August.

I believe that the return rate calculated here is indicative of how the company was performing or the quality of the mattress at the the time of sale. The plot, despite being out of order, shows the return rates over time.

returnRate <- orders[, .( orders = sum(orders[orderstatus == "complete"], na.rm = T),
                          returns = sum(orders[orderstatus == "returned"], na.rm = T),
                          returnRate = sum(orders[orderstatus == "returned"], na.rm = T)/
                           sum(orders[orderstatus == "complete"], na.rm = T)),
                     by = .(monthName, month)]
## clean up
returnRate[, returnRate := round(returnRate,2)]
print(returnRate)
##    monthName month orders returns returnRate
## 1:    August     8    139      10       0.07
## 2: September     9    134       9       0.07
## 3:   October    10    170      10       0.06
## 4:  November    11    214      12       0.06
## 5:  December    12    376      19       0.05
## plot
plot_ly(data = returnRate, 
        x = ~monthName, 
        y =  ~returnRate, 
        mode = 'lines', 
        colors = "Set1")
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar

Aggregation II

As a gut check, I wanted to aggregate the data in a different way. I aggregated the data by the date of return, which shows the sheer volume of mattresses being sold and returned by month, regardless of their sale date. While I believed this calculation to be less valuable, I created it to rest assured. Again, we can see that, although the number of returns is rising, the rate is not as telling when ignoring the month in which the order was placed.

returnsByMonth <- orders[, .(returns = sum(orders[orderstatus == "returned"], na.rm = T)),
                         by = monthName]

ordersByMonth <- orders[, .(orders = sum(orders[orderstatus == "complete"], na.rm = T)),
                        by = monthName]
## join
setkey(returnsByMonth, monthName)
setkey(ordersByMonth, monthName)
returnsByMonth <- ordersByMonth[returnsByMonth]

## clean up
returnsByMonth[!is.na(monthName), returnRate := returns/orders, by = monthName]
returnsByMonth <- returnsByMonth[c(1, 5, 4, 3, 2)]
print(returnsByMonth)
##    monthName orders returns returnRate
## 1:    August    139      10    0.07194
## 2: September    134       9    0.06716
## 3:   October    170      10    0.05882
## 4:  November    214      12    0.05607
## 5:  December    376      19    0.05053
## plot
plot_ly(returnsByMonth, x = ~monthName) %>%
  add_trace(y = ~orders, name = 'Orders',mode = 'lines') %>%
  add_trace(y = ~returns, name = 'Returns', mode = 'lines') 
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar

Analysis

Some examples of factors highlighted by the first aggregation include:
- Changing (bettering) quality of product over time
- Changing (bettering) customer experience at the point of sale (whether customers are being matched to the right product (mattress) at the point of sale)

Some examples of factors highlighted by the second aggregation include:
- Changes to consumer behavior in reaction to new competitors
- Changes to post-sale responsiveness of customer support
- Other factors not associated with date of point-of-sale

SQL Query

Using a package in R, I created and ran a SQL query to get the number of completed orders by date. If the audience also wanted to see an aggregation by month, I would add either a MONTH or a DATEPART to the query.

casper_orders <- orders

sqldf("SELECT SUM(orders) as totalOrders, dateordered 
      FROM casper_orders 
      WHERE orderstatus = 'complete' 
      GROUP BY dateordered;")
##     totalOrders dateordered
## 1             4  2016-08-01
## 2             4  2016-08-02
## 3             8  2016-08-03
## 4             6  2016-08-04
## 5             1  2016-08-05
## 6             1  2016-08-06
## 7             5  2016-08-07
## 8             4  2016-08-08
## 9             9  2016-08-09
## 10            4  2016-08-10
## 11           12  2016-08-11
## 12            8  2016-08-12
## 13            4  2016-08-13
## 14            6  2016-08-14
## 15            3  2016-08-15
## 16            1  2016-08-16
## 17            3  2016-08-17
## 18            5  2016-08-18
## 19            5  2016-08-19
## 20            3  2016-08-20
## 21            3  2016-08-21
## 22            2  2016-08-22
## 23            4  2016-08-23
## 24            6  2016-08-24
## 25            3  2016-08-25
## 26            5  2016-08-26
## 27            3  2016-08-27
## 28            7  2016-08-28
## 29            3  2016-08-29
## 30            3  2016-08-30
## 31            4  2016-08-31
## 32            3  2016-09-01
## 33            6  2016-09-02
## 34            2  2016-09-03
## 35            8  2016-09-04
## 36           10  2016-09-05
## 37            7  2016-09-06
## 38            5  2016-09-07
## 39            9  2016-09-08
## 40            4  2016-09-09
## 41            3  2016-09-10
## 42            1  2016-09-11
## 43            2  2016-09-12
## 44            5  2016-09-13
## 45            2  2016-09-14
## 46           10  2016-09-15
## 47            1  2016-09-16
## 48            6  2016-09-17
## 49            2  2016-09-18
## 50            3  2016-09-19
## 51            4  2016-09-20
## 52            5  2016-09-21
## 53            4  2016-09-22
## 54            2  2016-09-23
## 55            7  2016-09-24
## 56            3  2016-09-25
## 57            3  2016-09-26
## 58            4  2016-09-27
## 59            2  2016-09-28
## 60            6  2016-09-29
## 61            5  2016-09-30
## 62            8  2016-10-01
## 63            2  2016-10-02
## 64            3  2016-10-03
## 65            3  2016-10-04
## 66            6  2016-10-05
## 67            8  2016-10-06
## 68            6  2016-10-07
## 69            4  2016-10-08
## 70            7  2016-10-09
## 71            6  2016-10-10
## 72            7  2016-10-11
## 73            6  2016-10-12
## 74            5  2016-10-13
## 75            5  2016-10-14
## 76            6  2016-10-15
## 77            3  2016-10-16
## 78            6  2016-10-17
## 79            5  2016-10-18
## 80            6  2016-10-19
## 81            8  2016-10-20
## 82            5  2016-10-21
## 83           10  2016-10-22
## 84            7  2016-10-23
## 85            2  2016-10-24
## 86            7  2016-10-25
## 87            7  2016-10-26
## 88            2  2016-10-27
## 89            7  2016-10-28
## 90            1  2016-10-29
## 91            6  2016-10-30
## 92            6  2016-10-31
## 93            7  2016-11-01
## 94            2  2016-11-02
## 95            6  2016-11-03
## 96            8  2016-11-04
## 97            5  2016-11-05
## 98            3  2016-11-07
## 99            2  2016-11-08
## 100           2  2016-11-09
## 101          10  2016-11-10
## 102           9  2016-11-11
## 103           4  2016-11-12
## 104           4  2016-11-13
## 105           7  2016-11-14
## 106           6  2016-11-15
## 107           6  2016-11-16
## 108          11  2016-11-17
## 109          11  2016-11-18
## 110           6  2016-11-19
## 111           8  2016-11-20
## 112           6  2016-11-21
## 113           5  2016-11-22
## 114           5  2016-11-23
## 115          13  2016-11-24
## 116           5  2016-11-25
## 117           5  2016-11-26
## 118           4  2016-11-27
## 119           9  2016-11-28
## 120           9  2016-11-29
## 121          36  2016-11-30
## 122          32  2016-12-01
## 123          15  2016-12-02
## 124          12  2016-12-03
## 125          11  2016-12-04
## 126          11  2016-12-05
## 127           7  2016-12-06
## 128           9  2016-12-07
## 129           5  2016-12-08
## 130          14  2016-12-09
## 131          14  2016-12-10
## 132           9  2016-12-11
## 133          13  2016-12-12
## 134          10  2016-12-13
## 135          15  2016-12-14
## 136          18  2016-12-15
## 137          13  2016-12-16
## 138           8  2016-12-17
## 139          10  2016-12-18
## 140          11  2016-12-19
## 141          13  2016-12-20
## 142           8  2016-12-21
## 143          10  2016-12-22
## 144          12  2016-12-23
## 145           4  2016-12-24
## 146           8  2016-12-25
## 147          12  2016-12-26
## 148          13  2016-12-27
## 149          23  2016-12-28
## 150          14  2016-12-29
## 151          14  2016-12-30
## 152           8  2016-12-31