This is the analysis on Order Frequency for Diapers and Essentials Bundle for user orders since Jan 1, 2015.

Getting data from Redshift

First we need to get data out from Redshift. The Redshift query gets bundle order frequency at user level. Frequency is calculated as time time elapsed between first and last order, divided by total number of orders:

#Getting Data from Redshift
#SELECT
#  user_id,
#  product_type,
#  count(o.id),
#  CASE WHEN (count(o.id) - 1) = 0
#    THEN NULL
#  ELSE (max(charge_date) - min(charge_date)) / (count(o.id)-1)  END AS order_freq
#FROM orders o
#WHERE product_type SIMILAR TO '%Bundle%' AND
#      charge_date IS NOT NULL
#      AND state NOT SIMILAR TO '(%fraud%|%cancel%|%cart%)'
#      AND charge_date >= '2012-01-01'
#GROUP BY 1, 2

Feed data into R for analysis

Now we have the data, let’s feed it into R for plotting and analysis.

#Read in data
df = read.csv(file = '/Users/gmu/Documents/data/order_freq/essentials_order_freq5.csv', stringsAsFactors = F)
##include only diaper and essentials subscriptions
df_sub = subset(df,  product_type %in% c('Diaper Bundle', 'Essentials Bundle') )
#exclude users with multiple bundle orders in one day
df_sub = subset(df_sub,  order_freq > 0 )
#transform data for plotting
df_sub2 = df_sub %>%
  group_by(product_type, order_freq) %>%
  summarise(n=n()) %>%
  mutate(freq = n / sum(n))

Summary stats – Users order Diaper Bundle more frequently than Essentials

Median Diaper OF = 5 weeks and Median Essentials OF = 6 weeks

#Getting order frequency stats
by(df_sub$order_freq, df_sub$product_type, summary, na.rm = T)
## df_sub$product_type: Diaper Bundle
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   28.00   36.00   52.05   50.00 1617.00 
## -------------------------------------------------------- 
## df_sub$product_type: Essentials Bundle
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   30.00   44.00   65.02   72.00 1503.00

As we can see, the order frequency for both Diapers and Essentials Bundle are right skewed the average (mean) much higher than median:

For Diaper Bundles: Avg order frequency is 52 days (7 weeks), but median order frequency is 36 days (5 weeks)

For Essentials Bundles: Avg order frequency is longer than Diaper at 65 days (9 weeks), and median order frequency is 44 days (6 weeks).

The bigger gap between mean and media order frequency for essentials bundle also suggest that essentials bundles OF has longer right tail (more users has longer order freq) than Diaper bundles as shown below.

Essentials Bundle OF are more right skewed than Diapers, indicating more users order Essentials Bundles less frequently than Diapers

#Plot order frequency density by bundle type
p1 = ggplot(df_sub2,aes(x=order_freq,y=freq,fill=product_type)) + 
  geom_bar(stat="identity", position='identity', alpha=.2) +
  xlim(c(0,400)) +
  xlab('Order Frequency') +
  ylab('Order Frequency Distribution %')+
  ggtitle('Order Frequency in Days by Product Type')
ggplotly(p1)

Diaper OF peaks at 28 days, while Essentials OF concentrates at 28 as well as 42 days.

To zoom in on the chart, we can see that Diapers bundle OF peaks around 28 days (4 weeks), while Essentials bundle has dual mode, which peaks at 28 days (4 weeks) and 42 days (6 weeks).

#Zoom in Diaper and Essentials OF Plot
p2 = ggplot(df_sub2,aes(x=order_freq,y=freq,fill=product_type)) + 
  geom_bar(stat="identity", position='identity', alpha=.2) +
  xlim(c(0,120))+
  xlab('Order Frequency') +
  ylab('Order Frequency Distribution %')+
  ggtitle('Order Frequency in Days by Product Type')
ggplotly(p2)

50% of Diaper Bundlers order every 4-6 weeks, and half of Essentials Bundlers order every 5-10 weeks

The boxplot chart shows that 50% Diaper Bundlers has OF between 28-50 days (4-7 weeks), and 50% of Essentials Bundle OF is between 30-72 days (4-10 weeks)

#Boxplot remove outlier
bound = boxplot.stats(df$order_freq)$stats

p3 = ggplot(df_sub, aes(x=product_type, y=order_freq, fill=product_type)) + 
  geom_boxplot(na.rm = T) + 
  guides(fill=FALSE) + 
  coord_flip(ylim = c(0,bound*1.1) )
ggplotly(p3)

Conclusion

In summary, Essentials Bundle users order their bundles less freuqently than Diaper bundles, with half of them order every 4-10 weeks while Diaper bundlers orders every 4-7 weeks.

Since the order frequency data is heavily skewed towards longer OF, MEDIAN OF is a better data point to describe OF for each bundle: In general Diaper bundlers order their bundles every 5 weeks and Essentials Bundlers every 6 weeks.