R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
orders <- read.csv("C:/Users/nanl/Desktop/ANLY699/orders.csv")
products <- read.csv("C:/Users/nanl/Desktop/ANLY699/products.csv")
order_products_prior <- read.csv("C:/Users/nanl/Desktop/ANLY699/order_products__prior.csv")
order_products_train <- read.csv("C:/Users/nanl/Desktop/ANLY699/order_products__train.csv")
departments <- read.csv("C:/Users/nanl/Desktop/ANLY699/departments.csv")
aisles <- read.csv("C:/Users/nanl/Desktop/ANLY699/aisles.csv")

What hours do consumers make the most purchase?

ggplot(data=orders,aes(x=order_hour_of_day))+geom_histogram(stat="count",fill="red") + xlab('Hour of Day') + ylab('No of orders') + ggtitle('No of orders vs Hour of Day')
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Frequency of orders in a given week?

ggplot(data=orders,aes(x=order_dow))+geom_histogram(stat="count",fill="red") + xlab('Days of Week') + ylab('No of orders') + ggtitle('No of orders vs Days of week')
## Warning: Ignoring unknown parameters: binwidth, bins, pad

When do consumers make repurchase?

ggplot(data=orders,aes(x=days_since_prior_order))+geom_histogram(stat="count",fill="red") + xlab('Days since previous order') + ylab('No of orders') + ggtitle('No of orders vs Days since previous order')
## Warning: Ignoring unknown parameters: binwidth, bins, pad
## Warning: Removed 206209 rows containing non-finite values (stat_count).

Distribution of orders

orders %>% select(user_id) %>% group_by(user_id) %>% count(user_id) %>% ggplot(aes(x=n))+geom_histogram(stat="count", fill="red") + xlab('Total orders per customer') + ylab('No of customers') + ggtitle('No of customers vs Total no of orders per customer')
## Warning: Ignoring unknown parameters: binwidth, bins, pad