## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:Hmisc':
## 
##     src, summarize
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Joining, by = "product_id"

Part 2: Specific Questions

This part of the report will be directed to product managers throughout the client’s company. The idea is to give them the useful information they need to act on the specific questions they posed. Plan your communication accordingly.

For this report, make sure to use all of the data that was provided to you for the month. If you do note any issues with the data (Part 3), this can be reported to the engineering team for them to resolve.

Q1

Question

During the first week of the month, what were the 10 most viewed products? Show the results in a table with the product’s identifier, category, and count of the number of views.

Answer

##          product_id category count
## 1  gjcsELW14TF4dxot    shirt   214
## 2  wvMbusrgZjBr04Ho    pants   213
## 3  CIZkDqF9GheJ1mhT     coat   211
## 4  Nx6fZQAusZtV1JVN    shirt   209
## 5  UDawXFXMc4Sn5RB9    shirt   203
## 6  T2IDb82DS8ecGoJc    shirt   198
## 7  I9vhjbvj5rD4A2Wk    pants   197
## 8  XTmI0uGhAVgmdypt    shirt   196
## 9  QxYq5yX0hNbPy5QH    shirt   195
## 10 yyEeOsvUazwVtH6B    shirt   195

Q2

Question

During the whole month, what were the 10 most viewed products for each category? Show the results in separate tables by category. Including only the product’s identifier and the count of the number of views.

Answer

Shirt
## # A tibble: 10 x 2
##    product_id       count
##    <chr>            <int>
##  1 T2IDb82DS8ecGoJc   748
##  2 GgBnL6Gx07tK8VrF   714
##  3 oaTfgLKNy3efmBL6   685
##  4 U8WJMwXa3Sb2vRR0   681
##  5 fprdTtl7TGcmFFUh   678
##  6 v6rCnGI4bllVLA3p   678
##  7 dhS5UCHe5VF7lYBL   677
##  8 9HmaGQzI6b1ayJcL   674
##  9 4DQewZYVwU9H4Acv   670
## 10 GFdFcxNxmh9isgbY   666
Pants
## # A tibble: 10 x 2
##    product_id       count
##    <chr>            <int>
##  1 WNIGeNa97YnOpAdw   661
##  2 Zh8mTIQ1dBtUKlwU   650
##  3 hUuAPXDMWPjFK8XE   645
##  4 Cu5V1RJBS2v1QO2c   637
##  5 m8sSxzMEBzgSZHR5   637
##  6 V9poxltYd3UFTOmP   636
##  7 4pPqOvA5nnWA85WG   633
##  8 mQLZIH7cioPHdpy4   633
##  9 KjCVeq5KL2ihjuPr   630
## 10 nOJryj3XEq4ZL9OW   629
Coat
## # A tibble: 10 x 2
##    product_id       count
##    <chr>            <int>
##  1 EqKAFYosFdW1Pifo   695
##  2 sKNpoakq96XF7dv6   681
##  3 gevXa0ZpHKbiD2qK   661
##  4 EIYARupNVVHG93XY   653
##  5 Udjqm2TqXATY1AXZ   652
##  6 A8miBlolQ84S98qH   643
##  7 lGeofqOvFo0u3ZcJ   642
##  8 xTqbcfoBuKC02GGt   642
##  9 RBjQIkr1qBnpOL6z   641
## 10 rfjMEqCigHn0dBpz   641
Shoes
## # A tibble: 10 x 2
##    product_id       count
##    <chr>            <int>
##  1 LQv262onJ6CMQV3V   701
##  2 uXClWK2bruOsmV1u   700
##  3 57XAlBm7ISzPKLMo   693
##  4 2TNPbNJ2D2LQRLUX   675
##  5 8CuIuVvu9tWZotSE   663
##  6 9RylmoAfWeianhHM   662
##  7 GnKoiyttZFZAzmQn   653
##  8 1u8pkz2FiAKtkZJF   648
##  9 CuGQAYoSaQWiyAkO   645
## 10 C9QithXAFKZ50gD6   644
Hat
## # A tibble: 10 x 2
##    product_id       count
##    <chr>            <int>
##  1 hTQCSZpdaNEZMvhY   666
##  2 bJkcF4WYOfvws3qd   644
##  3 BOMyBJR1eqUShjDM   643
##  4 uBcSuyl4Qnl5sx8f   627
##  5 tBbGJFMpvB7Jw2yH   624
##  6 pY41R0877u9G95z4   623
##  7 W7AWmeOCRO8O7zqG   620
##  8 WcJzZvenCnI439HI   616
##  9 nX4rqev3NEaPwvqL   614
## 10 a3NEL240yJrFzqK8   608

Q3

Question

What was the total revenue for each category of product during the month? Show the results in a single table sorted in decreasing order.

Answer

## # A tibble: 5 x 2
##   category  revenue
##   <chr>       <dbl>
## 1 shirt    6956670.
## 2 coat     6028708.
## 3 shoes    3515731.
## 4 pants     823091.
## 5 hat       155830.

Q4

Question

Among customers with at least one transaction, show the average, median, and standard deviation of the customers’ monthly spending on the site.

Answer

## Average monthly spending: $ 494.81
## 
## Median monthly spending: $ 257.56
## 
## Standard deviation monthly spending: $ 799.08

Q5

Question

What is the percentage distribution of spending by gender? Show the amount of revenue and the percentage.

Answer

## # A tibble: 2 x 3
##   gender    spend percentage
##   <chr>     <dbl>      <dbl>
## 1 F      8785185.      0.503
## 2 M      8694846.      0.497

Q6

Question

Using linear regression, what is the effect of an extra ten thousand dollars of income on monthly spending for a customer while adjusting for age, gender, and region?

Answer

## $ 10.58

Q7

Question

Among customers who viewed at least 1 product, how many had at least one purchase during the month? Show the total number and as a percentage of the users with a view.

Answer

## Joining, by = "customer_id"
##                                         Criteria Number Percentage
## 1:                      Made at least 1 purchase  35327       52.5
## 2: Total customers who viewed at least 1 product  67345      100.0

Q8

Question

Now let’s look at the viewing habits in different age groups, including 18-34, 35-49, 50-64, and 65+. Within each group, what were the mean, median, and standard deviation for the number of unique products viewed per customer?

Answer

## For age group 18-34, the mean number of unique products viewed is  89.2 , the median is  41 , and the standard deviation is  127 .
## For age group 35-49, the mean number of unique products viewed is  94.5 , the median is  42 , and the standard deviation is  140.5 .
## For age group 50-64, the mean number of unique products viewed is  87.3 , the median is  41 , and the standard deviation is  122.2 .
## For age group 65+, the mean number of unique products viewed is  68.3 , the median is  40 , and the standard deviation is  82.9 .

Q9

Question

What is the correlation between a user’s total page views and total spending? For customers without a transaction, include their spending as zero.

Answer

## Joining, by = "customer_id"
## [1] 0.8156881

Q10

Question

Which customer purchased the largest number of coats? In the event of a tie, include all of the users who reached this value. Show their identifiers and total volume.

Answer

## # A tibble: 1 x 2
##   customer_id  coats
##   <chr>        <int>
## 1 cnwsiHuMZvd1    27