Question 1

The following analysis was done in R, where the data was read in through the provided spreadsheet saved as “shopify.csv”. Below is code to read in the dataset and to output column summaries.

dataset <- read.csv("shopify.csv")
summary(dataset)
##     order_id       shop_id          user_id       order_amount   
##  Min.   :   1   Min.   :  1.00   Min.   :607.0   Min.   :    90  
##  1st Qu.:1251   1st Qu.: 24.00   1st Qu.:775.0   1st Qu.:   163  
##  Median :2500   Median : 50.00   Median :849.0   Median :   284  
##  Mean   :2500   Mean   : 50.08   Mean   :849.1   Mean   :  3145  
##  3rd Qu.:3750   3rd Qu.: 75.00   3rd Qu.:925.0   3rd Qu.:   390  
##  Max.   :5000   Max.   :100.00   Max.   :999.0   Max.   :704000  
##   total_items       payment_method      created_at       
##  Min.   :   1.000   Length:5000        Length:5000       
##  1st Qu.:   1.000   Class :character   Class :character  
##  Median :   2.000   Mode  :character   Mode  :character  
##  Mean   :   8.787                                        
##  3rd Qu.:   3.000                                        
##  Max.   :2000.000
  1. The problem with the calculated value of $3145.13 is that this value is calculated across 100 different shops as opposed to each individual shop. We can filter the rows with order_amount greater than $1100 using the R command:
dataset[dataset$order_amount > 10000,c("order_id", "shop_id", "user_id", "order_amount", "total_items")]
##      order_id shop_id user_id order_amount total_items
## 16         16      42     607       704000        2000
## 61         61      42     607       704000        2000
## 161       161      78     990        25725           1
## 491       491      78     936        51450           2
## 494       494      78     983        51450           2
## 512       512      78     967        51450           2
## 521       521      42     607       704000        2000
## 618       618      78     760        51450           2
## 692       692      78     878       154350           6
## 1057     1057      78     800        25725           1
## 1105     1105      42     607       704000        2000
## 1194     1194      78     944        25725           1
## 1205     1205      78     970        25725           1
## 1260     1260      78     775        77175           3
## 1363     1363      42     607       704000        2000
## 1385     1385      78     867        25725           1
## 1420     1420      78     912        25725           1
## 1437     1437      42     607       704000        2000
## 1453     1453      78     812        25725           1
## 1530     1530      78     810        51450           2
## 1563     1563      42     607       704000        2000
## 1603     1603      42     607       704000        2000
## 2154     2154      42     607       704000        2000
## 2271     2271      78     855        25725           1
## 2298     2298      42     607       704000        2000
## 2453     2453      78     709        51450           2
## 2493     2493      78     834       102900           4
## 2496     2496      78     707        51450           2
## 2513     2513      78     935        51450           2
## 2549     2549      78     861        25725           1
## 2565     2565      78     915        77175           3
## 2691     2691      78     962        77175           3
## 2774     2774      78     890        25725           1
## 2819     2819      78     869        51450           2
## 2822     2822      78     814        51450           2
## 2836     2836      42     607       704000        2000
## 2907     2907      78     817        77175           3
## 2923     2923      78     740        25725           1
## 2970     2970      42     607       704000        2000
## 3086     3086      78     910        25725           1
## 3102     3102      78     855        51450           2
## 3152     3152      78     745        25725           1
## 3168     3168      78     927        51450           2
## 3333     3333      42     607       704000        2000
## 3404     3404      78     928        77175           3
## 3441     3441      78     982        25725           1
## 3706     3706      78     828        51450           2
## 3725     3725      78     766        77175           3
## 3781     3781      78     889        25725           1
## 4041     4041      78     852        25725           1
## 4057     4057      42     607       704000        2000
## 4080     4080      78     946        51450           2
## 4193     4193      78     787        77175           3
## 4312     4312      78     960        51450           2
## 4413     4413      78     756        51450           2
## 4421     4421      78     969        77175           3
## 4506     4506      78     866        25725           1
## 4585     4585      78     997        25725           1
## 4647     4647      42     607       704000        2000
## 4716     4716      78     818        77175           3
## 4869     4869      42     607       704000        2000
## 4883     4883      42     607       704000        2000
## 4919     4919      78     823        25725           1

Looking at the results, we notice that the shops with shop_id 42 and 78 are the only shops that have orders greater than $1100. Customers who order from the shop with shop_id 42 typically order in large quantities, while the shop with shop_id 78 sells sneakers for $25725 a pair. These two outliers significantly increase the average order value across the 100 different shops and thus this method of calculating the AOV is an unreliable measure to use.


b. The metric I would use to report for this dataset is the average order value on a per shop_id basis. In R, we can aggregate the mean order amount by shop_id with the following R command:

aggregate(dataset[,c("order_amount"),drop=FALSE], by=list(shop_id=dataset$shop_id), FUN=mean)
##     shop_id order_amount
## 1         1     308.8182
## 2         2     174.3273
## 3         3     305.2500
## 4         4     258.5098
## 5         5     290.3111
## 6         6     383.5085
## 7         7     218.0000
## 8         8     241.0435
## 9         9     234.0000
## 10       10     332.3019
## 11       11     356.7347
## 12       12     352.6981
## 13       13     345.3968
## 14       14     242.0000
## 15       15     308.9423
## 16       16     270.1463
## 17       17     332.0755
## 18       18     342.5882
## 19       19     320.9062
## 20       20     251.5577
## 21       21     308.6957
## 22       22     273.7500
## 23       23     317.6727
## 24       24     320.7273
## 25       25     232.9167
## 26       26     341.2245
## 27       27     334.8704
## 28       28     320.3721
## 29       29     331.6207
## 30       30     295.0714
## 31       31     268.9787
## 32       32     189.9762
## 33       33     376.2750
## 34       34     234.2400
## 35       35     328.0000
## 36       36     254.8000
## 37       37     340.2083
## 38       38     390.8571
## 39       39     268.0000
## 40       40     295.1667
## 41       41     254.0000
## 42       42  235101.4902
## 43       43     333.9138
## 44       44     262.1538
## 45       45     269.3103
## 46       46     347.4419
## 47       47     259.1489
## 48       48     242.7750
## 49       49     279.9057
## 50       50     403.5455
## 51       51     361.8043
## 52       52     316.9268
## 53       53     214.1176
## 54       54     276.6400
## 55       55     327.7500
## 56       56     218.1892
## 57       57     296.7736
## 58       58     254.9492
## 59       59     358.9667
## 60       60     350.2340
## 61       61     344.4400
## 62       62     308.8372
## 63       63     264.9655
## 64       64     272.1860
## 65       65     330.8148
## 66       66     312.8868
## 67       67     272.6216
## 68       68     254.6383
## 69       69     264.1833
## 70       70     343.0678
## 71       71     323.0303
## 72       72     309.5652
## 73       73     335.6897
## 74       74     306.0000
## 75       75     240.7619
## 76       76     321.0714
## 77       77     280.8000
## 78       78   49213.0435
## 79       79     328.4815
## 80       80     299.6667
## 81       81     384.0000
## 82       82     349.7857
## 83       83     248.7857
## 84       84     342.3051
## 85       85     329.2571
## 86       86     277.5000
## 87       87     292.2692
## 88       88     355.5200
## 89       89     379.1475
## 90       90     403.2245
## 91       91     325.9259
## 92       92     162.8571
## 93       93     214.4746
## 94       94     297.7778
## 95       95     318.7692
## 96       96     330.0000
## 97       97     324.0000
## 98       98     245.3621
## 99       99     339.4444
## 100     100     213.6750


c. From the results of the command in part b above, we see that most shops have an average order value in the $200-$400 range. If we aggregate by total_items instead of order_amount, we see that most customers order 1-2 pairs of shoes per order.

aggregate(dataset$total_items, by=list(shop_id=dataset$shop_id), FUN=mean)
##     shop_id          x
## 1         1   1.954545
## 2         2   1.854545
## 3         3   2.062500
## 4         4   2.019608
## 5         5   2.044444
## 6         6   2.050847
## 7         7   1.946429
## 8         8   1.826087
## 9         9   1.983051
## 10       10   2.245283
## 11       11   1.938776
## 12       12   1.754717
## 13       13   2.158730
## 14       14   2.086207
## 15       15   2.019231
## 16       16   1.731707
## 17       17   1.886792
## 18       18   2.196078
## 19       19   1.968750
## 20       20   1.980769
## 21       21   2.173913
## 22       22   1.875000
## 23       23   2.036364
## 24       24   2.290909
## 25       25   1.791667
## 26       26   1.938776
## 27       27   1.981481
## 28       28   1.953488
## 29       29   2.034483
## 30       30   1.928571
## 31       31   2.085106
## 32       32   1.880952
## 33       33   2.175000
## 34       34   1.920000
## 35       35   2.000000
## 36       36   1.960000
## 37       37   2.395833
## 38       38   2.057143
## 39       39   2.000000
## 40       40   1.833333
## 41       41   2.152542
## 42       42 667.901961
## 43       43   1.844828
## 44       44   1.820513
## 45       45   1.896552
## 46       46   2.093023
## 47       47   1.787234
## 48       48   2.075000
## 49       49   2.169811
## 50       50   2.090909
## 51       51   1.934783
## 52       52   2.170732
## 53       53   1.911765
## 54       54   2.080000
## 55       55   1.916667
## 56       56   1.864865
## 57       57   2.018868
## 58       58   1.847458
## 59       59   2.016667
## 60       60   1.978723
## 61       61   2.180000
## 62       62   1.930233
## 63       63   1.948276
## 64       64   2.046512
## 65       65   2.148148
## 66       66   1.943396
## 67       67   2.081081
## 68       68   1.872340
## 69       69   2.016667
## 70       70   1.983051
## 71       71   1.969697
## 72       72   1.934783
## 73       73   2.034483
## 74       74   2.000000
## 75       75   1.880952
## 76       76   2.071429
## 77       77   1.800000
## 78       78   1.913043
## 79       79   1.814815
## 80       80   2.066667
## 81       81   2.169492
## 82       82   1.976190
## 83       83   1.928571
## 84       84   2.237288
## 85       85   1.914286
## 86       86   2.134615
## 87       87   1.961538
## 88       88   2.020000
## 89       89   1.934426
## 90       90   2.265306
## 91       91   2.037037
## 92       92   1.809524
## 93       93   1.881356
## 94       94   2.222222
## 95       95   1.897436
## 96       96   2.156863
## 97       97   2.000000
## 98       98   1.844828
## 99       99   1.740741
## 100     100   1.925000

These are reasonable numbers which would not have been obvious if the AOV was calculated over all 100 shops.


Question 2


a.

select count(orderid) 
from orders o 
inner join shippers s on s.shipperid = o.shipperid
where s.shipperid = 1

Output: 54


b.

select top 1 productname
from ((products p
inner join orderdetails od on od.productid = p.productid)
inner join orders o on o.orderid = od.orderid)
inner join customers c on c.customerid = o.customerid
where country = 'Germany'
group by productname
order by count(productname) desc

Output: Gorgonzola Telino


c. 

select top 1 e.lastname
from (employees e 
inner join orders o on e.employeeid = o.employeeid)
group by e.lastname
order by count(orderid) desc

Output: Peacock