Operation Analytics - Casino
Import Libraries
Import Data
Explore Data
datatable(dataCasino)
str(dataCasino)
## Classes 'tbl_df', 'tbl' and 'data.frame': 5000 obs. of 9 variables:
## $ ID : chr "Player 1" "Player 2" "Player 3" "Player 4" ...
## $ Slots : int 1013 68 148 63 92 658 358 139 673 65 ...
## $ BJ : int 6190 23 0 17 44 0 359 0 279 15 ...
## $ Craps : int 4276 23 0 28 18 0 172 0 103 10 ...
## $ Bac : int 868 12 0 9 10 0 103 0 103 12 ...
## $ Bingo : int 0 0 0 0 0 106 0 0 0 0 ...
## $ Poker : int 0 28 0 23 26 0 258 0 161 26 ...
## $ Other : int 0 53 0 52 60 0 422 0 546 69 ...
## $ Total Spend: int 12348 207 148 193 250 764 1671 139 1866 196 ...
There are 5000 observations with 9 variables. The first column shows the player ID (in running order), the next 7 columns show the amount spend on each type of game and the last column shows the total spend by each player.
Correlation Clusters
Let’s take a look at the correlation clusters. We see that players who play Bingo fall into one cluster, and Poker/Others into another. The remaining - Slots, Bac, Craps and BJ - fall into the third cluster. The correlation between BJ, Bac and Craps, in particular, is very high. People tend to play these games together. Bingo, on the other hand, has low correlations with the other games, suggesting people who play Bingo tend to stick to Bingo.
cor(dataCasino[,2:9]) %>% corrplot(order = "hclust", addrect = 3, method ="number", tl.cex=0.9)
Visualisation
Let’s take a quick look at the distributions of the variables. We can see that there are outliers and the distibutions are highly skewed.
Desc(dataCasino$Slots, main = "Slots", plotit = TRUE)
## -------------------------------------------------------------------------
## Slots
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 992 625 291.77 282.71
## 100.0% 0.0% 12.5% 300.84
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 62.00 104.00 507.00 794.00 935.05
##
## range sd vcoef mad IQR skew kurt
## 1'861.00 326.86 1.12 154.19 445.00 1.27 1.01
##
## lowest : 0 (625), 9, 10, 11, 13 (2)
## highest: 1'673, 1'697, 1'753, 1'854, 1'861
Desc(dataCasino$BJ, main = "BJ", plotit = TRUE)
## -------------------------------------------------------------------------
## BJ
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 735 2'084 283.29 258.36
## 100.0% 0.0% 41.7% 308.22
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 24.00 190.00 404.10 1'768.35
##
## range sd vcoef mad IQR skew kurt
## 7'294.00 899.17 3.17 35.58 190.00 4.52 20.87
##
## lowest : 0 (2'084), 3, 4 (3), 5 (2), 7 (3)
## highest: 6'778, 6'839, 7'052, 7'104, 7'294
Desc(dataCasino$Craps, main = "Craps", plotit = TRUE)
## -------------------------------------------------------------------------
## Craps
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 619 2'128 267.63 241.74
## 100.0% 0.0% 42.6% 293.52
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 16.00 117.50 265.10 1'942.65
##
## range sd vcoef mad IQR skew kurt
## 7'251.00 933.88 3.49 23.72 117.50 4.44 19.45
##
## lowest : 0 (2'128), 2, 3, 4 (4), 5 (7)
## highest: 6'749, 6'869, 7'059, 7'111, 7'251
Desc(dataCasino$Bac, main = "Bac", plotit = TRUE)
## -------------------------------------------------------------------------
## Bac
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 438 2'180 82.07 75.22
## 100.0% 0.0% 43.6% 88.92
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 7.00 35.00 122.00 620.05
##
## range sd vcoef mad IQR skew kurt
## 2'254.00 247.14 3.01 10.38 35.00 4.20 18.09
##
## lowest : 0 (2'180), 1, 2 (7), 3 (11), 4 (31)
## highest: 1'752, 1'807, 1'893, 1'990, 2'254
Desc(dataCasino$Bingo, main = "Bingo", plotit = TRUE)
## -------------------------------------------------------------------------
## Bingo
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 135 4'506 10.09 9.20
## 100.0% 0.0% 90.1% 10.97
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 0.00 0.00 0.00 100.05
##
## range sd vcoef mad IQR skew kurt
## 213.00 31.94 3.17 0.00 0.00 3.12 8.67
##
## lowest : 0 (4'506), 5, 17, 23, 24
## highest: 172 (2), 175, 180, 184, 213
Desc(dataCasino$Poker, main = "Poker", plotit = TRUE)
## -------------------------------------------------------------------------
## Poker
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 373 2'395 54.59 51.66
## 100.0% 0.0% 47.9% 57.53
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 11.00 27.00 214.00 260.00
##
## range sd vcoef mad IQR skew kurt
## 914.00 105.86 1.94 16.31 27.00 2.78 9.90
##
## lowest : 0 (2'395), 1, 3, 4 (4), 5 (7)
## highest: 749, 812, 828, 860, 914
Desc(dataCasino$Other, main = "Other", plotit = TRUE)
## -------------------------------------------------------------------------
## Other
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 596 2'278 132.97 126.94
## 100.0% 0.0% 45.6% 139.00
##
## .05 .10 .25 median .75 .90 .95
## 0.00 0.00 0.00 34.50 74.00 536.00 610.05
##
## range sd vcoef mad IQR skew kurt
## 1'025.00 217.51 1.64 51.15 74.00 1.57 1.04
##
## lowest : 0 (2'278), 3, 4, 6, 7
## highest: 899 (2), 901, 962, 972, 1'025
Desc(dataCasino$`Total Spend`, main = "Total Spend", plotit = TRUE)
## -------------------------------------------------------------------------
## Total Spend
##
## length n NAs unique 0s mean meanCI
## 5'000 5'000 0 1'597 0 1'122.42 1'060.70
## 100.0% 0.0% 0.0% 1'184.14
##
## .05 .10 .25 median .75 .90 .95
## 87.00 113.00 185.00 336.50 757.00 2'185.20 6'740.10
##
## range sd vcoef mad IQR skew kurt
## 15'569.00 2'226.22 1.98 292.81 572.00 3.66 13.49
##
## lowest : 13, 16, 18, 25, 27
## highest: 14'330, 14'333, 14'394, 14'568, 15'582
We are interested in the impact of removing Bingo. Note that Bingo only contributes 0.9% of total spend. The bulk of total spend comes from Slots, BJ and Craps.
apply(dataCasino[,2:9],2,sum)
## Slots BJ Craps Bac Bingo Poker
## 1458871 1416453 1338136 410343 50432 272961
## Other Total Spend
## 664869 5612088
(bingoPct <- sum(dataCasino$Bingo))/sum(dataCasino$`Total Spend`)*100
## [1] 0.8986317
Simple Linear Regression
fit <- lm(`Total Spend` ~ Bingo*Slots + Bingo*BJ + Bingo*Craps, data=dataCasino)
stargazer(fit, type = "text",
dep.var.labels.include = TRUE, column.labels = c("Linear", "LinearInteration"))
##
## ================================================
## Dependent variable:
## ----------------------------
## `Total Spend`
## Linear
## ------------------------------------------------
## Bingo 0.296
## (0.347)
##
## Slots 1.685***
## (0.013)
##
## BJ 1.061***
## (0.009)
##
## Craps 1.017***
## (0.009)
##
## Bingo:Slots -0.006***
## (0.001)
##
## Bingo:BJ
##
##
## Bingo:Craps
##
##
## Constant 79.622***
## (4.834)
##
## ------------------------------------------------
## Observations 5,000
## R2 0.988
## Adjusted R2 0.988
## Residual Std. Error 248.364 (df = 4994)
## F Statistic 79,330.240*** (df = 5; 4994)
## ================================================
## Note: *p<0.1; **p<0.05; ***p<0.01