Station Casinos is the leading provider of gaming and entertainment to the local residents of Las Vegas, Nevada. As the largest non-union gaming company in the country, Station Casinos owns and operates 18 casino and/or hotel properties in the Las Vegas area.
Currently, Station Casinos management team is interested in gathering and transforming data into useable information. This can assist the decision-making process and help with analyzing results of a variety of operational and marketing initiatives in the organization.
This objective of this report is to provide an exploratory analysis of the data gathered per game in a selected day in order to obtain an understanding of the typical pattern in players behavior in terms of games preferences, frequency, and dollars spent. This information will provide insights to the management team regarding the nature of players that should receive comps due to the value that they add to the casino.
The bar chart below showcases that Slot, BJ, and Craps are the games with the highest revenue of the selected day. Almost 70% of the revenue relies on the games Slot, BJ, and Craps (See Table 1). On the other hand, games such as Bac, Poker and Bingo, are the games with a very poor performance in terms of revenue as the revenue obtained on each of those games represents less than 8% in a day, with Bingo ranked the game with least revenue as per the bar plot and the below table 1.
data <- read_csv("file.csv", skip_empty_rows = TRUE)
data1 <- data%>%gather(Games, Amount, "Slots", "BJ", "Craps", "Bac", "Bingo", "Poker", "Other")%>%dplyr::select(-`Total Spend`, -X1 )
data2 <- data1%>%group_by(Games)
data3 <- as.data.frame(data2%>%summarize(Total = sum(Amount, na.rm = TRUE)))%>%
arrange(desc(Total))
p<-ggplot(data=data3, aes(x= reorder(Games, -Total), y=Total)) +
geom_bar(stat="identity", fill="#ffcc00", color = "#235eb5" )+
theme_minimal() + geom_text(aes(label= Total), vjust= - 0.5, color="#0b1623", size=12, face="bold") + labs(x = "Games", y = "Total Amount Bet") + ggtitle("Graph 1: Total Revenue Amount by Game in a Day") + theme_bw() + theme(plot.title = element_text(hjust = 0.5, size = 30), axis.text.x = element_text(size = 30), axis.text.y = element_text(size = 25), axis.title.y = element_text(size = 15), axis.title.x = element_text(size = 15)) + scale_y_continuous(labels = function(data3) format(data3, big.mark = ",", scientific = FALSE))
p
datat <- data3%>%mutate(Percentage = round(Total/sum(Total)*100, 2))
colnames(datat) <- c("Games", "Revenue", "%")
kable(datat)
Games | Revenue | % |
---|---|---|
Slots | 1458871 | 26.00 |
BJ | 1416453 | 25.24 |
Craps | 1338136 | 23.84 |
Other | 664869 | 11.85 |
Bac | 410343 | 7.31 |
Poker | 272961 | 4.86 |
Bingo | 50432 | 0.90 |
Graph 2 represents the frequency distribution of the amount wagered by game. It illustrates that around 80% of players (4000 players) bet dollars amount between $0 and $100 for all the games except Slots. The other 20% of players bet amounts between $100 and $6000.
hist <- geom_histogram(bins = 10, binwidth = 500, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
slots <-ggplot(data = data, aes(x= Slots)) + ggtitle("Slots") + hist + axis +theme
BJ <-ggplot(data = data, aes(x= BJ)) + ggtitle("BJ") + hist + axis +theme
Craps <-ggplot(data = data, aes(x= Craps)) + ggtitle("Craps") + hist + axis +theme
Bac <-ggplot(data = data, aes(x= Bac)) + ggtitle("Bac") + hist + axis +theme
Bingo <-ggplot(data = data, aes(x= Bingo)) + ggtitle("Bingo") + hist + axis +theme
Poker <-ggplot(data = data, aes(x= Poker)) + ggtitle("Poker") + hist + axis +theme
Other <-ggplot(data = data, aes(x= Other)) + ggtitle("Other") + hist + axis +theme
figure1 <- ggarrange(slots, BJ, Craps, Bac, Bingo, Poker, Other + rremove("x.text"),ncol = 2, nrow = 5)
annotate_figure(figure1, top = text_grob("Graph 2: Wagered Amount per Player and Game", color = "Black", face = "bold", size = 20))
Graph 2 does not provide complete information of the players who bet higher dollar amounts. In order to obtain a better insight of the high value clients, the frequency distribution charts are generated individually to visualize the number of players that bet more than $500.
The frequency table for slots indicates that 1089 clients betted amounts between $500 and $1000 and 183 betted amount between $1000 and $2000 (See graph and frequency table below).
datan <- data%>%filter(Slots > 500)
hist <- geom_histogram(bins = 10, binwidth = 100, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
slots <-ggplot(data = datan, aes(x= Slots)) + ggtitle("Slots") + hist + axis +theme
slots
br = seq(0,2000,by=250)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Slots, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 250 | 3052 | 61.04 |
250 - 500 | 676 | 13.52 |
500 - 750 | 687 | 13.74 |
750 - 1000 | 402 | 8.04 |
1000 - 1250 | 126 | 2.52 |
1250 - 1500 | 39 | 0.78 |
1500 - 1750 | 15 | 0.30 |
1750 - 2000 | 3 | 0.06 |
In the case of BJ there are 245 clients who bettted amounts higher than $2000 (See graph and frequency table below).
datan1 <- data%>%filter(BJ > 500)
hist <- geom_histogram(bins = 10, binwidth = 500, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
BJ <-ggplot(data = datan1, aes(x= BJ)) + ggtitle("BJ") + hist + axis +theme
BJ
br = seq(0,8000,by=500)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$BJ, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 500 | 4658 | 93.16 |
500 - 1000 | 82 | 1.64 |
1000 - 1500 | 3 | 0.06 |
1500 - 2000 | 12 | 0.24 |
2000 - 2500 | 20 | 0.40 |
2500 - 3000 | 28 | 0.56 |
3000 - 3500 | 42 | 0.84 |
3500 - 4000 | 44 | 0.88 |
4000 - 4500 | 32 | 0.64 |
4500 - 5000 | 32 | 0.64 |
5000 - 5500 | 22 | 0.44 |
5500 - 6000 | 9 | 0.18 |
6000 - 6500 | 9 | 0.18 |
6500 - 7000 | 4 | 0.08 |
7000 - 7500 | 3 | 0.06 |
7500 - 8000 | 0 | 0.00 |
The frequency table for the game Craps indicates that 249 clients betted amounts higher than $2000 (See graph and frequency table below).
datan2 <- data%>%filter(Craps > 500)
hist <- geom_histogram(bins = 10, binwidth = 500, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
Craps <-ggplot(data = datan2, aes(x= Craps)) + ggtitle("Craps") + hist + axis +theme
Craps
br = seq(0,8000,by=500)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Craps, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 500 | 4673 | 93.46 |
500 - 1000 | 67 | 1.34 |
1000 - 1500 | 1 | 0.02 |
1500 - 2000 | 10 | 0.20 |
2000 - 2500 | 12 | 0.24 |
2500 - 3000 | 18 | 0.36 |
3000 - 3500 | 37 | 0.74 |
3500 - 4000 | 40 | 0.80 |
4000 - 4500 | 59 | 1.18 |
4500 - 5000 | 36 | 0.72 |
5000 - 5500 | 23 | 0.46 |
5500 - 6000 | 11 | 0.22 |
6000 - 6500 | 5 | 0.10 |
6500 - 7000 | 5 | 0.10 |
7000 - 7500 | 3 | 0.06 |
7500 - 8000 | 0 | 0.00 |
145 Clients who play Bac betted amounts between $1000 and $2000 and only one client betted an amount higher than $2000 (See graph and frequency table below).
datan3 <- data%>%filter(Bac > 500)
hist <- geom_histogram(bins = 10, binwidth = 100, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
Bac <-ggplot(data = datan3, aes(x= Bac)) + ggtitle("Bac") + hist + axis +theme
Bac
br = seq(0,2500,by=500)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Bac, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 500 | 4721 | 94.42 |
500 - 1000 | 133 | 2.66 |
1000 - 1500 | 124 | 2.48 |
1500 - 2000 | 21 | 0.42 |
2000 - 2500 | 1 | 0.02 |
The table and graph indicate that all the players betted amounts no higher than $500 (See graph and frequency table below). 90% of the clients bet amounts lower than $100.
datan4 <- data%>%filter(Bingo > 25)
hist <- geom_histogram(bins = 10, binwidth = 25, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
Bingo <-ggplot(data = datan4, aes(x= Bingo)) + ggtitle("Bingo") + hist + axis +theme
Bingo
br = seq(0,500,by=50)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Bingo, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 50 | 4526 | 90.52 |
50 - 100 | 224 | 4.48 |
100 - 150 | 219 | 4.38 |
150 - 200 | 30 | 0.60 |
200 - 250 | 1 | 0.02 |
250 - 300 | 0 | 0.00 |
300 - 350 | 0 | 0.00 |
350 - 400 | 0 | 0.00 |
400 - 450 | 0 | 0.00 |
450 - 500 | 0 | 0.00 |
In the case of the game Poker only 56 of the 5000 clients wagered amounts between $500 and $1000, and there are no clients who betted more than $1000 (See graph and frequency table below).
datan5 <- data%>%filter(Poker > 100)
hist <- geom_histogram(bins = 10, binwidth = 100, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
Poker <-ggplot(data = datan5, aes(x= Poker)) + ggtitle("Poker") + hist + axis +theme
Poker
br = seq(0,1000,by=100)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Poker, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 100 | 4007 | 80.14 |
100 - 200 | 418 | 8.36 |
200 - 300 | 440 | 8.80 |
300 - 400 | 57 | 1.14 |
400 - 500 | 22 | 0.44 |
500 - 600 | 31 | 0.62 |
600 - 700 | 17 | 0.34 |
700 - 800 | 4 | 0.08 |
800 - 900 | 3 | 0.06 |
900 - 1000 | 1 | 0.02 |
The table indicates that 76.8% of the players bet amounts no higher than $100 and 12% of the players betted amounts between $500 and $1000(See graph and frequency table below).
datan6 <- data%>%filter(Other > 100)
hist <- geom_histogram(bins = 10, binwidth = 100, boundary = 0, color= "#0b1623", fill = "#ffcc00")
axis <- labs(x = "Amount Wagered", y = "# Players")
theme <- theme_minimal()
Other <-ggplot(data = datan6, aes(x= Other)) + ggtitle("Other") + hist + axis +theme
Other
br = seq(0,1200,by=100)
ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(data$Other, breaks=br, include.lowest=TRUE, plot=FALSE)
frame1 <- data.frame(range = ranges, frequency = freq$counts)
fram1 <- frame1%>%mutate(Percentage = round(frequency/5000*100,3))
kable(fram1)
range | frequency | Percentage |
---|---|---|
0 - 100 | 3838 | 76.76 |
100 - 200 | 25 | 0.50 |
200 - 300 | 68 | 1.36 |
300 - 400 | 175 | 3.50 |
400 - 500 | 285 | 5.70 |
500 - 600 | 330 | 6.60 |
600 - 700 | 173 | 3.46 |
700 - 800 | 80 | 1.60 |
800 - 900 | 22 | 0.44 |
900 - 1000 | 3 | 0.06 |
1000 - 1100 | 1 | 0.02 |
1100 - 1200 | 0 | 0.00 |
After analyzing the frequency distribution for each game, the high value clients for the casino play the games Slots, BJ, Craps and Bac. BJ and Craps are games where approximately 245 clients bet amounts higher than $2000, which represents the clients with higher budget and value for the casino.
Initially, the number of dollars per player was converted to a percentage of the total bet amount. After that, the players were grouped in 6 clusters or groups. The table below displays the average of the percentage per game per clusters. For example, cluster 6 is characterized by the fact that players bet mainly in the game slots as the average percentage amount of betting is 92%, however their wallet size in only an average of $100. On the other hand, players in clusters 1 and 2 have more betting capacity as their average total spend is 2023 and 9992 respectively. Also, players in cluster 1 and 2 split their budget between the games Slots, BJ, Craps and Bac.
Cluster 4 and 3 are the groups where players wager mostly in Slots (50%) and Other games (15%) with an average expenditure of $673 and $459 respectively.
set.seed(20)
test <- data%>%dplyr::select(-X1)
for (var in 1:7) {
test[[var]] <- test[[var]] / test[['Total Spend']]
}
set.seed(20)
clusters <- kmeans(test, 6)
clustercenters<- as.data.frame(round(clusters$centers, digits = 2))
clustercenters <- clustercenters%>% mutate(cluster = row_number())
kable(clustercenters)
Slots | BJ | Craps | Bac | Bingo | Poker | Other | Total Spend | cluster |
---|---|---|---|---|---|---|---|---|
0.34 | 0.15 | 0.10 | 0.05 | 0.00 | 0.10 | 0.25 | 2023.99 | 1 |
0.10 | 0.38 | 0.41 | 0.11 | 0.00 | 0.00 | 0.00 | 9992.17 | 2 |
0.50 | 0.09 | 0.07 | 0.03 | 0.08 | 0.07 | 0.16 | 458.95 | 3 |
0.49 | 0.11 | 0.09 | 0.05 | 0.05 | 0.08 | 0.14 | 673.30 | 4 |
0.37 | 0.15 | 0.10 | 0.05 | 0.01 | 0.09 | 0.24 | 210.70 | 5 |
0.92 | 0.02 | 0.01 | 0.01 | 0.00 | 0.01 | 0.03 | 104.72 | 6 |
ak |
The following graph displays a scatterplot for the different combination of the games. Each color in each scatter plot represents the 6 clusters. For example, the scatterplot with games Craps and BJ, showcases cluster 2 as a very distinct group of the others as players spent most of their budget playing the game BJ and Craps and also, they do not play games such as Bingo, Poker and Other.
The uniqueness of cluster 2 is also displayed in the scatterplots that illustrate the relationship between Total Spend and the games BJ, Slots, Craps, Bac. As described earlier, this cluster is represented by the players that spend around 9000 dollars and do not play games such as Bingo, Poker and other. It indicates that cluster two segments the high-value players for the casino. As a result, the loyalty program should provide better value comps to players in cluster two in order to incentive them to bet more frequently.
data10 <- test%>%mutate(clusters = as.factor(clusters$cluster))
ggpairs(data10, columns = 1:8, aes(colour=clusters, legends = T), legend = c(1,1), title="Clusters: k = 6",
lower=list(continuous="points"),
upper=list(continuous="blank"), switch="both") +
theme(legend.position = "bottom")
The graph below showcases the mentioned scatterplots, where it is possible to visualize high-value players in cluster 2.
data10 <- test%>%mutate(clusters = as.factor(clusters$cluster))
ggpairs(data10, columns = c(1,2,3,8), aes(colour=clusters, legends = T), legend = c(1,1), title="Clusters: k = 6 (Slots, BJ, Craps, Bac and Total Spend)",
lower=list(continuous="points"),
upper=list(continuous="blank"), switch="both") +
theme(legend.position = "bottom")
The objective of the casino management team is to identify and match each player to their specific group, defined by the games they play and how much they wager. These groupings should aim to differentiate the type and value of the comps that are offered to the players. This report has deployed segmentation analysis, where the high-value players were identified using the k -mean cluster analysis technique. The main insights of the segmentation analysis are:
• Cluster 2 is the group which represents the players whose gambling budget could be around $9000 per day. An essential characteristic of this group is the choice of the games as those players only bet on games such as Slots, BJ, Craps and Bac which are the games where the casino has a greater and more controllable ‘edge’. It is also important to highlight that those clients do not play games such as Bingo, Poker and others.
• Cluster 1 is also an important group as it is formed by players whose wallet size is around $2000 in a day.
• Clusters 3 and 4 are the groups where players wager mostly in Slots and Other games respectively but with average amounts between $460 and $670. Their gambling budget is not significantly high, but it is not considered low either.
• Clusters 6 and 5 are characterized as the group of players that have the lowest average spend in all the games with average amounts less than $250 in a day.
The casino considers that a critical strategy in the company loyalty program is to match the value of the comps to the value of the guest to the casino. Hence, according to those insights, players in cluster 1 and 2 should be provided with high-value comps to reward their loyalty and retain them as clients. Players in clusters 3 and 4 could also be offered comps to motivate the frequency and amount wagered; however, players in clusters 6 and 5 should not be offered high-value comps as their gambling budget could be around $210 per day (as is the case for cluster 6).
It is important to note that this analysis is performed with sample data for a single day. In order to obtain a greater overview and insights to decide the value of the comps, it is necessary to use training data that includes information for a longer time period.
library(RColorBrewer) # for brewer.pal
palette(brewer.pal(8, "Set1"))
test <- data%>%dplyr::select(-X1)
for (var in 1:7) {
test[[var]] <- test[[var]] / test[['Total Spend']]
}
sdata <- scale(test)
km <- kmeans(sdata, 6, nstart = 1000)
pairs(test, col = km$cluster)