Abstract

Euchre is a trick-taking card game that is played by 4 players on 2 teams. For each round of Euchre, each player gets dealt 5 cards that range from 9 to ace. Once the cards are dealt, the players determine what the trump suit should be, depending on the cards in hand. A nine (weakest rank) of the trump suit is more powerful than the ace (strongest rank) of a non-trump suit. When a player is selecting trump, the goal is for their team to win the majority of 5 tricks.

When deciding to bid trump, there is no straightforward process for doing so. Therefore, this project will determine what factors contribute to creating a good hand that is worthy to bid on. Since there are no datasets online that will achieve this goal, 300,000 Euchre rounds were simulated Monte Carlo style. Information about the hand and the call that should have been made based on the results have been recorded. Multiclass machine learning algorithms will be used to determine if a suit in hand should be passed on, bid trump, or be bid and played alone.


How to play Euchre

Because there are slightly different versions of Euchre, this is how Euchre will be played in this simulation.

  • There is no farming so no players can take cards that are buried.

  • Stick the dealer will be enforced which means the dealer must bid trump if the other 3 players pass.

  • Loners will start to the left of the dealer, not starting left of the player who made the bid.

  • Benny is not played.

  • Canadian Loner is not played.


Part 1: Dealing

The game starts with 4 players (two players on each team). The players sit at a square table and the members of each team face each other. In a 1st person perspective, I would be at the south part of the table. My partner would be at the north part. The members of the opposing team would sit to the left and right of me. Each player gets dealt 5 cards that can take the value ace, king, queen, jack, ten, or nine. Nobody can see anyone’s cards. Three cards will not be seen for that specific round and one card will be turned up in the center of the table. Each player takes turns being the dealer.


Part 2: Deciding to pick up the turned-up card

Going in clockwise direction, starting with the player sitting left of the dealer, each player will decide whether to pass or make the dealer pick up the turned up card. If that card gets picked up, the dealer will have to include it in their hand and have to discard a weaker card in their hand. The suit of the turned up card would become trump for that round. If the first 3 players pass, the decision goes to the dealer. They can pick up the card or pass. If they pass, the card is turned down and will not be used for that round. That suit would then be unable to be bid trump for that round.

TLDR. Each player can decide to make the suit of the turned up card trump and that is it for this phase. If all players agree to not bid the suit trump, they can decide on one of the three other suits.

It is important to know that a player can opt to play the round alone without the help of their partner. If they win all 5 tricks, their team can win a lot more points. A player can announce that they are going alone when they order trump.


Part 2.5. Deciding Trump

This step occurs if the turned up card is not picked up.

Going in clockwise direction, starting with the player sitting left of the dealer, each player will decide on which suit to make trump for that round. They can choose any suit, except for the suit that was recently turned down. If a player has a lot of cards of the same suit, they can opt to make that suit trump. If the first 3 players pass on bidding trump, the decision goes to the dealer who then is required to bid trump. Even if they have a bad hand, they have to choose a suit to be trump. However, some games don’t play with this rule. In that case, if everyone passes, the cards get redealt and the entire bidding process gets repeated. It will keep repeating until someone bids trump.

Again, a player can announce if they will be playing the hand alone when they choose a suit to be trump.

Parts 2 and 2.5 are what this research is based on. Part 3 (specified below) is not part of this research, but is hard-coded for the simulation. Furthermore, individual models will be built for parts 2 and 2.5. The best model for part 2 will have no influence on the best model for part 2.5 and vice versa.

An important caveat to know that unlike in the real game, the analyzed hand has total control of what suit to bid trump on. In most cases, the analyzed hand will not have the opportunity to bid trump because someone already did. For the sake of this research, the analyzed hand has total control because it makes the target variable easier to interpret.


Part 3. Playing cards

Starting with the player left of the dealer, in clockwise order, each player takes one card from their hand and plays it on the table. The suit of the first card that gets played is the “led suit.” The led suit can be a trump suit or a non-trump suit. Every player must play a card of the led suit if they have a card of that suit in hand. If a player does not have that suit, they can choose to play a trump card on it to try to win the trick (if the led suit isn’t a trump card). If possible, players can also play a “loser card” (low-value, off-suit card) if they don’t have the led suit in hand. This is usually done if there’s no chance they can win that trick with any of the cards in hand, or if their partner is already guaranteed to win the trick.

The player that puts the highest value card on the table wins the trick. The winning card must either be of the led suit (if no trump cards got played) or the most powerful trump card played. A high-rank, non-trump card that is a non-led suit is useless and carries as much value as a low-rank, non-trump card that is a non-led suit. The winner of the previous series goes first in the next trick-taking series (if there’s cards left). This process continues for all 5 trick-taking series that get played.

Once all cards are played, the round is over and the entire process repeats, starting from part 1, if neither team has a winning score yet.


Scoring

If the team that chooses trump wins 3 or 4 tricks, they win 1 point, regardless of if a member went alone or not.

If the team that chooses trump wins all 5 tricks, they win 2 points. This happens when both members play the round.

If the team that chooses trump wins all 5 tricks and only one player from the team plays that round, they win 4 points.

If the team that chooses trump wins less than 3 tricks, the opposing team wins 2 points (so it’s important to be careful when bidding trump since it’s risky).

The first team that wins at least 10 points wins the game.


Card power hierarchy

For this scenario, let’s assume that hearts are trump and clubs are led for a series. Starting from strongest to weakest, here are the card values for this specific series:

Jack of hearts (also known as Right Bower)

Jack of diamonds (also known as Left Bower) – Jack of the non-trump suit that is same color as the trump suit. It is not a diamond, but a heart for all rounds that hearts are trump.

Ace of hearts

King of hearts

Queen of hearts

Ten of hearts

Nine of hearts

Ace of clubs

King of clubs

Queen of clubs

Jack of clubs

Ten of clubs

Nine of clubs

Every other card not mentioned above


How Python Simulation made Euchre games

Baseline bidding method

For this project, south has the sole ability to bid trump for a given suit. This is not the case in real life, but for the sake of comparing results, south will have the full option of bidding trump.

Here is how the default selection method chooses trump:

When a card gets turned up at the beginning of the round, the summary data for that suit gets analyzed using the logic below. If it meets the specified criteria, it is bid trump.

If that card gets turned down, the strongest suit (the suit with the highest calculated strength) is used and gets analyzed.


The logic used for bidding on the turned up suit is similar to bidding on the other 3 suits. The logic is provided below.

GO ALONE if the hand satisfies these conditions:

  • There are 4 or more trump cards of a certain suit and the Left or Right Bower is present in hand

  • OR There are 3 or more trump cards of a certain suit, the Left or Right Bower is present in hand, and the non-trump cards are either king or ace rank

  • AND An opponent will not be picking up a jack

ORDER TRUMP if the hand satisfies these conditions:

  • There are at least 2 trump cards in hand, the Right Bower is present, and there are at least 2 non-trump cards of king or ace rank

  • OR There are at least 3 trump cards with the Right Bower present

  • OR There are at least 4 trump cards present

Otherwise — PASS


Determining the correct call based on simulation results

For each round of Euchre simulated, the simulation records how many tricks are won by south (the player analyzed) and the total tricks won by the player’s team. If prompted, the simulation runs the scenario of south playing the hand alone and counts the number of tricks south wins while going alone.

  • If the total number of tricks won by the team is less than 3, or the south player wins less than 2 tricks while playing with the partner and less than 2 tricks while going alone, the correct call is set to PASS.

  • If the total number of tricks won by the team is at least 3 AND either south is responsible for winning at least 1 trick while playing the hand with their partner and at least 2 while playing alone OR south is responsible for winning at least 2 tricks while playing with their partner and at least 1 while playing alone, the correct call is set to ORDER TRUMP.

  • If south wins all 5 tricks while going alone OR south wins 4 tricks without the Right Bower or with 4 trump cards in hand AND does not receive any help from their partner AND an opponent does not pick up a jack, the correct call is set to ORDER TRUMP ALONE.


Process logic for playing cards with all 4 players

First

  • If the Right Bower is in hand, play the Right Bower

  • Else if the hand has the expected strongest trump card that hasn’t been played yet, lead with that card

  • Else if at least 4 trump cards have been played and there are no non-trump aces in hand, but there are trump cards in hand, lead with the strongest available trump card

  • Else if the partner made the bid and there are no non-trump aces in hand, but there are trump cards in hand, lead with the strongest available trump card

  • Otherwise, lead with the strongest available non-trump card

Second

  • If the second player has a card of the suit that was led, they must play it
    • If the second player has a card stronger than the first card, play the strongest available card of the led suit
    • Otherwise, play the weakest available card of the led suit
  • If the second player has trump cards (excluding the expected strongest card that hasn’t been played), play those trump cards
    • If the second player made the bid and there are mid-value trump cards (queen, king, or ace) in hand, play the lowest available trump card from that list
    • Otherwise, play that highest available trump card that is not the expected strongest card
  • Else play the weakest card. Ideally, play a lower-value card of a suit that is the only card of that suit in hand

Third

  • If the third player has a card of the suit that was led, they must play it
    • If the third player has a card of the led suit that is stronger than the first 2 played cards and the partner is not guaranteed to win the trick, play the strongest available card of the led suit
    • Otherwise, play the weakest available card of the led suit
  • Else if the third player has a trump card and the first player’s card is not currently winning or if the first card is not a non-trump ace, play a trump card (preferably a mid-value trump card if there’s one available, or the weakest available trump card)
  • Else play the weakest card. Ideally, play a lower-value card of a suit that is the only card of that suit in hand

Fourth

  • If the fourth player has a card of the suit that was led, they must play it

    • If the fourth player has a card of the led suit that can win the trick that is currently not being won by the fourth player’s team, play the weakest available card of the led suit that can win the trick
    • Otherwise, play the weakest available card of the led suit
  • Else if the fourth player has a trump card that can win a trick that is currently not being won by the fourth player’s team, play the weakest available trump card that can win the fourth player the trick

  • Else play the weakest card. Ideally, play a lower-value card of a suit that is the only card of that suit in hand


Process logic for playing cards when a player is going alone

First

  • If the Right Bower is present, play the Right Bower (fishes for trump cards and guarantees a free trick)

  • Else if all the cards stronger than the strongest trump card in hand have been played, play the strongest available trump card in hand (also fishes for trump and guarantees a free trick)

  • Else if at least 3 total trump cards have been played and the strongest card in hand is a trump and there are no non-trump aces in hand, play the strongest available trump card in hand

  • Else play the strongest non-trump card in hand

Second

  • If the second player has a card in hand of the suit that was led, play a card of that suit
    • If the second player has a card of the led suit that is stronger than the first card, play the strongest available card of the led suit
    • Otherwise, play the weakest available card of the led suit
  • Else if the second player has trump cards (excluding the expected strongest card), and would not be trumping a partner’s played ace, play a trump card
    • If the first player made the bid, win the trick by playing the weakest available trump card
    • Otherwise, play a strong trump card (but not the expected strongest card)
  • Else play the lowest value card in hand

Third

  • If the third player has a card of the suit that was led, play a card of that suit

    • If the player has a card of the led suit that can win the trick that otherwise would not be won by the team, play the weakest available card of the led suit that would win the trick
    • Otherwise, play the weakest available card of the led suit
  • Else If the third player has a trump card that would win the trick for the team that otherwise would not be won, play the weakest available trump card that would win the trick

  • Else play the lowest value card in hand


Machine Learning Tactic

There will be machine learning models that get tested on each of the two parts of the bidding process. There will be models built on whether the turned up card gets picked up. The modeling process will be repeated on if the best suit in hand should be bid trump in the scenario the turned up card gets turned down.


Libraries

library(dplyr)
library(ggplot2)
library(rms)
library(fastDummies)
library(ROCR)
library(rpart)
library(randomForest)
library(kableExtra)
library(tidytext)
library(nnet)
library(FNN)
library(caret)
library(mda)
library(stringr)


Data Dictionary

When the datasets were made, they include the following columns:

Suit - Suit that gets selected as trump for that round - will remove due to no predictability.

Right_Bower - Indicator that the most powerful card is in the hand. This guarantees 1 trick. This card is the jack of the trump suit.

Left_Bower - Indicator that the second most powerful card is in the hand. This card is the jack of the suit that is the same color as the trump suit (jack of diamonds is the Left Bower if hearts are trump).

TAce - Indicator that the 3rd most powerful card is in the hand. This card is the ace of the trump suit.

TKing - Indicator that the 4th most powerful card is in the hand. This card is the king of the trump suit.

TQueen - Indicator that the 5th most powerful card is in the hand. This card is the queen of the trump suit.

TTen - Indicator that the 6th most powerful card is in the hand. This card is the ten of the trump suit.

TNine - Indicator that the 7th most powerful card is in the hand. This card is the nine of the trump suit.

Ace_count - Counts the number of non-trump ace cards in hand.

King_count - Counts the number of non-trump king cards in hand.

Queen_count - Counts the number of non-trump queen cards in hand.

Jack_count - Counts the number of non-trump jack cards in hand. These jack cards are a different color than the trump suit.

Ten_count - Counts the number of non-trump ten cards in hand.

Nine_count - Counts the number of non-trump nine cards in hand.

Count_Trump - Counts the number of cards in hand that are the trump suit.

Strength - An algorithm to calculate the strength of hand. Will remove due to potential multicollinearity and difficulty interpreting in real-life Euchre game.

Non_Trump_Suit_Count - Counts the total number of different suits in hand, excluding the trump suit.

Non_Trump_Turned_Down_Color_Ind - Indicator for if the trump suit selected is the same color of the suit that was turned up and passed on being trump.

Start_Order - The order in which the analyzed player starts in the first trick-taking series in a round. Will get removed due to Broad_Start_Order being a simpler variable that can process loners better.

Dealer - Indicator for if the analyzed player was the dealer for the round.

Strongest_Card - Value of the strongest card in hand when trump is selected.

Broad_Start_Order - A more broad description of when the analyzed player starts in the first trick-taking series in a round. It has the following values: First (when the starting order is 1), Middle (when the starting order is 2 or 3), or Last (when the starting order is 4).

ind_picked_up - Indicates what card (if any) the analyzed player picks up.

p_picked_up - Indicates what card (if any) the analyzed player’s partner picks up.

ind_picked_up - Indicates what card (if any) the analyzed player’s opponent picks up.

team_ind_tricks_won - Counts the number of tricks the analyzed player wins when playing a simulated round with their partner (ranges from 0-5).

team_tricks_won - Counts the total number of tricks the analyzed player and their partner win when playing a simulated round together (ranges from 0-5).

alone_tricks_won - Counts the number of tricks the analyzed player wins when playing a simulated round alone without their partner (ranges from 0-5).

predicted_call - The hard coded predicted decision based on the player’s hand. This will get treated as the baseline prediction when comparing model performances.

correct_call - The correct decision a player should make based on the scores from “team_ind_tricks_won”, “team_tricks_won”, and “alone_tricks_won”. This is the target variable for this project.


Part 1: Bidding on the Turned Up Suit

Load in Euchre dataframe simulated in Python

euchre_turn_up = read.csv('euchre_turn_up_stats_train.csv')
head(euchre_turn_up) %>%
  kbl() %>%
  kable_styling()
Suit Right_Bower Left_Bower TAce TKing TQueen TTen TNine Ace_count King_count Queen_count Jack_count Ten_count Nine_count Count_Trump Strength Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind Start_Order Dealer Strongest_Card Broad_Start_Order ind_picked_up p_picked_up opp_picked_up team_ind_tricks_won team_tricks_won alone_tricks_won predicted_call correct_call
Spades 0 1 0 0 1 0 0 0 0 1 1 1 0 2 30 3 0 1 0 12 First None None Ten 1 2 0 Pass Pass
Clubs 0 1 0 1 1 1 0 0 1 0 0 0 0 4 44 1 0 4 1 12 Last Ten None None 4 5 4 Order Trump Alone Order Trump
Spades 1 0 0 0 0 0 0 2 0 0 0 0 2 1 27 3 0 3 0 13 Middle None None Ten 2 2 0 Pass Pass
Diamonds 1 0 0 0 0 1 0 2 0 0 0 0 1 2 34 2 0 2 0 13 Middle None King None 2 3 3 Order Trump Order Trump
Hearts 1 1 0 0 1 0 0 0 0 0 1 1 0 3 39 2 0 1 0 13 First None None Ace 2 4 2 Order Trump Order Trump
Diamonds 0 1 0 0 1 0 0 0 1 0 1 1 0 2 31 2 0 4 1 12 Last Queen None None 1 3 1 Pass Pass


Exploratory Data Analysis

summary(euchre_turn_up)
##      Suit            Right_Bower       Left_Bower          TAce      
##  Length:100000      Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Mode  :character   Median :0.0000   Median :0.0000   Median :0.000  
##                     Mean   :0.2227   Mean   :0.2202   Mean   :0.224  
##                     3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.000  
##                     Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##      TKing            TQueen            TTen            TNine      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.2243   Mean   :0.2217   Mean   :0.2229   Mean   :0.222  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##    Ace_count      King_count      Queen_count       Jack_count    
##  Min.   :0.00   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.00   Median :1.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.65   Mean   :0.6478   Mean   :0.6334   Mean   :0.4115  
##  3rd Qu.:1.00   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :3.00   Max.   :3.0000   Max.   :3.0000   Max.   :2.0000  
##    Ten_count        Nine_count      Count_Trump       Strength    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   : 7.00  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:23.00  
##  Median :0.0000   Median :0.0000   Median :1.000   Median :28.00  
##  Mean   :0.5866   Mean   :0.5129   Mean   :1.558   Mean   :28.17  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:33.00  
##  Max.   :3.0000   Max.   :3.0000   Max.   :5.000   Max.   :55.00  
##  Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind  Start_Order  
##  Min.   :0.000        Min.   :0                       Min.   :1.00  
##  1st Qu.:2.000        1st Qu.:0                       1st Qu.:1.75  
##  Median :2.000        Median :0                       Median :2.50  
##  Mean   :2.285        Mean   :0                       Mean   :2.50  
##  3rd Qu.:3.000        3rd Qu.:0                       3rd Qu.:3.25  
##  Max.   :3.000        Max.   :0                       Max.   :4.00  
##      Dealer     Strongest_Card  Broad_Start_Order  ind_picked_up     
##  Min.   :0.00   Min.   : 2.00   Length:100000      Length:100000     
##  1st Qu.:0.00   1st Qu.: 8.00   Class :character   Class :character  
##  Median :0.00   Median :11.00   Mode  :character   Mode  :character  
##  Mean   :0.25   Mean   :10.19                                        
##  3rd Qu.:0.25   3rd Qu.:12.00                                        
##  Max.   :1.00   Max.   :13.00                                        
##  p_picked_up        opp_picked_up      team_ind_tricks_won team_tricks_won
##  Length:100000      Length:100000      Min.   :0.00        Min.   :0.000  
##  Class :character   Class :character   1st Qu.:0.00        1st Qu.:1.000  
##  Mode  :character   Mode  :character   Median :1.00        Median :2.000  
##                                        Mean   :1.28        Mean   :2.488  
##                                        3rd Qu.:2.00        3rd Qu.:4.000  
##                                        Max.   :5.00        Max.   :5.000  
##  alone_tricks_won predicted_call     correct_call      
##  Min.   :0.000    Length:100000      Length:100000     
##  1st Qu.:0.000    Class :character   Class :character  
##  Median :0.000    Mode  :character   Mode  :character  
##  Mean   :1.084                                         
##  3rd Qu.:2.000                                         
##  Max.   :5.000

All the numerical values have the correct distributions associated. There can only be 0 or 1 of each trump card in a hand because only 1 suit can be trump. Also, the non-trump cards all have values that range from 0-3 with the exception of jacks. This is because there can be 3 non-trump suits. There can only be 2 non-trump jacks because of the Right Bower and the Left Bower, which means that 2 jack values are always trump.

There can only be 5 cards dealt to a hand which is why the “Count_Trump” values range from 0-5. In addition, there are 4 suits in a card game and one of them will be trump. Therefore, the maximum number of non-trump suits in a hand is 3. This value can be as low as 0 if a player only has trump cards.

All the other binary variables all have values that are either 0 or 1.

All this information concludes that this dataset is perfectly clean.


Outcome variable and predicted outcome variable

Here are the values that make up the target variable “correct_call” and the predicted target variable “predicted_call.”

Order Trump Alone - When a player has a hand with many cards that are the same suit as the turned up card, they can opt to “go alone.” If they win all 5 tricks, their team wins 4 points. If they win 3-4 tricks, their team gets 1 point. If they win less than 3 tricks, the other team wins 2 points. A player should “Order Trump Alone” when they believe they can win all the tricks without help from their partner.

Order Trump - When a player has a hand with a decent number of cards that are the same suit as the turned up card, they can opt to “order trump.” They play the round with their partner and can win 2 points if they win all 5 tricks, and 1 point if they win 3-4 tricks. If they win less than 3 tricks, the opposing team gets 2 points.

Pass - A player will “pass” if they simply don’t have a good hand that would win them a lot of tricks. They can also pass if they are starting first and have a better hand of a suit that wasn’t turned up. This is because they will have the opportunity to bid the suit they want if everyone passes on the turned up suit.


For simplicity, the target variables will get ordinal numerical imputations

Order Trump Alone - 2

Order Trump - 1

Pass - 0

euchre_turn_up$predicted_call = replace(euchre_turn_up$predicted_call, euchre_turn_up$predicted_call == "Pass", 0)
euchre_turn_up$predicted_call = replace(euchre_turn_up$predicted_call, euchre_turn_up$predicted_call == "Order Trump", 1)
euchre_turn_up$predicted_call = replace(euchre_turn_up$predicted_call, euchre_turn_up$predicted_call == "Order Trump Alone", 2)
euchre_turn_up$correct_call = replace(euchre_turn_up$correct_call, euchre_turn_up$correct_call == "Pass", 0)
euchre_turn_up$correct_call = replace(euchre_turn_up$correct_call, euchre_turn_up$correct_call == "Order Trump", 1)
euchre_turn_up$correct_call = replace(euchre_turn_up$correct_call, euchre_turn_up$correct_call == "Order Trump Alone", 2)
euchre_turn_up$ind_picked_up[euchre_turn_up$ind_picked_up == ''] = 'None'
euchre_turn_up$p_picked_up[euchre_turn_up$p_picked_up == ''] = 'None'
euchre_turn_up$opp_picked_up[euchre_turn_up$opp_picked_up == ''] = 'None'


View number of records that are associated with each output

table(euchre_turn_up$correct_call)
## 
##     0     1     2 
## 63405 32228  4367

The minority of records (~37%) are good enough to make a bid on.


View handmade predictions

table(euchre_turn_up$correct_call, euchre_turn_up$predicted_call, dnn=c('True', 'Pred'))
##     Pred
## True     0     1     2
##    0 61980  1266   159
##    1 24460  6099  1669
##    2   974  1053  2340

From this data, we can see that the vast majority of records are getting classified as zero. The handmade predictions, based on intuition, are not aggressive at bidding trump at all. It looks like there is a lot of room for improvement here.


Change data type of columns for plotting

euchre_turn_up[,names(euchre_turn_up)] = lapply(euchre_turn_up[,names(euchre_turn_up)], as.factor)
euchre_turn_up$Strength = as.integer(euchre_turn_up$Strength)


Convert character variables to factors

euchre_turn_up$ind_picked_up = factor(euchre_turn_up$ind_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_up$p_picked_up = factor(euchre_turn_up$p_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_up$opp_picked_up = factor(euchre_turn_up$opp_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_up$Broad_Start_Order = factor(euchre_turn_up$Broad_Start_Order, levels = c('First', 'Middle', 'Last'))


Show relationships between different variables and the correct call

for(column in names(euchre_turn_up[,2:25])){
  target = 'correct_call'
  column_name = column
  target_name = target
  
  
  if(class(euchre_turn_up[,column])=='integer' | class(euchre_turn_up[,column])=='numeric'){
    column = euchre_turn_up[, column]
    target = euchre_turn_up[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = target, y = column)) + 
      geom_boxplot() +
      labs(x = target_name, y = column_name) +
      coord_flip()
    print(p)
  }
  else{
    column = euchre_turn_up[, column]
    target = euchre_turn_up[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = column, fill = target)) + 
      ggtitle(paste('Proportion of', column_name, 'in each class')) +
      geom_bar(position = position_fill(reverse=TRUE)) +
      labs(x = column_name, y =' proportion', fill = target_name) +
      coord_flip()

    print(p)
  }
}

# end plots

Right Bower - Better hands significantly tend to have the Right Bower more frequently than not.

Left Bower - Better hands tend to have the Left Bower more frequently than not.

TAce - Better hands tend to have the trump ace more frequently than not.

TKing - Better hands tend to have the trump king more frequently than not.

TQueen - Better hands tend to have the trump queen more frequently than not.

TTen - Better hands tend to have the trump ten more frequently than not.

TNine - Better hands tend to have the trump nine more frequently than not.

Ace_count - Better hands tend to have at least one non-trump ace.

King_count - Worse hands slightly tend to have more non-trump kings.

Queen_count - Worse hands tend to have more non-trump queens.

Jack_Count - Worse hands tend to have more non-trump jacks.

Ten_count - Worse hands significantly tend to have more non-trump tens.

Nine_count - Worse hands significantly tend to have more non-trump nines.

Count_Trump - Having more trump cards is significantly more frequent in better hands.

Strength - Hands that go alone tend to be stronger while hands that pass tend to have lower strength. This variable is the sum of the assigned strength values of all cards in hand. It is difficult to calculate in a real-life game, so it will get removed.

Non_Trump_Suit_Count - Better hands tend to have less non-trump suits in hand.

Non_Trump_Turned_Down_Ind - Since someone always has to pick up the turned up card in this simulation, this column has no meaning and will be removed.

Start_Order - Players that initially start last tend to be more successful. Starting 1st or 3rd tends to be the most unfavorable (probably because neither the player nor their partner is dealing for that round).

Dealer - Players that are the dealer for a round tend to be more successful.

Strongest_Card - Hands where the strongest card is 12 or 13 (Left or Right Bower present) seem to be indicative of positive results.

Broad_Start_Order - Hands that start last tend to be more successful while hands that start first tend to be the least successful.

ind_picked_up - The main player picking up the turned up card generally yields a more favorable outcome, especially if it’s a jack. There is a concern with multicollinearity with this variable as it double-counts the card picked up in hand.

p_picked_up - Not very telling.

opp_picked_up - An opponent picking up the turned up card generally yields a more unfavorable outcome, especially if it’s a jack.

Since there is a similar impact when picking up each non-jack card, every non-jack card will be grouped together and a new set of variables will get tested.


Assign broad picked up card values

euchre_turn_up = euchre_turn_up %>%
  mutate(broad_ind_pick_up = case_when(ind_picked_up == 'Jack' ~ 'Jack', 
                                       ind_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'),
         broad_p_pick_up = case_when(p_picked_up == 'Jack' ~ 'Jack', 
                                       p_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'),
         broad_opp_pick_up = case_when(opp_picked_up == 'Jack' ~ 'Jack', 
                                       opp_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'))

euchre_turn_up$broad_ind_pick_up = factor(euchre_turn_up$broad_ind_pick_up, levels = c('None', 'Other', 'Jack'))
euchre_turn_up$broad_p_pick_up = factor(euchre_turn_up$broad_p_pick_up, levels = c('None', 'Other', 'Jack'))
euchre_turn_up$broad_opp_pick_up = factor(euchre_turn_up$broad_opp_pick_up, levels = c('None', 'Other', 'Jack'))


Convert columns to easier-to-process classes

euchre_turn_up[,names(euchre_turn_up)] = lapply(euchre_turn_up[,names(euchre_turn_up)], as.factor)
euchre_turn_up$Strength = as.integer(euchre_turn_up$Strength)


Show relationships between new broad picked up columns and the correct call to make

for(column in names(euchre_turn_up[,31:33])){
  target = 'correct_call'
  column_name = column
  target_name = target
  
  
  if(class(euchre_turn_up[,column])=='integer' | class(euchre_turn_up[,column])=='numeric'){
    column = euchre_turn_up[, column]
    target = euchre_turn_up[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = target, y = column)) + 
      geom_boxplot() +
      labs(x = target_name, y = column_name) +
      coord_flip()
    print(p)
  }
  else{
    column = euchre_turn_up[, column]
    target = euchre_turn_up[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = column, fill = target)) + 
      ggtitle(paste('Proportion of', column_name, 'in each class')) +
      geom_bar(position = position_fill(reverse=TRUE)) +
      labs(x = column_name, y =' proportion', fill = target_name) +
      coord_flip()

    print(p)
  }
}

# end plots

These visualizations are easier to follow. If the analyzed person or their partner picks up a jack, it yields better results for them. The opponents are also more likely to win a round if they pick up a jack or another card.

Because broad_ind_picked_up creates multicollinearity, it will get removed from the dataset.


Remove irrelevant columns

The following columns were removed: Suit, Strength, Non_Trump_Turned_Down_Color_Ind, Start_Order, ind_pick_up, p_pick_up, opp_pick_up, and broad_ind_picked_up

euchre_turn_up = euchre_turn_up[, -c(1, 16, 18:19, 23:25, 31)]


Create new dataframe that removes variables that are involved with the target variable (they create leakage)

Variables removed: team_ind_tricks_won, team_tricks_won, alone_tricks_won, predicted_call

Also, numerical variables are converted back to integers.

euchre_turn_up[,names(euchre_turn_up[,-c(18, 22:25)])] = lapply(euchre_turn_up[,names(euchre_turn_up[,-c(18, 22:25)])], as.character)
euchre_turn_up[,names(euchre_turn_up[,-c(18, 22:25)])] = lapply(euchre_turn_up[,names(euchre_turn_up[,-c(18, 22:25)])], as.integer)
euchre_turn_up1 = euchre_turn_up[, -c(19:22)]


Split euchre dataset into training and validation data to build models

80% of the dataset will be for training while 20% will be for validation.

set.seed(1)
sample_index = sample(nrow(euchre_turn_up1), nrow(euchre_turn_up1) * 0.80)
euchre_turn_up_train = euchre_turn_up1[sample_index, ]
euchre_turn_up_test = euchre_turn_up1[-sample_index, ]
euchre_turn_up_results = euchre_turn_up[, 19:23] # dataframe that only has result columns
results_train = euchre_turn_up_results[sample_index, ]
results_test = euchre_turn_up_results[-sample_index, ]
euchre_all_test_turn_up = euchre_turn_up[-sample_index, ]


Create functions that will be used to measure success of models

Here is what the functions do:

build_results - This predicts the correct call from a built model and creates a dataframe that stores the predictions. The other functions use the dataframe made from this.

find_loss_cost - This calculates the points that were failed to be earned when the model makes a call that is different than what should have been called. For each record, the potential number of points won (given the correct call was made) is calculated. The possible numbers in this field are 0 (because it is impossible to win any points when no bid should be made), 1, 2, and 4. Another field is added that calculates the points earned based off the model prediction. The numbers could be -2 (for hands that predicted 1 or 2 but were actually 0). They can also be 0 (predicted class is 0), 1, 2, and 4. A field gets made that calculates the absolute difference. The average number in this field is the loss cost value. There is also a feature included that records the average loss cost value at every misclassification type.

Get_Macro_AUC - This calculates the area under the curve for each of the 3 classes. The average is calculated from these 3 classes.

Macro_F1_Score_Compute - This calculates the F1 score for each of the 3 predictor classes. The average is calculated from these 3 classes.

Model_Accuracy - This measures the number of records that were correctly classified divided by the total number of records.

Get_Euchre_Call_Percent - This takes the number of records that are predicted as 1s and 2s but are actually 0s. This number gets divided by the total number of records predicted as 1s and 2s. The output is the percent of bids made that get euchred.

Get_Missed_Call_Percent - This takes the number of records that are predicted as 0s but are actually 1s and 2s. This number gets divided by the total number of records predicted as 0s. Essentially, this is the proportion of records that don’t get bid trump when they actually should have.

binary_F1Score - This calculates the F1 score for binary prediction classes.

build_lm1_metrics - This assesses results provided from a binary logistic regression model.

build_lm2_metrics - This assesses results provided from a second version of a binary logistic regression model.

Build_metrics_table - This uses a provided model and results table to create a new dataframe that measures the calculated metrics defined above.

Build_metrics_table_rslt_tbl - This uses a provided results table to create a new dataframe that measures the calculated metrics defined above (except Macro AUC).

build_results = function(results_df = results_test, model){
  if(model$call[[1]] == 'orm'){
    x = predict(model, newdata = euchre_turn_up_test, type = 'fitted.ind')
    results_df$lp_score = predict(model, newdata = euchre_turn_up_test, type = 'lp')
    results_df = results_df %>%
      mutate(adj_model_bid = case_when(lp_score<=cutoff_1~0, lp_score<=cutoff_2~1, TRUE~2))
  }
  else {
    x = predict(model, newdata = euchre_turn_up_test, type = 'prob')
  }
  x = as.data.frame(x)
  names(x)[1] = 'p_zero'
  names(x)[2] = 'p_one'
  names(x)[3] = 'p_two'
  results_df = cbind(results_df, x)
  results_df = results_df %>%
    mutate(best = pmax(p_zero, p_one, p_two),
           model_bid = case_when(p_zero == best~0, p_one == best~1, TRUE~2))
  assign(paste0('results_', deparse(substitute(model))), results_df, envir=.GlobalEnv)
}

find_loss_cost = function(df, test_column, return_avg_cost = TRUE, save_ind_costs = FALSE){
  df = df %>%
    mutate(potential_points_earned = case_when((team_tricks_won >= 3 & team_tricks_won < 5 & alone_tricks_won <5)~1, 
                                               (team_tricks_won == 5 & alone_tricks_won != 5)~2,
                                               (alone_tricks_won == 5)~4, TRUE~0),
           points_won = case_when((get(test_column) == 1 & team_tricks_won <3)~-2,
                                  (get(test_column) == 2 & alone_tricks_won <3)~-2,
                                  (get(test_column) == 1 & team_tricks_won >= 3 & team_tricks_won < 5)~1,
                                  (get(test_column) == 2 & alone_tricks_won >= 3 & alone_tricks_won < 5)~1,
                                  (get(test_column) == 1 & team_tricks_won == 5)~2,
                                  (get(test_column) == 2 & alone_tricks_won == 5)~4, TRUE~0),
           loss_cost = abs(points_won - potential_points_earned))
  if(save_ind_costs == TRUE){
    p0_a1 = mean(df[(df[, test_column] == 0 & df$correct_call == 1),]$loss_cost)
    p0_a2 = mean(df[(df[, test_column] == 0 & df$correct_call == 2),]$loss_cost)
    p1_a0 = mean(df[(df[, test_column] == 1 & df$correct_call == 0),]$loss_cost)
    p1_a2 = mean(df[(df[, test_column] == 1 & df$correct_call == 2),]$loss_cost)
    p2_a0 = mean(df[(df[, test_column] == 2 & df$correct_call == 0),]$loss_cost)
    p2_a1 = mean(df[(df[, test_column] == 2 & df$correct_call == 1),]$loss_cost)
    assign('p0_a1', p0_a1, envir = .GlobalEnv)
    assign('p0_a2', p0_a2, envir = .GlobalEnv)
    assign('p1_a0', p1_a0, envir = .GlobalEnv)
    assign('p1_a2', p1_a2, envir = .GlobalEnv)
    assign('p2_a0', p2_a0, envir = .GlobalEnv)
    assign('p2_a1', p2_a1, envir = .GlobalEnv)
  }
  
  if(return_avg_cost == TRUE){
    return(mean(df$loss_cost))
  }
}

Get_Macro_AUC = function(df){
  p_pass = df$p_zero
  p_order_trump = df$p_one
  p_go_alone = df$p_two
  bid.dummy = dummy_cols(df,
                           select_columns = "correct_call",
                           remove_first_dummy = FALSE,
                           remove_selected_columns = TRUE)
  pred0 = prediction(p_pass, bid.dummy$correct_call_0)
  perf0 = performance(pred0, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_0 = slot(performance(pred0, "auc"), "y.values")[[1]]
  
  pred1 = prediction(p_order_trump, bid.dummy$correct_call_1)
  perf1 = performance(pred1, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_1 = slot(performance(pred1, "auc"), "y.values")[[1]]
  
  pred2 = prediction(p_go_alone, bid.dummy$correct_call_2)
  perf2 = performance(pred2, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_2 = slot(performance(pred2, "auc"), "y.values")[[1]]
  
  return((AUC_0+AUC_1+AUC_2)/3)
}

Macro_F1_Score_Compute = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  F1_0 = matrix[1]/(matrix[1]+matrix[2]+matrix[3]+matrix[4]+matrix[7])
  F1_1 = matrix[5]/(matrix[2]+matrix[5]+matrix[8]+matrix[4]+matrix[6])
  F1_2 = matrix[9]/(matrix[3]+matrix[6]+matrix[9]+matrix[7]+matrix[8])
  
  return((F1_0+F1_1+F1_2)/3)
}

Model_Accuracy = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  num_correct = matrix[1]+matrix[5]+matrix[9]
  total = matrix[1]+matrix[2]+matrix[3]+matrix[4]+matrix[5]+matrix[6]+matrix[7]+matrix[8]+matrix[9]
  return(num_correct/total)
}
  
Get_Euchre_Call_Percent = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  num_euchred = matrix[4]+matrix[7]
  total = matrix[4]+matrix[5]+matrix[6]+matrix[7]+matrix[8]+matrix[9]
  return(num_euchred/total)
}

Get_Missed_Call_Percent = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  missed_calls = matrix[2] + matrix[3]
  total = matrix[1] + matrix[2] + matrix[3]
  return(missed_calls/total)
}

binary_F1Score = function(matrix){
  precision = matrix[4]/(matrix[4]+matrix[3])
  recall = matrix[4]/(matrix[4]+matrix[2])
  score = round((2*precision*recall)/(precision+recall), 4)
}

build_lm1_metrics = function(model){
  logr1_results = euchre_all_test_turn_up
  logr1_results$correct_call1 = ifelse(logr1_results$correct_call == 0, 0, 1)
  logr1_results$model_prediction = ifelse(predict(model, newdata = logr1_results, type =
                                                         'response') <.5, 0, 1)
  logr1_results$model_adj_prediction = ifelse(predict(model, newdata = logr1_results,
                                                           type = 'response') < cutoff_1_lm, 0, 1)
  matrix = table(logr1_results$correct_call1, logr1_results$model_prediction, dnn= c("True", "Pred"))

  matrix_adj = table(logr1_results$correct_call1, logr1_results$model_adj_prediction, dnn = c("True", "Pred"))
  
  Name = deparse(substitute(model))
  Accuracy = round((matrix[1]+matrix[4])/sum(matrix), 4)
  F1_Score = binary_F1Score(matrix)
  Accuracy_adj = round((matrix_adj[1]+matrix_adj[4])/sum(matrix_adj), 4)
  F1_Score_adj = binary_F1Score(matrix_adj)
  BIC = BIC(model)
  AIC = AIC(model)
  
  model_df = data.frame(Name, Accuracy, F1_Score, Accuracy_adj, F1_Score_adj, BIC, AIC)
  assign('model_df', model_df, envir=.GlobalEnv)
  if(exists('lm1_models')){
    lm1_models = rbind(lm1_models, model_df)
  } else {
    lm1_models = model_df
  }
  assign('lm1_models', lm1_models, envir=.GlobalEnv)
  rm(model_df)
}

build_lm2_metrics = function(model){
  logr2_results = euchre_all_test_turn_up
  logr2_results$correct_call2 = ifelse(logr2_results$correct_call == 0, 0, 1)
  logr2_results$model_prediction = ifelse(predict(model, newdata = logr2_results, type =
                                                         'response') <.5, 0, 1)
  logr2_results$model_adj_prediction = ifelse(predict(model, newdata = logr2_results,
                                                           type = 'response') < cutoff_2_lm, 0, 1)
  matrix = table(logr2_results$correct_call2, logr2_results$model_prediction, dnn= c("True", "Pred"))

  matrix_adj = table(logr2_results$correct_call2, logr2_results$model_adj_prediction, dnn = c("True", "Pred"))
  
  Name = deparse(substitute(model))
  Accuracy = round((matrix[1]+matrix[4])/sum(matrix), 4)
  F1_Score = binary_F1Score(matrix)
  Accuracy_adj = round((matrix_adj[1]+matrix_adj[4])/sum(matrix_adj), 4)
  F1_Score_adj = binary_F1Score(matrix_adj)
  BIC = BIC(model)
  AIC = AIC(model)
  
  model_df = data.frame(Name, Accuracy, F1_Score, Accuracy_adj, F1_Score_adj, BIC, AIC)
  assign('model_df', model_df, envir=.GlobalEnv)
  if(exists('lm2_models')){
    lm2_models = rbind(lm2_models, model_df)
  } else {
    lm2_models = model_df
  }
  assign('lm2_models', lm2_models, envir=.GlobalEnv)
  rm(model_df)
}

Build_metrics_table = function(model){
  model_name = deparse(substitute(model))
  results_table = get(paste0('results_', model_name))
  df = data.frame(model_name = model_name)
  
  df$loss_cost = find_loss_cost(results_table, 'model_bid')
  df$macro_auc = Get_Macro_AUC(results_table)
  df$macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                       results_table$model_bid)
  df$accuracy = Model_Accuracy(results_table$correct_call,
                               results_table$model_bid)
  df$p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                               results_table$model_bid)
  df$p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                              results_table$model_bid)
    
  if(model$call[[1]] == 'orm'){
    df$adj_loss_cost = find_loss_cost(results_table, 'adj_model_bid')
    df$adj_macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                             results_table$adj_model_bid)
    df$adj_accuracy = Model_Accuracy(results_table$correct_call,
                                     results_table$adj_model_bid)
    df$adj_p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                                     results_table$adj_model_bid)
    df$adj_p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                                    results_table$adj_model_bid)
    df$BIC = BIC(model)
    df$AIC = AIC(model)
    df$num_predictors = length(model$assign)
  }
  if(model$call[[1]] == 'multinom'){
    df$BIC = BIC(model)
    df$AIC = AIC(model)
  }
  assign(paste0(deparse(substitute(model)), '_stats'), df, envir=.GlobalEnv)
}

Build_metrics_table_rslt_tbl = function(results_table, table_name, result_variable_name){
  df = data.frame(model_name = table_name)
  
  df$loss_cost = find_loss_cost(results_table, result_variable_name)
  df$macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                       results_table[,result_variable_name])
  df$accuracy = Model_Accuracy(results_table$correct_call,
                               results_table[,result_variable_name])
  df$p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                               results_table[,result_variable_name])
  df$p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                              results_table[,result_variable_name])
  assign(paste0(table_name, '_stats'), df, envir=.GlobalEnv)
}


Create and determine the best models

There are 6 of many algorithms that have been proven to work well with multiclass outputs:

ordinal regression - Through the rms library, the orm (ordinal regression model) function is great for multiclass predictions when the target variable is ordinal and the predictor variables are nominal. This is the case here as the best hands are classified as 2s while the worst hands are classified as 0s, in an ordinal fashion. The orm function outputs a linear predictor value that can be referred as a log odds value. The function can also make predictions based on the probability that a record could be in a class. For this analysis, both prediction methods will be explored. This model is set up so each variable measured has one predictor coefficient and 2 intercepts which measure two predicted classes.

multinomial regression - This is a form of logistic regression used when the outcome variable is not necessarily ordinal. It can be used through the multinom function though the nnet library. The algorithm creates separate coefficient values for each predicted class, except for a class which is set as the default. In this case, there will be two sets of coefficients used to create two logit values for the classes that will be converted to p-values. The p-value of the class not calculated is 1 - the sum of the p-values of the other two classes. The class with the highest p-value gets predicted.

classification tree - This is the most interpretative algorithm that can be best used when playing Euchre in real-time. It can be used through the rpart function from the rpart library. The algorithm predicts what call to make based on a few conditions and breaks up the tree into different leafs. The classification decision is based on which classifier has the most training records in a specific leaf.

random forest - This is similar to classification trees, but this algorithm trains on many trees and the result gets averaged together. It can be used though the randomForest function from the randomForest library. Although useless to interpret in real-time, the results of this algorithm can measure variable importance. This is useful to determine which variables directly make a hand good enough to be classified as a 1 or 2.

k nearest neighbors - This algorithm takes a defined number of observations that are most similar to a to-be-predicted observation and uses those similar observations to predict the outcome class of the new observation. The highest number of a specific outcome class found in the defined “most similar” number of observations will be assigned to the new observation. Tiebreakers are randomly done. This is done using the fnn library.

discriminant analysis - This method finds a linear combination of variables that will then be used to separate all the different outcome classes, similar to clustering. An observed record will get classified based on which group it fits closest to. This is done using the mda library.


In addition, logistic regression (used to predict a binary outcome) will be used twice and together to predict the 3 different classes.

two-step logistic regression - This method will use a logistic regression algorithm to classify the dataset as 0s or 1s (pass or pick it up). From there, the variables classified as 1s will go through a separate logistic regression algorithm to classify the records as 1s or 2s (pick it up or pick it up & go alone). The 2 algorithms chosen will be be assessed on how well they perform on their own. Once they are chosen, they will be integrated together to make multiclass predictions. This is done using the stats library to create multiple GLMs, using the binomial family.


Ordinal Regression

Using the orm function from rms, results can be predicted in a few ways. They can be based on which class has the highest p-value. They can also be based on different cutoff values that get manually set for the linear predictor scores. This analysis will measure if cutoffs based on probability are more effective than manually setting cutoff scores.


Approximate the 2 cutoff values

These created cutoff values will be the linear predictor logit scores that will split the prediction classes. The goal is to fit the data so the estimated proportion of data that falls into each class matches the proportion of each class in the training data. This method is fine to use because the high volume of training data observations are similar to any new data the model will analyze.

To approximate the logit cutoff values, the entire training dataset was used. 80% of the data was sampled and the proportion of records that were of class values 0 and 1 were derived for that sampled dataset. The first cutoff value was derived by taking the quantile of the proportion of records classified as 0 for the linear predictor scores. The second cutoff value was derived by taking the quantile of the proportion of records classified as 0 or 1 for the linear predictor scores.

For instance, if 30% of the random data was class 0 and 60% of the random data was class 1, the estimated cutoffs would be the lp scores at the 30th percentile and the 90th percentile.

This process gets repeated 100 times so 100 values get assigned for each cutoff score iteration. The final cutoff scores are the averages of each iteration.

These values will be applied to all the orm models created because the lp distributions will be similar and this is the most convenient way of making cutoffs.


Create cutoff estimates that best fit the data

cutoff_1_list = c()
cutoff_2_list = c()
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_up1), nrow(euchre_turn_up1) * 0.80)
  euchre_turn_up_train_sample = euchre_turn_up1[sample_index, ]
  full.orm = orm(correct_call ~ ., data = euchre_turn_up_train_sample)
  pass_proportion = nrow(euchre_turn_up_train_sample[euchre_turn_up_train_sample$correct_call==0,])/nrow(euchre_turn_up_train_sample)
  order_trump_proportion = nrow(euchre_turn_up_train_sample[euchre_turn_up_train_sample$correct_call==1,])/nrow(euchre_turn_up_train_sample)
  y = predict(full.orm)
  sample_cutoff_1 = quantile(y, pass_proportion)
  sample_cutoff_2 = quantile(y, pass_proportion+order_trump_proportion)
  cutoff_1_list = append(cutoff_1_list, sample_cutoff_1)
  cutoff_2_list = append(cutoff_2_list, sample_cutoff_2)
}
cutoff_1 = mean(cutoff_1_list)
cutoff_2 = mean(cutoff_2_list)


Ordinal Regression with all predictors

(full.orm = orm(correct_call ~ ., data = euchre_turn_up_train))
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = euchre_turn_up_train)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    49161.54    R2                  0.578    rho     0.663    
##  0          50718    d.f.             23    R2(23,80000)        0.459                     
##  1          25793    Pr(> chi2)  <0.0001    R2(23,56927.4)      0.578                     
##  2           3489    Score chi2 44323.33    |Pr(Y>=median)-0.5| 0.324                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        1                                                                         
## max |deriv| 2e-08                                                                         
## 
##                          Coef    S.E.         Wald Z Pr(>|Z|)
## y>=1                      0.7826   48119.0991   0.00 1.0000  
## y>=2                     -3.6082   48119.0991   0.00 0.9999  
## Right_Bower               0.3406   23674.3938   0.00 1.0000  
## Left_Bower               -0.3306   23674.3938   0.00 1.0000  
## TAce                     -0.9273   23674.3938   0.00 1.0000  
## TKing                    -1.2541   23674.3938   0.00 1.0000  
## TQueen                   -1.4377   23674.3938   0.00 1.0000  
## TTen                     -1.5749   23674.3938   0.00 0.9999  
## TNine                    -1.6018   23674.3938   0.00 0.9999  
## Ace_count                -1.4481   23674.3938   0.00 1.0000  
## King_count               -2.5292   23674.3938   0.00 0.9999  
## Queen_count              -2.7266   23674.3938   0.00 0.9999  
## Jack_count               -2.8028   23674.3938   0.00 0.9999  
## Ten_count                -2.8034   23674.3938   0.00 0.9999  
## Nine_count               -2.8360   23674.3938   0.00 0.9999  
## Count_Trump               0.3763       0.0000    Inf <0.0001 
## Non_Trump_Suit_Count     -0.3505       0.0180 -19.49 <0.0001 
## Dealer                    9.4671 1483070.5563   0.00 1.0000  
## Strongest_Card           -0.1123       0.0097 -11.55 <0.0001 
## Broad_Start_Order=Middle  0.0059       0.0299   0.20 0.8438  
## Broad_Start_Order=Last    0.7109 1485914.8915   0.00 1.0000  
## broad_p_pick_up=Other    10.4625  116239.6818   0.00 0.9999  
## broad_p_pick_up=Jack     10.8244  116239.6818   0.00 0.9999  
## broad_opp_pick_up=Other   9.2504  116239.6818   0.00 0.9999  
## broad_opp_pick_up=Jack    8.6882  116239.6818   0.00 0.9999

It is interesting to see that the coefficients for “Left_Bower” and “Strongest_Card” are negative. It is easily assumed that having a higher “strongest card” and Left Bower would only be beneficial (as seen from the visualizations). Since all the predictors are present, some coefficient values could be inaccurate. In order to fix this, let’s build orm models with less predictors.


Ordinal Regression with top performing individual predictors

The first reduced-variable orm model will be built based on the top variables that perform the best individually, based on chi-squared score. All of the well-performing variables will be included in this new model.

top_ind_predictors = data.frame()
for(i in 1:21){
  if(i == 19){
    next # ignoring result column
  }
  df = euchre_turn_up_train[,c(i,19)]
  (one.orm = orm(correct_call ~ ., data = df))
  Score_chi2 = one.orm$stats[8]
  var_name = colnames(df[1])
  table = data.frame(var_name, Score_chi2)
  top_ind_predictors = rbind(top_ind_predictors, table)
}
rm(table)

top_ind_predictors = top_ind_predictors %>%
  arrange(desc(Score_chi2))

top_ind_predictors %>%
  kbl() %>%
  kable_styling()
var_name Score_chi2
Score13 Count_Trump 29055.03458
Score16 Strongest_Card 15444.89631
Score14 Non_Trump_Suit_Count 11647.60653
Score Right_Bower 10263.71938
Score17 Broad_Start_Order 9721.72836
Score15 Dealer 9263.51219
Score19 broad_opp_pick_up 8145.14729
Score12 Nine_count 5546.20975
Score1 Left_Bower 4399.60442
Score11 Ten_count 4362.15285
Score2 TAce 3244.10786
Score9 Queen_count 2586.04151
Score10 Jack_count 2310.25255
Score3 TKing 2165.48415
Score4 TQueen 1629.28222
Score5 TTen 1407.38608
Score6 TNine 1327.22427
Score8 King_count 1290.30396
Score7 Ace_count 815.29724
Score18 broad_p_pick_up 58.28208

The top 10 predictors (Score_chi2 > 4000) will be used to build this model since there is a good cutoff between the chi-squared score of the 10th best predictor and the 11th best predictor.

best_predictors = top_ind_predictors[1:10,]$var_name
df = euchre_turn_up_train %>%
  select(matches(best_predictors), 'correct_call')
(top.pred.orm = orm(correct_call~., data = df))
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    41889.45    R2                  0.513    rho     0.623    
##  0          50718    d.f.             12    R2(12,80000)        0.408                     
##  1          25793    Pr(> chi2)  <0.0001    R2(12,56927.4)      0.521                     
##  2           3489    Score chi2 38130.63    |Pr(Y>=median)-0.5| 0.304                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        1                                                                         
## max |deriv| 0.002                                                                         
## 
##                          Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                     -1.1848 0.0947 -12.52 <0.0001 
## y>=2                     -5.1219 0.0949 -53.94 <0.0001 
## Count_Trump               1.1761 0.0156  75.32 <0.0001 
## Strongest_Card           -0.0073 0.0082  -0.88 0.3767  
## Non_Trump_Suit_Count     -0.2878 0.0172 -16.76 <0.0001 
## Right_Bower               1.2523 0.0314  39.91 <0.0001 
## Broad_Start_Order=Middle  0.0102 0.0287   0.35 0.7229  
## Broad_Start_Order=Last   -0.2743 0.0384  -7.14 <0.0001 
## Dealer                    0.0000 0.0000    Inf <0.0001 
## broad_opp_pick_up=Other  -1.1539 0.0273 -42.20 <0.0001 
## broad_opp_pick_up=Jack   -1.6909 0.0468 -36.14 <0.0001 
## Nine_count               -0.4884 0.0171 -28.61 <0.0001 
## Left_Bower                0.7492 0.0256  29.31 <0.0001 
## Ten_count                -0.4681 0.0153 -30.65 <0.0001

These coefficients generally look a lot better although it is still surprising to see the Strongest_Card variable have a negative coefficient.


BIC stepwise regression

It would also be interesting to look at the orm models that are optimized for BIC scoring because BIC measures model performance but also penalizes models for being too complex. Forward steps (where a new variable is added per iteration to improve BIC the most) and backward steps (where a variable is removed per iteration to improve BIC) will be tested to create two additional orm models.


Create forward regression model

tot_pred = top_ind_predictors[,1]
not_used_pred_list = (top_ind_predictors[,1])
used_pred_list = top_ind_predictors[0,1]
stop_ind = 0

for(i in 1:(nrow(top_ind_predictors))){
  for(var in not_used_pred_list){
    df = euchre_turn_up_train %>%
      select(matches(used_pred_list), matches(var), 'correct_call')
    fwd_orm = orm(correct_call~., data = df)
    
    if(!exists('use_var')){
      use_var = var
    }
    if(!exists('Best.fwd.orm')){
      Best.fwd.orm = fwd_orm
    }
    if(!exists('Best.fwd.orm.iter')){
      Best.fwd.orm.iter = fwd_orm
    }
    if(BIC(fwd_orm)<BIC(Best.fwd.orm.iter)){
      Best.fwd.orm.iter = fwd_orm
      use_var = var
    }
  }
  used_pred_list = c(used_pred_list, use_var)
  not_used_pred_list = tot_pred[!tot_pred %in% used_pred_list]
  if(BIC(Best.fwd.orm.iter)<BIC(Best.fwd.orm)){
    Best.fwd.orm = Best.fwd.orm.iter
    stop_ind = 0
  } else{
    stop_ind = stop_ind + 1
  }
  rm(Best.fwd.orm.iter)
  rm(use_var)
  rm(df)
  if(stop_ind == 3){
    break
  }
}
rm(var)
rm(fwd_orm)
rm(tot_pred)
rm(not_used_pred_list)
rm(used_pred_list)
rm(stop_ind)
Best.fwd.orm
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    49157.80    R2                  0.578    rho     0.663    
##  0          50718    d.f.             15    R2(15,80000)        0.459                     
##  1          25793    Pr(> chi2)  <0.0001    R2(15,56927.4)      0.578                     
##  2           3489    Score chi2 44303.39    |Pr(Y>=median)-0.5| 0.324                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        1                                                                         
## max |deriv| 2e-08                                                                         
## 
##                         Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                    -3.1061 0.1078 -28.82 <0.0001 
## y>=2                    -7.4973 0.1090 -68.77 <0.0001 
## Count_Trump              1.6010 0.0205  78.21 <0.0001 
## Ace_count                1.3652 0.0165  82.76 <0.0001 
## broad_opp_pick_up=Other -0.9322 0.0246 -37.82 <0.0001 
## broad_opp_pick_up=Jack  -1.4941 0.0462 -32.31 <0.0001 
## Right_Bower              1.9270 0.0411  46.84 <0.0001 
## Left_Bower               1.2563 0.0331  37.99 <0.0001 
## Non_Trump_Suit_Count    -0.3508 0.0180 -19.51 <0.0001 
## TAce                     0.6598 0.0281  23.48 <0.0001 
## King_count               0.2842 0.0158  17.94 <0.0001 
## broad_p_pick_up=Other    0.2821 0.0274  10.29 <0.0001 
## broad_p_pick_up=Jack     0.6446 0.0467  13.80 <0.0001 
## TKing                    0.3338 0.0260  12.83 <0.0001 
## Strongest_Card          -0.1116 0.0097 -11.50 <0.0001 
## TQueen                   0.1502 0.0254   5.90 <0.0001 
## Queen_count              0.0869 0.0162   5.37 <0.0001

It looks like 13 of 20 predictors were used to build this model. The coefficients look pretty good for the most part.


Create backward regression model

tot_pred = top_ind_predictors[,1]
not_used_pred_list = top_ind_predictors[0,1]
used_pred_list = tot_pred
stop_ind = 0

for(i in 1:nrow(top_ind_predictors)-2){
  for(var in used_pred_list){
    df = euchre_turn_up_train %>%
      select(matches(used_pred_list), 'correct_call') %>%
      select(-matches(var))
    bwd_orm = orm(correct_call~., data = df)
    if(!exists('use_var')){
      use_var = var
    }
    if(!exists('Best.bwd.orm')){
      Best.bwd.orm = bwd_orm
    }
    if(!exists('Best.bwd.orm.iter')){
      Best.bwd.orm.iter = bwd_orm
    }
    if(BIC(bwd_orm)<BIC(Best.bwd.orm.iter)){
      Best.bwd.orm.iter = bwd_orm
      use_var = var
    }
  }
  not_used_pred_list = c(not_used_pred_list, use_var)
  used_pred_list = tot_pred[!tot_pred %in% not_used_pred_list]
  if(BIC(Best.bwd.orm.iter)<=BIC(Best.bwd.orm)){
    Best.bwd.orm = Best.bwd.orm.iter
    stop_ind = 0
  } else {
    stop_ind = stop_ind + 1
  }
  rm(Best.bwd.orm.iter)
  rm(use_var)
  if(stop_ind == 3){
    break
    }
}
rm(var)
rm(df)
rm(bwd_orm)
rm(tot_pred)
rm(not_used_pred_list)
rm(used_pred_list)
rm(stop_ind)
Best.bwd.orm
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    49161.50    R2                  0.578    rho     0.663    
##  0          50718    d.f.             18    R2(18,80000)        0.459                     
##  1          25793    Pr(> chi2)  <0.0001    R2(18,56927.4)      0.578                     
##  2           3489    Score chi2 44323.33    |Pr(Y>=median)-0.5| 0.324                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        1                                                                         
## max |deriv| 2e-08                                                                         
## 
##                         Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                    14.5453 0.2776  52.40 <0.0001 
## y>=2                    10.1545 0.2676  37.95 <0.0001 
## Strongest_Card          -0.1123 0.0097 -11.55 <0.0001 
## Non_Trump_Suit_Count    -0.3505 0.0180 -19.48 <0.0001 
## broad_opp_pick_up=Other -0.9248 0.0251 -36.87 <0.0001 
## broad_opp_pick_up=Jack  -1.4869 0.0465 -31.99 <0.0001 
## Nine_count              -3.5529 0.0463 -76.67 <0.0001 
## Left_Bower              -0.6713 0.0299 -22.47 <0.0001 
## Ten_count               -3.5203 0.0456 -77.19 <0.0001 
## TAce                    -1.2680 0.0343 -36.96 <0.0001 
## Queen_count             -3.4435 0.0452 -76.25 <0.0001 
## Jack_count              -3.5197 0.0463 -75.97 <0.0001 
## TKing                   -1.5947 0.0382 -41.73 <0.0001 
## TQueen                  -1.7784 0.0410 -43.37 <0.0001 
## TTen                    -1.9155 0.0430 -44.50 <0.0001 
## TNine                   -1.9424 0.0443 -43.85 <0.0001 
## King_count              -3.2461 0.0449 -72.37 <0.0001 
## Ace_count               -2.1650 0.0432 -50.07 <0.0001 
## broad_p_pick_up=Other    0.2903 0.0279  10.41 <0.0001 
## broad_p_pick_up=Jack     0.6523 0.0470  13.89 <0.0001

It looks like a presumably important variable (“Count_Trump”) was removed. It is also surprising to see that “Ace_count” got such a strong negative coefficient. 16 of 20 predictors were used.


Assess performance on all the created orm models

The best performing model will be selected as the represented orm model.

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls adj_loss_cost adj_macro_f1 adj_accuracy adj_p_euchred_calls adj_p_missed_calls BIC AIC num_predictors
full.orm 0.4282 0.8994877 0.5250265 0.78330 0.2274303 0.1636364 0.42515 0.5544073 0.78115 0.2522942 0.1459957 77599.11 77366.86 20
Best.fwd.orm 0.4279 0.8994423 0.5260054 0.78335 0.2280385 0.1633463 0.42430 0.5537159 0.78110 0.2518823 0.1455691 77512.52 77354.60 13
Best.bwd.orm 0.4285 0.8994904 0.5249015 0.78315 0.2277795 0.1636731 0.42515 0.5544073 0.78115 0.2522942 0.1459957 77542.70 77356.90 16
top.pred.orm 0.4473 0.8763030 0.4879293 0.76395 0.2533758 0.1783877 0.44725 0.5199093 0.76690 0.2735823 0.1613939 84747.00 84616.95 10

Since the forward ordinal regression model performs well, uses less variables, and the coefficients make better sense, it will be the orm model to use. In addition, the customized cutoff classification method will be used. Although it is not as accurate, it has a better F1 and loss cost score.

euchre_turn_up.orm = Best.fwd.orm


Multinomial Regression

Create full multinomial model

(full.multinom = multinom(correct_call ~ ., data = euchre_turn_up_train))
## # weights:  75 (48 variable)
## initial  value 87888.983093 
## iter  10 value 44016.006179
## iter  20 value 43701.130424
## iter  30 value 41396.975409
## iter  40 value 40058.898295
## iter  50 value 38092.853206
## iter  60 value 38062.884601
## iter  70 value 38060.915583
## iter  80 value 38060.849066
## final  value 38060.848010 
## converged
## Call:
## multinom(formula = correct_call ~ ., data = euchre_turn_up_train)
## 
## Coefficients:
##   (Intercept) Right_Bower Left_Bower      TAce       TKing     TQueen
## 1  -0.2432768    1.475023  0.7496752 0.2433769 -0.08458379 -0.2647023
## 2  -2.0670061    2.183275  1.4703311 0.2100716 -0.29180477 -0.6188985
##         TTen      TNine Ace_count King_count Queen_count Jack_count  Ten_count
## 1 -0.4182088 -0.4614212 0.5726357  -0.439412   -0.587868 -0.6409888 -0.6525986
## 2 -0.7885171 -0.7677397 0.5121475  -1.703349   -2.351846 -2.6862310 -2.7312901
##   Nine_count Count_Trump Non_Trump_Suit_Count    Dealer Strongest_Card
## 1 -0.7073115    1.239159           -0.1821937 0.1022599    -0.09797345
## 2 -2.7711791    1.396718           -1.4882292 1.2593147     0.06065642
##   Broad_Start_OrderMiddle Broad_Start_OrderLast broad_p_pick_upOther
## 1              0.03662162             0.1022599            0.5642888
## 2             -0.34952341             1.2593147            3.1129580
##   broad_p_pick_upJack broad_opp_pick_upOther broad_opp_pick_upJack
## 1           0.8771725             -0.6675251             -1.119473
## 2           4.0376074              1.1267887            -11.603675
## 
## Residual Deviance: 76121.7 
## AIC: 76201.7

This is more difficult to interpret compared to the orm model. However, the coefficients for the variables can be interpreted the same. Unlike the orm model, there are separate coefficients for each predicted class, except one. These scores are converted to p-values for each class that add to 1.


It is interesting that most of the positive and negative coefficients for the “2” class generally go in the same direction but have a larger magnitude compared to the “1” class for most variables. It can be seen by the “broad_opp_pick_upJack” coefficients that if an opponent picks up a jack, a class “2” will receive a very low score compared to class “1”. This makes sense because nobody should ever go alone if it is known that the opposing team has the Right Bower. It’s great that the model catches that.


All these coefficients seem to make reasonable sense.


Regression with top 10 individual orm predictors

For curiosity, let’s look at how well a multinomial model performs with the 10 best predictors from the individual orm models.

df = euchre_turn_up_train %>%
  select(matches(best_predictors), 'correct_call')
(top.pred.multinom = multinom(correct_call~., data = df))
## # weights:  42 (26 variable)
## initial  value 87888.983093 
## iter  10 value 46910.026662
## iter  20 value 46570.202466
## iter  30 value 41903.498162
## iter  40 value 41719.255241
## iter  50 value 41716.082880
## final  value 41715.278919 
## converged
## Call:
## multinom(formula = correct_call ~ ., data = df)
## 
## Coefficients:
##   (Intercept) Count_Trump Strongest_Card Non_Trump_Suit_Count Right_Bower
## 1   -1.800959    1.071656     0.03325242           -0.1357145    1.106095
## 2   -7.060153    2.020462     0.23220520           -1.0904826    1.751544
##   Broad_Start_OrderMiddle Broad_Start_OrderLast     Dealer
## 1              0.03712149            -0.1808843 -0.1808843
## 2             -0.27584584            -0.2456710 -0.2456710
##   broad_opp_pick_upOther broad_opp_pick_upJack Nine_count Left_Bower  Ten_count
## 1              -1.163473             -1.600546 -0.4476403  0.5899219 -0.4002415
## 2              -1.794084             -9.201308 -1.3824789  1.3501582 -1.3602851
## 
## Residual Deviance: 83430.56 
## AIC: 83478.56

These coefficients seem to make reasonable sense, although it is surprising to see the “Dealer” variable receive 2 negative coefficients.


Regression with step models

# Create null multinomial model as baseline 
null.multinom = multinom(correct_call ~ 1, data = euchre_turn_up_train)

fwd.multinom = step(null.multinom, scope = list(lower = null.multinom, upper = full.multinom), direction = "forward", k = log(nrow(euchre_turn_up_train)))

bwd.multinom = step(full.multinom, direction = "backward", k = log(nrow(euchre_turn_up_train)))


View forward multinomial model

## Call:
## multinom(formula = correct_call ~ Count_Trump + Ace_count + broad_opp_pick_up + 
##     Strongest_Card + Non_Trump_Suit_Count + Right_Bower + Left_Bower + 
##     King_count + TAce + broad_p_pick_up + TKing + Queen_count + 
##     TQueen + Broad_Start_Order, data = euchre_turn_up_train)
## 
## Coefficients:
##   (Intercept) Count_Trump Ace_count broad_opp_pick_upOther
## 1    -3.01267    1.465333  1.239399              -1.244588
## 2   -13.22054    3.346933  3.241943              -1.381645
##   broad_opp_pick_upJack Strongest_Card Non_Trump_Suit_Count Right_Bower
## 1             -1.695867    -0.09661892           -0.1831319    1.910321
## 2            -13.998998     0.06215444           -1.4896401    2.956934
##   Left_Bower King_count      TAce broad_p_pick_upOther broad_p_pick_upJack
## 1   1.186175  0.2272574 0.6804841          -0.01457359           0.2993197
## 2   2.245739  1.0260594 0.9870027           0.60358577           1.5304310
##       TKing Queen_count    TQueen Broad_Start_OrderMiddle Broad_Start_OrderLast
## 1 0.3540897  0.07897937 0.1741913              0.03716103           -0.35696207
## 2 0.4847287  0.37796406 0.1585006             -0.34930858            0.02608959
## 
## Residual Deviance: 76133.83 
## AIC: 76201.83

It looks like 14 of the 20 variables are being used in this model. The coefficients look good.


View backward multinomial model

## Call:
## multinom(formula = correct_call ~ Right_Bower + Left_Bower + 
##     TAce + TKing + TQueen + TTen + TNine + Ace_count + King_count + 
##     Queen_count + Jack_count + Ten_count + Nine_count + Count_Trump + 
##     Non_Trump_Suit_Count + Dealer + Strongest_Card + Broad_Start_Order + 
##     broad_p_pick_up + broad_opp_pick_up, data = euchre_turn_up_train)
## 
## Coefficients:
##   (Intercept) Right_Bower Left_Bower      TAce       TKing     TQueen
## 1  -0.2432768    1.475023  0.7496752 0.2433769 -0.08458379 -0.2647023
## 2  -2.0670061    2.183275  1.4703311 0.2100716 -0.29180477 -0.6188985
##         TTen      TNine Ace_count King_count Queen_count Jack_count  Ten_count
## 1 -0.4182088 -0.4614212 0.5726357  -0.439412   -0.587868 -0.6409888 -0.6525986
## 2 -0.7885171 -0.7677397 0.5121475  -1.703349   -2.351846 -2.6862310 -2.7312901
##   Nine_count Count_Trump Non_Trump_Suit_Count    Dealer Strongest_Card
## 1 -0.7073115    1.239159           -0.1821937 0.1022599    -0.09797345
## 2 -2.7711791    1.396718           -1.4882292 1.2593147     0.06065642
##   Broad_Start_OrderMiddle Broad_Start_OrderLast broad_p_pick_upOther
## 1              0.03662162             0.1022599            0.5642888
## 2             -0.34952341             1.2593147            3.1129580
##   broad_p_pick_upJack broad_opp_pick_upOther broad_opp_pick_upJack
## 1           0.8771725             -0.6675251             -1.119473
## 2           4.0376074              1.1267887            -11.603675
## 
## Residual Deviance: 76121.7 
## AIC: 76201.7

It looks like all 20 variables are being used in this model, so it is the same as the full model.


View performance of all multinomial models

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls BIC AIC
full.multinom 0.41885 0.9029942 0.5541177 0.78645 0.2251364 0.1642793 76573.29 76201.70
fwd.multinom 0.41880 0.9028693 0.5545199 0.78670 0.2256310 0.1635657 76517.69 76201.83
bwd.multinom 0.41885 0.9029942 0.5541177 0.78645 0.2251364 0.1642793 76573.29 76201.70
top.pred.multinom 0.43700 0.8799953 0.5143080 0.76970 0.2390928 0.1818182 83701.51 83478.56

The forward step multinomial model has the best results and looks to be simple compared to the other models, so it will be used.

euchre_turn_up.multinom = fwd.multinom


Decision Trees

Decision trees are the most interesting because unlike all the created models provided, decision trees are very interpretative and provide a clear action of what call to make based on a list of conditions.

It is the most practical model to use in a real-life Euchre game.


Create initial decision tree

default.large.tree = rpart(correct_call ~., data = euchre_turn_up_train, method = "class", control = rpart.control(cp = .001))
rpart.plot::rpart.plot(default.large.tree)

Prune the tree to make an easier to follow model

printcp(default.large.tree)
## 
## Classification tree:
## rpart(formula = correct_call ~ ., data = euchre_turn_up_train, 
##     method = "class", control = rpart.control(cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] Ace_count         broad_opp_pick_up Count_Trump       Dealer           
## [5] Right_Bower       Strongest_Card   
## 
## Root node error: 29282/80000 = 0.36602
## 
## n= 80000 
## 
##           CP nsplit rel error  xerror      xstd
## 1  0.1625231      0   1.00000 1.00000 0.0046530
## 2  0.0670036      1   0.83748 0.83748 0.0044535
## 3  0.0241104      3   0.70347 0.70347 0.0042235
## 4  0.0238713      4   0.67936 0.68844 0.0041936
## 5  0.0184072      5   0.65549 0.65549 0.0041249
## 6  0.0078205      6   0.63708 0.63708 0.0040845
## 7  0.0032443      8   0.62144 0.62144 0.0040491
## 8  0.0026410     10   0.61495 0.61522 0.0040347
## 9  0.0013319     13   0.60703 0.60703 0.0040155
## 10 0.0011270     15   0.60436 0.60590 0.0040129
## 11 0.0010928     16   0.60324 0.60553 0.0040120
## 12 0.0010416     20   0.59866 0.60515 0.0040111
## 13 0.0010000     22   0.59658 0.60395 0.0040082
default.pruned.tree = rpart(correct_call ~., data = euchre_turn_up_train, method = "class", control = rpart.control(cp = 0.0013319)) # nsplit = 13
rpart.plot::rpart.plot(default.pruned.tree)

This tree gives a great list of conditions that should be satisfied when making a bid for the hand a player has (when evaluating the turned up suit as trump).


The player should PASS if:

There are 0 trump cards in hand

OR

There is 1 trump card in hand
AND
There are less than 2 non-trump aces in hand
AND/OR
An opponent is the dealer

OR

There are 2 trump cards in hand
AND
An opponent is the dealer
AND
The Right Bower is NOT present in hand
AND/OR
There are no non-trump aces in hand

OR

There are 2 trump cards in hand
AND
An opponent is NOT the dealer
AND
The strongest card in hand is weaker than the Left Bower
AND
There are no non-trump aces in hand


The player should order trump if:

There is 1 trump card in hand
AND
There are at least 2 non-trump aces in hand
AND
An opponent is NOT the dealer

OR

There are 2 trump cards in hand
AND
An opponent is the dealer
AND
There is at least one non-trump ace in hand
AND
The Right Bower is present in hand

OR

There are 2 trump cards in hand
AND
An opponent is NOT the dealer
AND
The strongest card is at least as strong as the Left Bower
AND/OR
There is at least one non-trump ace in hand

OR

There are 3 trump cards in hand

OR

There are 4-5 trump cards in hand
AND
The Right Bower is NOT present in hand
AND
There are no non-trump aces in hand


The player should go alone if:

There are 4-5 trump cards in hand
AND
The Right Bower is present in hand
AND/OR
There is at least 1 non-trump ace in hand


Create Weighted Tree

To create this weighted tree, loss costs will be assigned to each misclassification category. These loss cost values get calculated by using the default tree model to predict what call should be made. The difference between the points won off the correct call and the points won off the predicted call are calculated as the loss cost. The average loss cost for each misclassification category (predicted 0s actual 1s), (predicted 1s actual 2s), etc. get stored as the parameter values used when creating the new tree.

# Create average loss cost values for each misclassification category
build_results(model = default.pruned.tree)
find_loss_cost(results_default.pruned.tree, 'model_bid', return_avg_cost = FALSE, save_ind_costs = TRUE)
weighted.large.tree = rpart(correct_call ~., data = euchre_turn_up_train, method = "class",
                      parms = list(loss = matrix(c(0, p0_a1, p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), control = rpart.control(cp = .001))
rpart.plot::rpart.plot(weighted.large.tree)

Prune the tree to make an easier to follow model

printcp(weighted.large.tree)
## 
## Classification tree:
## rpart(formula = correct_call ~ ., data = euchre_turn_up_train, 
##     method = "class", parms = list(loss = matrix(c(0, p0_a1, 
##         p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), 
##     control = rpart.control(cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] Ace_count            broad_opp_pick_up    broad_p_pick_up     
## [4] Count_Trump          Left_Bower           Non_Trump_Suit_Count
## [7] Right_Bower          Strongest_Card       TAce                
## 
## Root node error: 36503/80000 = 0.45629
## 
## n= 80000 
## 
##           CP nsplit rel error  xerror      xstd
## 1  0.2207321      0   1.00000 1.44706 0.0067417
## 2  0.0420518      1   0.77927 0.99181 0.0057997
## 3  0.0251422      2   0.73722 1.00004 0.0058519
## 4  0.0036742      5   0.66179 0.79402 0.0052466
## 5  0.0032609      9   0.64709 0.86054 0.0054285
## 6  0.0031544     11   0.64057 0.83164 0.0053536
## 7  0.0029234     13   0.63426 0.80997 0.0052907
## 8  0.0018421     15   0.62842 0.76864 0.0051659
## 9  0.0014856     18   0.62242 0.79213 0.0052612
## 10 0.0011812     20   0.61945 0.79971 0.0052986
## 11 0.0011021     21   0.61826 0.79456 0.0052831
## 12 0.0010929     23   0.61606 0.79353 0.0052797
## 13 0.0010000     25   0.61387 0.79059 0.0052703
weighted.pruned.tree = rpart(correct_call ~., data = euchre_turn_up_train, method = "class",
                      parms = list(loss = matrix(c(0, p0_a1, p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), control = rpart.control(cp = 0.0029234)) # nsplit = 13
rpart.plot::rpart.plot(weighted.pruned.tree)

This tree looks to be slightly more complex than the pruned default tree. There seems to be slightly more splits in this tree.

This tree also gives a great list of conditions that should be satisfied when making a bid for the turned up suit.


The player should PASS if:

There are 0-1 trump cards in hand

OR

There are 2 trump cards in hand
AND
An opponent is the dealer
AND
There are no non-trump aces in hand
AND/OR
The Right Bower is NOT present in hand

OR

There are 2 trump cards in hand
AND
An opponent is NOT the dealer
AND
The strongest card in hand is weaker than the Left Bower
AND
There are no non-trump aces in hand

OR

There are 2 trump cards in hand
AND
An opponent is NOT the dealer
AND
The strongest card in hand is weaker than the Left Bower
AND
There is exactly 1 non-trump ace in hand

OR

There are 3 trump cards in hand
AND
An opponent is the dealer
AND
There are no non-trump aces in hand
AND
The Right Bower is NOT present in hand


The player should order trump if:

There are 2 trump cards in hand
AND
An opponent is the dealer
AND
There is at least one non-trump ace in hand
AND
The Right Bower is present in hand

OR

There are 2 trump cards in hand
AND
An opponent is NOT the dealer
AND
The strongest card in hand is at least as strong as the Left Bower
AND/OR
There are at least 2 non-trump aces in hand

OR

There are 3 trump cards in hand
AND
There are no non-trump aces in hand
AND
The Right Bower is present in hand
AND/OR
An opponent is NOT the dealer

OR

There are 3 trump cards in hand
AND
There is at least one non-trump ace in hand
AND
The strongest card in hand is weaker than the Left Bower

OR

There are 3 trump cards in hand
AND
There is exactly one non-trump ace in hand
AND
The strongest card in hand is at least as strong as the Left Bower
AND
There are 2 distinct non-trump suits in hand


The player should go alone if:

There are 3 trump cards in hand
AND
There is exactly one non-trump ace in hand
AND
The strongest card in hand is at least as strong as the Left Bower
AND
There are less than 2 distinct non-trump suits in hand

OR

There are 3 trump cards in hand
AND
There are at least two non-trump aces in hand
AND
The strongest card in hand is at least as strong as the Left Bower

OR

There are 4-5 trump cards in hand


These trees provide good recommendations for what decisions to make, but they are by no means perfect. There are definitely exceptions that just happen to get grouped in a specific leaf. For example, although it isn’t explicitly shown in the tree models, nobody should ever go alone if an opponent is picking up a jack, regardless of how good their hand is.

It is also important to note that for the pink leaf scenarios, bidding trump is recommended in dire situations (when the opposing team has 9 points and only needs one point left to score). A little less than half of the records in the pink leafs are successful bids, so it would not hurt to bid trump in these scenarios. This is assuming a scenario where an opponent is not picking up a jack card.

On the flip side, the green leaf scenarios should only “order trump alone” if the player’s team has 7 points or less. This is because if a team has 8 points, both players can play the round and can win 2 points, which would win the game.

Not all the leafs should necessarily be followed all the time. For example, if there are 4 weak trump cards in hand, or an opponent is picking up a jack, maybe it would make more sense to order trump NOT alone instead of going alone.

Finally, if someone is starting first and they have a good hand (using the turned up suit), but they have a better hand using another suit, they should opt to PASS. This is because they will have the first opportunity to bid for a suit that wasn’t turned up. If another player makes a bid on the turned up suit, it’s okay since they already have a decent hand for it.


Evaluate results for the decision trees

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls
default.large.tree 0.42940 0.8700445 0.5260445 0.78290 0.2318841 0.1632224
default.pruned.tree 0.43205 0.8613537 0.5214663 0.78130 0.2425771 0.1575563
weighted.large.tree 0.43225 0.8541628 0.5242543 0.77475 0.2159686 0.1815236
weighted.pruned.tree 0.43450 0.8510773 0.5274358 0.77495 0.2318014 0.1725363

Although the default pruned tree looks to be the better model compared to the weighted pruned tree, it doesn’t seem to have better classification logic compared to the weighted tree. For instance, the leaf in the default pruned tree that bids trump when there is one trump card seems like an awful decision in most cases. Furthermore, that model is very picky on classifying 2s while the weighted model does a better job of doing so.

Since this model is for the first part of the bidding process, it would make sense to go with a less aggressive model (lower euchre % and higher missed call %) as players will have the chance to bid again on a different suit. Therefore, the weighted pruned tree model will be used.

euchre_turn_up.tree = weighted.pruned.tree


Random Forest

Ideally, this model can be used to determine which predictors are the most important and if the predictions align with the other created models.

A random forest uses many decision trees and combines the outputs to reach a singular result.

euchre_turn_up.rf = randomForest(correct_call ~ ., data = euchre_turn_up_train, ntree = 300, mtry = 5, nodesize = 1, importance = TRUE)
varImpPlot(euchre_turn_up.rf, type = 2)

The random forest importance plot shows that the top 6 most important predictors are the count of trump cards, the strongest card value, the count of non-trump aces, what the opponent picks up as the dealer, the count of distinct non-trump suits, and whether the Right Bower is present. This is consistent with the predictors selected in other models.


Show random forest model performance

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.rf 0.4263 0.8765961 0.5437038 0.7813 0.2337309 0.1673021

From a glance, the result metrics don’t seem to be vastly different than the other models. Therefore, it can be fair to assume that the insights found from the random forest model can be applied to the other models.


K Nearest Neighbors

In order to build K nearest neighbors models, all the data needs to be numerical. In addition, it is important that all variables are normalized. This way, all variables carry the same amount of weight when making predictions.

# Create dummy variables for categorical fields
euchre_turn_up_train_w_dummies = dummy_cols(euchre_turn_up_train, select_columns = c("Broad_Start_Order", "broad_p_pick_up", "broad_opp_pick_up"), remove_selected_columns = TRUE)

euchre_turn_up_test_w_dummies = dummy_cols(euchre_turn_up_test, select_columns = c("Broad_Start_Order", "broad_p_pick_up", "broad_opp_pick_up"), remove_selected_columns = TRUE)

# Normalize all predictor variables so each value is between zero and one
euchre_turn_up_train_w_dummies.norm = euchre_turn_up_train_w_dummies
euchre_turn_up_test_w_dummies.norm = euchre_turn_up_test_w_dummies

cols = colnames(euchre_turn_up_train_w_dummies.norm[,-18])
for(i in cols){
  euchre_turn_up_train_w_dummies.norm[[i]] = (euchre_turn_up_train_w_dummies.norm[[i]] - min(euchre_turn_up_train_w_dummies[[i]]))/(max(euchre_turn_up_train_w_dummies[[i]]) - min(euchre_turn_up_train_w_dummies[[i]]))
  euchre_turn_up_test_w_dummies.norm[[i]] = (euchre_turn_up_test_w_dummies.norm[[i]] - min(euchre_turn_up_train_w_dummies[[i]]))/(max(euchre_turn_up_train_w_dummies[[i]]) - min(euchre_turn_up_train_w_dummies[[i]]))
}


K Nearest Neighbors with all variables

# Create dataframe with unique observations to prevent excess number of ties
distinct_euchre_turn_up_train_w_dummies.norm = unique(euchre_turn_up_train_w_dummies.norm)

lg_loss_cost.df = data.frame(k=seq(26, 50, 1),
                          loss_cost = rep(0, 25))
# Look for which K value performs the best. Look for K values between 26-50 to use
for(i in 1:25){
  knn.pred = knn(distinct_euchre_turn_up_train_w_dummies.norm[,-18],
                 euchre_turn_up_test_w_dummies.norm[,-18],
                 cl = distinct_euchre_turn_up_train_w_dummies.norm$correct_call,
                 k=i+25)
  result.iter = results_test
  result.iter$model_bid = knn.pred
  lg_loss_cost.df[i,2] = find_loss_cost(result.iter, 'model_bid') # Record loss cost of model when testing for each K
}
# Use the average lowest loss cost value to get the best K value
best_full_k = lg_loss_cost.df %>%
  mutate(min_score = min(loss_cost)) %>%
  filter(loss_cost == min_score) %>%
  mutate(min_k = min(k)) %>%
  filter(k == min_k)
best_full_k = best_full_k$k

# Build KNN model with the best performing K value
euchre_turn_up.full.knn = knn(train = distinct_euchre_turn_up_train_w_dummies.norm[,-18],
                              test = euchre_turn_up_test_w_dummies.norm[,-18],
                              cl = distinct_euchre_turn_up_train_w_dummies.norm$correct_call,
                              k=best_full_k)

paste0('The best K for the full KNN model is ', best_full_k)
## [1] "The best K for the full KNN model is 29"


K Nearest Neighbors with the 6 most important variables from the random forest model

# Use the top 6 rf variables
top_rf_var = c('Count_Trump', 'Strongest_Card', 'Ace_count', 'Non_Trump_Suit_Count', 'Right_Bower', 'broad_opp_pick_up')
euchre_turn_up_train_w_dummies.norm_subset = euchre_turn_up_train_w_dummies.norm %>%
  select(matches(top_rf_var), 'correct_call')
euchre_turn_up_test_w_dummies.norm_subset = euchre_turn_up_test_w_dummies.norm %>%
  select(matches(top_rf_var), 'correct_call')

distinct_euchre_turn_up_train_w_dummies.norm_subset = unique(euchre_turn_up_train_w_dummies.norm_subset)

sm_loss_cost.df = data.frame(k=seq(26, 50, 1),
                             loss_cost = rep(0, 25))
# Look for which K value performs the best. Look for K values between 26-50 to use
for(i in 1:25){
  knn.pred = knn(distinct_euchre_turn_up_train_w_dummies.norm_subset[,-9],
                 euchre_turn_up_test_w_dummies.norm_subset[,-9],
                 cl = distinct_euchre_turn_up_train_w_dummies.norm_subset$correct_call,
                 k=i+25)
  result.iter = results_test
  result.iter$model_bid = knn.pred
  sm_loss_cost.df[i,2] = find_loss_cost(result.iter, 'model_bid') # Record loss cost of model when testing for each K
}
# Use the average lowest loss cost value to get the best K value
best_small_k = sm_loss_cost.df %>%
  mutate(min_score = min(loss_cost)) %>%
  filter(loss_cost == min_score) %>%
  mutate(min_k = min(k)) %>%
  filter(k == min_k)
best_small_k = best_small_k$k

# Build KNN model with the best performing K value
euchre_turn_up.small.knn = knn(train = distinct_euchre_turn_up_train_w_dummies.norm_subset[,-9],
                              test = euchre_turn_up_test_w_dummies.norm_subset[,-9],
                              cl = distinct_euchre_turn_up_train_w_dummies.norm_subset$correct_call,
                              k=best_small_k)

paste0('The best K for the small KNN model is ', best_small_k)
## [1] "The best K for the small KNN model is 41"


Evaluate performance of KNN models

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.full.knn 0.4400 0.4556910 0.76735 0.2296521 0.1810106
euchre_turn_up.small.knn 0.5588 0.3771277 0.66905 0.4022754 0.1993133

The full KNN model performs a lot better, so that will be used.

turn_up_best_k = best_full_k


Discriminant Analysis

The two types of discriminant analysis tested will be mixture discriminant analysis (MDA) and flexible discriminant analysis (FDA).

MDA assumes that classes have a Gaussian mixture of sub classes. This means that there could be clusters that belong to the same class that are scattered in multiple different groups. This algorithm does a great job at identifying multiple areas in which a class has a high frequency of observations and predicts based on those patterns.

FDA is similar to linear discriminant analysis, but it allows for non-linear combinations, making predictions more flexible. This is beneficial for variables that have non-linear relationships within each group.

Discriminant analysis will be applied using the top 6 most important random forest predictors. This is for more capable model processing and so no NAs get included in subscripted assignments.

euchre_turn_up_train_subset = euchre_turn_up_train %>%
  select(matches(top_rf_var), 'correct_call')


euchre_turn_up.mda = mda(correct_call ~., data = euchre_turn_up_train_subset)
euchre_turn_up.fda = fda(correct_call ~., data = euchre_turn_up_train_subset)


Evaluate performance of discriminant analysis models

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.mda 0.47375 0.5185298 0.7497 0.3035525 0.1653019
euchre_turn_up.fda 0.42340 0.5365013 0.7791 0.2263044 0.1701671

The FDA model has better results and seems to fit the data better compared to the MDA model. Therefore, FDA will be used as the discriminant analysis model.

euchre_turn_up.da = euchre_turn_up.fda


Two-Step Logistic Regression

Every model made so far has been a multiclass model. Let’s see how well 2 integrated logistic regression models perform. This is a two-step process where the first logistic regression model will assume a binomial distribution and predict whether an observation is a 0 or 1. The predicted 1s are then filtered and another logistic regression model will assume another binomial distribution and predict whether those observations are 1s or 2s.

Like the orm models, the cutoffs will either be decided by the class with the highest probability or a simulated cutoff value for both classes. Whichever cutoff method yields the better results for each best performing logistic regression model will be used.


Create the first cutoff value

This is based on the distribution of hands classified as zero. Similar to making the orm cutoffs, data is randomly selected and the proportion of records classified as 0s are recorded. The cutoff value assigned will be Nth percentile of the logit made on a logistic regression model with all variables. N represents the proportion of records classified as 0s. The process is repeated 100 times and the average cutoff score is used.

cutoff_1_list_lm = c()
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_up1), nrow(euchre_turn_up1) * 0.80) # Take 80% of data to test
  euchre_turn_up_train_sample = euchre_turn_up1[sample_index, ]
  full.logr1 = glm(correct_call ~ ., data = euchre_turn_up_train_sample, family = binomial)
  pass_proportion = nrow(euchre_turn_up_train_sample[euchre_turn_up_train_sample$correct_call==0,])/nrow(euchre_turn_up_train_sample)
  y = predict(full.logr1, type = 'response')
  sample_cutoff_1_lm = quantile(y, pass_proportion)
  cutoff_1_list_lm = append(cutoff_1_list_lm, sample_cutoff_1_lm)
  
}
cutoff_1_lm = mean(cutoff_1_list_lm)


Create the second cutoff value

Using the 1st full logistic regression model and the 1st cutoff value that was calculated above, the data is filtered so only the records predicted as 1s are used to build the 2nd cutoff value. This 2nd cutoff value is created using the same process as above.

cutoff_2_list_lm = c()
full.logr1 = glm(correct_call ~ ., data = euchre_turn_up1, family = binomial)
euchre_turn_up2 = euchre_turn_up1
euchre_turn_up2$prediction = predict(full.logr1, type = 'response', newdata = euchre_turn_up2)
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_up2), nrow(euchre_turn_up2) * 0.80)
  euchre_turn_up_train_sample = euchre_turn_up2[sample_index, ]
  euchre_turn_up_train_sample = euchre_turn_up_train_sample[euchre_turn_up_train_sample$prediction >= cutoff_1_lm, ]
  euchre_turn_up_train_sample = subset(euchre_turn_up_train_sample, select=-c(prediction))
  full.logr2 = glm(correct_call ~ ., data = euchre_turn_up_train_sample, family = binomial)
  loner_proportion = nrow(euchre_turn_up_train_sample[euchre_turn_up_train_sample$correct_call==2,])/nrow(euchre_turn_up_train_sample)
  y = predict(full.logr2, type = 'response')
  sample_cutoff_2_lm = quantile(y, loner_proportion)
  cutoff_2_list_lm = append(cutoff_2_list_lm, sample_cutoff_2_lm)
}
cutoff_2_lm = mean(cutoff_2_list_lm)


Create first logistic regression models

These models will be made using forward and backward stepwise regression, using AIC and BIC as the performance metrics.

euchre_turn_up_train1 = euchre_turn_up_train
euchre_turn_up_train1$correct_call1 = ifelse(euchre_turn_up_train1$correct_call == 0, 0, 1) # Assign new variable that converts 2s into 1s

lm1.full = glm(correct_call1 ~. - correct_call, family = binomial, data = euchre_turn_up_train1)

null.lm1 = glm(correct_call1 ~ 1, data = euchre_turn_up_train1)

fwd.lm1.bic = step(null.lm1, scope = list(lower = null.lm1, upper = lm1.full), direction = "forward", k = log(nrow(euchre_turn_up_train1)))

bwd.lm1.bic = step(lm1.full, direction = "backward", k = log(nrow(euchre_turn_up_train1)))

fwd.lm1.aic = step(null.lm1, scope = list(lower = null.lm1, upper = lm1.full), direction = "forward")

bwd.lm1.aic = step(lm1.full, direction = "backward")
Name Accuracy F1_Score Accuracy_adj F1_Score_adj BIC AIC
lm1.full 0.8164 0.7372 0.8162 0.7483 64043.66 63857.87
fwd.lm1.bic 0.8110 0.7188 0.8147 0.7456 66749.05 66591.12
bwd.lm1.bic 0.8160 0.7374 0.8164 0.7487 64019.18 63861.26
fwd.lm1.aic 0.8117 0.7202 0.8152 0.7463 66755.93 66579.43
bwd.lm1.aic 0.8164 0.7375 0.8166 0.7489 64033.43 63856.92

The bwd.lm1.bic model with customized cutoff appears to be the best option here due its great results and simplicity (lowest BIC).

bwd.lm1.bic
## 
## Call:  glm(formula = correct_call1 ~ Right_Bower + Left_Bower + TAce + 
##     TKing + TQueen + TTen + TNine + Ace_count + King_count + 
##     Queen_count + Non_Trump_Suit_Count + Strongest_Card + broad_p_pick_up + 
##     broad_opp_pick_up, family = binomial, data = euchre_turn_up_train1)
## 
## Coefficients:
##            (Intercept)             Right_Bower              Left_Bower  
##               -3.27500                 3.49413                 2.75166  
##                   TAce                   TKing                  TQueen  
##                2.21404                 1.87284                 1.68397  
##                   TTen                   TNine               Ace_count  
##                1.52686                 1.48304                 1.27490  
##             King_count             Queen_count    Non_Trump_Suit_Count  
##                0.23861                 0.08143                -0.20970  
##         Strongest_Card    broad_p_pick_upOther     broad_p_pick_upJack  
##               -0.11021                 0.37200                 0.69343  
## broad_opp_pick_upOther   broad_opp_pick_upJack  
##               -0.89119                -1.36359  
## 
## Degrees of Freedom: 79999 Total (i.e. Null);  79983 Residual
## Null Deviance:       105100 
## Residual Deviance: 63830     AIC: 63860

The 14 variables and their respective coefficients seem to make reasonable sense for the most part.

euchre_turn_up.lm1 = bwd.lm1.bic


Preprocess the data so a new dataframe is created using only the observations predicted as 1s

euchre_turn_up_train1 = euchre_turn_up_train
euchre_turn_up_train1$correct_call1 = ifelse(euchre_turn_up_train1$correct_call == 0, 0, 1) # Preprocess data to create new variable that classifies 2s as 1s so the outcome is binary

# Make predictions using the first logistic regression model
logr1_results = euchre_all_test_turn_up
logr1_results$model_bid = ifelse(predict(euchre_turn_up.lm1, newdata = logr1_results, type = 'response') < cutoff_1_lm, 0, 1)
euchre_turn_up_train1$prediction1 = ifelse(predict(euchre_turn_up.lm1, type = 'response') < cutoff_1_lm, 0, 1) # Define predicted variables using custom-made cutoff value

# Create new dataframe that has predicted 1s from model
euchre_turn_up_train2 = euchre_turn_up_train1[euchre_turn_up_train1$prediction1 != 0, ]
euchre_turn_up_train2 = subset(euchre_turn_up_train2, select=-c(correct_call1, prediction1))
euchre_turn_up_train2$correct_call2 = ifelse(euchre_turn_up_train2$correct_call == 2, 1, 0)


Create second logistic regression models

These models will also be made using forward and backward stepwise regression, using AIC and BIC as the performance metrics.

lm2.full = glm(correct_call2 ~. - correct_call, family = binomial, data = euchre_turn_up_train2)

null.lm2 = glm(correct_call2 ~ 1, data = euchre_turn_up_train2)

fwd.lm2.bic = step(null.lm2, scope = list(lower = null.lm2, upper = lm2.full), direction = "forward", k = log(nrow(euchre_turn_up_train2)))

bwd.lm2.bic = step(lm2.full, direction = "backward", k = log(nrow(euchre_turn_up_train2)))

fwd.lm2.aic = step(null.lm2, scope = list(lower = null.lm2, upper = lm2.full), direction = "forward")

bwd.lm2.aic = step(lm2.full, direction = "backward")
Name Accuracy F1_Score Accuracy_adj F1_Score_adj BIC AIC
lm2.full 0.6628 0.1461 0.6617 0.1409 12789.548 12615.583
fwd.lm2.bic 0.6492 0.0781 0.6478 0.0707 6147.910 6015.364
bwd.lm2.bic 0.6626 0.1456 0.6617 0.1409 12789.814 12615.849
fwd.lm2.aic 0.6491 0.0775 0.6478 0.0710 6159.771 6010.658
bwd.lm2.aic 0.6626 0.1456 0.6617 0.1409 12789.814 12615.849

The lm2.full model with the default 0.5 cutoff will be used due to it having the best F1 score and accuracy when comparing all the models. There are a couple other models that have similar performance metrics with slightly fewer variables, but to a negligible extent.

lm2.full
## 
## Call:  glm(formula = correct_call2 ~ . - correct_call, family = binomial, 
##     data = euchre_turn_up_train2)
## 
## Coefficients:
##             (Intercept)              Right_Bower               Left_Bower  
##               2.009e+11                3.207e+00                3.202e+00  
##                    TAce                    TKing                   TQueen  
##               2.410e+00                2.218e+00                2.048e+00  
##                    TTen                    TNine                Ace_count  
##               2.023e+00                2.091e+00                2.144e+00  
##              King_count              Queen_count               Jack_count  
##               8.392e-01                3.306e-01                4.036e-02  
##               Ten_count               Nine_count              Count_Trump  
##               5.977e-03                       NA                       NA  
##    Non_Trump_Suit_Count                   Dealer           Strongest_Card  
##              -1.305e+00               -2.009e+11                1.900e-01  
## Broad_Start_OrderMiddle    Broad_Start_OrderLast     broad_p_pick_upOther  
##              -3.917e-01                       NA               -2.009e+11  
##     broad_p_pick_upJack   broad_opp_pick_upOther    broad_opp_pick_upJack  
##              -2.009e+11               -2.009e+11               -2.009e+11  
## 
## Degrees of Freedom: 29262 Total (i.e. Null);  29242 Residual
## Null Deviance:       21310 
## Residual Deviance: 12570     AIC: 12620

Some of these variables and their respective coefficients seem to be reasonable. However, there are some coefficients that do not look right or are missing. This is most likely due to the fact that it can be more difficult for this model to differentiate between a good hand and a great hand. This could explain the lackluster performance metrics shown above.

euchre_turn_up.lm2 = lm2.full


Combine the results of the two logistic regression models and evaluate the result

logr2_results = logr1_results[logr1_results$model_bid != 0, ]
logr1_results = logr1_results[logr1_results$model_bid == 0, ]

logr2_results$model_bid = ifelse(predict(euchre_turn_up.lm2, newdata = logr2_results, type = 'response') < 0.5, 1, 2)


turn_up_results_two_lms = rbind(logr1_results[, c('correct_call', 'team_ind_tricks_won', 'team_tricks_won', 'alone_tricks_won', 'model_bid')], logr2_results[, c('correct_call', 'team_ind_tricks_won', 'team_tricks_won', 'alone_tricks_won', 'model_bid')])

Build_metrics_table_rslt_tbl(turn_up_results_two_lms, 'euchre_turn_up.2lm', 'model_bid')

euchre_turn_up.2lm_stats %>%
  kbl() %>%
  kable_styling()
model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.2lm 0.42555 0.5594429 0.78725 0.2505139 0.1451626

This method also does a good job with classification.


Create dataframe that makes predictions using all 7 of the best algorithms analyzed

The goal is to compare how each of the best algorithms perform and to determine if combining all these models together would yield significantly better results.

# Build logistic regression results
euchre_2lm_prediction1 = euchre_all_test_turn_up
euchre_2lm_prediction1$two_lm_bid = ifelse(predict(euchre_turn_up.lm1, newdata = euchre_all_test_turn_up, type = 'response') < cutoff_1_lm, 0, 1)
euchre_2lm_prediction2 = euchre_2lm_prediction1[euchre_2lm_prediction1$two_lm_bid == 1,]
euchre_2lm_prediction1 = euchre_2lm_prediction1[euchre_2lm_prediction1$two_lm_bid == 0,]
euchre_2lm_prediction2$two_lm_bid = ifelse(predict(euchre_turn_up.lm2, newdata = euchre_2lm_prediction2, type = 'response') < 0.5, 1, 2)
euchre_test_predictions = rbind(euchre_2lm_prediction1, euchre_2lm_prediction2)

# Build ordinal regression results
euchre_test_predictions$orm_bid = predict(euchre_turn_up.orm, newdata = euchre_test_predictions, type = 'lp')
euchre_test_predictions = euchre_test_predictions %>%
  mutate(orm_bid = case_when(orm_bid<=cutoff_1~0, orm_bid<=cutoff_2~1, TRUE~2))

# Build multinomial results
euchre_test_predictions$multinom_bid = as.numeric(predict(euchre_turn_up.multinom, newdata = euchre_test_predictions))-1

# Build decision tree results
euchre_test_predictions$tree_bid = as.numeric(predict(euchre_turn_up.tree, newdata = euchre_test_predictions, type = 'class'))-1

# Build random forest results
euchre_test_predictions$random_forest_bid = as.numeric(predict(euchre_turn_up.rf, newdata = euchre_test_predictions, type = 'class'))-1

# Build K nearest neighbors results
euchre_test_predictions_w_dummies = dummy_cols(euchre_test_predictions, select_columns = c("Broad_Start_Order", "broad_ind_pick_up", "broad_p_pick_up", "broad_opp_pick_up"), remove_selected_columns = TRUE)

euchre_test_predictions_w_dummies = euchre_test_predictions_w_dummies %>% 
  select(-matches('_bid'), -matches('_tricks_won'), -'predicted_call')


cols = colnames(euchre_test_predictions_w_dummies[,-18])
for(i in cols){
  euchre_test_predictions_w_dummies[[i]] = (euchre_test_predictions_w_dummies[[i]] - min(euchre_turn_up_train_w_dummies[[i]]))/(max(euchre_turn_up_train_w_dummies[[i]]) - min(euchre_turn_up_train_w_dummies[[i]]))
}
euchre_test_predictions$knn_bid = as.numeric(knn(train = distinct_euchre_turn_up_train_w_dummies.norm[,-18],
                                      test = euchre_test_predictions_w_dummies[,-18],
                                      cl=distinct_euchre_turn_up_train_w_dummies.norm$correct_call,
                                      k=turn_up_best_k))-1

# Build discriminant analysis results
euchre_test_predictions$da_bid = as.numeric(predict(euchre_turn_up.da, newdata = euchre_test_predictions))-1

# Take average score of all model predictions for an observation and assign value as ensemble bid
euchre_test_predictions$top_bid = round((euchre_test_predictions$two_lm_bid +
                                          euchre_test_predictions$orm_bid +
                                          euchre_test_predictions$multinom_bid +
                                          euchre_test_predictions$tree_bid +
                                          euchre_test_predictions$random_forest_bid + 
                                           euchre_test_predictions$knn_bid + 
                                           euchre_test_predictions$da_bid)/7, 0)


Create results table to see which top algorithms perform the best

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.Intuition 0.49380 0.4137758 0.70355 0.1068951 0.2919824
euchre_turn_up.2lm 0.42555 0.5594429 0.78725 0.2505139 0.1451626
euchre_turn_up.orm 0.42430 0.5537159 0.78110 0.2518823 0.1455691
euchre_turn_up.multinom 0.41880 0.5545199 0.78670 0.2256310 0.1635657
euchre_turn_up.tree 0.42900 0.5194332 0.76590 0.1914933 0.2044602
euchre_turn_up.rf 0.42620 0.5438582 0.78140 0.2338661 0.1669659
euchre_turn_up.knn 0.43970 0.4580087 0.76810 0.2296048 0.1805596
euchre_turn_up.da 0.42340 0.5365013 0.77910 0.2263044 0.1701671
euchre_turn_up.Ensemble 0.41735 0.5592172 0.78865 0.2311300 0.1568642

From a glance, it is obvious that every single model above has drastic improvement over the default personal method of bidding trump. The practical tree model that can be used in a real-life Euchre game has the worst accuracy out of all the models, but even that still performs way better than the default method.

The ensemble model that combines all 7 models performs the best, but not by much. In comparison to each individual model, it has the best accuracy and the best loss cost.

The ordinal regression, multinomial regression, and 2-step logistic regression models all perform the best, although they are not very interpretative. Any of these models can be integrated in a Euchre computer application to decide if, what, and how to bid trump.


Model testing on new data

For this next test, a new Euchre dataset will be used to assess the performance of the selected models on new, unseen data. Since this data is likely similar and probably has some matching observations to the training data, the model that performs the best on the training data will be used. The tree model will also be used because it is interpretive.

When comparing the model performances on the new test data, there will be 3 methods tested.

  1. The personal method (as a baseline)

  2. The decision tree model (it is interpretive and can be used in real-life Euchre games)

  3. The multinomial model (out of the 7 individual models, it has the second highest F1 and accuracy scores, and the best loss cost score. It is also simpler to use.)


Read in new validation data with 50,000 generated hands to test selected models on unseen data

new_euchre_turn_up = read.csv('euchre_turn_up_stats_test.csv')

Prepare and modify target and predictor variables

new_euchre_turn_up$predicted_call = replace(new_euchre_turn_up$predicted_call, new_euchre_turn_up$predicted_call == "Pass", 0)
new_euchre_turn_up$predicted_call = replace(new_euchre_turn_up$predicted_call, new_euchre_turn_up$predicted_call == "Order Trump", 1)
new_euchre_turn_up$predicted_call = replace(new_euchre_turn_up$predicted_call, new_euchre_turn_up$predicted_call == "Order Trump Alone", 2)
new_euchre_turn_up$correct_call = replace(new_euchre_turn_up$correct_call, new_euchre_turn_up$correct_call == "Pass", 0)
new_euchre_turn_up$correct_call = replace(new_euchre_turn_up$correct_call, new_euchre_turn_up$correct_call == "Order Trump", 1)
new_euchre_turn_up$correct_call = replace(new_euchre_turn_up$correct_call, new_euchre_turn_up$correct_call == "Order Trump Alone", 2)

new_euchre_turn_up$predicted_call = as.factor(new_euchre_turn_up$predicted_call)
new_euchre_turn_up$correct_call = as.factor(new_euchre_turn_up$correct_call)

Preprocess the data the same way the training data was

# Create the broad picked up variables that were used in the training set and convert characters to factors

new_euchre_turn_up$ind_picked_up[new_euchre_turn_up$ind_picked_up == ''] = 'None'
new_euchre_turn_up$p_picked_up[new_euchre_turn_up$p_picked_up == ''] = 'None'
new_euchre_turn_up$opp_picked_up[new_euchre_turn_up$opp_picked_up == ''] = 'None'

new_euchre_turn_up = new_euchre_turn_up %>%
  mutate(broad_ind_pick_up = case_when(ind_picked_up == 'Jack' ~ 'Jack', 
                                       ind_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'),
         broad_p_pick_up = case_when(p_picked_up == 'Jack' ~ 'Jack', 
                                       p_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'),
         broad_opp_pick_up = case_when(opp_picked_up == 'Jack' ~ 'Jack', 
                                       opp_picked_up == 'None' ~ 'None', 
                                       TRUE ~ 'Other'))

# Convert characters to factors
new_euchre_turn_up$broad_ind_pick_up = factor(new_euchre_turn_up$broad_ind_pick_up, levels = c('None', 'Other', 'Jack'))
new_euchre_turn_up$broad_p_pick_up = factor(new_euchre_turn_up$broad_p_pick_up, levels = c('None', 'Other', 'Jack'))
new_euchre_turn_up$broad_opp_pick_up = factor(new_euchre_turn_up$broad_opp_pick_up, levels = c('None', 'Other', 'Jack'))
new_euchre_turn_up$Broad_Start_Order = factor(new_euchre_turn_up$Broad_Start_Order, levels = c('First', 'Middle', 'Last'))

Remove variables that were not used at all in model building

keep_variables = names(euchre_turn_up)
new_euchre_turn_up = new_euchre_turn_up %>%
  select(all_of(keep_variables))


Predict the target variable classes using the two selected models

# tree
new_euchre_turn_up$tree_pred = predict(euchre_turn_up.tree, newdata = new_euchre_turn_up, type = 'class')

# multinomial
new_euchre_turn_up$multinom_pred = predict(euchre_turn_up.multinom, newdata = new_euchre_turn_up, type = 'class')


Evaluate the performances of the 3 classification methods on the new data

Build_metrics_table_rslt_tbl(new_euchre_turn_up, 'euchre_turn_up.Intuition_test', 'predicted_call')
Build_metrics_table_rslt_tbl(new_euchre_turn_up, 'euchre_turn_up.tree_test', 'tree_pred')
Build_metrics_table_rslt_tbl(new_euchre_turn_up, 'euchre_turn_up.multinom_test', 'multinom_pred')

euchre_turn_up_validation_stats = bind_rows(euchre_turn_up.Intuition_test_stats,
                                            euchre_turn_up.tree_test_stats,
                                            euchre_turn_up.multinom_test_stats)
euchre_turn_up_validation_stats %>%
  kbl() %>%
  kable_styling()
model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_up.Intuition_test 0.49890 0.4167841 0.70168 0.1076852 0.2933679
euchre_turn_up.tree_test 0.42896 0.5242913 0.76744 0.1816973 0.2044650
euchre_turn_up.multinom_test 0.42418 0.5483185 0.78270 0.2239818 0.1679318

The multinomial model yields the best results, whereas the tree lags a little behind. However, the tree still does way better than the personal method. This means that I can win more Euchre games if I make bids using the tree model, instead of my own bidding method. This also means that a computer that makes bids using the multinomial model will be very difficult to beat if it gets half-decent cards.


Bidding on the Turned Up Suit Conclusion

Because the new data is similar to the training data, it makes sense that the model results are all similar. Since the data here will not be vastly different compared to what is seen in a real-life Euchre game, it can be concluded that these results are valid. Therefore, these machine learning models are able to be applied in Euchre games.

From this research, it has been determined that machine learning can be used to improve the 1st part of the bidding process of a Euchre game. This is both at a practical level and at a more complex level. Furthermore, it is shown that deciding to pick up a card relies on a lot of variables, but mainly these 6: the count of trump cards, the strongest card in hand, the count of non-trump aces, the count of distinct non-trump suits in hand, whether the Right Bower is present in hand, and the card value an opponent would potentially pick up if prompted.



Part 2: Bidding on a Non-Turned Up Suit

Load in Euchre dataframe simulated in Python

euchre_turn_down = read.csv('euchre_make_trump_stats_train.csv')
head(euchre_turn_down) %>%
  kbl() %>%
  kable_styling()
Suit Right_Bower Left_Bower TAce TKing TQueen TTen TNine Ace_count King_count Queen_count Jack_count Ten_count Nine_count Count_Trump Strength Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind Start_Order Dealer Strongest_Card Broad_Start_Order ind_picked_up p_picked_up opp_picked_up team_ind_tricks_won team_tricks_won alone_tricks_won predicted_call correct_call
Diamonds 0 1 0 0 0 1 0 0 0 2 1 0 0 2 31 2 0 1 0 12 First None None None 1 3 2 Pass Order Trump
Spades 1 0 0 1 0 0 1 0 1 1 0 0 0 3 39 1 1 4 1 13 Last None None None 4 4 4 Order Trump Order Trump
Clubs 0 1 1 0 0 0 1 1 0 0 0 0 1 3 37 2 1 3 0 12 Middle None None None 3 3 3 Pass Order Trump
Hearts 0 1 1 0 0 0 0 1 0 0 0 1 1 2 32 2 1 2 0 12 Middle None None None 2 4 3 Pass Order Trump
Clubs 0 1 0 0 0 1 0 0 0 1 2 0 0 2 30 2 0 1 0 12 First None None None 1 3 3 Pass Order Trump
Clubs 0 1 0 1 0 1 0 0 0 0 1 0 1 3 34 1 0 4 1 12 Last None None None 1 1 0 Pass Pass


Exploratory Data Analysis

summary(euchre_turn_down)
##      Suit            Right_Bower       Left_Bower          TAce       
##  Length:100000      Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Mode  :character   Median :0.0000   Median :0.0000   Median :0.0000  
##                     Mean   :0.4491   Mean   :0.2389   Mean   :0.3868  
##                     3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##                     Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      TKing            TQueen            TTen            TNine      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.000  
##  Mean   :0.3151   Mean   :0.2825   Mean   :0.2816   Mean   :0.276  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##    Ace_count        King_count      Queen_count       Jack_count    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.4466   Mean   :0.5192   Mean   :0.5477   Mean   :0.1473  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :3.0000   Max.   :3.0000   Max.   :3.0000   Max.   :2.0000  
##    Ten_count        Nine_count      Count_Trump      Strength    
##  Min.   :0.0000   Min.   :0.0000   Min.   :1.00   Min.   :12.00  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:2.00   1st Qu.:28.00  
##  Median :0.0000   Median :0.0000   Median :2.00   Median :32.00  
##  Mean   :0.5559   Mean   :0.5533   Mean   :2.23   Mean   :32.41  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:3.00   3rd Qu.:37.00  
##  Max.   :3.0000   Max.   :3.0000   Max.   :5.00   Max.   :55.00  
##  Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind  Start_Order  
##  Min.   :0.000        Min.   :0.0000                  Min.   :1.00  
##  1st Qu.:2.000        1st Qu.:0.0000                  1st Qu.:1.75  
##  Median :2.000        Median :0.0000                  Median :2.50  
##  Mean   :2.028        Mean   :0.3808                  Mean   :2.50  
##  3rd Qu.:2.000        3rd Qu.:1.0000                  3rd Qu.:3.25  
##  Max.   :3.000        Max.   :1.0000                  Max.   :4.00  
##      Dealer     Strongest_Card  Broad_Start_Order  ind_picked_up     
##  Min.   :0.00   Min.   : 7.00   Length:100000      Length:100000     
##  1st Qu.:0.00   1st Qu.:11.00   Class :character   Class :character  
##  Median :0.00   Median :12.00   Mode  :character   Mode  :character  
##  Mean   :0.25   Mean   :11.86                                        
##  3rd Qu.:0.25   3rd Qu.:13.00                                        
##  Max.   :1.00   Max.   :13.00                                        
##  p_picked_up        opp_picked_up      team_ind_tricks_won team_tricks_won
##  Length:100000      Length:100000      Min.   :0.000       Min.   :0.000  
##  Class :character   Class :character   1st Qu.:1.000       1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median :2.000       Median :3.000  
##                                        Mean   :1.921       Mean   :2.963  
##                                        3rd Qu.:3.000       3rd Qu.:4.000  
##                                        Max.   :5.000       Max.   :5.000  
##  alone_tricks_won predicted_call     correct_call      
##  Min.   :0.000    Length:100000      Length:100000     
##  1st Qu.:0.000    Class :character   Class :character  
##  Median :2.000    Mode  :character   Mode  :character  
##  Mean   :1.776                                         
##  3rd Qu.:3.000                                         
##  Max.   :5.000

All the numerical values have the correct distributions associated. There can only be 0 or 1 of each trump card in a hand because only 1 suit can be trump. Also, the non-trump cards all have values that range from 0-3 with the exception of jacks. This is because there can be 3 non-trump suits. There can only be 2 non-trump jacks because of the Right Bower and the Left Bower, which means that 2 jack values are always trump.

There can only be 5 cards dealt to a hand and the algorithm picks the best suit in hand, which is why the “Count_Trump” values range from 1-5. In addition, there are 4 suits in a card game and one of them will be trump. Therefore, the maximum number of non-trump suits in a hand is 3. This value can be as low as 0 if a player only has trump cards.

All the other binary variables all have values that are either 0 or 1.

All this information concludes that this dataset is perfectly clean.


Outcome variable and predicted outcome variable

Here are the values that make up the target variable “correct_call” and the predicted target variable “predicted_call.”

Order Trump Alone - When a player has a hand with many cards that are the same suit, they can opt to “go alone.” If they win all 5 tricks, their team wins 4 points. If they win 3-4 tricks, their team gets 1 point. If they win less than 3 tricks, the other team wins 2 points. A player should “Order Trump Alone” when they believe they can win all the tricks without help from their partner.

Order Trump - When a player has a hand with a decent number of cards that are the same suit, they can opt to “order trump.” They play the round with their partner and can win 2 points if they win all 5 tricks, and 1 point if they win 3-4 tricks. If they win less than 3 tricks, the opposing team gets 2 points.

Pass - A player will “pass” if they simply don’t have a good hand that would win them a lot of tricks.


For simplicity, the target variables will get ordinal numerical imputations

Order Trump Alone - 2

Order Trump - 1

Pass - 0

euchre_turn_down$predicted_call = replace(euchre_turn_down$predicted_call, euchre_turn_down$predicted_call == "Pass", 0)
euchre_turn_down$predicted_call = replace(euchre_turn_down$predicted_call, euchre_turn_down$predicted_call == "Order Trump", 1)
euchre_turn_down$predicted_call = replace(euchre_turn_down$predicted_call, euchre_turn_down$predicted_call == "Order Trump Alone", 2)
euchre_turn_down$correct_call = replace(euchre_turn_down$correct_call, euchre_turn_down$correct_call == "Pass", 0)
euchre_turn_down$correct_call = replace(euchre_turn_down$correct_call, euchre_turn_down$correct_call == "Order Trump", 1)
euchre_turn_down$correct_call = replace(euchre_turn_down$correct_call, euchre_turn_down$correct_call == "Order Trump Alone", 2)
euchre_turn_down$ind_picked_up[euchre_turn_down$ind_picked_up == ''] = 'None'
euchre_turn_down$p_picked_up[euchre_turn_down$p_picked_up == ''] = 'None'
euchre_turn_down$opp_picked_up[euchre_turn_down$opp_picked_up == ''] = 'None'


View number of records that are associated with each output

table(euchre_turn_down$correct_call)
## 
##     0     1     2 
## 41438 51215  7347

The majority of records (~59%) are good enough to make a bid on.


View handmade predictions

table(euchre_turn_down$correct_call, euchre_turn_down$predicted_call, dnn=c('True', 'Pred'))
##     Pred
## True     0     1     2
##    0 38226  3049   163
##    1 32909 15558  2748
##    2  1656  2244  3447

From this data, we can see that the vast majority of records are getting classified as zero. The handmade predictions, based on intuition, are not aggressive at bidding trump at all. It looks like there is a lot of room for improvement here.


Change data type of columns for plotting

euchre_turn_down[,names(euchre_turn_down)] = lapply(euchre_turn_down[,names(euchre_turn_down)], as.factor)
euchre_turn_down$Strength = as.integer(euchre_turn_down$Strength)


Convert character variables to factors

euchre_turn_down$ind_picked_up = factor(euchre_turn_down$ind_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_down$p_picked_up = factor(euchre_turn_down$p_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_down$opp_picked_up = factor(euchre_turn_down$opp_picked_up, levels = c('None', 'Nine', 'Ten', 'Queen', 'King', 'Ace', 'Jack'))
euchre_turn_down$Broad_Start_Order = factor(euchre_turn_down$Broad_Start_Order, levels = c('First', 'Middle', 'Last'))


Show relationships between different variables and the correct call

for(column in names(euchre_turn_down[,2:25])){
  target = 'correct_call'
  column_name = column
  target_name = target
  
  
  if(class(euchre_turn_down[,column])=='integer' | class(euchre_turn_down[,column])=='numeric'){
    column = euchre_turn_down[, column]
    target = euchre_turn_down[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = target, y = column)) + 
      geom_boxplot() +
      labs(x = target_name, y = column_name) +
      coord_flip()
    print(p)
  }
  else{
    column = euchre_turn_down[, column]
    target = euchre_turn_down[, target]
    df = data.frame(column, target)
    p = ggplot(df, aes(x = column, fill = target)) + 
      ggtitle(paste('Proportion of', column_name, 'in each class')) +
      geom_bar(position = position_fill(reverse=TRUE)) +
      labs(x = column_name, y =' proportion', fill = target_name) +
      coord_flip()

    print(p)
  }
}

# end plots

Right Bower - Better hands significantly tend to have the Right Bower more frequently than not.

Left Bower - Better hands tend to have the Left Bower more frequently than not.

TAce - Better hands slightly tend to have the trump ace more frequently than not.

TKing - Better hands slightly tend to have the trump king more frequently than not.

TQueen - Better hands slightly tend to have the trump queen more frequently than not.

TTen - Better hands slightly tend to have the trump ten more frequently than not.

TNine - Better hands slightly tend to have the trump nine more frequently than not.

Ace_count - Better hands tend to have at least one non-trump ace.

King_count - Worse hands slightly tend to have more non-trump kings.

Queen_count - Worse hands tend to have more non-trump queens.

Jack_Count - Better hands tend to have at least one non-trump jack in hand.

Ten_count - Worse hands significantly tend to have more non-trump tens.

Nine_count - Worse hands significantly tend to have more non-trump nines.

Count_Trump - Having more trump cards is significantly more frequent in better hands.

Strength - Hands that go alone tend to be stronger while hands that pass tend to have lower strength. This variable is the sum of the assigned strength values of all cards in hand. It is difficult to calculate in a real-life game, so it will get removed.

Non_Trump_Suit_Count - Better hands tend to have less non-trump suits in hand.

Non_Trump_Turned_Down_Ind - There seems to be no effect on bidding a trump suit that is the same color as the turned down suit.

Start_Order - There is no apparent relationship between the initial start order and how well the respective hand performs.

Dealer - Being the dealer does not seem to make an impact on the round outcome.

Strongest_Card - Hands where the strongest card is 12 or 13 (Left or Right Bower present) seem to be indicative of positive results.

Broad_Start_Order - The initial broad starting order doesn’t seem to impact the round results. However, starting last tends to provide a slight advantage.

ind_picked_up - No cards were picked up in this simulation, so this variable is irrelevant.

p_picked_up - No cards were picked up in this simulation, so this variable is irrelevant.

opp_picked_up - No cards were picked up in this simulation, so this variable is irrelevant.


Remove irrelevant columns

The following columns were removed: Suit, Strength, Start_Order, ind_pick_up, p_pick_up, opp_pick_up,

euchre_turn_down = euchre_turn_down[, -c(1, 16, 19, 23:25)]


Create new dataframe that removes variables that are involved with the target variable (they create leakage)

Variables removed: team_ind_tricks_won, team_tricks_won, alone_tricks_won, predicted_call

Also, numerical variables are converted back to integers.

euchre_turn_down[,names(euchre_turn_down[,-c(19, 23:24)])] = lapply(euchre_turn_down[,names(euchre_turn_down[,-c(19, 23:24)])], as.character)
euchre_turn_down[,names(euchre_turn_down[,-c(19, 23:24)])] = lapply(euchre_turn_down[,names(euchre_turn_down[,-c(19, 23:24)])], as.integer)
euchre_turn_down1 = euchre_turn_down[, -c(20:23)]


Split euchre dataset into training and validation data to build models

80% of the dataset will be for training while 20% will be for validation.

set.seed(1)
sample_index = sample(nrow(euchre_turn_down1), nrow(euchre_turn_down1) * 0.80)
euchre_turn_down_train = euchre_turn_down1[sample_index, ]
euchre_turn_down_test = euchre_turn_down1[-sample_index, ]
euchre_turn_down_results = euchre_turn_down[, 20:24] # dataframe that only has result columns
results_train = euchre_turn_down_results[sample_index, ]
results_test = euchre_turn_down_results[-sample_index, ]
euchre_all_test_turn_down = euchre_turn_down[-sample_index, ]


Create functions that will be used to measure success of models

Here is what the functions do:

build_results - This predicts the correct call from a built model and creates a dataframe that stores the predictions. The other functions use the dataframe made from this.

find_loss_cost - This calculates the points that were failed to be earned when the model makes a call that is different than what should have been called. For each record, the potential number of points won (given the correct call was made) is calculated. The possible numbers in this field are 0 (because it is impossible to win any points when no bid should be made), 1, 2, and 4. Another field is added that calculates the points earned based off the model prediction. The numbers could be -2 (for hands that predicted 1 or 2 but were actually 0). They can also be 0 (predicted class is 0), 1, 2, and 4. A field gets made that calculates the absolute difference. The average number in this field is the loss cost value. There is also a feature included that records the average loss cost value at every misclassification type.

Get_Macro_AUC - This calculates the area under the curve for each of the 3 classes. The average is calculated from these 3 classes.

Macro_F1_Score_Compute - This calculates the F1 score for each of the 3 predictor classes. The average is calculated from these 3 classes.

Model_Accuracy - This measures the number of records that were correctly classified divided by the total number of records.

Get_Euchre_Call_Percent - This takes the number of records that are predicted as 1s and 2s but are actually 0s. This number gets divided by the total number of records predicted as 1s and 2s. The output is the percent of bids made that get euchred.

Get_Missed_Call_Percent - This takes the number of records that are predicted as 0s but are actually 1s and 2s. This number gets divided by the total number of records predicted as 0s. Essentially, this is the proportion of records that don’t get bid trump when they actually should have.

binary_F1Score - This calculates the F1 score for binary prediction classes.

build_lm1_metrics - This assesses results provided from a binary logistic regression model.

build_lm2_metrics - This assesses results provided from a second version of a binary logistic regression model.

Build_metrics_table - This uses a provided model and results table to create a new dataframe that measures the calculated metrics defined above.

Build_metrics_table_rslt_tbl - This uses a provided results table to create a new dataframe that measures the calculated metrics defined above (except Macro AUC).

build_results = function(results_df = results_test, model){
  if(model$call[[1]] == 'orm'){
    x = predict(model, newdata = euchre_turn_down_test, type = 'fitted.ind')
    results_df$lp_score = predict(model, newdata = euchre_turn_down_test, type = 'lp')
    results_df = results_df %>%
      mutate(adj_model_bid = case_when(lp_score<=cutoff_1~0, lp_score<=cutoff_2~1, TRUE~2))
  }
  else {
    x = predict(model, newdata = euchre_turn_down_test, type = 'prob')
  }
  x = as.data.frame(x)
  names(x)[1] = 'p_zero'
  names(x)[2] = 'p_one'
  names(x)[3] = 'p_two'
  results_df = cbind(results_df, x)
  results_df = results_df %>%
    mutate(best = pmax(p_zero, p_one, p_two),
           model_bid = case_when(p_zero == best~0, p_one == best~1, TRUE~2))
  assign(paste0('results_', deparse(substitute(model))), results_df, envir=.GlobalEnv)
}

find_loss_cost = function(df, test_column, return_avg_cost = TRUE, save_ind_costs = FALSE){
  df = df %>%
    mutate(potential_points_earned = case_when((team_tricks_won >= 3 & team_tricks_won < 5 & alone_tricks_won <5)~1, 
                                               (team_tricks_won == 5 & alone_tricks_won != 5)~2,
                                               (alone_tricks_won == 5)~4, TRUE~0),
           points_won = case_when((get(test_column) == 1 & team_tricks_won <3)~-2,
                                  (get(test_column) == 2 & alone_tricks_won <3)~-2,
                                  (get(test_column) == 1 & team_tricks_won >= 3 & team_tricks_won < 5)~1,
                                  (get(test_column) == 2 & alone_tricks_won >= 3 & alone_tricks_won < 5)~1,
                                  (get(test_column) == 1 & team_tricks_won == 5)~2,
                                  (get(test_column) == 2 & alone_tricks_won == 5)~4, TRUE~0),
           loss_cost = abs(points_won - potential_points_earned))
  if(save_ind_costs == TRUE){
    p0_a1 = mean(df[(df[, test_column] == 0 & df$correct_call == 1),]$loss_cost)
    p0_a2 = mean(df[(df[, test_column] == 0 & df$correct_call == 2),]$loss_cost)
    p1_a0 = mean(df[(df[, test_column] == 1 & df$correct_call == 0),]$loss_cost)
    p1_a2 = mean(df[(df[, test_column] == 1 & df$correct_call == 2),]$loss_cost)
    p2_a0 = mean(df[(df[, test_column] == 2 & df$correct_call == 0),]$loss_cost)
    p2_a1 = mean(df[(df[, test_column] == 2 & df$correct_call == 1),]$loss_cost)
    assign('p0_a1', p0_a1, envir = .GlobalEnv)
    assign('p0_a2', p0_a2, envir = .GlobalEnv)
    assign('p1_a0', p1_a0, envir = .GlobalEnv)
    assign('p1_a2', p1_a2, envir = .GlobalEnv)
    assign('p2_a0', p2_a0, envir = .GlobalEnv)
    assign('p2_a1', p2_a1, envir = .GlobalEnv)
  }
  
  if(return_avg_cost == TRUE){
    return(mean(df$loss_cost))
  }
}

Get_Macro_AUC = function(df){
  p_pass = df$p_zero
  p_order_trump = df$p_one
  p_go_alone = df$p_two
  bid.dummy = dummy_cols(df,
                           select_columns = "correct_call",
                           remove_first_dummy = FALSE,
                           remove_selected_columns = TRUE)
  pred0 = prediction(p_pass, bid.dummy$correct_call_0)
  perf0 = performance(pred0, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_0 = slot(performance(pred0, "auc"), "y.values")[[1]]
  
  pred1 = prediction(p_order_trump, bid.dummy$correct_call_1)
  perf1 = performance(pred1, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_1 = slot(performance(pred1, "auc"), "y.values")[[1]]
  
  pred2 = prediction(p_go_alone, bid.dummy$correct_call_2)
  perf2 = performance(pred2, "tpr", "fpr")
  #plot(perf1, colorize=F, print.cutoffs.at=seq(.1, by=.1))
  AUC_2 = slot(performance(pred2, "auc"), "y.values")[[1]]
  
  return((AUC_0+AUC_1+AUC_2)/3)
}

Macro_F1_Score_Compute = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  F1_0 = matrix[1]/(matrix[1]+matrix[2]+matrix[3]+matrix[4]+matrix[7])
  F1_1 = matrix[5]/(matrix[2]+matrix[5]+matrix[8]+matrix[4]+matrix[6])
  F1_2 = matrix[9]/(matrix[3]+matrix[6]+matrix[9]+matrix[7]+matrix[8])
  
  return((F1_0+F1_1+F1_2)/3)
}

Model_Accuracy = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  num_correct = matrix[1]+matrix[5]+matrix[9]
  total = matrix[1]+matrix[2]+matrix[3]+matrix[4]+matrix[5]+matrix[6]+matrix[7]+matrix[8]+matrix[9]
  return(num_correct/total)
}
  
Get_Euchre_Call_Percent = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  num_euchred = matrix[4]+matrix[7]
  total = matrix[4]+matrix[5]+matrix[6]+matrix[7]+matrix[8]+matrix[9]
  return(num_euchred/total)
}

Get_Missed_Call_Percent = function(correct_vector, predicted_vector){
  matrix = table(correct_vector, predicted_vector)
  missed_calls = matrix[2] + matrix[3]
  total = matrix[1] + matrix[2] + matrix[3]
  return(missed_calls/total)
}

binary_F1Score = function(matrix){
  precision = matrix[4]/(matrix[4]+matrix[3])
  recall = matrix[4]/(matrix[4]+matrix[2])
  score = round((2*precision*recall)/(precision+recall), 4)
}

build_lm1_metrics = function(model){
  logr1_results = euchre_all_test_turn_down
  logr1_results$correct_call1 = ifelse(logr1_results$correct_call == 0, 0, 1)
  logr1_results$model_prediction = ifelse(predict(model, newdata = logr1_results, type =
                                                         'response') <.5, 0, 1)
  logr1_results$model_adj_prediction = ifelse(predict(model, newdata = logr1_results,
                                                           type = 'response') < cutoff_1_lm, 0, 1)
  matrix = table(logr1_results$correct_call1, logr1_results$model_prediction, dnn= c("True", "Pred"))

  matrix_adj = table(logr1_results$correct_call1, logr1_results$model_adj_prediction, dnn = c("True", "Pred"))
  
  Name = deparse(substitute(model))
  Accuracy = round((matrix[1]+matrix[4])/sum(matrix), 4)
  F1_Score = binary_F1Score(matrix)
  Accuracy_adj = round((matrix_adj[1]+matrix_adj[4])/sum(matrix_adj), 4)
  F1_Score_adj = binary_F1Score(matrix_adj)
  BIC = BIC(model)
  AIC = AIC(model)
  
  model_df = data.frame(Name, Accuracy, F1_Score, Accuracy_adj, F1_Score_adj, BIC, AIC)
  assign('model_df', model_df, envir=.GlobalEnv)
  if(exists('lm1_models')){
    lm1_models = rbind(lm1_models, model_df)
  } else {
    lm1_models = model_df
  }
  assign('lm1_models', lm1_models, envir=.GlobalEnv)
  rm(model_df)
}

build_lm2_metrics = function(model){
  logr2_results = euchre_all_test_turn_down
  logr2_results$correct_call2 = ifelse(logr2_results$correct_call == 0, 0, 1)
  logr2_results$model_prediction = ifelse(predict(model, newdata = logr2_results, type =
                                                         'response') <.5, 0, 1)
  logr2_results$model_adj_prediction = ifelse(predict(model, newdata = logr2_results,
                                                           type = 'response') < cutoff_2_lm, 0, 1)
  matrix = table(logr2_results$correct_call2, logr2_results$model_prediction, dnn= c("True", "Pred"))

  matrix_adj = table(logr2_results$correct_call2, logr2_results$model_adj_prediction, dnn = c("True", "Pred"))
  
  Name = deparse(substitute(model))
  Accuracy = round((matrix[1]+matrix[4])/sum(matrix), 4)
  F1_Score = binary_F1Score(matrix)
  Accuracy_adj = round((matrix_adj[1]+matrix_adj[4])/sum(matrix_adj), 4)
  F1_Score_adj = binary_F1Score(matrix_adj)
  BIC = BIC(model)
  AIC = AIC(model)
  
  model_df = data.frame(Name, Accuracy, F1_Score, Accuracy_adj, F1_Score_adj, BIC, AIC)
  assign('model_df', model_df, envir=.GlobalEnv)
  if(exists('lm2_models')){
    lm2_models = rbind(lm2_models, model_df)
  } else {
    lm2_models = model_df
  }
  assign('lm2_models', lm2_models, envir=.GlobalEnv)
  rm(model_df)
}

Build_metrics_table = function(model){
  model_name = deparse(substitute(model))
  results_table = get(paste0('results_', model_name))
  df = data.frame(model_name = model_name)
  
  df$loss_cost = find_loss_cost(results_table, 'model_bid')
  df$macro_auc = Get_Macro_AUC(results_table)
  df$macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                       results_table$model_bid)
  df$accuracy = Model_Accuracy(results_table$correct_call,
                               results_table$model_bid)
  df$p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                               results_table$model_bid)
  df$p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                              results_table$model_bid)
    
  if(model$call[[1]] == 'orm'){
    df$adj_loss_cost = find_loss_cost(results_table, 'adj_model_bid')
    df$adj_macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                             results_table$adj_model_bid)
    df$adj_accuracy = Model_Accuracy(results_table$correct_call,
                                     results_table$adj_model_bid)
    df$adj_p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                                     results_table$adj_model_bid)
    df$adj_p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                                    results_table$adj_model_bid)
    df$BIC = BIC(model)
    df$AIC = AIC(model)
    df$num_predictors = length(model$assign)
  }
  if(model$call[[1]] == 'multinom'){
    df$BIC = BIC(model)
    df$AIC = AIC(model)
  }
  assign(paste0(deparse(substitute(model)), '_stats'), df, envir=.GlobalEnv)
}

Build_metrics_table_rslt_tbl = function(results_table, table_name, result_variable_name){
  df = data.frame(model_name = table_name)
  
  df$loss_cost = find_loss_cost(results_table, result_variable_name)
  df$macro_f1 = Macro_F1_Score_Compute(results_table$correct_call,
                                       results_table[,result_variable_name])
  df$accuracy = Model_Accuracy(results_table$correct_call,
                               results_table[,result_variable_name])
  df$p_euchred_calls = Get_Euchre_Call_Percent(results_table$correct_call,
                                               results_table[,result_variable_name])
  df$p_missed_calls = Get_Missed_Call_Percent(results_table$correct_call,
                                              results_table[,result_variable_name])
  assign(paste0(table_name, '_stats'), df, envir=.GlobalEnv)
}


Create and determine the best models

There are 6 of many algorithms that have been proven to work well with multiclass outputs:

ordinal regression - Through the rms library, the orm (ordinal regression model) function is great for multiclass predictions when the target variable is ordinal and the predictor variables are nominal. This is the case here as the best hands are classified as 2s while the worst hands are classified as 0s, in an ordinal fashion. The orm function outputs a linear predictor value that can be referred as a log odds value. The function can also make predictions based on the probability that a record could be in a class. For this analysis, both prediction methods will be explored. This model is set up so each variable measured has one predictor coefficient and 2 intercepts which measure two predicted classes.

multinomial regression - This is a form of logistic regression used when the outcome variable is not necessarily ordinal. It can be used through the multinom function though the nnet library. The algorithm creates separate coefficient values for each predicted class, except for a class which is set as the default. In this case, there will be two sets of coefficients used to create two logit values for the classes that will be converted to p-values. The p-value of the class not calculated is 1 - the sum of the p-values of the other two classes. The class with the highest p-value gets predicted.

classification tree - This is the most interpretative algorithm that can be best used when playing Euchre in real-time. It can be used through the rpart function from the rpart library. The algorithm predicts what call to make based on a few conditions and breaks up the tree into different leafs. The classification decision is based on which classifier has the most training records in a specific leaf.

random forest - This is similar to classification trees, but this algorithm trains on many trees and the result gets averaged together. It can be used though the randomForest function from the randomForest library. Although useless to interpret in real-time, the results of this algorithm can measure variable importance. This is useful to determine which variables directly make a hand good enough to be classified as a 1 or 2.

k nearest neighbors - This algorithm takes a defined number of observations that are most similar to a to-be-predicted observation and uses those similar observations to predict the outcome class of the new observation. The highest number of a specific outcome class found in the defined “most similar” number of observations will be assigned to the new observation. Tiebreakers are randomly done. This is done using the fnn library.

discriminant analysis - This method finds a linear combination of variables that will then be used to separate all the different outcome classes, similar to clustering. An observed record will get classified based on which group it fits closest to. This is done using the mda library.


In addition, logistic regression (used to predict a binary outcome) will be used twice and together to predict the 3 different classes.

two-step logistic regression - This method will use a logistic regression algorithm to classify the dataset as 0s or 1s (pass or bid). From there, the variables classified as 1s will go through a separate logistic regression algorithm to classify the records as 1s or 2s (bid or bid & go alone). The 2 algorithms chosen will be be assessed on how well they perform on their own. Once they are chosen, they will be integrated together to make multiclass predictions. This is done using the stats library to create multiple GLMs, using the binomial family.


Ordinal Regression

Using the orm function from rms, results can be predicted in a few ways. They can be based on which class has the highest p-value. They can also be based on different cutoff values that get manually set for the linear predictor scores. This analysis will measure if cutoffs based on probability are more effective than manually setting cutoff scores.


Approximate the 2 cutoff values

These created cutoff values will be the linear predictor logit scores that will split the prediction classes. The goal is to fit the data so the estimated proportion of data that falls into each class matches the proportion of each class in the training data. This method is fine to use because the high volume of training data observations are similar to any new data the model will analyze.

To approximate the logit cutoff values, the entire training dataset was used. 80% of the data was sampled and the proportion of records that were of class values 0 and 1 were derived for that sampled dataset. The first cutoff value was derived by taking the quantile of the proportion of records classified as 0 for the linear predictor scores. The second cutoff value was derived by taking the quantile of the proportion of records classified as 0 or 1 for the linear predictor scores.

For instance, if 30% of the random data was class 0 and 60% of the random data was class 1, the estimated cutoffs would be the lp scores at the 30th percentile and the 90th percentile.

This process gets repeated 100 times so 100 values get assigned for each cutoff score iteration. The final cutoff scores are the averages of each iteration.

These values will be applied to all the orm models created because the lp distributions will be similar and this is the most convenient way of making cutoffs.


Create cutoff estimates that best fit the data

cutoff_1_list = c()
cutoff_2_list = c()
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_down1), nrow(euchre_turn_down1) * 0.80)
  euchre_turn_down_train_sample = euchre_turn_down1[sample_index, ]
  full.orm = orm(correct_call ~ ., data = euchre_turn_down_train_sample)
  pass_proportion = nrow(euchre_turn_down_train_sample[euchre_turn_down_train_sample$correct_call==0,])/nrow(euchre_turn_down_train_sample)
  order_trump_proportion = nrow(euchre_turn_down_train_sample[euchre_turn_down_train_sample$correct_call==1,])/nrow(euchre_turn_down_train_sample)
  y = predict(full.orm)
  sample_cutoff_1 = quantile(y, pass_proportion)
  sample_cutoff_2 = quantile(y, pass_proportion+order_trump_proportion)
  cutoff_1_list = append(cutoff_1_list, sample_cutoff_1)
  cutoff_2_list = append(cutoff_2_list, sample_cutoff_2)
}
cutoff_1 = mean(cutoff_1_list)
cutoff_2 = mean(cutoff_2_list)


Ordinal Regression with all predictors

(full.orm = orm(correct_call ~ ., data = euchre_turn_down_train))
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = euchre_turn_down_train)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    40933.23    R2                  0.480    rho     0.615    
##  0          33135    d.f.             20    R2(20,80000)        0.400                     
##  1          41012    Pr(> chi2)  <0.0001    R2(20,63506)        0.475                     
##  2           5853    Score chi2 37175.63    |Pr(Y>=median)-0.5| 0.274                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        2                                                                         
## max |deriv| 7e-05                                                                         
## 
##                                 Coef    S.E.       Wald Z Pr(>|Z|)
## y>=1                            -4.6625 39247.4678   0.00 0.9999  
## y>=2                            -8.9756 39247.4678   0.00 0.9998  
## Right_Bower                      4.4236  7849.4936   0.00 0.9996  
## Left_Bower                       4.0601  7849.4936   0.00 0.9996  
## TAce                             3.5862  7849.4936   0.00 0.9996  
## TKing                            3.3355  7849.4936   0.00 0.9997  
## TQueen                           3.2291  7849.4936   0.00 0.9997  
## TTen                             3.1354  7849.4936   0.00 0.9997  
## TNine                            3.0097  7849.4936   0.00 0.9997  
## Ace_count                        1.2911  7849.4936   0.00 0.9999  
## King_count                       0.4047  7849.4936   0.00 1.0000  
## Queen_count                      0.1293  7849.4936   0.00 1.0000  
## Jack_count                      -0.0430  7849.4936   0.00 1.0000  
## Ten_count                       -0.0118  7849.4936   0.00 1.0000  
## Nine_count                      -0.0022  7849.4936   0.00 1.0000  
## Count_Trump                     -1.5316     0.0000   -Inf <0.0001 
## Non_Trump_Suit_Count            -0.5710     0.0165 -34.65 <0.0001 
## Non_Trump_Turned_Down_Color_Ind  0.0970     0.0169   5.72 <0.0001 
## Dealer                           0.0029     0.0000    Inf <0.0001 
## Strongest_Card                   0.0779     0.0171   4.55 <0.0001 
## Broad_Start_Order=Middle        -0.0687     0.0197  -3.49 0.0005  
## Broad_Start_Order=Last           0.0824     0.0227   3.64 0.0003

It is interesting to see that the coefficient for “Count_Trump” is negative. Since all the predictors are present, some coefficient values could be inaccurate. In order to fix this, let’s build orm models with less predictors.


Ordinal Regression with top performing individual predictors

The first reduced-variable orm model will be built based on the top variables that perform the best individually, based on chi-squared score. All of the well-performing variables will be included in this new model.

top_ind_predictors = data.frame()
for(i in 1:19){
  df = euchre_turn_down_train[,c(i,20)]
  (one.orm = orm(correct_call ~ ., data = df))
  Score_chi2 = one.orm$stats[8]
  var_name = colnames(df[1])
  table = data.frame(var_name, Score_chi2)
  top_ind_predictors = rbind(top_ind_predictors, table)
}
rm(table)

top_ind_predictors = top_ind_predictors %>%
  arrange(desc(Score_chi2))

top_ind_predictors %>%
  kbl() %>%
  kable_styling()
var_name Score_chi2
Score13 Count_Trump 22688.52728
Score17 Strongest_Card 11417.13940
Score14 Non_Trump_Suit_Count 11410.31920
Score Right_Bower 6840.32410
Score12 Nine_count 4918.86022
Score11 Ten_count 4848.71872
Score1 Left_Bower 4135.71407
Score9 Queen_count 3307.32506
Score7 Ace_count 2893.31754
Score8 King_count 727.15353
Score5 TTen 601.27662
Score4 TQueen 521.65101
Score6 TNine 462.71515
Score10 Jack_count 192.99151
Score3 TKing 109.20604
Score18 Broad_Start_Order 35.91298
Score16 Dealer 31.44481
Score2 TAce 31.15586
Score15 Non_Trump_Turned_Down_Color_Ind 16.29697

The top 9 predictors (Score_chi2 > 2000) will be used to build this model since there is a good cutoff between the chi-squared score of the 9th best predictor and the 10th best predictor.

best_predictors = top_ind_predictors[1:9,]$var_name
df = euchre_turn_down_train %>%
  select(matches(best_predictors), 'correct_call')
(top.pred.orm = orm(correct_call~., data = df))
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    39886.55    R2                  0.471    rho     0.609    
##  0          33135    d.f.              9    R2(9,80000)         0.393                     
##  1          41012    Pr(> chi2)  <0.0001    R2(9,63506)         0.466                     
##  2           5853    Score chi2 36038.61    |Pr(Y>=median)-0.5| 0.273                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        2                                                                         
## max |deriv| 3e-05                                                                         
## 
##                      Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                 -4.5040 0.1848 -24.37 <0.0001 
## y>=2                 -8.7346 0.1861 -46.93 <0.0001 
## Count_Trump           1.3867 0.0174  79.60 <0.0001 
## Strongest_Card        0.2147 0.0156  13.73 <0.0001 
## Non_Trump_Suit_Count -0.5283 0.0163 -32.35 <0.0001 
## Right_Bower           0.7349 0.0375  19.58 <0.0001 
## Nine_count           -0.2903 0.0159 -18.24 <0.0001 
## Ten_count            -0.3032 0.0159 -19.04 <0.0001 
## Left_Bower            0.5681 0.0248  22.90 <0.0001 
## Queen_count          -0.1627 0.0159 -10.25 <0.0001 
## Ace_count             0.9898 0.0169  58.64 <0.0001

These coefficients make a lot more sense.


BIC stepwise regression

It would also be interesting to look at the orm models that are optimized for BIC scoring because BIC measures model performance, but also penalizes models for being too complex. Forward steps (where a new variable is added per iteration to improve BIC the most) and backward steps (where a variable is removed per iteration to improve BIC) will be tested to create two additional orm models.


Create forward regression model

tot_pred = top_ind_predictors[,1]
not_used_pred_list = (top_ind_predictors[,1])
used_pred_list = top_ind_predictors[0,1]
stop_ind = 0

for(i in 1:(nrow(top_ind_predictors))){
  for(var in not_used_pred_list){
    df = euchre_turn_down_train %>%
      select(matches(used_pred_list), matches(var), 'correct_call')
    fwd_orm = orm(correct_call~., data = df)
    
    if(!exists('use_var')){
      use_var = var
    }
    if(!exists('Best.fwd.orm')){
      Best.fwd.orm = fwd_orm
    }
    if(!exists('Best.fwd.orm.iter')){
      Best.fwd.orm.iter = fwd_orm
    }
    if(BIC(fwd_orm)<BIC(Best.fwd.orm.iter)){
      Best.fwd.orm.iter = fwd_orm
      use_var = var
    }
  }
  used_pred_list = c(used_pred_list, use_var)
  not_used_pred_list = tot_pred[!tot_pred %in% used_pred_list]
  if(BIC(Best.fwd.orm.iter)<BIC(Best.fwd.orm)){
    Best.fwd.orm = Best.fwd.orm.iter
    stop_ind = 0
  } else{
    stop_ind = stop_ind + 1
  }
  rm(Best.fwd.orm.iter)
  rm(use_var)
  rm(df)
  if(stop_ind == 3){
    break
  }
}
rm(var)
rm(fwd_orm)
rm(tot_pred)
rm(not_used_pred_list)
rm(used_pred_list)
rm(stop_ind)
Best.fwd.orm
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    40930.55    R2                  0.480    rho     0.615    
##  0          33135    d.f.             15    R2(15,80000)        0.400                     
##  1          41012    Pr(> chi2)  <0.0001    R2(15,63506)        0.475                     
##  2           5853    Score chi2 37152.05    |Pr(Y>=median)-0.5| 0.274                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        2                                                                         
## max |deriv| 7e-05                                                                         
## 
##                                 Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                            -4.7073 0.1910 -24.65 <0.0001 
## y>=2                            -9.0196 0.1916 -47.07 <0.0001 
## Count_Trump                      1.7084 0.0223  76.57 <0.0001 
## Ace_count                        1.3025 0.0159  81.86 <0.0001 
## Strongest_Card                   0.0766 0.0171   4.48 <0.0001 
## Non_Trump_Suit_Count            -0.5688 0.0164 -34.64 <0.0001 
## King_count                       0.4161 0.0145  28.64 <0.0001 
## TNine                           -0.2197 0.0239  -9.21 <0.0001 
## Left_Bower                       0.8273 0.0323  25.60 <0.0001 
## Right_Bower                      1.1929 0.0467  25.52 <0.0001 
## TAce                             0.3579 0.0251  14.24 <0.0001 
## Queen_count                      0.1407 0.0144   9.80 <0.0001 
## TKing                            0.1067 0.0239   4.47 <0.0001 
## Broad_Start_Order=Middle        -0.0687 0.0197  -3.49 0.0005  
## Broad_Start_Order=Last           0.0854 0.0227   3.77 0.0002  
## Non_Trump_Turned_Down_Color_Ind  0.1002 0.0168   5.96 <0.0001 
## TTen                            -0.0945 0.0238  -3.98 <0.0001

It looks like 14 of 19 predictors were used to build this model. The coefficients look pretty good with no apparent surprises.


Create backward regression model

tot_pred = top_ind_predictors[,1]
not_used_pred_list = top_ind_predictors[0,1]
used_pred_list = tot_pred
stop_ind = 0

for(i in 1:nrow(top_ind_predictors)-2){
  for(var in used_pred_list){
    df = euchre_turn_down_train %>%
      select(matches(used_pred_list), 'correct_call') %>%
      select(-matches(var))
    bwd_orm = orm(correct_call~., data = df)
    if(!exists('use_var')){
      use_var = var
    }
    if(!exists('Best.bwd.orm')){
      Best.bwd.orm = bwd_orm
    }
    if(!exists('Best.bwd.orm.iter')){
      Best.bwd.orm.iter = bwd_orm
    }
    if(BIC(bwd_orm)<BIC(Best.bwd.orm.iter)){
      Best.bwd.orm.iter = bwd_orm
      use_var = var
    }
  }
  not_used_pred_list = c(not_used_pred_list, use_var)
  used_pred_list = tot_pred[!tot_pred %in% not_used_pred_list]
  if(BIC(Best.bwd.orm.iter)<=BIC(Best.bwd.orm)){
    Best.bwd.orm = Best.bwd.orm.iter
    stop_ind = 0
  } else {
    stop_ind = stop_ind + 1
  }
  rm(Best.bwd.orm.iter)
  rm(use_var)
  if(stop_ind == 3){
    break
  }
}
rm(var)
rm(df)
rm(bwd_orm)
rm(tot_pred)
rm(not_used_pred_list)
rm(used_pred_list)
rm(stop_ind)
Best.bwd.orm
## Logistic (Proportional Odds) Ordinal Regression Model
## 
## orm(formula = correct_call ~ ., data = df)
## 
##                         Model Likelihood               Discrimination    Rank Discrim.    
##                               Ratio Test                      Indexes          Indexes    
## Obs         80000    LR chi2    40933.23    R2                  0.480    rho     0.615    
##  0          33135    d.f.             17    R2(17,80000)        0.400                     
##  1          41012    Pr(> chi2)  <0.0001    R2(17,63506)        0.475                     
##  2           5853    Score chi2 37175.63    |Pr(Y>=median)-0.5| 0.274                     
## Distinct Y      3    Pr(> chi2)  <0.0001                                                  
## Median Y        2                                                                         
## max |deriv| 7e-05                                                                         
## 
##                                 Coef    S.E.   Wald Z Pr(>|Z|)
## y>=1                             9.7972 0.3854  25.42 <0.0001 
## y>=2                             5.4841 0.3813  14.38 <0.0001 
## Strongest_Card                   0.0779 0.0171   4.55 <0.0001 
## Non_Trump_Suit_Count            -0.5710 0.0165 -34.65 <0.0001 
## Nine_count                      -2.8941 0.0469 -61.70 <0.0001 
## Ten_count                       -2.9038 0.0468 -62.02 <0.0001 
## Left_Bower                      -0.3635 0.0306 -11.88 <0.0001 
## Queen_count                     -2.7627 0.0465 -59.45 <0.0001 
## Ace_count                       -1.6009 0.0455 -35.15 <0.0001 
## King_count                      -2.4872 0.0458 -54.25 <0.0001 
## TTen                            -1.2882 0.0467 -27.61 <0.0001 
## TQueen                          -1.1945 0.0468 -25.55 <0.0001 
## TNine                           -1.4139 0.0467 -30.26 <0.0001 
## Jack_count                      -2.9349 0.0511 -57.49 <0.0001 
## TKing                           -1.0881 0.0453 -24.03 <0.0001 
## Broad_Start_Order=Middle        -0.0687 0.0197  -3.49 0.0005  
## Broad_Start_Order=Last           0.0853 0.0227   3.77 0.0002  
## TAce                            -0.8374 0.0381 -21.97 <0.0001 
## Non_Trump_Turned_Down_Color_Ind  0.0970 0.0169   5.72 <0.0001

It looks like a presumably important variable (“Count_Trump”) was removed. It is also surprising to see that “Ace_count” got such a strong negative coefficient. 16 of 19 predictors were used.


Assess performance on all the created orm models

The best performing model will be selected as the represented orm model.

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls adj_loss_cost adj_macro_f1 adj_accuracy adj_p_euchred_calls adj_p_missed_calls BIC AIC num_predictors
full.orm 0.51450 0.8352157 0.4530990 0.69365 0.2244714 0.2864837 0.49440 0.4831168 0.68535 0.2155363 0.2996114 103145.3 102940.9 19
Best.fwd.orm 0.51395 0.8352204 0.4534260 0.69395 0.2243527 0.2861724 0.49445 0.4830315 0.68545 0.2154513 0.2994899 103091.5 102933.6 14
Best.bwd.orm 0.51450 0.8352157 0.4530990 0.69365 0.2244714 0.2864837 0.49440 0.4831168 0.68535 0.2155363 0.2996114 103111.4 102934.9 16
top.pred.orm 0.52780 0.8320577 0.4404326 0.68730 0.2343537 0.2869059 0.50200 0.4813592 0.68265 0.2214159 0.3031047 104067.8 103965.6 9

The forward ordinal regression model will be used. The results are good and that model uses less variables. Also, the variables used and their respective coefficients make the most logical sense from an intuitive standpoint. The cutoffs will be determined by the simulated calculated cutoffs, not the traditional way, because it yields a better F1 and loss cost score.

euchre_turn_down.orm = Best.fwd.orm


Multinomial Regression

Create full multinomial model

(full.multinom = multinom(correct_call ~ ., data = euchre_turn_down_train))
## # weights:  66 (42 variable)
## initial  value 87888.983093 
## iter  10 value 62081.871307
## iter  20 value 60482.291652
## iter  30 value 56148.074673
## iter  40 value 50976.887926
## iter  50 value 50459.810208
## iter  50 value 50459.810057
## iter  50 value 50459.810057
## final  value 50459.810057 
## converged
## Call:
## multinom(formula = correct_call ~ ., data = euchre_turn_down_train)
## 
## Coefficients:
##   (Intercept) Right_Bower Left_Bower      TAce       TKing     TQueen
## 1  -0.5226135    1.021231  0.6370279 0.1817394 -0.08832868 -0.1826430
## 2  -1.1939440    2.067370  1.2817992 0.2882629 -0.20203414 -0.4177617
##         TTen      TNine Ace_count King_count Queen_count Jack_count  Ten_count
## 1 -0.2806741 -0.4029804  0.216046 -0.5474272  -0.7347954 -0.7828974 -0.8296594
## 2 -0.5460320 -0.7557642  1.073741 -0.7499333  -1.6241853 -2.0425492 -2.1729436
##   Nine_count Count_Trump Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind
## 1 -0.8197062   0.8853721           -0.3046495                      0.08720317
## 2 -2.1696892   1.7158397           -1.8839881                      0.21914240
##        Dealer Strongest_Card Broad_Start_OrderMiddle Broad_Start_OrderLast
## 1 0.070266957     0.06734436             -0.01410918           0.070266957
## 2 0.000285261     0.01414089             -0.25032447           0.000285261
## 
## Residual Deviance: 100919.6 
## AIC: 100991.6

This is more difficult to interpret compared to the orm model. However, the coefficients for the variables can be interpreted the same. Unlike the orm model, there are separate coefficients for each predicted class, except one. These scores are converted to p-values for each class that add to 1.


It is interesting that most of the positive and negative coefficients for the “2” class generally go in the same direction, but have a larger magnitude compared to the “1” class for most variables.


All these coefficients seem to make reasonable sense.


Regression with top 9 individual orm predictors

For curiosity, let’s look at how well a multinomial model performs with the 9 best predictors from the individual orm models.

df = euchre_turn_down_train %>%
  select(matches(best_predictors), 'correct_call')
(top.pred.multinom = multinom(correct_call~., data = df))
## # weights:  33 (20 variable)
## initial  value 87888.983093 
## iter  10 value 61070.204021
## iter  20 value 55572.273583
## iter  30 value 51058.012394
## final  value 51057.913168 
## converged
## Call:
## multinom(formula = correct_call ~ ., data = df)
## 
## Coefficients:
##   (Intercept) Count_Trump Strongest_Card Non_Trump_Suit_Count Right_Bower
## 1   -5.319032    1.268082      0.2592896           -0.2731482   0.5967809
## 2   -8.818741    2.412294      0.2241454           -1.7753448   1.6757864
##   Nine_count  Ten_count Left_Bower Queen_count Ace_count
## 1 -0.2136365 -0.2272708  0.4559736  -0.1338117 0.8112436
## 2 -0.9916218 -0.9972707  1.1839681  -0.4571094 2.1821640
## 
## Residual Deviance: 102115.8 
## AIC: 102155.8

These coefficients seem to make reasonable sense.


Regression with step models

# Create null multinomial model as baseline 
null.multinom = multinom(correct_call ~ 1, data = euchre_turn_down_train)

fwd.multinom = step(null.multinom, scope = list(lower = null.multinom, upper = full.multinom), direction = "forward", k = log(nrow(euchre_turn_down_train)))

bwd.multinom = step(full.multinom, direction = "backward", k = log(nrow(euchre_turn_down_train)))


View forward multinomial model

## Call:
## multinom(formula = correct_call ~ Count_Trump + Ace_count + Strongest_Card + 
##     Non_Trump_Suit_Count + King_count + Left_Bower + Right_Bower + 
##     TAce + Queen_count + TNine + Broad_Start_Order + TKing + 
##     Non_Trump_Turned_Down_Color_Ind + TTen, data = euchre_turn_down_train)
## 
## Coefficients:
##   (Intercept) Count_Trump Ace_count Strongest_Card Non_Trump_Suit_Count
## 1   -4.623885    1.524013  1.036525     0.06746622           -0.3074876
## 2  -11.960990    3.448159  3.220185     0.01642931           -1.8865848
##   King_count Left_Bower Right_Bower      TAce Queen_count      TNine
## 1  0.2731963  0.8273509    1.209835 0.3642325  0.08564733 -0.2199302
## 2  1.3968677  1.7105617    2.490292 0.7053151  0.52153657 -0.3379475
##   Broad_Start_OrderMiddle Broad_Start_OrderLast      TKing
## 1              -0.0142728          0.1405805588 0.09400719
## 2              -0.2504507          0.0003329501 0.21519192
##   Non_Trump_Turned_Down_Color_Ind       TTen
## 1                      0.08248926 -0.0971728
## 2                      0.21289843 -0.1270702
## 
## Residual Deviance: 100925.5 
## AIC: 100989.5

It looks like 14 of the 19 variables are being used in this model. The coefficients look good.


View backward multinomial model

## Call:
## multinom(formula = correct_call ~ Right_Bower + Left_Bower + 
##     TAce + TKing + TQueen + TTen + TNine + Ace_count + King_count + 
##     Queen_count + Jack_count + Ten_count + Nine_count + Count_Trump + 
##     Non_Trump_Suit_Count + Non_Trump_Turned_Down_Color_Ind + 
##     Dealer + Strongest_Card + Broad_Start_Order, data = euchre_turn_down_train)
## 
## Coefficients:
##   (Intercept) Right_Bower Left_Bower      TAce       TKing     TQueen
## 1  -0.5226135    1.021231  0.6370279 0.1817394 -0.08832868 -0.1826430
## 2  -1.1939440    2.067370  1.2817992 0.2882629 -0.20203414 -0.4177617
##         TTen      TNine Ace_count King_count Queen_count Jack_count  Ten_count
## 1 -0.2806741 -0.4029804  0.216046 -0.5474272  -0.7347954 -0.7828974 -0.8296594
## 2 -0.5460320 -0.7557642  1.073741 -0.7499333  -1.6241853 -2.0425492 -2.1729436
##   Nine_count Count_Trump Non_Trump_Suit_Count Non_Trump_Turned_Down_Color_Ind
## 1 -0.8197062   0.8853721           -0.3046495                      0.08720317
## 2 -2.1696892   1.7158397           -1.8839881                      0.21914240
##        Dealer Strongest_Card Broad_Start_OrderMiddle Broad_Start_OrderLast
## 1 0.070266957     0.06734436             -0.01410918           0.070266957
## 2 0.000285261     0.01414089             -0.25032447           0.000285261
## 
## Residual Deviance: 100919.6 
## AIC: 100991.6

It looks like all 19 variables are being used in this model, so it is the same as the full model.


View performance of all multinomial models

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls BIC AIC
full.multinom 0.50520 0.8433329 0.4799202 0.69675 0.2264656 0.2813356 101326.1 100991.6
fwd.multinom 0.50550 0.8433110 0.4792450 0.69640 0.2261209 0.2821280 101286.8 100989.5
bwd.multinom 0.50520 0.8433329 0.4799202 0.69675 0.2264656 0.2813356 101326.1 100991.6
top.pred.multinom 0.51745 0.8388128 0.4710149 0.68910 0.2367353 0.2829452 102341.6 102155.8

The forward step multinomial model will be used since it contains less variables and yields nearly similar results to the full model.

euchre_turn_down.multinom = fwd.multinom


Decision Trees

Decision trees are the most interesting because unlike all the created models provided, decision trees are very interpretative and provide a clear action of what call to make based on a list of conditions.

It is the most practical model to use in a real-life Euchre game.


Create initial decision tree

default.large.tree = rpart(correct_call ~., data = euchre_turn_down_train, method = "class", control = rpart.control(cp = .001))
rpart.plot::rpart.plot(default.large.tree)

Prune the tree to make an easier to follow model

printcp(default.large.tree)
## 
## Classification tree:
## rpart(formula = correct_call ~ ., data = euchre_turn_down_train, 
##     method = "class", control = rpart.control(cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] Ace_count            Broad_Start_Order    Count_Trump         
## [4] King_count           Left_Bower           Non_Trump_Suit_Count
## [7] Right_Bower          Strongest_Card       TAce                
## 
## Root node error: 38988/80000 = 0.48735
## 
## n= 80000 
## 
##           CP nsplit rel error  xerror      xstd
## 1  0.1480969      0   1.00000 1.00000 0.0036261
## 2  0.1363753      1   0.85190 0.85190 0.0035747
## 3  0.0158254      2   0.71553 0.71553 0.0034573
## 4  0.0129399      3   0.69970 0.69970 0.0034390
## 5  0.0088489      5   0.67382 0.67382 0.0034070
## 6  0.0072843      6   0.66497 0.66508 0.0033955
## 7  0.0045997      7   0.65769 0.65769 0.0033856
## 8  0.0027701     10   0.64389 0.64779 0.0033719
## 9  0.0026034     11   0.64112 0.64202 0.0033637
## 10 0.0018980     13   0.63591 0.63591 0.0033549
## 11 0.0018724     15   0.63212 0.63484 0.0033534
## 12 0.0016415     16   0.63025 0.63363 0.0033516
## 13 0.0012889     17   0.62860 0.63007 0.0033464
## 14 0.0010000     22   0.62214 0.62409 0.0033375
default.pruned.tree = rpart(correct_call ~., data = euchre_turn_down_train, method = "class", control = rpart.control(cp = 0.0026034)) # nsplit = 11
rpart.plot::rpart.plot(default.pruned.tree)

This tree gives a great list of conditions that should be satisfied when making a bid for presumably the best suit in hand (excluding the turned down suit).


The player should PASS if:

There are 0 trump cards in hand

OR

There is 1 trump card in hand
AND
That trump card is weaker than the Left Bower
AND/OR
There are no non-trump aces in hand

OR

There are 2 trump cards in hand
AND
The strongest card in hand is weaker than the Left Bower
AND
There are no non-trump aces in hand

OR

There are 2 trump cards in hand
AND
The strongest card in hand is weaker than the Left Bower
AND
There is at least 1 non-trump ace in hand
AND
The trump ace is NOT in hand

OR

There are 2 trump cards in hand
AND
The strongest card is at least as strong as the Left Bower
AND
There are no non-trump aces in hand
AND
All 3 non-trump suits are in hand


The player should order trump if:

There are 1-2 trump cards in hand
AND
The strongest card in hand is at least as strong as the Left Bower
AND
There is at least one non-trump ace in hand

OR

There are 2 trump cards in hand
AND
The strongest card in hand is weaker than the Left Bower
AND
There is at least 1 non-trump ace in hand
AND
The trump ace is in hand

OR

There are 2 trump cards in hand
AND
The strongest card in hand is at least as strong as the Left Bower
AND
There are no non-trump aces in hand
AND
All 3 distinct non-trump suits are NOT present in hand

OR

There are 3 trump cards in hand

OR

There are 4-5 trump cards in hand
AND
The Right Bower is NOT present in hand
AND
There are no non-trump aces in hand


The player should go alone if:

There are 4-5 trump cards in hand
AND
The Right Bower is present in hand
AND/OR
There is at least 1 non-trump ace in hand


Create Weighted Tree

To create this weighted tree, loss costs will be assigned to each misclassification category. These loss cost values get calculated by using the default tree model to predict what call should be made. The difference between the points won off the correct call and the points won off the predicted call are calculated as the loss cost. The average loss cost for each misclassification category (predicted 0s actual 1s), (predicted 1s actual 2s), etc. get stored as the parameter values used when creating the new tree.

# Create average loss cost values for each misclassification category
build_results(model = default.pruned.tree)
find_loss_cost(results_default.pruned.tree, 'model_bid', return_avg_cost = FALSE, save_ind_costs = TRUE)
weighted.large.tree = rpart(correct_call ~., data = euchre_turn_down_train, method = "class",
                      parms = list(loss = matrix(c(0, p0_a1, p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), control = rpart.control(cp = .001))
rpart.plot::rpart.plot(weighted.large.tree)

Prune the tree to make an easier to follow model

printcp(weighted.large.tree)
## 
## Classification tree:
## rpart(formula = correct_call ~ ., data = euchre_turn_down_train, 
##     method = "class", parms = list(loss = matrix(c(0, p0_a1, 
##         p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), 
##     control = rpart.control(cp = 0.001))
## 
## Variables actually used in tree construction:
## [1] Ace_count            Count_Trump          Non_Trump_Suit_Count
## [4] Right_Bower          Strongest_Card      
## 
## Root node error: 67663/80000 = 0.84579
## 
## n= 80000 
## 
##          CP nsplit rel error  xerror      xstd
## 1 0.3564977      0   1.00000 1.30673 0.0038871
## 2 0.0334991      1   0.64350 0.79534 0.0035425
## 3 0.0271022      3   0.57650 0.57763 0.0030401
## 4 0.0133086      4   0.54940 0.58923 0.0031200
## 5 0.0022809      5   0.53609 0.60408 0.0031878
## 6 0.0018598     11   0.52127 0.62335 0.0032615
## 7 0.0011662     12   0.51941 0.61661 0.0032448
## 8 0.0010000     15   0.51591 0.60619 0.0032156
weighted.pruned.tree = rpart(correct_call ~., data = euchre_turn_down_train, method = "class",
                      parms = list(loss = matrix(c(0, p0_a1, p0_a2, p1_a0, 0, p1_a2, p2_a0, p2_a1, 0), nrow = 3)), control = rpart.control(cp = 0.0011662)) # nsplit = 12
rpart.plot::rpart.plot(weighted.pruned.tree)

This tree looks to be a little more complex than the pruned default tree. In fact, there only seems to be one additional split.

This tree also gives a great list of conditions that should be satisfied when making a bid for the presumed best suit in hand.


The player should PASS if:

There are 0 trump cards in hand

OR

There is 1 trump card in hand
AND
That trump card is at least as strong as the Left Bower
AND
There is exactly 1 non-trump ace in hand

OR

There are 1-2 trump cards in hand
AND
The strongest card in hand is weaker than the Left Bower

OR

There are 1-2 trump cards in hand
AND
The strongest card in hand is at least as strong as the Left Bower
AND
There are no non-trump aces in hand

OR

There are 3 trump cards in hand
AND
There are no non-trump aces in hand
AND
The strongest card in hand is weaker than the trump ace


The player should order trump if:

There is 1 trump card in hand
AND
That trump card is at least as strong as the Left Bower
AND
There are at least 2 non-trump aces in hand

OR

There are 2 trump cards in hand
AND
The strongest card is at least as strong as the Left Bower
AND
There is at least 1 non-trump ace in hand

OR

There are 3 trump cards in hand
AND
There are no non-trump aces in hand
AND
The strongest card in hand is at least as strong as the trump ace

OR

There are 3 trump cards in hand
AND
There is at least one non-trump ace in hand
AND
There are less than 2 distinct non-trump suits in hand
AND
The Right Bower is NOT present in hand

OR

There are 3 trump cards in hand
AND
There are 2 distinct non-trump suits in hand
AND
There is exactly 1 non-trump ace in hand


The player should go alone if:

There are 3 trump cards in hand
AND
There is at least 1 non-trump ace in hand
AND
There are less than 2 distinct non-trump suits in hand
AND
The Right Bower is present in hand

OR

There are 3 trump cards in hand
AND
There are 2 non-trump aces in hand

OR

There are 4-5 trump cards in hand


These trees provide good recommendations for what decisions to make, but they are by no means perfect. There are definitely exceptions that just happen to get grouped in a specific leaf.

It is also important to note that for the pink leaf scenarios, bidding trump is recommended in dire situations (when the opposing team has 9 points and only needs one point left to score). A little less than half of the records in the pink leafs are successful bids, so it would not hurt to bid trump in these scenarios. It would not be a good idea give a team that only needs one point an opportunity to bid trump.

On the flip side, the green leaf scenarios should only “order trump alone” if the player’s team has 7 points or less. This is because if a team has 8 points, both players can play the round and can win 2 points, which would win the game.

Finally, not all the leafs should necessarily be followed all the time. For example, if there are 4 weak trump cards in hand, maybe it would make more sense to order trump NOT alone instead of going alone.


Evaluate results for the decision trees

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls
default.large.tree 0.52210 0.8177085 0.4753822 0.69420 0.2375942 0.2735407
default.pruned.tree 0.55445 0.7953158 0.4441396 0.68235 0.2665998 0.2414307
weighted.large.tree 0.50860 0.8164407 0.4756771 0.67410 0.2188739 0.3273610
weighted.pruned.tree 0.49890 0.7977183 0.4712634 0.66840 0.1818368 0.3611791

The weighted pruned tree yields a much better loss cost compared to the default pruned tree. Although the weighted pruned tree has a leaf where a player could make a bid with 1 trump card, the additional logic for that leaf makes it reliable to do so. Furthermore, the sorting process is also more lenient on classifying 2s and there’s a lower risk of getting euchred while using this model. Therefore, the weighted pruned tree model will be used.

euchre_turn_down.tree = weighted.pruned.tree


Random Forest

Ideally, this model can be used to determine which predictors are the most important and if the predictions align with the other created models.

A random forest uses many decision trees and combines the outputs to reach a singular result.

euchre_turn_down.rf = randomForest(correct_call ~ ., data = euchre_turn_down_train, ntree = 300, mtry = 5, nodesize = 1, importance = TRUE)
varImpPlot(euchre_turn_down.rf, type = 2)

The random forest importance plot shows that the top 6 most important predictors are the count of trump cards, the strongest card value, the count of distinct non-trump suits, the count of non-trump aces, whether the Right Bower is present, and the order in which the player will be starting in. This is consistent with the predictors selected in other models.


Show random forest model performance

model_name loss_cost macro_auc macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.rf 0.5285 0.8178694 0.4677295 0.68895 0.2492563 0.2691247

From a glance, the result metrics don’t seem to be vastly different than the other models. Therefore, it can be fair to assume that the insights found from the random forest model can be applied to the other models.


K Nearest Neighbors

In order to build K nearest neighbors models, all the data needs to be numerical. In addition, it is important that all variables are normalized. This way, all variables carry the same amount of weight when making predictions.

# Create dummy variables for categorical fields
euchre_turn_down_train_w_dummies = dummy_cols(euchre_turn_down_train, select_columns = "Broad_Start_Order", remove_selected_columns = TRUE)

euchre_turn_down_test_w_dummies = dummy_cols(euchre_turn_down_test, select_columns = "Broad_Start_Order", remove_selected_columns = TRUE)

# Normalize all predictor variables so each value is between zero and one
euchre_turn_down_train_w_dummies.norm = euchre_turn_down_train_w_dummies
euchre_turn_down_test_w_dummies.norm = euchre_turn_down_test_w_dummies

cols = colnames(euchre_turn_down_train_w_dummies.norm[,-19])
for(i in cols){
  euchre_turn_down_train_w_dummies.norm[[i]] = (euchre_turn_down_train_w_dummies.norm[[i]] - min(euchre_turn_down_train_w_dummies[[i]]))/(max(euchre_turn_down_train_w_dummies[[i]]) - min(euchre_turn_down_train_w_dummies[[i]]))
  euchre_turn_down_test_w_dummies.norm[[i]] = (euchre_turn_down_test_w_dummies.norm[[i]] - min(euchre_turn_down_train_w_dummies[[i]]))/(max(euchre_turn_down_train_w_dummies[[i]]) - min(euchre_turn_down_train_w_dummies[[i]]))
}


K Nearest Neighbors with all variables

# Create dataframe with unique observations to prevent excess number of ties
distinct_euchre_turn_down_train_w_dummies.norm = unique(euchre_turn_down_train_w_dummies.norm)

lg_loss_cost.df = data.frame(k=seq(26, 50, 1),
                          loss_cost = rep(0, 25))
# Look for which K value performs the best. Look for K values between 26-50 to use
for(i in 1:25){
  knn.pred = knn(distinct_euchre_turn_down_train_w_dummies.norm[,-19],
                 euchre_turn_down_test_w_dummies.norm[,-19],
                 cl = distinct_euchre_turn_down_train_w_dummies.norm$correct_call,
                 k=i+25)
  result.iter = results_test
  result.iter$model_bid = knn.pred
  lg_loss_cost.df[i,2] = find_loss_cost(result.iter, 'model_bid') # Record loss cost of model when testing for each K
}
# Use the average lowest loss cost value to get the best K value
best_full_k = lg_loss_cost.df %>%
  mutate(min_score = min(loss_cost)) %>%
  filter(loss_cost == min_score) %>%
  mutate(min_k = min(k)) %>%
  filter(k == min_k)
best_full_k = best_full_k$k

# Build KNN model with the best performing K value
euchre_turn_down.full.knn = knn(train = distinct_euchre_turn_down_train_w_dummies.norm[,-19],
                              test = euchre_turn_down_test_w_dummies.norm[,-19],
                              cl = distinct_euchre_turn_down_train_w_dummies.norm$correct_call,
                              k=best_full_k)

paste0('The best K for the full KNN model is ', best_full_k)
## [1] "The best K for the full KNN model is 42"


K Nearest Neighbors with the 6 most important variables from the random forest model

# Use the top 6 rf variables
top_rf_var = c('Count_Trump', 'Strongest_Card', 'Ace_count', 'Non_Trump_Suit_Count', 'Right_Bower', 'Broad_Start_Order')
euchre_turn_down_train_w_dummies.norm_subset = euchre_turn_down_train_w_dummies.norm %>%
  select(matches(top_rf_var), 'correct_call')
euchre_turn_down_test_w_dummies.norm_subset = euchre_turn_down_test_w_dummies.norm %>%
  select(matches(top_rf_var), 'correct_call')

distinct_euchre_turn_down_train_w_dummies.norm_subset = unique(euchre_turn_down_train_w_dummies.norm_subset)

sm_loss_cost.df = data.frame(k=seq(26, 50, 1),
                             loss_cost = rep(0, 25))
# Look for which K value performs the best. Look for K values between 26-50 to use
for(i in 1:25){
  knn.pred = knn(distinct_euchre_turn_down_train_w_dummies.norm_subset[,-9],
                 euchre_turn_down_test_w_dummies.norm_subset[,-9],
                 cl = distinct_euchre_turn_down_train_w_dummies.norm_subset$correct_call,
                 k=i+25)
  result.iter = results_test
  result.iter$model_bid = knn.pred
  sm_loss_cost.df[i,2] = find_loss_cost(result.iter, 'model_bid') # Record loss cost of model when testing for each K
}
# Use the average lowest loss cost value to get the best K value
best_small_k = sm_loss_cost.df %>%
  mutate(min_score = min(loss_cost)) %>%
  filter(loss_cost == min_score) %>%
  mutate(min_k = min(k)) %>%
  filter(k == min_k)
best_small_k = best_small_k$k

# Build KNN model with the best performing K value
euchre_turn_down.small.knn = knn(train = distinct_euchre_turn_down_train_w_dummies.norm_subset[,-9],
                              test = euchre_turn_down_test_w_dummies.norm_subset[,-9],
                              cl = distinct_euchre_turn_down_train_w_dummies.norm_subset$correct_call,
                              k=best_small_k)

paste0('The best K for the small KNN model is ', best_small_k)
## [1] "The best K for the small KNN model is 48"


Evaluate performance of KNN models

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.full.knn 0.5568 0.3778704 0.66720 0.2442944 0.3034674
euchre_turn_down.small.knn 0.6212 0.3381127 0.58585 0.2489328 0.4312650

The full KNN model performs a lot better, so that will be used.

turn_down_best_k = best_full_k


Discriminant Analysis

The two types of discriminant analysis tested will be mixture discriminant analysis (MDA) and flexible discriminant analysis (FDA).

MDA assumes that classes have a Gaussian mixture of sub classes. This means that there could be clusters that belong to the same class that are scattered in multiple different groups. This algorithm does a great job at identifying multiple areas in which a class has a high frequency of observations and predicts based on those patterns.

FDA is similar to linear discriminant analysis, but it allows for non-linear combinations, making predictions more flexible. This is beneficial for variables that have non-linear relationships within each group.

Discriminant analysis will be applied using the top 6 most important random forest predictors. This is for more capable model processing and so no NAs get included in subscripted assignments.

euchre_turn_down_train_subset = euchre_turn_down_train %>%
  select(matches(top_rf_var), 'correct_call')


euchre_turn_down.mda = mda(correct_call ~., data = euchre_turn_down_train_subset)
euchre_turn_down.fda = fda(correct_call ~., data = euchre_turn_down_train_subset)


Evaluate performance of discriminant analysis models

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.mda 0.52920 0.4488105 0.68050 0.2483786 0.2783768
euchre_turn_down.fda 0.53105 0.4611099 0.68645 0.2433362 0.2798668

Although it is close, the FDA model seems to be the better choice due to a better F1 score and better accuracy. Also, the natural distribution of the target variable seems to be a better fit for FDA (there are no scattered clusters because the target variable appears to be ordinal). Therefore, FDA will be used as the discriminant analysis model.

euchre_turn_down.da = euchre_turn_down.fda


Two-Step Logistic Regression

Every model made so far has been a multiclass model. Let’s see how well 2 integrated logistic regression models perform. This is a two-step process where the first logistic regression model will assume a binomial distribution and predict whether an observation is a 0 or 1. The predicted 1s are then filtered and another logistic regression model will assume another binomial distribution and predict whether those observations are 1s or 2s.

Like the orm models, the cutoffs will either be decided by the class with the highest probability or a simulated cutoff value for both classes. Whichever cutoff method yields the better results for each best performing logistic regression model will be used.


Create the first cutoff value

This is based on the distribution of hands classified as zero. Similar to making the orm cutoffs, data is randomly selected and the proportion of records classified as 0s are recorded. The cutoff value assigned will be Nth percentile of the logit made on a logistic regression model with all variables. N represents the proportion of records classified as 0s. The process is repeated 100 times and the average cutoff score is used.

cutoff_1_list_lm = c()
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_down1), nrow(euchre_turn_down1) * 0.80) # Take 80% of data to test
  euchre_turn_down_train_sample = euchre_turn_down1[sample_index, ]
  full.logr1 = glm(correct_call ~ ., data = euchre_turn_down_train_sample, family = binomial)
  pass_proportion = nrow(euchre_turn_down_train_sample[euchre_turn_down_train_sample$correct_call==0,])/nrow(euchre_turn_down_train_sample)
  y = predict(full.logr1, type = 'response')
  sample_cutoff_1_lm = quantile(y, pass_proportion)
  cutoff_1_list_lm = append(cutoff_1_list_lm, sample_cutoff_1_lm)
  
}
cutoff_1_lm = mean(cutoff_1_list_lm)


Create the second cutoff value

Using the 1st full logistic regression model and the 1st cutoff value that was calculated above, the data is filtered so only the records predicted as 1s are used to build the 2nd cutoff value. This 2nd cutoff value is created using the same process as above.

cutoff_2_list_lm = c()
full.logr1 = glm(correct_call ~ ., data = euchre_turn_down1, family = binomial)
euchre_turn_down2 = euchre_turn_down1
euchre_turn_down2$prediction = predict(full.logr1, type = 'response', newdata = euchre_turn_down2)
for(i in 1:100){
  sample_index = sample(nrow(euchre_turn_down2), nrow(euchre_turn_down2) * 0.80)
  euchre_turn_down_train_sample = euchre_turn_down2[sample_index, ]
  euchre_turn_down_train_sample = euchre_turn_down_train_sample[euchre_turn_down_train_sample$prediction >= cutoff_1_lm, ]
  euchre_turn_down_train_sample = subset(euchre_turn_down_train_sample, select=-c(prediction))
  full.logr2 = glm(correct_call ~ ., data = euchre_turn_down_train_sample, family = binomial)
  loner_proportion = nrow(euchre_turn_down_train_sample[euchre_turn_down_train_sample$correct_call==2,])/nrow(euchre_turn_down_train_sample)
  y = predict(full.logr2, type = 'response')
  sample_cutoff_2_lm = quantile(y, loner_proportion)
  cutoff_2_list_lm = append(cutoff_2_list_lm, sample_cutoff_2_lm)
}
cutoff_2_lm = mean(cutoff_2_list_lm)


Create first logistic regression models

These models will be made using forward and backward stepwise regression, using AIC and BIC as the performance metrics.

euchre_turn_down_train1 = euchre_turn_down_train
euchre_turn_down_train1$correct_call1 = ifelse(euchre_turn_down_train1$correct_call == 0, 0, 1) # Assign new variable that converts 2s into 1s

lm1.full = glm(correct_call1 ~. - correct_call, family = binomial, data = euchre_turn_down_train1)

null.lm1 = glm(correct_call1 ~ 1, data = euchre_turn_down_train1)

fwd.lm1.bic = step(null.lm1, scope = list(lower = null.lm1, upper = lm1.full), direction = "forward", k = log(nrow(euchre_turn_down_train1)))

bwd.lm1.bic = step(lm1.full, direction = "backward", k = log(nrow(euchre_turn_down_train1)))

fwd.lm1.aic = step(null.lm1, scope = list(lower = null.lm1, upper = lm1.full), direction = "forward")

bwd.lm1.aic = step(lm1.full, direction = "backward")
Name Accuracy F1_Score Accuracy_adj F1_Score_adj BIC AIC
lm1.full 0.7528 0.7953 0.7502 0.7869 78303.23 78136.02
fwd.lm1.bic 0.7494 0.7949 0.7476 0.7832 83339.94 83191.30
bwd.lm1.bic 0.7524 0.7948 0.7502 0.7870 78276.61 78137.26
fwd.lm1.aic 0.7494 0.7950 0.7473 0.7826 83348.56 83190.63
bwd.lm1.aic 0.7526 0.7949 0.7506 0.7872 78282.26 78133.63

The bwd.lm1.bic model with the default 0.5 cutoff appears to be the best option here due to its decent performance and simplicity (lowest BIC).

bwd.lm1.bic
## 
## Call:  glm(formula = correct_call1 ~ Right_Bower + Left_Bower + TAce + 
##     TKing + TQueen + TTen + TNine + Ace_count + King_count + 
##     Queen_count + Non_Trump_Suit_Count + Non_Trump_Turned_Down_Color_Ind + 
##     Broad_Start_Order, family = binomial, data = euchre_turn_down_train1)
## 
## Coefficients:
##                     (Intercept)                      Right_Bower  
##                        -4.00826                          3.00827  
##                      Left_Bower                             TAce  
##                         2.54185                          2.00913  
##                           TKing                           TQueen  
##                         1.69341                          1.58485  
##                            TTen                            TNine  
##                         1.48441                          1.35936  
##                       Ace_count                       King_count  
##                         1.10593                          0.30411  
##                     Queen_count             Non_Trump_Suit_Count  
##                         0.09709                         -0.35528  
## Non_Trump_Turned_Down_Color_Ind          Broad_Start_OrderMiddle  
##                         0.08808                         -0.02156  
##           Broad_Start_OrderLast  
##                         0.13658  
## 
## Degrees of Freedom: 79999 Total (i.e. Null);  79985 Residual
## Null Deviance:       108500 
## Residual Deviance: 78110     AIC: 78140

The 13 variables and their respective coefficients seem to make reasonable sense.

euchre_turn_down.lm1 = bwd.lm1.bic


Preprocess the data so a new dataframe is created using only the observations predicted as 1s

euchre_turn_down_train1 = euchre_turn_down_train
euchre_turn_down_train1$correct_call1 = ifelse(euchre_turn_down_train1$correct_call == 0, 0, 1) # Preprocess data to create new variable that classifies 2s as 1s so the outcome is binary

# Make predictions using the first logistic regression model
logr1_results = euchre_all_test_turn_down
logr1_results$model_bid = ifelse(predict(euchre_turn_down.lm1, newdata = logr1_results, type = 'response') < 0.5, 0, 1)
euchre_turn_down_train1$prediction1 = ifelse(predict(euchre_turn_down.lm1, type = 'response') < 0.5, 0, 1) # Define predicted variables using default cutoff value

# Create new dataframe that has predicted 1s from model
euchre_turn_down_train2 = euchre_turn_down_train1[euchre_turn_down_train1$prediction1 != 0, ]
euchre_turn_down_train2 = subset(euchre_turn_down_train2, select=-c(correct_call1, prediction1))
euchre_turn_down_train2$correct_call2 = ifelse(euchre_turn_down_train2$correct_call == 2, 1, 0)


Create second logistic regression models

These models will also be made using forward and backward stepwise regression, using AIC and BIC as the performance metrics.

lm2.full = glm(correct_call2 ~. - correct_call, family = binomial, data = euchre_turn_down_train2)

null.lm2 = glm(correct_call2 ~ 1, data = euchre_turn_down_train2)

fwd.lm2.bic = step(null.lm2, scope = list(lower = null.lm2, upper = lm2.full), direction = "forward", k = log(nrow(euchre_turn_down_train2)))

bwd.lm2.bic = step(lm2.full, direction = "backward", k = log(nrow(euchre_turn_down_train2)))

fwd.lm2.aic = step(null.lm2, scope = list(lower = null.lm2, upper = lm2.full), direction = "forward")

bwd.lm2.aic = step(lm2.full, direction = "backward")
Name Accuracy F1_Score Accuracy_adj F1_Score_adj BIC AIC
lm2.full 0.4561 0.1313 0.4434 0.0923 23542.59 23384.01
fwd.lm2.bic 0.4308 0.0521 0.4208 0.0190 13605.05 13455.27
bwd.lm2.bic 0.4560 0.1308 0.4434 0.0925 23513.15 23381.00
fwd.lm2.aic 0.4313 0.0537 0.4207 0.0188 13612.13 13453.55
bwd.lm2.aic 0.4561 0.1313 0.4434 0.0921 23520.99 23380.03

The bwd.lm2.aic model with the default 0.5 cutoff will be used because it has the best results. It has similar results to the full model while using slightly less variables.

bwd.lm2.aic
## 
## Call:  glm(formula = correct_call2 ~ Right_Bower + Left_Bower + TAce + 
##     TKing + TQueen + TTen + TNine + Ace_count + King_count + 
##     Queen_count + Jack_count + Non_Trump_Suit_Count + Non_Trump_Turned_Down_Color_Ind + 
##     Broad_Start_Order, family = binomial, data = euchre_turn_down_train2)
## 
## Coefficients:
##                     (Intercept)                      Right_Bower  
##                        -8.78502                          3.43071  
##                      Left_Bower                             TAce  
##                         3.04931                          2.48249  
##                           TKing                           TQueen  
##                         2.23700                          2.10962  
##                            TTen                            TNine  
##                         2.06240                          1.97066  
##                       Ace_count                       King_count  
##                         2.32091                          1.16776  
##                     Queen_count                       Jack_count  
##                         0.45142                          0.09104  
##            Non_Trump_Suit_Count  Non_Trump_Turned_Down_Color_Ind  
##                        -1.61739                          0.13126  
##         Broad_Start_OrderMiddle            Broad_Start_OrderLast  
##                        -0.23736                         -0.12309  
## 
## Degrees of Freedom: 49514 Total (i.e. Null);  49499 Residual
## Null Deviance:       35820 
## Residual Deviance: 23350     AIC: 23380

These 14 variables and their respective coefficients also seem to be reasonable.

euchre_turn_down.lm2 = bwd.lm2.aic


Combine the results of the two logistic regression models and evaluate the result

logr2_results = logr1_results[logr1_results$model_bid != 0, ]
logr1_results = logr1_results[logr1_results$model_bid == 0, ]

logr2_results$model_bid = ifelse(predict(euchre_turn_down.lm2, newdata = logr2_results, type = 'response') < 0.5, 1, 2)


turn_down_results_two_lms = rbind(logr1_results[, c('correct_call', 'team_ind_tricks_won', 'team_tricks_won', 'alone_tricks_won', 'model_bid')], logr2_results[, c('correct_call', 'team_ind_tricks_won', 'team_tricks_won', 'alone_tricks_won', 'model_bid')])

Build_metrics_table_rslt_tbl(turn_down_results_two_lms, 'euchre_turn_down.2lm', 'model_bid')

euchre_turn_down.2lm_stats %>%
  kbl() %>%
  kable_styling()
model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.2lm 0.5064 0.4787872 0.69665 0.2287024 0.2786365

This method also does a good job with classification.


Create dataframe that makes predictions using all 7 of the best algorithms analyzed

The goal is to compare how each of the best algorithms perform and to determine if combining all these models together would yield significantly better results.

# Build logistic regression results
euchre_2lm_prediction1 = euchre_all_test_turn_down
euchre_2lm_prediction1$two_lm_bid = ifelse(predict(euchre_turn_down.lm1, newdata = euchre_all_test_turn_down, type = 'response') < 0.5, 0, 1)
euchre_2lm_prediction2 = euchre_2lm_prediction1[euchre_2lm_prediction1$two_lm_bid == 1,]
euchre_2lm_prediction1 = euchre_2lm_prediction1[euchre_2lm_prediction1$two_lm_bid == 0,]
euchre_2lm_prediction2$two_lm_bid = ifelse(predict(euchre_turn_down.lm2, newdata = euchre_2lm_prediction2, type = 'response') < 0.5, 1, 2)
euchre_test_predictions = rbind(euchre_2lm_prediction1, euchre_2lm_prediction2)

# Build ordinal regression results
euchre_test_predictions$orm_bid = predict(euchre_turn_down.orm, newdata = euchre_test_predictions, type = 'lp')
euchre_test_predictions = euchre_test_predictions %>%
  mutate(orm_bid = case_when(orm_bid<=cutoff_1~0, orm_bid<=cutoff_2~1, TRUE~2))

# Build multinomial results
euchre_test_predictions$multinom_bid = as.numeric(predict(euchre_turn_down.multinom, newdata = euchre_test_predictions))-1

# Build decision tree results
euchre_test_predictions$tree_bid = as.numeric(predict(euchre_turn_down.tree, newdata = euchre_test_predictions, type = 'class'))-1

# Build random forest results
euchre_test_predictions$random_forest_bid = as.numeric(predict(euchre_turn_down.rf, newdata = euchre_test_predictions, type = 'class'))-1

# Build K nearest neighbors results
euchre_test_predictions_w_dummies = dummy_cols(euchre_test_predictions, select_columns = "Broad_Start_Order", remove_selected_columns = TRUE)

euchre_test_predictions_w_dummies = euchre_test_predictions_w_dummies %>% 
  select(-matches('_bid'), -matches('_tricks_won'), -'predicted_call')


cols = colnames(euchre_test_predictions_w_dummies[,-19])
for(i in cols){
  euchre_test_predictions_w_dummies[[i]] = (euchre_test_predictions_w_dummies[[i]] - min(euchre_turn_down_train_w_dummies[[i]]))/(max(euchre_turn_down_train_w_dummies[[i]]) - min(euchre_turn_down_train_w_dummies[[i]]))
}
euchre_test_predictions$knn_bid = as.numeric(knn(train = distinct_euchre_turn_down_train_w_dummies.norm[,-19],
                                      test = euchre_test_predictions_w_dummies[,-19],
                                      cl=distinct_euchre_turn_down_train_w_dummies.norm$correct_call,
                                      k=turn_down_best_k))-1

# Build discriminant analysis results
euchre_test_predictions$da_bid = as.numeric(predict(euchre_turn_down.da, newdata = euchre_test_predictions))-1

# Take average score of all model predictions for an observation and assign value as ensemble bid
euchre_test_predictions$top_bid = round((euchre_test_predictions$two_lm_bid +
                                          euchre_test_predictions$orm_bid +
                                          euchre_test_predictions$multinom_bid +
                                          euchre_test_predictions$tree_bid +
                                          euchre_test_predictions$random_forest_bid + 
                                           euchre_test_predictions$knn_bid + 
                                           euchre_test_predictions$da_bid)/7, 0)


Create results table to see which top algorithms perform the best

model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.Intuition 0.58200 0.3766271 0.57465 0.1157116 0.4738177
euchre_turn_down.2lm 0.50640 0.4787872 0.69665 0.2287024 0.2786365
euchre_turn_down.orm 0.49445 0.4830315 0.68545 0.2154513 0.2994899
euchre_turn_down.multinom 0.50550 0.4792450 0.69640 0.2261209 0.2821280
euchre_turn_down.tree 0.49855 0.4680213 0.66480 0.1729051 0.3698565
euchre_turn_down.rf 0.52865 0.4675626 0.68890 0.2493902 0.2687500
euchre_turn_down.knn 0.55570 0.3788823 0.66750 0.2443962 0.3034556
euchre_turn_down.da 0.53105 0.4611099 0.68645 0.2433362 0.2798668
euchre_turn_down.Ensemble 0.50475 0.4812781 0.69640 0.2245416 0.2857143

From a glance, it is obvious that every single model above has drastic improvement over the default personal method of bidding trump. The practical tree model that can be used in a real-life Euchre game has the worst accuracy out of all the models, but even that still performs way better than the default method.

The ensemble model that combines all 7 models performs well, but does not stand out. In comparison to each individual model, it has the 3rd best lost cost score, is tied for the 2nd best accuracy, and has the 2nd best F1 score.

The ordinal regression, multinomial regression, and 2-step logistic regression models all perform the best, although they are not very interpretative. Any of these models can be integrated in a Euchre computer application to decide if, what, and how to bid trump.


Model testing on new data

For this next test, a new Euchre dataset will be used to assess the performance of the selected models on new, unseen data. Since this data is likely similar and probably has some matching observations to the training data, the model that performs the best on the training data will be used. The tree model will also be used because it is interpretive.

When comparing the model performances on the new test data, there will be 3 methods tested.

  1. The personal method (as a baseline)

  2. The decision tree model (it is interpretive and can be used in real-life Euchre games)

  3. The orm model (out of the 7 individual models, it has the highest F1 score and the best loss cost. It is also simpler to use.)


Read in new validation data with 50,000 generated hands to test selected models on unseen data

new_euchre_turn_down = read.csv('euchre_make_trump_stats_test.csv')

Prepare and modify target and predictor variables

new_euchre_turn_down$predicted_call = replace(new_euchre_turn_down$predicted_call, new_euchre_turn_down$predicted_call == "Pass", 0)
new_euchre_turn_down$predicted_call = replace(new_euchre_turn_down$predicted_call, new_euchre_turn_down$predicted_call == "Order Trump", 1)
new_euchre_turn_down$predicted_call = replace(new_euchre_turn_down$predicted_call, new_euchre_turn_down$predicted_call == "Order Trump Alone", 2)
new_euchre_turn_down$correct_call = replace(new_euchre_turn_down$correct_call, new_euchre_turn_down$correct_call == "Pass", 0)
new_euchre_turn_down$correct_call = replace(new_euchre_turn_down$correct_call, new_euchre_turn_down$correct_call == "Order Trump", 1)
new_euchre_turn_down$correct_call = replace(new_euchre_turn_down$correct_call, new_euchre_turn_down$correct_call == "Order Trump Alone", 2)

new_euchre_turn_down$predicted_call = as.factor(new_euchre_turn_down$predicted_call)
new_euchre_turn_down$correct_call = as.factor(new_euchre_turn_down$correct_call)

Preprocess the data the same way the training data was

# Convert characters to factors
new_euchre_turn_down$Broad_Start_Order = factor(new_euchre_turn_down$Broad_Start_Order, levels = c('First', 'Middle', 'Last'))

Remove variables that were not used at all in model building

keep_variables = names(euchre_turn_down)
new_euchre_turn_down = new_euchre_turn_down %>%
  select(all_of(keep_variables))


Predict the target variable classes using the two selected models

# tree
new_euchre_turn_down$tree_pred = predict(euchre_turn_down.tree, newdata = new_euchre_turn_down, type = 'class')

# ordinal
new_euchre_turn_down$orm_pred = predict(euchre_turn_down.orm, newdata = new_euchre_turn_down, type = 'lp')
new_euchre_turn_down = new_euchre_turn_down %>%
  mutate(orm_pred = case_when(orm_pred<=cutoff_1~0, orm_pred<=cutoff_2~1, TRUE~2))


Evaluate the performances of the 3 classification methods on the new data

Build_metrics_table_rslt_tbl(new_euchre_turn_down, 'euchre_turn_down.Intuition_test', 'predicted_call')
Build_metrics_table_rslt_tbl(new_euchre_turn_down, 'euchre_turn_down.tree_test', 'tree_pred')
Build_metrics_table_rslt_tbl(new_euchre_turn_down, 'euchre_turn_down.orm_test', 'orm_pred')

euchre_turn_down_validation_stats = bind_rows(euchre_turn_down.Intuition_test_stats,
                                            euchre_turn_down.tree_test_stats,
                                            euchre_turn_down.orm_test_stats)
euchre_turn_down_validation_stats %>%
  kbl() %>%
  kable_styling()
model_name loss_cost macro_f1 accuracy p_euchred_calls p_missed_calls
euchre_turn_down.Intuition_test 0.59072 0.3719740 0.57026 0.1238074 0.4777543
euchre_turn_down.tree_test 0.50578 0.4593965 0.66090 0.1757638 0.3743168
euchre_turn_down.orm_test 0.50120 0.4792387 0.68320 0.2182368 0.3050954

The ordinal regression model yields the best results, whereas the tree lags a little behind. However, the tree still does way better than the personal method. This means that I can win more Euchre games if I make bids using the tree model, instead of my own bidding method. This also means that a computer that makes bids using the ordinal regression model will be very difficult to beat if it gets half-decent cards.


Bidding on a Non-Turned Up Suit Conclusion

Because the new data is similar to the training data, it makes sense that the model results are all similar. Since the data here will not be vastly different compared to what is seen in a real-life Euchre game, it can be concluded that these results are valid. Therefore, these machine learning models are able to be applied in Euchre games.

From this research, it has been determined that machine learning can be used to improve the 2nd part of the bidding process of a Euchre game. This is both at a practical level and at a more complex level. Furthermore, it is shown that making a bid on the best suit in hand relies on a lot of variables, but mainly these 6: the count of trump cards, the strongest card value in hand, the count of distinct non-trump suits in hand, the count of non-trump aces, whether the Right Bower is present in hand, and the order in which a player will be starting the trick-taking round.