Coordinating Instructions
- Available Data
- Key Terms
EDA and Data Cleaning
Exloring Correlations in Data
- Creating Probability Vectors in the Data
Determine Original Odds of Teleportation for Teleported Passengers
- Create Odds Functions to Apply to Existing Passengers
Determine Odds for Existing Passengers
Conclusions

Coordinating Instructions

Background The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars. While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

Objective To help rescue crews and retrieve the lost passengers, you are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system. Help save them and change history!

NAs I mostly ignored them, and did not attempt to discover what information the NA data might disclose upon investigation. That would’ve take too long. ### Packages - tidyverse - stringr - scales - knitr

Available Data

All datasets obtained from Kaggle
The train_df dataset provides personal records for about 8700 passengers whose teleportation status is known.
The test_df dataset provides personal records for the remaining ~4300 passengers whose teleportation status is unknown. The key task is to project which of these passengers was likely teleported.

test_df %>% 
  head(n = 10) %>% 
  kable()

PassengerId	HomePlanet	CryoSleep	Cabin	Destination	Age	VIP	RoomService	FoodCourt	ShoppingMall	Spa	VRDeck	Name
0013_01	Earth	TRUE	G/3/S	TRAPPIST-1e	27	FALSE	0	0	0	0	0	Nelly Carsoning
0018_01	Earth	FALSE	F/4/S	TRAPPIST-1e	19	FALSE	0	9	0	2823	0	Lerome Peckers
0019_01	Europa	TRUE	C/0/S	55 Cancri e	31	FALSE	0	0	0	0	0	Sabih Unhearfus
0021_01	Europa	FALSE	C/1/S	TRAPPIST-1e	38	FALSE	0	6652	0	181	585	Meratz Caltilter
0023_01	Earth	FALSE	F/5/S	TRAPPIST-1e	20	FALSE	10	0	635	0	0	Brence Harperez
0027_01	Earth	FALSE	F/7/P	TRAPPIST-1e	31	FALSE	0	1615	263	113	60	Karlen Ricks
0029_01	Europa	TRUE	B/2/P	55 Cancri e	21	FALSE	0	NA	0	0	0	Aldah Ainserfle
0032_01	Europa	TRUE	D/0/S	TRAPPIST-1e	20	FALSE	0	0	0	0	0	Acrabi Pringry
0032_02	Europa	TRUE	D/0/S	55 Cancri e	23	FALSE	0	0	0	0	0	Dhena Pringry
0033_01	Earth	FALSE	F/7/S	55 Cancri e	24	FALSE	0	639	0	0	0	Eliana Delazarson

Key Terms

PassengerId: A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.
HomePlanet: The planet the passenger departed from, typically their planet of permanent residence.
CryoSleep: Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
Cabin: The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.
Destination: The planet the passenger will be debarking to.
Age: The age of the passenger.
VIP: Whether the passenger has paid for special VIP service during the voyage.
RoomService, FoodCourt, ShoppingMall, Spa, VRDeck: Amount the passenger has billed at each of the Spaceship Titanic’s many luxury amenities.
Name: The first and last names of the passenger.
Transported: Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.
PassengerId: Id for each passenger in the test set.
Transported: The target. For each passenger, predict either True or False. (I refer to this as “teleported” throughout this project, also I only predict likelihood)

EDA and Data Cleaning

Addressing Problematic NAs

Addresing the NAs in the categories for amounts billed will help with some of the math later on. We address each dataframes separately to keep them uniform

test_df <- test_df %>% 
  mutate(RoomService = replace_na(test_df$RoomService, 0),
         FoodCourt = replace_na(test_df$FoodCourt, 0),
         ShoppingMall = replace_na(test_df$ShoppingMall, 0),
         Spa = replace_na(test_df$Spa, 0),
         VRDeck = replace_na(test_df$VRDeck))

train_df <- train_df %>% 
  mutate(RoomService = replace_na(train_df$RoomService, 0),
         FoodCourt = replace_na(train_df$FoodCourt, 0),
         ShoppingMall = replace_na(train_df$ShoppingMall, 0),
         Spa = replace_na(train_df$Spa, 0),
         VRDeck = replace_na(train_df$VRDeck))

Separating the Group number out of the PassengerId

This operation will allow us to group by group number should we need to later on; just seems like something we’ll need. I ended up forgetting to use this to determine likelihood of teleportation by assigned group number or passenger number. Oops.

# first make a Tibble that just separates the PassengerId into Group# and Passenger#
train_group_numbers <- as.tibble(str_split(string = train_df$PassengerId, pattern ="_", simplify = T))
test_group_numbers <- as.tibble(str_split(string = test_df$PassengerId, pattern ="_", simplify = T))
#change column names
colnames(test_group_numbers)[1] <- "GroupNum"
colnames(train_group_numbers)[1] <- "GroupNum"
colnames(test_group_numbers)[2] <- "PsngrNum"
colnames(train_group_numbers)[2] <- "PsngrNum"
#Add these columns to the original dataframes
test_df <- test_df %>% 
  mutate(GroupNum = test_group_numbers$GroupNum,
         PsngrNum = test_group_numbers$PsngrNum)
train_df <- train_df %>% 
  mutate(GroupNum = train_group_numbers$GroupNum,
         PsngrNum = train_group_numbers$PsngrNum)
# reorder to put the new columns by the PassengerId
test_df <- test_df %>% 
  select(PassengerId, GroupNum, PsngrNum, Name, HomePlanet:VRDeck)
train_df <- train_df %>% 
  select(PassengerId, GroupNum, PsngrNum, Name, Transported, HomePlanet:VRDeck)

Separating the data within the Cabin vector

This operation will parse out deck assignment and ship side assignment from the Cabin vector. Perhaps where the passenger was sleeping (if they were) made a difference in their likelihood of being teleported

# first make a Tibble that just separates the Cabin assignments into Deck/Number/Side
train_cabin_assignments <- as.tibble(str_split(string = train_df$Cabin, pattern = "/", simplify = T))
test_cabin_assignments <- as.tibble(str_split(string = test_df$Cabin, pattern = "/", simplify = T))
#change column names
colnames(test_cabin_assignments)[1] <- "Deck"
colnames(train_cabin_assignments)[1] <- "Deck"
colnames(test_cabin_assignments)[2] <- "CabinNumber"
colnames(train_cabin_assignments)[2] <- "CabinNumber"
colnames(test_cabin_assignments)[3] <- "CabinSide"
colnames(train_cabin_assignments)[3] <- "CabinSide"
#Add these columns to the original dataframes
test_df <- test_df %>% 
  mutate(Deck = test_cabin_assignments$Deck,
         CabinNumber = test_cabin_assignments$CabinNumber,
         CabinSide = test_cabin_assignments$CabinSide)
train_df <- train_df %>% 
  mutate(Deck = train_cabin_assignments$Deck,
         CabinNumber = train_cabin_assignments$CabinNumber,
         CabinSide = train_cabin_assignments$CabinSide)
# reorder to put the new columns by the PassengerId
test_df <- test_df %>% 
  select(PassengerId:Cabin, Deck, CabinNumber, CabinSide, Destination:VRDeck)
train_df <- train_df %>% 
  select(PassengerId:Cabin, Deck, CabinNumber, CabinSide, Destination:VRDeck)

Mutate a total debt column

This opertion totals the spending of each passenger; perhaps big spenders were spared teleportation.

# First replace NAs with 0s
train_df <- train_df %>% 
  mutate(VRDeck = replace_na(VRDeck, 0))
test_df <- test_df %>% 
  mutate(VRDeck = replace_na(VRDeck, 0))
# Try to mutate a total column
train_df <- train_df %>% 
  group_by(PassengerId) %>% 
  mutate(Total_Debt = rowSums(across(RoomService:VRDeck)))
test_df <- test_df %>% 
  group_by(PassengerId) %>% 
  mutate(Total_Debt = rowSums(across(RoomService:VRDeck)))

Exloring Correlations in Data

ggplot(train_df, aes(HomePlanet, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 3.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Home Planet",
       x = "Home Planet",
       y = "Totals")

Majority of Europans and Martians were teleported

ggplot(train_df, aes(Deck, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 2.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Deck Assignment",
       x = "Deck Assignment",
       y = "Totals")

Decks B, C, and G had a majority get teleported

ggplot(train_df, aes(CabinSide, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 3.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Side assignment",
       x = "Side Assignment",
       y = "Totals")

Majority assigned to Starboard side were teleported

ggplot(train_df, aes(CryoSleep, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 3.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Cryosleep Status",
       x = "Cryo Sleep status",
       y = "Totals")

Majority of those in Cryosleep were teleported

This operation will bin ages into groups to make the age variable more manageable

# make a column that bins ages in increments of 10 (0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79)
train_df$AgeBins <- cut(train_df$Age, breaks = c(0, 9, 19, 29, 39, 49, 59, 69, 79),
                        labels = c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79"))
test_df$AgeBins <- cut(test_df$Age, breaks = c(0, 9, 19, 29, 39, 49, 59, 69, 79),
                        labels = c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79"))
# Then put them back in the order that is easier to read
train_df <- train_df %>% 
  select(PassengerId:Age, AgeBins, VIP:Total_Debt)
test_df <- test_df %>% 
  select(PassengerId:Age, AgeBins, VIP:Total_Debt)

ggplot(train_df, aes(AgeBins, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 3.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Age Range",
       x = "Age Range",
       y = "Totals")

0-19 and 40-49 year olds were most likely to be teleported

ggplot(train_df, aes(Destination, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", colour = "white", size = 3.5,
aes(label = ..count..),position = position_dodge(width = .9), show.legend = FALSE)+
  labs(title = "Teleported Status by Destination",
       x = "Destination",
       y = "Totals")

Majority bound for 55 Cancri e were Teleported

ggplot(train_df, aes(VIP, fill = Transported))+
  geom_bar(position = "dodge", stat = "count")+ 
stat_count(geom = "label", color = "white", size = 3.5,
aes(label = ..count..), position = position_dodge(width = 0.9), show.legend = FALSE)+
  labs(title = "Teleported Status by VIP Status",
       x = "VIP Status",
       y = "Totals")

Slight majority of NON-VIP passengers teleported, majority VIP not

# some weird scaling issues with this chart
train_df %>% 
  group_by(Transported) %>% 
  summarise(Average_Spent = mean(Total_Debt)) %>% 
  mutate(Average_Spent = dollar(Average_Spent, accuracy = 1e-02)) %>% 
  ggplot(aes(Transported, Average_Spent, fill = Transported))+
  geom_col()+ 
  geom_label(aes(label = Average_Spent), color = "black", fill = "white", show.legend = FALSE, nudge_y = -0.1)+
  labs(title = "Teleportation Status by Average Spent",
       x = "Teleportation Status",
       y = "Average Spent")

Those who were teleported spent on average $1100 less than those who were not

Creating Probability Vectors in the Data

Odds of being teleported by CryoSleep

this operation creates a dataframe of multiple vectors specific to CryoSleep

Cryo_Odds <- train_df %>% 
  group_by(CryoSleep) %>% 
  mutate(Cryo_Total = n(),
         Cryo_Transported = sum(Transported == TRUE),
         Cryo_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, CryoSleep, Transported, Cryo_Total, Cryo_Transported, Cryo_Remained) %>% 
  group_by(CryoSleep) %>% 
  mutate(Cryo_odds = Cryo_Transported / Cryo_Total)
Cryo_Odds %>% 
  head(n = 10) %>% 
  kable()

PassengerId	CryoSleep	Transported	Cryo_Total	Cryo_Transported	Cryo_Remained	Cryo_odds
0001_01	FALSE	FALSE	5439	1789	3650	0.3289208
0002_01	FALSE	TRUE	5439	1789	3650	0.3289208
0003_01	FALSE	FALSE	5439	1789	3650	0.3289208
0003_02	FALSE	FALSE	5439	1789	3650	0.3289208
0004_01	FALSE	TRUE	5439	1789	3650	0.3289208
0005_01	FALSE	TRUE	5439	1789	3650	0.3289208
0006_01	FALSE	TRUE	5439	1789	3650	0.3289208
0006_02	TRUE	TRUE	3037	2483	554	0.8175831
0007_01	FALSE	TRUE	5439	1789	3650	0.3289208
0008_01	TRUE	TRUE	3037	2483	554	0.8175831

# do the same to add the new column to the original
train_df <- train_df %>% 
  group_by(CryoSleep) %>% 
  mutate(Cryo_Total = n(),
         Cryo_Transported = sum(Transported == TRUE),
         Cryo_Remained = sum(Transported == FALSE)) %>% 
  mutate(Cryo_odds = Cryo_Transported / Cryo_Total)

This operation simplifies the cryosleep data

Cryo_Odds_Simplified <- Cryo_Odds %>% 
  ungroup() %>% 
  select(CryoSleep, Cryo_odds) %>% 
  group_by(CryoSleep) %>% 
  summarise(Cryo_odds = mean(Cryo_odds, trim = 3)) %>% 
  mutate(across(Cryo_odds, round, 3))
kable(Cryo_Odds_Simplified, 
      col.names = c("CryoSleep", "Odds"),
      align = "cc")

CryoSleep	Odds
FALSE	0.329
TRUE	0.818
NA	0.488

Visualizing CryoSleep data

ggplot(Cryo_Odds_Simplified, aes(CryoSleep, Cryo_odds))+ 
  geom_col(aes(fill = CryoSleep, show.legend = FALSE)) +
  geom_label(aes(label = Cryo_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Cryosleep Status",
       x = "Cryosleep Status",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Those in cryosleep are more likely to be teleported

Odds of being teleported by HomePlanet

Planet_Odds <- train_df %>% 
  group_by(HomePlanet) %>% 
  mutate(Planet_Total = n(),
         Planet_Transported = sum(Transported == TRUE),
         Planet_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, HomePlanet, Transported, Planet_Total, Planet_Transported, Planet_Remained) %>% 
  group_by(HomePlanet) %>% 
  mutate(Planet_odds = Planet_Transported / Planet_Total)
Planet_Odds %>% 
  slice_sample(n = 3) %>% 
  kable()

PassengerId	HomePlanet	Transported	Planet_Total	Planet_Transported	Planet_Remained	Planet_odds
4661_02	Earth	FALSE	4602	1951	2651	0.4239461
6225_01	Earth	FALSE	4602	1951	2651	0.4239461
8319_01	Earth	FALSE	4602	1951	2651	0.4239461
7117_01	Europa	TRUE	2131	1404	727	0.6588456
4704_01	Europa	FALSE	2131	1404	727	0.6588456
9085_03	Europa	TRUE	2131	1404	727	0.6588456
5772_01	Mars	FALSE	1759	920	839	0.5230244
6253_01	Mars	FALSE	1759	920	839	0.5230244
4082_01	Mars	FALSE	1759	920	839	0.5230244
4078_01	NA	FALSE	201	103	98	0.5124378
6623_01	NA	TRUE	201	103	98	0.5124378
6954_02	NA	FALSE	201	103	98	0.5124378

This operation simplifies the new data

Planet_Odds_Simplified <- Planet_Odds %>% 
  ungroup() %>% 
  select(HomePlanet, Planet_odds) %>% 
  group_by(HomePlanet) %>% 
  summarise(Planet_odds = mean(Planet_odds, trim = 3)) %>% 
  mutate(across(Planet_odds, round, 3))
kable(Planet_Odds_Simplified, 
      col.names = c("Home Planet", "Odds"),
      align = "cc")

Home Planet	Odds
Earth	0.424
Europa	0.659
Mars	0.523
NA	0.512

# do the same to add the new column to the original
train_df <- train_df %>% 
  group_by(HomePlanet) %>% 
  mutate(Planet_Total = n(),
         Planet_Transported = sum(Transported == TRUE),
         Planet_Remained = sum(Transported == FALSE)) %>% 
  mutate(Planet_odds = Planet_Transported / Planet_Total)

ggplot(Planet_Odds_Simplified, aes(HomePlanet, Planet_odds))+ 
  geom_col(aes(fill = HomePlanet, show.legend = FALSE)) +
  geom_label(aes(label = Planet_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Home Planet",
       x = "Home Planet",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Europans are more likely to be teleported

Odds of Being teleported by Deck assignment

Deck_Odds <- train_df %>% 
  group_by(Deck) %>% 
  mutate(Deck_Total = n(),
         Deck_Transported = sum(Transported == TRUE),
         Deck_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, Deck, Deck_Total, Transported, Deck_Transported, Deck_Remained) %>% 
  group_by(Deck) %>% 
  mutate(Deck_odds = Deck_Transported / Deck_Total)
Deck_Odds %>% 
  slice_sample(n = 1) %>% 
  kable(align = "ccccccc")

PassengerId	Deck	Deck_Total	Transported	Deck_Transported	Deck_Remained	Deck_odds
6827_02	A	256	TRUE	127	129	0.4960938
6590_01	B	779	FALSE	572	207	0.7342747
1452_04	C	747	FALSE	508	239	0.6800535
1357_01	D	478	TRUE	207	271	0.4330544
1316_02	E	876	FALSE	313	563	0.3573059
4145_01	F	2794	FALSE	1229	1565	0.4398712
1615_03	G	2559	TRUE	1321	1238	0.5162173
1071_01	T	5	FALSE	1	4	0.2000000
0310_01	NA	199	FALSE	100	99	0.5025126

# do the same to add the new column to the original
train_df <- train_df %>%
  group_by(Deck) %>% 
  mutate(Deck_Total = n(),
         Deck_Transported = sum(Transported == TRUE),
         Deck_Remained = sum(Transported == FALSE)) %>% 
  mutate(Deck_odds = Deck_Transported / Deck_Total)

This operation simplifies the data

Deck_Odds_Simplified <- Deck_Odds %>% 
  ungroup() %>% 
  select(Deck, Deck_odds) %>% 
  group_by(Deck) %>% 
  summarise(Deck_odds = mean(Deck_odds, trim = 3)) %>% 
  mutate(across(Deck_odds, round, 3))
kable(Deck_Odds_Simplified,
      align = "cc")

Deck	Deck_odds
A	0.496
B	0.734
C	0.680
D	0.433
E	0.357
F	0.440
G	0.516
T	0.200
NA	0.503

ggplot(Deck_Odds_Simplified, aes(Deck, Deck_odds))+ 
  geom_col(aes(fill = Deck, show.legend = FALSE)) +
  geom_label(aes(label = Deck_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Deck Assignment",
       x = "Deck Assignment",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Those assigned to decks B, C, and G are most likely to be teleported

Odds of being transported by vessel Cabin side assignment

CabinSide_Odds <- train_df %>% 
  group_by(CabinSide) %>% 
  mutate(CabinSide_Total = n(),
         CabinSide_Transported = sum(Transported == TRUE),
         CabinSide_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, CabinSide, Transported, CabinSide_Total, CabinSide_Transported, CabinSide_Remained) %>% 
  group_by(CabinSide) %>% 
  mutate(CabinSide_odds = CabinSide_Transported / CabinSide_Total)
CabinSide_Odds %>% 
  slice_sample(n = 3) %>% 
  kable(col.names = c("PassengerID", "CabinSide", "Transported", "Total", "Teleported", "Remained", "Odds"),
        align = "ccccccc")

PassengerID	CabinSide	Transported	Total	Teleported	Remained	Odds
4336_03		TRUE	199	100	99	0.5025126
8770_03		FALSE	199	100	99	0.5025126
1718_01		TRUE	199	100	99	0.5025126
7679_01	P	FALSE	4206	1898	2308	0.4512601
9179_02	P	FALSE	4206	1898	2308	0.4512601
6976_01	P	TRUE	4206	1898	2308	0.4512601
0728_01	S	TRUE	4288	2380	1908	0.5550373
3542_01	S	TRUE	4288	2380	1908	0.5550373
0762_01	S	FALSE	4288	2380	1908	0.5550373

train_df <- train_df %>% 
  group_by(CabinSide) %>% 
  mutate(CabinSide_Total = n(),
         CabinSide_Transported = sum(Transported == TRUE),
         CabinSide_Remained = sum(Transported == FALSE)) %>% 
  mutate(CabinSide_odds = CabinSide_Transported / CabinSide_Total)

CabinSide_Odds_Simplified <- CabinSide_Odds %>% 
  ungroup() %>% 
  select(CabinSide, CabinSide_odds) %>% 
  group_by(CabinSide) %>% 
  summarise(CabinSide_odds = mean(CabinSide_odds, trim = 3)) %>% 
  mutate(across(CabinSide_odds, round, 3))
CabinSide_Odds_Simplified[1,1] <- NA
kable(CabinSide_Odds_Simplified,
      col.names = c("CabinSide", "Odds"),
      align = "cc")

CabinSide	Odds
NA	0.503
P	0.451
S	0.555

ggplot(CabinSide_Odds_Simplified, aes(CabinSide, CabinSide_odds))+ 
  geom_col(aes(fill = CabinSide, show.legend = FALSE)) +
  geom_label(aes(label = CabinSide_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Ship Side Assignment",
       x = "Ship Side Assignment",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Passengers assigned to the starboard side are more likely to be teleported

Odds of being teleported by VIP status

VIP_Odds <- train_df %>% 
  group_by(VIP) %>% 
  mutate(VIP_Total = n(),
         VIP_Transported = sum(Transported == TRUE),
         VIP_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, VIP, Transported, VIP_Total, VIP_Transported, VIP_Remained) %>% 
  group_by(VIP) %>% 
  mutate(VIP_odds = VIP_Transported / VIP_Total)
VIP_Odds %>% 
  slice_sample(n = 3) %>% 
  kable(align = "ccccccc",
        col.names = c("PassengerID", "VIP", "Transported", "Total", "Teleported", "Remained", "Odds"))

PassengerID	VIP	Transported	Total	Teleported	Remained	Odds
6599_01	FALSE	TRUE	8291	4198	4093	0.5063322
2683_01	FALSE	TRUE	8291	4198	4093	0.5063322
8570_01	FALSE	FALSE	8291	4198	4093	0.5063322
1816_01	TRUE	TRUE	199	76	123	0.3819095
1742_01	TRUE	FALSE	199	76	123	0.3819095
4543_01	TRUE	FALSE	199	76	123	0.3819095
8948_01	NA	FALSE	203	104	99	0.5123153
1509_01	NA	FALSE	203	104	99	0.5123153
8710_01	NA	FALSE	203	104	99	0.5123153

train_df <- train_df %>% 
  group_by(VIP) %>% 
  mutate(VIP_Total = n(),
         VIP_Transported = sum(Transported == TRUE),
         VIP_Remained = sum(Transported == FALSE)) %>% 
  mutate(VIP_odds = VIP_Transported / VIP_Total)

VIP_Odds_Simplified <- VIP_Odds %>% 
  ungroup() %>% 
  select(VIP, VIP_odds) %>% 
  group_by(VIP) %>% 
  summarise(VIP_odds = mean(VIP_odds, trim = 3)) %>% 
  mutate(across(VIP_odds, round, 3))
kable(VIP_Odds_Simplified,
      align = "cc")

VIP	VIP_odds
FALSE	0.506
TRUE	0.382
NA	0.512

ggplot(VIP_Odds_Simplified, aes(VIP, VIP_odds))+ 
  geom_col(aes(fill = VIP, show.legend = FALSE)) +
  geom_label(aes(label = VIP_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by VIP Status",
       x = "VIP Status",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

The peasants are more likely to be teleported, obviously.

Odds of being teleported by AgeBins

AgeBins_Odds <- train_df %>% 
  group_by(AgeBins) %>% 
  mutate(AgeBins_Total = n(),
        AgeBins_Transported = sum(Transported == TRUE),
         AgeBins_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, Age, AgeBins, Transported, AgeBins_Total, AgeBins_Transported, AgeBins_Remained) %>% 
  group_by(AgeBins) %>% 
  mutate(AgeBins_odds = AgeBins_Transported / AgeBins_Total)
AgeBins_Odds %>% 
  slice_sample(n = 1) %>% 
  kable(align = "cccccccc",
        col.names = c("PassengerID","Age", "AgeBin", "Transported", "Total", "Teleported", "Remained", "Odds"))

PassengerID	Age	AgeBin	Transported	Total	Teleported	Remained	Odds
8226_01	3	0-9	TRUE	501	339	162	0.6766467
0598_01	19	10-19	FALSE	1479	788	691	0.5327924
5758_01	22	20-29	TRUE	2762	1296	1466	0.4692252
4017_02	30	30-39	TRUE	1735	796	939	0.4587896
8032_02	40	40-49	TRUE	1048	531	517	0.5066794
3382_02	56	50-59	TRUE	557	275	282	0.4937163
1238_02	61	60-69	TRUE	208	99	109	0.4759615
3831_01	73	70-79	FALSE	46	20	26	0.4347826
5714_03	0	NA	TRUE	357	234	123	0.6554622

train_df <- train_df %>% 
  group_by(VIP) %>% 
  mutate(AgeBins_Total = n(),
         AgeBins_Transported = sum(Transported == TRUE),
         AgeBins_Remained = sum(Transported == FALSE)) %>% 
  mutate(AgeBins_odds = AgeBins_Transported / AgeBins_Total)

AgeBins_Odds_Simplified <- AgeBins_Odds %>% 
  ungroup() %>% 
  select(AgeBins, AgeBins_odds) %>% 
  group_by(AgeBins) %>% 
  summarise(AgeBins_odds = mean(AgeBins_odds, trim = 3)) %>% 
  mutate(across(AgeBins_odds, round, 3))
kable(AgeBins_Odds_Simplified,
      align = "cc",
      col.names = c("AgeBins", "Odds"))

AgeBins	Odds
0-9	0.677
10-19	0.533
20-29	0.469
30-39	0.459
40-49	0.507
50-59	0.494
60-69	0.476
70-79	0.435
NA	0.655

ggplot(AgeBins_Odds_Simplified, aes(AgeBins, AgeBins_odds))+ 
  geom_col(aes(fill = AgeBins, show.legend = FALSE)) +
  geom_label(aes(label = AgeBins_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Age",
       x = "Age",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Children, teenagers, and the adults who act like them are more likely to be teleported.

Odds of being teleported by Destination

Destination_Odds <- train_df %>% 
  group_by(Destination) %>% 
  mutate(Destination_Total = n(),
        Destination_Transported = sum(Transported == TRUE),
         Destination_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, Destination, Transported, Destination_Total, Destination_Transported, Destination_Remained) %>% 
  group_by(Destination) %>% 
  mutate(Destination_odds = Destination_Transported / Destination_Total)
Destination_Odds %>% 
  slice_sample(n = 3) %>% 
  kable(align = "ccccccc",
        col.names = c("PassengerID", "Destination", "Transported", "Total", "Teleported", "Remained", "Odds"))

PassengerID	Destination	Transported	Total	Teleported	Remained	Odds
6744_03	55 Cancri e	FALSE	1800	1098	702	0.6100000
8004_01	55 Cancri e	TRUE	1800	1098	702	0.6100000
0221_01	55 Cancri e	FALSE	1800	1098	702	0.6100000
0781_01	PSO J318.5-22	FALSE	796	401	395	0.5037688
6120_01	PSO J318.5-22	TRUE	796	401	395	0.5037688
5108_01	PSO J318.5-22	TRUE	796	401	395	0.5037688
7020_01	TRAPPIST-1e	TRUE	5915	2787	3128	0.4711750
2558_01	TRAPPIST-1e	TRUE	5915	2787	3128	0.4711750
3103_01	TRAPPIST-1e	FALSE	5915	2787	3128	0.4711750
6889_01	NA	FALSE	182	92	90	0.5054945
0504_03	NA	TRUE	182	92	90	0.5054945
5859_01	NA	TRUE	182	92	90	0.5054945

# do the same to add the new column to the original
train_df <- train_df %>% 
  group_by(Destination) %>% 
  mutate(Destination_Total = n(),
         Destination_Transported = sum(Transported == TRUE),
         Destination_Remained = sum(Transported == FALSE)) %>% 
  mutate(Destination_odds = Destination_Transported / Destination_Total)

Destination_Odds_Simplified <- Destination_Odds %>% 
  ungroup() %>% 
  select(Destination, Destination_odds) %>% 
  group_by(Destination) %>% 
  summarise(Destination_odds = mean(Destination_odds, trim = 3)) %>% 
  mutate(across(Destination_odds, round, 3))
kable(Destination_Odds_Simplified,
      align = "cc")

Destination	Destination_odds
55 Cancri e	0.610
PSO J318.5-22	0.504
TRAPPIST-1e	0.471
NA	0.505

ggplot(Destination_Odds_Simplified, aes(Destination, Destination_odds))+ 
  geom_col(aes(fill = Destination, show.legend = FALSE)) +
  geom_label(aes(label = Destination_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Destination",
       x = "Destination",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Passengers bound for 55 Cancri e and PSO J318.5-22 are more likely to be teleported

Odds of being teleported by Total_Debt

This operation bins the total debt figures into manageable chunks

train_df$DebtBins <- cut(train_df$Total_Debt, breaks = c(-1, 1000, 5000, 10000, 20000, 30000, 100000), labels = c("0-1000", "1-5000", "5-10000", "5-20000", "20-30000", "30000+"))

test_df$DebtBins <- cut(test_df$Total_Debt, breaks = c(-1, 1000, 5000, 10000, 20000, 30000, 100000),labels = c("0-1000", "1-5000", "5-10000", "5-20000", "20-30000", "30000+"))

this operation creates a df just for binned debt totals and their odds

DebtBins_Odds <- train_df %>% 
  group_by(DebtBins) %>% 
  mutate(DebtBins_Total = n(),
        DebtBins_Transported = sum(Transported == TRUE),
         DebtBins_Remained = sum(Transported == FALSE)) %>% 
  ungroup() %>% 
  select(PassengerId, DebtBins, Transported, DebtBins_Total, DebtBins_Transported, DebtBins_Remained) %>% 
  group_by(DebtBins) %>% 
  mutate(DebtBins_odds = DebtBins_Transported / DebtBins_Total)
DebtBins_Odds %>% 
  slice_sample(n = 1) %>% 
  kable(align = "ccccccc",
        col.names = c("PassengerID", "DebtBin", "Transported", "Total", "Teleported", "Remained", "Odds"))

PassengerID	DebtBin	Transported	Total	Teleported	Remained	Odds
4933_04	0-1000	TRUE	5860	3520	2340	0.6006826
1561_01	1-5000	FALSE	2196	653	1543	0.2973588
7833_01	5-10000	FALSE	456	150	306	0.3289474
2589_03	5-20000	FALSE	150	48	102	0.3200000
8970_01	20-30000	FALSE	27	5	22	0.1851852
8556_01	30000+	FALSE	4	2	2	0.5000000

train_df <- train_df %>% 
  group_by(DebtBins) %>% 
  mutate(DebtBins_Total = n(),
         DebtBins_Transported = sum(Transported == TRUE),
         DebtBins_Remained = sum(Transported == FALSE)) %>% 
  mutate(DebtBins_odds = DebtBins_Transported / DebtBins_Total)

DebtBins_Odds_Simplified <- DebtBins_Odds %>% 
  ungroup() %>% 
  select(DebtBins, DebtBins_odds) %>% 
  group_by(DebtBins) %>% 
  summarise(DebtBins_odds = mean(DebtBins_odds, trim = 3)) %>% 
  mutate(across(DebtBins_odds, round, 3))
kable(DebtBins_Odds_Simplified,
      align = "cc",
      col.names = c("DebBin", "Odds"))

DebBin	Odds
0-1000	0.601
1-5000	0.297
5-10000	0.329
5-20000	0.320
20-30000	0.185
30000+	0.500

ggplot(DebtBins_Odds_Simplified, aes(DebtBins, DebtBins_odds))+ 
  geom_col(aes(fill = DebtBins, show.legend = FALSE)) +
  geom_label(aes(label = DebtBins_odds, vjust = 1.5))+
  labs(title = "Odds of Teleportation by Debt Amount",
       x = "Debt Amount",
       y = "Odds of Teleportation")+
  theme(axis.text.y = element_text(size = 1.5))

Those who spent the least and those who spent the most were more likely to be teleported

this operation reorganzied the data for consumption

train_df <- train_df %>% 
  select(PassengerId:HomePlanet, Planet_Total:Planet_odds, CryoSleep:Cryo_odds, Cabin:CabinSide, Deck_Total:CabinSide_odds, Destination, Destination_Total:Destination_odds, Age:AgeBins, AgeBins_Total:AgeBins_odds, VIP, VIP_Total:VIP_odds, RoomService:Total_Debt, DebtBins:DebtBins_odds)

Determine Original Odds of Teleportation for Teleported Passengers

This operation gives a snapshot of the range of odds

train_total_odds <- train_df %>%
  group_by(PassengerId) %>% 
  mutate(Odds = (Planet_odds * Cryo_odds * Deck_odds *CabinSide_odds * Destination_odds * AgeBins_odds * VIP_odds *DebtBins_odds)) %>%
  select(PassengerId, Odds, Transported, GroupNum:Name, HomePlanet:DebtBins_odds)
summary(train_total_odds$Odds)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0005464 0.0020066 0.0028990 0.0047252 0.0062640 0.0211128

To actually be Teleported, a passenger has to have at least a 0.00069 chance to be teleported at all

Train_Mean_Min_Max_Odds <- train_total_odds %>% 
  group_by(Transported) %>% 
  summarise(MeanOdds = mean(Odds),
            MaxOdds = max(Odds),
            MinOdds = min(Odds),
            SDOdds = sd(Odds))
kable(Train_Mean_Min_Max_Odds,
      align = "ccccc")

Transported	MeanOdds	MaxOdds	MinOdds	SDOdds
FALSE	0.0027124	0.0139331	0.0005464	0.0017756
TRUE	0.0067090	0.0211128	0.0006870	0.0049713

Create Odds Functions to Apply to Existing Passengers

AgeBins Function this function will assign odds based on a passenger’s age range

age_bins_function <- function(x) {
  case_when(x == "0-9" ~ 0.677,
            x == "10-19" ~ 0.533,
            x == "20-29" ~ 0.469,
            x == "30-39" ~ 0.459,
            x == "40-49" ~ 0.507,
            x == "50-59" ~ 0.494,
            x == "60-69" ~ 0.476,
            x == "70-79" ~ 0.435,
            x == "NA" ~ 0.655)
}

this operation adds the vector created by the function into the test dataframe

test_age_bins <- as.vector(lapply(test_df$AgeBins, age_bins_function))
# add it as a column to test_df
test_df$AgeBinsOdds <- test_age_bins

CabinSide Function

CabinSide_function <- function(x) {
  case_when(x == NA ~ 0.503,
            x == "P" ~ 0.451,
            x == "S" ~ 0.555)
}

test_CabinSide <- as.vector(lapply(test_df$CabinSide, CabinSide_function))
# add it as a column to test_df
test_df$CabinSideOdds <- test_CabinSide
# for some reason, the newly created columns have to be assigned as doubles after being created
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds))

CryoSleep Function

CryoSleep_function <- function(x) {
  case_when(x == "FALSE" ~ 0.329,
            x == "TRUE" ~ 0.451,
            x == NA ~ 0.488)
}

test_CryoSleep <- as.vector(lapply(test_df$CryoSleep, CryoSleep_function))
# add it as a column to test_df
test_df$CryoSleepOdds <- test_CryoSleep
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds))

DebtBins Function

DebtBins_function <- function(x) {
  case_when(x == "0-1000" ~ 0.601,
            x == "1-5000" ~ 0.297,
            x == "5-10000" ~ 0.329,
            x == "5-2000" ~ 0.320,
            x == "20-30000" ~ 0.185,
            x == "30000+" ~ 0.500)
}

test_DebtBins <- as.vector(lapply(test_df$DebtBins, DebtBins_function))
# add it as a column to test_df
test_df$DebtBinsOdds <- test_DebtBins
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds),
         DebtBinsOdds = as.double(DebtBinsOdds))

Destination Function

Destination_function <- function(x) {
  case_when(x == "55 Cancri e" ~ 0.610,
            x == "PSO J318.5-22" ~ 0.504,
            x == "TRAPPIST-1e" ~ 0.471,
            x == NA ~ 0.505)
}

test_Destination <- as.vector(lapply(test_df$Destination, Destination_function))
# add it as a column to test_df
test_df$DestinationOdds <- test_Destination
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds),
         DebtBinsOdds = as.double(DebtBinsOdds),
         DestinationOdds = as.double(DestinationOdds))

HomePlanet Function

HomePlanet_function <- function(x) {
  case_when(x == "Earth" ~ 0.424,
            x == "Europa" ~ 0.659,
            x == "Mars" ~ 0.523,
            x == NA ~ 0.512)
}

test_HomePlanet <- as.vector(lapply(test_df$HomePlanet, HomePlanet_function))
# add it as a column to test_df
test_df$HomePlanetOdds <- test_HomePlanet
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds),
         DebtBinsOdds = as.double(DebtBinsOdds),
         DestinationOdds = as.double(DestinationOdds),
         HomePlanetOdds = as.double(HomePlanetOdds))

Deck Function

Deck_function <- function(x) {
  case_when(x == "A" ~ 0.496,
            x == "B" ~ 0.734,
            x == "C" ~ 0.680,
            x == "D" ~ 0.433,
            x == "E" ~ 0.357,
            x == "F" ~ 0.440,
            x == "G" ~ 0.516,
            x == "T" ~ 0.200,
            x == NA ~ 0.503)
}

test_Deck <- as.vector(lapply(test_df$Deck, Deck_function))
# add it as a column to test_df
test_df$DeckOdds <- test_Deck
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds),
         DebtBinsOdds = as.double(DebtBinsOdds),
         DestinationOdds = as.double(DestinationOdds),
         HomePlanetOdds = as.double(HomePlanetOdds),
         DeckOdds = as.double(DeckOdds))

VIP function

VIP_function <- function(x) {
  case_when(x == FALSE ~ 0.506,
            x == TRUE ~ 0.382,
            x == NA ~ 0.512)
}

test_VIP <- as.vector(lapply(test_df$VIP, VIP_function))
# add it as a column to test_df
test_df$VIPOdds <- unlist(test_VIP)
# For some reason all these columns must be transformed back into doubles
test_df <- test_df %>% 
  mutate(CabinSideOdds = as.double(CabinSideOdds),
         AgeBinsOdds = as.double(AgeBinsOdds),
         CryoSleepOdds = as.double(CryoSleepOdds),
         DebtBinsOdds = as.double(DebtBinsOdds),
         DestinationOdds = as.double(DestinationOdds),
         HomePlanetOdds = as.double(HomePlanetOdds),
         DeckOdds = as.double(DeckOdds),
         VIPOdds = as.double(VIPOdds))

# this operation reorganized the data in the test df
test_df <- test_df %>% 
  select(PassengerId:HomePlanet, HomePlanetOdds, CryoSleep, CryoSleepOdds, Cabin, CabinNumber, Deck, DeckOdds, CabinSide, CabinSideOdds, Destination, DestinationOdds, Age:AgeBins, AgeBinsOdds, VIP, VIPOdds, RoomService:DebtBins, DebtBinsOdds)

Determine Odds for Existing Passengers

To be teleported at all, a passenger’s odds must be at least 0.0007

test_total_odds <- test_df %>%
  ungroup() %>% 
  group_by(PassengerId) %>% 
  mutate(Odds = (HomePlanetOdds * CryoSleepOdds * DeckOdds *CabinSideOdds * DestinationOdds * AgeBinsOdds * VIPOdds * DebtBinsOdds)) 
summary(test_total_odds$Odds)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0007  0.0017  0.0027  0.0032  0.0039  0.0120     659

Determining how many of those between 0.0005 - 0.001 were actually teleported

Using the train_df data to determine what percentage of those with particular odds were actually teleported. This figure will be applied to those from the test df

train_total_odds %>% 
  filter(Odds >= 0.0005 & Odds <= 0.001) %>% 
  ggplot(aes(x = Transported, fill = Transported))+
  geom_bar(stat = "count")+
  stat_count(geom = "label", color = "white", size = 3.5,
aes(label = ..count..), position = position_dodge(width = 0.9), show.legend = FALSE)+
  labs(title = "Total Teleported with Odds between 0.0005 and 0.001",
       x = "Teleportation Status",
       y = "Total Teleported")+
  theme(axis.text.y = element_text(size = 1.5))

Approximatley 21% chance of teleportation

Determining how many of those between 0.001- 0.005 were actually teleported

train_total_odds %>% 
  filter(Odds > 0.001 & Odds <= 0.005) %>% 
  ggplot(aes(x = Transported, fill = Transported))+
  geom_bar(stat = "count")+
  stat_count(geom = "label", color = "white", size = 3.5,
aes(label = ..count..), position = position_dodge(width = 0.9), show.legend = FALSE)+
  labs(title = "Total Teleported with Odds between 0.001 and 0.005",
       x = "Teleportation Status",
       y = "Total Teleported")+
  theme(axis.text.y = element_text(size = 1.5))

Approximately 34% chance of teleportation

Determining how many of those between 0.005- 0.01 were actually teleported

train_total_odds %>% 
  filter(Odds > 0.005 & Odds <= 0.01) %>% 
  ggplot(aes(x = Transported, fill = Transported))+
  geom_bar(stat = "count")+
  stat_count(geom = "label", color = "white", size = 3.5,
aes(label = ..count..), position = position_dodge(width = 0.9), show.legend = FALSE)+
  labs(title = "Total Teleported with Odds between 0.005 and 0.01",
       x = "Teleportation Status",
       y = "Total Teleported")+
  theme(axis.text.y = element_text(size = 1.5))

Approximately 76% chance of teleportation

Determine how many of those between 0.01- 0.05 were actually teleported

train_total_odds %>% 
  filter(Odds > 0.01 & Odds <= 0.05) %>% 
  ggplot(aes(x = Transported, fill = Transported))+
  geom_bar(stat = "count")+
  stat_count(geom = "label", color = "white", size = 3.5,
aes(label = ..count..), position = position_dodge(width = 0.9), show.legend = FALSE)+
  labs(title = "Total Teleported with Odds between 0.01 and 0.05",
       x = "Teleportation Status",
       y = "Total Teleported")+
  theme(axis.text.y = element_text(size = 1.5))

Approximately 99% chance of teleportation

Conclusions

This operation creates a function that will give a vector of liklihoods based on a passenger’s overall odds of teleportation

likelihood_function <- function(x) {
  case_when((x >- 0.0005 & x <= 0.001) ~ 21,
            (x > 0.001 & x <= 0.005) ~ 34,
            (x > 0.005 & x <= 0.01) ~ 76,
            (x > 0.01 & x <= 0.05) ~ 99)
}

test_Estimates <- as.vector(lapply(test_total_odds$Odds, likelihood_function))
# add it as a column to test_df
test_total_odds$Percent_Likelihood <- test_Estimates
# reorder columns to make checking easier
test_total_odds <- test_total_odds %>% 
  select(PassengerId, Odds, Percent_Likelihood, GroupNum:DebtBinsOdds)
# For some reason all these columns must be transformed back into doubles
test_total_odds <- test_total_odds %>% 
  mutate(Percent_Likelihood = as.double(Percent_Likelihood))

This operation simplifies for consumption

FINAL <- test_total_odds

passenger_likelihood <- FINAL %>% 
  select(PassengerId, Name, Percent_Likelihood)

passenger_likelihood %>% 
  slice_sample(n = 10) %>% 
  head(n = 12) %>% 
  kable(align = "ccc")

PassengerId	Name	Percent_Likelihood
0013_01	Nelly Carsoning	34
0018_01	Lerome Peckers	34
0019_01	Sabih Unhearfus	76
0021_01	Meratz Caltilter	34
0023_01	Brence Harperez	34
0027_01	Karlen Ricks	21
0029_01	Aldah Ainserfle	76
0032_01	Acrabi Pringry	34
0032_02	Dhena Pringry	76
0033_01	Eliana Delazarson	34
0037_01	Vivia Rickson	34
0040_01	Antino Pinoffent	34

sample visualiztion of simplified data

passenger_likelihood %>% 
  mutate(Percent_Likelihood = as.factor(Percent_Likelihood)) %>% 
ggplot(aes(Percent_Likelihood, fill = Percent_Likelihood))+
  geom_bar()+
  stat_count(geom = "text", color = "light blue", size = 3.5, aes(label = ..count..), position = position_stack(vjust = 0.5))+
  labs(title = "Total Possible Teleported Passengers by Likelihood",
       x = "Percent Likelihood",
       y = "Total Possible Teleported")+
  theme(axis.text.y = element_text(size = 1.5))

Spaceship Titanic

MAJ Owens

2022-06-26