Survivor
Background on Survivor
Survivor is a television show that has been around for decades. 16 -20 contestants battle each season to outwit, outlast, and outplay the others and try to claim the million dollar prize by being the Sole Survivor. The contestants face the elements each and every day out in the wild. They have to build a shelter for which they will stay for the duration of the competition. The contestants also have to ration their food, which is usually rice, and ration their water as well. There are challenges that the contestants participate in. There are both tribal challenges, group challenges, and individual challenges. The rewards for the different challenges can be comfort, food, experiences, or immunity from getting voted out. It can be a brutal time out in the wild for many contestants, but those who can outwit, outlast, and outplay will find themselves very close to winning a million dollars.
I am interested in looking at data from the last 42 seasons to better understand how to become a Sole Survivor. I want to get insight into some of the characteristics that a Sole Survivor contains. What type of things that might be similar between Sole Survivors during the game.
I have multiple data sets that I will analyze. The Castaways data set contains season and demographic information about each castaway. The Castaways details data set contains unique information for each castaway. The challenge results data contains immunity and reward challenge results. The hidden idol data contains the history of hidden immunity idols including who found them, on what day and which day they were played. The idol number increments for each idol the castaway finds during the game. Lastly, the season summary data contains summary details of each season of Survivor, including the winner, runner ups and location.
Data Dictionary
Some of the more important variables to know:
| Gender | Male or Female |
| Age | Age of Castaway(contestant) |
| Season | Season of Survivor (1-42) |
| Total_Votes_received | Number of votes against them |
| Immunity_idols_won | Number of immunity idols won |
| Sole_Survivor | Dummy variable indicated whether a season winner or not |
Summary Statistics
What is the average age of a contestant on Survivor?
## [1] 33.41026
What is the age range of contestants that have been on Survivor?
## [1] 18 75
This tells us the youngest person was 18 and the oldest was 75. Wow! The oldest person on Survivor was 75! Dang! Good for them!
What Countries has Survivor been played? How many times?
| country | Total Number of Seasons |
|---|---|
| Australia | 1 |
| Brazil | 2 |
| Cambodia | 2 |
| China | 1 |
| Fiji | 11 |
| Gabon | 1 |
| Guatemala | 1 |
| Islands | 1 |
| Kenya | 1 |
| Malaysia | 1 |
| Nicaragua | 6 |
| Palau | 2 |
| Panama | 3 |
| Philippines | 4 |
| Polynesia | 1 |
| Samoa | 2 |
| Thailand | 1 |
| Vanuatu | 1 |
Descriptive Analysis
What is the age of each Sole Survivor over the last 42 seasons? (Winners)
From this chart, we can see the difference in ages for the winners. It looks like in season 17 there was a winner that was pretty old. Good for them!
How many male vs female Sole Survivors have there been?
There are a lot more male winners than female winners. Also there is NA because we have some data from Season 42 which is airing currently, so there is no data available for that seasons winner.
How does the number of Female vs Male winners look like over the years?
I am curious to see if there is any span of years where males or females consistently won.
| season | result | gender |
|---|---|---|
| 1 | Sole Survivor | Male |
| 2 | Sole Survivor | Female |
| 3 | Sole Survivor | Male |
| 4 | Sole Survivor | Female |
| 5 | Sole Survivor | Male |
| 6 | Sole Survivor | Female |
| 7 | Sole Survivor | Female |
| 8 | Sole Survivor | NA |
| 9 | Sole Survivor | Male |
| 10 | Sole Survivor | Male |
| 11 | Sole Survivor | Female |
| 12 | Sole Survivor | Male |
| 13 | Sole Survivor | Male |
| 14 | Sole Survivor | Male |
| 15 | Sole Survivor | Male |
| 16 | Sole Survivor | Female |
| 17 | Sole Survivor | Male |
| 18 | Sole Survivor | Male |
| 19 | Sole Survivor | Female |
| 20 | Sole Survivor | Female |
| 21 | Sole Survivor | Male |
| 22 | Sole Survivor | Male |
| 23 | Sole Survivor | Female |
| 24 | Sole Survivor | NA |
| 25 | Sole Survivor | Female |
| 26 | Sole Survivor | Male |
| 27 | Sole Survivor | Male |
| 28 | Sole Survivor | Male |
| 29 | Sole Survivor | Female |
| 30 | Sole Survivor | Male |
| 31 | Sole Survivor | Male |
| 32 | Sole Survivor | Female |
| 33 | Sole Survivor | Male |
| 34 | Sole Survivor | Female |
| 35 | Sole Survivor | Male |
| 36 | Sole Survivor | Male |
| 37 | Sole Survivor | Male |
| 38 | Sole Survivor | Male |
| 39 | Sole Survivor | Male |
| 40 | Sole Survivor | Male |
| 41 | Sole Survivor | Female |
This table shows whether the Sole Survivor was male or female each season. We can see that after season 34 there was consistently male winners.
Do most Sole Survivors win immunity idols during their season?
Wow! So it seems really important to win immunity idols during your season because all of the winner have won at least one and alot seem to have 3 or more.
Where do most Sole Survivors tend to be from?
| state | result |
|---|---|
| California | 8 |
| New Jersey | 4 |
| New York | 3 |
| Pennsylvania | 3 |
| Iowa | 2 |
| Massachusetts | 2 |
| Texas | 2 |
| Utah | 2 |
| Alabama | 1 |
| Arkansas | 1 |
| D.C. | 1 |
| Florida | 1 |
| Idaho | 1 |
| Kansas | 1 |
| Kentucky | 1 |
| Maine | 1 |
| North Carolina | 1 |
| Ohio | 1 |
| Ontario | 1 |
| Rhode Island | 1 |
| South Carolina | 1 |
| Tennessee | 1 |
| Washington | 1 |
It looks like there have been the most Sole Survivors from California with 8, then New Jersey with 4, followed by New York and Pennsylvania with 3.
Twitter Scrapping
We know from above that immunity idols are common among the Sole survivors. Also, as a Survivor fan… I know that idols are a big deal so I am going to see if people are already talking about any immunity idols on Twitter so far for this season that is currently airing.
I scrapped 500 tweets from twitter that had the #Survivor hashtag for this analysis. Of the 500 I scraped I wanted tot find out what percent of those mentioned “immunity” or “idol” in the tweet.
((Survivor_Tweets %>%
select(text) %>%
filter(str_detect(text, '\\b(immunity|idol)\\b')) %>%
summarise(text = n())) / 500)*100## text
## 1 2.2
Only 2.2% of the 500 tweets I collected that use the Survivor hash tag mention immunity or idol… Interesting. Maybe it is too soon in the season for people to be really amp-ted about them.
Decision Tree
From the top of the tree it shows the proportion of Castaways, that won. 5% of all Castaways have won. 88% of castaways have 3 or more votes against them with a winning probability of 3%. 12% of castaways have less than 3 votes against them with a winning probability of 21%. If immunity idols found is greater than 4 then you have a winning probability of 38%.
Conclusion
If you are older than 28 but younger than 46 and win 4 or more immunity idols and receive less than 3 votes against you, you have a winning probability of 62%.
Looks like I’ll wait a few years before I re-apply! LOL.
Note: If you want to explore Survivor data more there is a package in R!