The data

The objective in this study is to observe if a passenger of the Titanic lived or died based only on his age and his Fare. We are using the titanic dataset[1]. The dimensionality of the dataset is 891 records and 12 attributes, we are using the full original sample found in Kaggle. Some of the observations have the exact same value for Age and Fare, then we added a small amount of noise to make them more visible in the plot.

Although it is not completely clear, we can observe a tendency of survival in passengers with a high Fare. Also passengers aged 5 or younger tended to survive, while passengers aged 70 or older tended to die, with some outliers like the 80 years old one at the right of the figure. Another interesting observation is that one of the three points with Fares over $500 was not rich, indeed she was the personal maid of one of passengers represented as a star at her side.

This study was made to fulfill a peer assignment task, and it is clear that considering more variables and the use of an approach like logistic regression could be a better way to address this problem.

[1] https://www.kaggle.com/c/titanic