Titanic Kaggle competition

Pablo Adames
April 6, 2020

The work was done in a kaggle kernel with a Jupyter notebook.

Created tutorial as a kernel in my Kaggle account
Kaggle uses Docker containers to sandbox notebook
- 2 (fast) CPUs
- NVIDIA Tesla P100 to your Notebook for free (US$7,000 data centre oriented GPU)
- TPU v3-8 to your Notebook for free
I/O
- input from “../input/”
- output to current folder (accessible after major versioning)
Submissions (still command line)
Installing packages (enable Internet in kernel settings)

The rules:

Fixed proportion of data in the train and test sets

Submission says nothing how

Forces to impute missing values in test set

11 submissions per 24 hour period

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22	1	0	A/5 21171	7.2500		S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38	1	0	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26	0	0	STON/O2. 3101282	7.9250		S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	0	113803	53.1000	C123	S
5	0	3	Allen, Mr. William Henry	male	35	0	0	373450	8.0500		S
6	0	3	Moran, Mr. James	male	NA	0	0	330877	8.4583		Q
7	0	1	McCarthy, Mr. Timothy J	male	54	0	0	17463	51.8625	E46	S
8	0	3	Palsson, Master. Gosta Leonard	male	2	3	1	349909	21.0750		S
9	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27	0	2	347742	11.1333		S

plot of chunk unnamed-chunk-1

Same procedure for both Training and Test sets

Imputation with KNN averaging and k=10 vectors.

Binary classification problesm like this in Kaggle competitions:

plot of chunk unnamed-chunk-2

$ kaggle competitions submit -c titanic -f results/glm_default.csv -m “typo fixed”