Report SWEETVIZ and Pandas-Profile

Pandas Profiling

Titanic Dataset

Overview

Dataset statistics

Number of variables12
Number of observations891
Missing cells866
Missing cells (%)8.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory83.7 KiB
Average record size in memory96.1 B

Variable types

Numeric5
Categorical7

Alerts

Name has a high cardinality: 891 distinct values High cardinality
Ticket has a high cardinality: 681 distinct values High cardinality
Cabin has a high cardinality: 147 distinct values High cardinality
Pclass is highly correlated with Fare and 1 other fieldsHigh correlation
Fare is highly correlated with PclassHigh correlation
Survived is highly correlated with SexHigh correlation
Sex is highly correlated with SurvivedHigh correlation
SibSp is highly correlated with ParchHigh correlation
Parch is highly correlated with SibSpHigh correlation
Embarked is highly correlated with PclassHigh correlation
Age has 177 (19.9%) missing values Missing
Cabin has 687 (77.1%) missing values Missing
PassengerId is uniformly distributed Uniform
Name is uniformly distributed Uniform
Ticket is uniformly distributed Uniform
Cabin is uniformly distributed Uniform
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 608 (68.2%) zeros Zeros
Parch has 678 (76.1%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2022-09-21 19:26:54.013630
Analysis finished2022-09-21 19:26:56.385991
Duration2.37 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

PassengerId
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-09-21T14:26:56.423640image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45.5
Q1223.5
median446
Q3668.5
95-th percentile846.5
Maximum891
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5770265516
Kurtosis-1.2
Mean446
Median Absolute Deviation (MAD)223
Skewness0
Sum397386
Variance66231
MonotonicityStrictly increasing
2022-09-21T14:26:56.468997image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
5991
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
5951
 
0.1%
Other values (881)881
98.9%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%

Survived
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549 
1
342 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Length

2022-09-21T14:26:56.507433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-21T14:26:56.547256image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring characters

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Pclass
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491 
1
216 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Length

2022-09-21T14:26:56.579226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-21T14:26:56.616222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Braund, Mr. Owen Harris
 
1
Boulos, Mr. Hanna
 
1
Frolicher-Stehli, Mr. Maxmillian
 
1
Gilinski, Mr. Eliezer
 
1
Murdlin, Mr. Joseph
 
1
Other values (886)
886 

Length

Max length82
Median length52
Mean length26.96520763
Min length12

Characters and Unicode

Total characters24026
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique891 ?
Unique (%)100.0%

Sample

1st rowBraund, Mr. Owen Harris
2nd rowCumings, Mrs. John Bradley (Florence Briggs Thayer)
3rd rowHeikkinen, Miss. Laina
4th rowFutrelle, Mrs. Jacques Heath (Lily May Peel)
5th rowAllen, Mr. William Henry

Common Values

ValueCountFrequency (%)
Braund, Mr. Owen Harris1
 
0.1%
Boulos, Mr. Hanna1
 
0.1%
Frolicher-Stehli, Mr. Maxmillian1
 
0.1%
Gilinski, Mr. Eliezer1
 
0.1%
Murdlin, Mr. Joseph1
 
0.1%
Rintamaki, Mr. Matti1
 
0.1%
Stephenson, Mrs. Walter Bertram (Martha Eustis)1
 
0.1%
Elsbury, Mr. William James1
 
0.1%
Bourke, Miss. Mary1
 
0.1%
Chapman, Mr. John Henry1
 
0.1%
Other values (881)881
98.9%

Length

2022-09-21T14:26:56.661748image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr521
 
14.4%
miss182
 
5.0%
mrs129
 
3.6%
william64
 
1.8%
john44
 
1.2%
master40
 
1.1%
henry35
 
1.0%
george24
 
0.7%
james24
 
0.7%
charles23
 
0.6%
Other values (1515)2538
70.0%

Most occurring characters

ValueCountFrequency (%)
2735
 
11.4%
r1958
 
8.1%
e1703
 
7.1%
a1657
 
6.9%
i1325
 
5.5%
n1304
 
5.4%
s1297
 
5.4%
M1128
 
4.7%
l1067
 
4.4%
o1008
 
4.2%
Other values (50)8844
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter15446
64.3%
Uppercase Letter3645
 
15.2%
Space Separator2735
 
11.4%
Other Punctuation1899
 
7.9%
Close Punctuation144
 
0.6%
Open Punctuation144
 
0.6%
Dash Punctuation13
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1958
12.7%
e1703
11.0%
a1657
10.7%
i1325
8.6%
n1304
8.4%
s1297
8.4%
l1067
 
6.9%
o1008
 
6.5%
t667
 
4.3%
h517
 
3.3%
Other values (16)2943
19.1%
Uppercase Letter
ValueCountFrequency (%)
M1128
30.9%
A250
 
6.9%
J215
 
5.9%
H203
 
5.6%
S180
 
4.9%
C172
 
4.7%
E166
 
4.6%
W143
 
3.9%
B140
 
3.8%
L129
 
3.5%
Other values (15)919
25.2%
Other Punctuation
ValueCountFrequency (%)
.892
47.0%
,891
46.9%
"106
 
5.6%
'9
 
0.5%
/1
 
0.1%
Space Separator
ValueCountFrequency (%)
2735
100.0%
Close Punctuation
ValueCountFrequency (%)
)144
100.0%
Open Punctuation
ValueCountFrequency (%)
(144
100.0%
Dash Punctuation
ValueCountFrequency (%)
-13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin19091
79.5%
Common4935
 
20.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1958
 
10.3%
e1703
 
8.9%
a1657
 
8.7%
i1325
 
6.9%
n1304
 
6.8%
s1297
 
6.8%
M1128
 
5.9%
l1067
 
5.6%
o1008
 
5.3%
t667
 
3.5%
Other values (41)5977
31.3%
Common
ValueCountFrequency (%)
2735
55.4%
.892
 
18.1%
,891
 
18.1%
)144
 
2.9%
(144
 
2.9%
"106
 
2.1%
-13
 
0.3%
'9
 
0.2%
/1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII24026
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2735
 
11.4%
r1958
 
8.1%
e1703
 
7.1%
a1657
 
6.9%
i1325
 
5.5%
n1304
 
5.4%
s1297
 
5.4%
M1128
 
4.7%
l1067
 
4.4%
o1008
 
4.2%
Other values (50)8844
36.8%

Sex
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Characters and Unicode

Total characters4192
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Length

2022-09-21T14:26:56.704780image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-21T14:26:56.744000image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4192
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin4192
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Age
Real number (ℝ≥0)

MISSING

Distinct88
Distinct (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-09-21T14:26:56.782255image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
MonotonicityNot monotonic
2022-09-21T14:26:56.829708image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
2825
 
2.8%
3025
 
2.8%
1925
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
2920
 
2.2%
Other values (78)467
52.4%
(Missing)177
 
19.9%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.2%
0.832
 
0.2%
0.921
 
0.1%
17
0.8%
210
1.1%
36
0.7%
410
1.1%
54
 
0.4%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.2%
70.51
 
0.1%
702
0.2%
661
 
0.1%
653
0.3%
642
0.2%
632
0.2%
624
0.4%

SibSp
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-09-21T14:26:56.866449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotonicityNot monotonic
2022-09-21T14:26:56.901956image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.1%
1209
 
23.5%
0608
68.2%

Parch
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-09-21T14:26:56.936054image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotonicityNot monotonic
2022-09-21T14:26:56.970645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
 
9.0%
1118
 
13.2%
0678
76.1%

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct681
Distinct (%)76.4%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
347082
 
7
CA. 2343
 
7
1601
 
7
3101295
 
6
CA 2144
 
6
Other values (676)
858 

Length

Max length18
Median length17
Mean length6.750841751
Min length3

Characters and Unicode

Total characters6015
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique547 ?
Unique (%)61.4%

Sample

1st rowA/5 21171
2nd rowPC 17599
3rd rowSTON/O2. 3101282
4th row113803
5th row373450

Common Values

ValueCountFrequency (%)
3470827
 
0.8%
CA. 23437
 
0.8%
16017
 
0.8%
31012956
 
0.7%
CA 21446
 
0.7%
3470886
 
0.7%
S.O.C. 148795
 
0.6%
3826525
 
0.6%
LINE4
 
0.4%
PC 177574
 
0.4%
Other values (671)834
93.6%

Length

2022-09-21T14:26:57.018121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pc60
 
5.3%
c.a27
 
2.4%
a/517
 
1.5%
ca14
 
1.2%
ston/o12
 
1.1%
212
 
1.1%
sc/paris9
 
0.8%
w./c9
 
0.8%
soton/o.q8
 
0.7%
3470827
 
0.6%
Other values (709)955
84.5%

Most occurring characters

ValueCountFrequency (%)
3746
12.4%
1689
11.5%
2594
9.9%
7490
8.1%
4464
 
7.7%
6422
 
7.0%
0406
 
6.7%
5387
 
6.4%
9328
 
5.5%
8282
 
4.7%
Other values (25)1207
20.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4808
79.9%
Uppercase Letter652
 
10.8%
Other Punctuation295
 
4.9%
Space Separator239
 
4.0%
Lowercase Letter21
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C151
23.2%
O100
15.3%
P98
15.0%
A82
12.6%
S74
11.3%
N40
 
6.1%
T36
 
5.5%
W16
 
2.5%
Q15
 
2.3%
I11
 
1.7%
Other values (6)29
 
4.4%
Decimal Number
ValueCountFrequency (%)
3746
15.5%
1689
14.3%
2594
12.4%
7490
10.2%
4464
9.7%
6422
8.8%
0406
8.4%
5387
8.0%
9328
6.8%
8282
 
5.9%
Lowercase Letter
ValueCountFrequency (%)
a6
28.6%
s5
23.8%
r4
19.0%
i4
19.0%
l1
 
4.8%
e1
 
4.8%
Other Punctuation
ValueCountFrequency (%)
.197
66.8%
/98
33.2%
Space Separator
ValueCountFrequency (%)
239
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5342
88.8%
Latin673
 
11.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
C151
22.4%
O100
14.9%
P98
14.6%
A82
12.2%
S74
11.0%
N40
 
5.9%
T36
 
5.3%
W16
 
2.4%
Q15
 
2.2%
I11
 
1.6%
Other values (12)50
 
7.4%
Common
ValueCountFrequency (%)
3746
14.0%
1689
12.9%
2594
11.1%
7490
9.2%
4464
8.7%
6422
7.9%
0406
7.6%
5387
7.2%
9328
6.1%
8282
 
5.3%
Other values (3)534
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6015
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3746
12.4%
1689
11.5%
2594
9.9%
7490
8.1%
4464
 
7.7%
6422
 
7.0%
0406
 
6.7%
5387
 
6.4%
9328
 
5.5%
8282
 
4.7%
Other values (25)1207
20.1%

Fare
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-09-21T14:26:57.066348image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotonicityNot monotonic
2022-09-21T14:26:57.114622image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
7.229215
 
1.7%
015
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%
221.77921
 
0.1%
211.51
 
0.1%
211.33753
0.3%
164.86672
0.2%
153.46253
0.3%

Cabin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct147
Distinct (%)72.1%
Missing687
Missing (%)77.1%
Memory size7.1 KiB
C23 C25 C27
 
4
G6
 
4
B96 B98
 
4
C22 C26
 
3
D
 
3
Other values (142)
186 

Length

Max length15
Median length3
Mean length3.588235294
Min length1

Characters and Unicode

Total characters732
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique101 ?
Unique (%)49.5%

Sample

1st rowC85
2nd rowC123
3rd rowE46
4th rowG6
5th rowC103

Common Values

ValueCountFrequency (%)
C23 C25 C274
 
0.4%
G64
 
0.4%
B96 B984
 
0.4%
C22 C263
 
0.3%
D3
 
0.3%
F333
 
0.3%
E1013
 
0.3%
F23
 
0.3%
B202
 
0.2%
E672
 
0.2%
Other values (137)173
 
19.4%
(Missing)687
77.1%

Length

2022-09-21T14:26:57.156141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c234
 
1.7%
c274
 
1.7%
g64
 
1.7%
b964
 
1.7%
b984
 
1.7%
f4
 
1.7%
c254
 
1.7%
f333
 
1.3%
e1013
 
1.3%
f23
 
1.3%
Other values (151)201
84.5%

Most occurring characters

ValueCountFrequency (%)
272
 
9.8%
C71
 
9.7%
B64
 
8.7%
161
 
8.3%
359
 
8.1%
651
 
7.0%
545
 
6.1%
437
 
5.1%
837
 
5.1%
34
 
4.6%
Other values (9)201
27.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number460
62.8%
Uppercase Letter238
32.5%
Space Separator34
 
4.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
272
15.7%
161
13.3%
359
12.8%
651
11.1%
545
9.8%
437
8.0%
837
8.0%
734
7.4%
933
7.2%
031
6.7%
Uppercase Letter
ValueCountFrequency (%)
C71
29.8%
B64
26.9%
D34
14.3%
E33
13.9%
A15
 
6.3%
F13
 
5.5%
G7
 
2.9%
T1
 
0.4%
Space Separator
ValueCountFrequency (%)
34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common494
67.5%
Latin238
32.5%

Most frequent character per script

Common
ValueCountFrequency (%)
272
14.6%
161
12.3%
359
11.9%
651
10.3%
545
9.1%
437
7.5%
837
7.5%
34
6.9%
734
6.9%
933
6.7%
Latin
ValueCountFrequency (%)
C71
29.8%
B64
26.9%
D34
14.3%
E33
13.9%
A15
 
6.3%
F13
 
5.5%
G7
 
2.9%
T1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII732
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
272
 
9.8%
C71
 
9.7%
B64
 
8.7%
161
 
8.3%
359
 
8.1%
651
 
7.0%
545
 
6.1%
437
 
5.1%
837
 
5.1%
34
 
4.6%
Other values (9)201
27.5%

Embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S644
72.3%
C168
 
18.9%
Q77
 
8.6%
(Missing)2
 
0.2%

Length

2022-09-21T14:26:57.192096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-21T14:26:57.228691image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter889
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin889
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Interactions

2022-09-21T14:26:55.933604image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.101208image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.359943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.546267image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.742400image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.971220image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.144202image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.397209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.583167image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.779114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:56.008923image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.246661image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.435636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.621377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.816570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:56.048478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.285246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.472966image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.661886image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.856089image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:56.085933image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.323564image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.509779image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.703496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-21T14:26:55.895678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-21T14:26:57.259078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-21T14:26:57.412609image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-21T14:26:57.481865image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-21T14:26:57.549007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-21T14:26:57.617037image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-21T14:26:56.152886image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-21T14:26:56.270561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-21T14:26:56.332669image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-21T14:26:56.366201image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC

Last rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
88188203Markun, Mr. Johannmale33.0003492577.8958NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

Deepchecks report

Full Suite

SWEETVIZ REPORT

Associations
[Only including dataset "Train"]
Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is assymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP).

Circles are the symmetrical numerical correlations (Pearson's) from -1 to 1. The trivial diagonal is intentionally left blank for clarity.
Associations
[Only including dataset "Test"]
Squares are categorical associations (uncertainty coefficient & correlation ratio) from 0 to 1. The uncertainty coefficient is assymmetrical, (i.e. ROW LABEL values indicate how much they PROVIDE INFORMATION to each LABEL at the TOP).

Circles are the symmetrical numerical correlations (Pearson's) from -1 to 1. The trivial diagonal is intentionally left blank for clarity.
Train
Test
891
ROWS
418
0
DUPLICATES
0
322.6 kb
RAM
148.3 kb
12
FEATURES
11
6
CATEGORICAL
5
3
NUMERICAL
3
3
TEXT
3
2.1.4
Get updates, docs & report issues here

Created & maintained by Francois Bertrand
Graphic design by Jean-Francois Hains
Survived
VALUES:
891
(100%)
MISSING:
---
DISTINCT:
2
(<1%)
TOP CATEGORIES

Survived
0
549
62%
1
342
38%
ALL
891
100%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
Survived
PROVIDES INFORMATION ON...

Sex
0.23
Pclass
0.06
SibSp
0.03
Embarked
0.02
Parch
0.02

THESE FEATURES
GIVE INFORMATION
ON Survived:

Sex
0.23
Pclass
0.09
SibSp
0.03
Embarked
0.03
Parch
0.02

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
Survived
CORRELATION RATIO WITH...

Fare
0.26
Age
0.01
PassengerId
0.01
1
PassengerId
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
891
(100%)
418
(100%)
ZEROES:
---
---
MAX
891
1,309
95%
846
1,288
Q3
668
1,205
MEDIAN
446
1,100
AVG
446
1,100
Q1
224
996
5%
46
913
MIN
1
892
RANGE
890
417
IQR
445
208
STD
257
121
VAR
66,231
14,595
KURT.
-1.20
-1.20
SKEW
0.00
0.00
SUM
397k
460k
>
NUMERICAL ASSOCIATIONS
(PEARSON, -1 to 1)

Age
0.04
Fare
0.01

CATEGORICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)

SibSp
0.09
Parch
0.07
Sex
0.04
Pclass
0.04
Embarked
0.03
Survived
0.01
MOST FREQUENT VALUES

1
1
0.1%
0
0.0%
599
1
0.1%
0
0.0%
588
1
0.1%
0
0.0%
589
1
0.1%
0
0.0%
590
1
0.1%
0
0.0%
591
1
0.1%
0
0.0%
592
1
0.1%
0
0.0%
593
1
0.1%
0
0.0%
594
1
0.1%
0
0.0%
595
1
0.1%
0
0.0%
596
1
0.1%
0
0.0%
597
1
0.1%
0
0.0%
598
1
0.1%
0
0.0%
600
1
0.1%
0
0.0%
586
1
0.1%
0
0.0%
SMALLEST VALUES

1
1
0.1%
0
0.0%
2
1
0.1%
0
0.0%
3
1
0.1%
0
0.0%
4
1
0.1%
0
0.0%
5
1
0.1%
0
0.0%
6
1
0.1%
0
0.0%
7
1
0.1%
0
0.0%
8
1
0.1%
0
0.0%
9
1
0.1%
0
0.0%
10
1
0.1%
0
0.0%
11
1
0.1%
0
0.0%
12
1
0.1%
0
0.0%
13
1
0.1%
0
0.0%
14
1
0.1%
0
0.0%
15
1
0.1%
0
0.0%
LARGEST VALUES

891
1
0.1%
0
0.0%
890
1
0.1%
0
0.0%
889
1
0.1%
0
0.0%
888
1
0.1%
0
0.0%
887
1
0.1%
0
0.0%
886
1
0.1%
0
0.0%
885
1
0.1%
0
0.0%
884
1
0.1%
0
0.0%
883
1
0.1%
0
0.0%
882
1
0.1%
0
0.0%
881
1
0.1%
0
0.0%
880
1
0.1%
0
0.0%
879
1
0.1%
0
0.0%
878
1
0.1%
0
0.0%
877
1
0.1%
0
0.0%
2
Age
VALUES:
714
(80%)
332
(79%)
MISSING:
177
(20%)
86
(21%)
DISTINCT:
88
(10%)
79
(19%)
ZEROES:
---
---
MAX
80.0
76.0
95%
56.0
57.0
Q3
38.0
39.0
AVG
29.7
30.3
MEDIAN
28.0
27.0
Q1
20.1
21.0
5%
4.0
8.0
MIN
0.4
0.2
RANGE
79.6
75.8
IQR
17.9
18.0
STD
14.5
14.2
VAR
211
201
KURT.
0.178
0.084
SKEW
0.389
0.457
SUM
21,205
10,050
>
NUMERICAL ASSOCIATIONS
(PEARSON, -1 to 1)

Fare
0.10
PassengerId
0.04

CATEGORICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)

Pclass
0.37
Embarked
0.25
SibSp
0.23
Parch
0.20
Sex
0.02
Survived
0.01
MOST FREQUENT VALUES

24.0
30
4.2%
17
5.1%
22.0
27
3.8%
16
4.8%
18.0
26
3.6%
13
3.9%
28.0
25
3.5%
7
2.1%
30.0
25
3.5%
15
4.5%
19.0
25
3.5%
4
1.2%
21.0
24
3.4%
17
5.1%
25.0
23
3.2%
11
3.3%
36.0
22
3.1%
9
2.7%
29.0
20
2.8%
10
3.0%
35.0
18
2.5%
5
1.5%
32.0
18
2.5%
6
1.8%
26.0
18
2.5%
12
3.6%
27.0
18
2.5%
12
3.6%
31.0
17
2.4%
6
1.8%
SMALLEST VALUES

0.42
1
0.1%
0
0.0%
0.67
1
0.1%
0
0.0%
0.75
2
0.3%
1
0.3%
0.83
2
0.3%
1
0.3%
0.92
1
0.1%
1
0.3%
1.0
7
1.0%
3
0.9%
2.0
10
1.4%
2
0.6%
3.0
6
0.8%
1
0.3%
4.0
10
1.4%
0
0.0%
5.0
4
0.6%
1
0.3%
6.0
3
0.4%
3
0.9%
7.0
3
0.4%
1
0.3%
8.0
4
0.6%
2
0.6%
9.0
8
1.1%
2
0.6%
10.0
2
0.3%
2
0.6%
LARGEST VALUES

80.0
1
0.1%
0
0.0%
74.0
1
0.1%
0
0.0%
71.0
2
0.3%
0
0.0%
70.5
1
0.1%
0
0.0%
70.0
2
0.3%
0
0.0%
66.0
1
0.1%
0
0.0%
65.0
3
0.4%
0
0.0%
64.0
2
0.3%
3
0.9%
63.0
2
0.3%
2
0.6%
62.0
4
0.6%
1
0.3%
61.0
3
0.4%
2
0.6%
60.0
4
0.6%
3
0.9%
59.0
2
0.3%
1
0.3%
58.0
5
0.7%
1
0.3%
57.0
2
0.3%
3
0.9%
3
Sex
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
2
(<1%)
2
(<1%)
TOP CATEGORIES

Survived
male
577
65%
266
64%
109
19%
female
314
35%
152
36%
233
74%
ALL
891
100%
418
100%
342
38%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
Sex
PROVIDES INFORMATION ON...

Survived
0.23
Parch
0.04
SibSp
0.03
Embarked
0.01
Pclass
0.01

THESE FEATURES
GIVE INFORMATION
ON Sex:

Survived
0.23
Parch
0.05
SibSp
0.04
Embarked
0.01
Pclass
0.01

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
Sex
CORRELATION RATIO WITH...

Fare
0.18
PassengerId
0.04
Age
0.02
4
Pclass
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
3
(<1%)
3
(<1%)
TOP CATEGORIES

Survived
3
491
55%
218
52%
119
24%
1
216
24%
107
26%
136
63%
2
184
21%
93
22%
87
47%
ALL
891
100%
418
100%
342
38%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
Pclass
PROVIDES INFORMATION ON...

Embarked
0.10
Survived
0.09
SibSp
0.04
Sex
0.01
Parch
0.01

THESE FEATURES
GIVE INFORMATION
ON Pclass:

Embarked
0.07
Survived
0.06
SibSp
0.04
Sex
0.01
Parch
0.01

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
Pclass
CORRELATION RATIO WITH...

Fare
0.59
Age
0.37
PassengerId
0.04
5
Name
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
891
(100%)
418
(100%)
1
<1%
-
-
Braund, Mr. Owen Harris
1
<1%
-
-
Boulos, Mr. Hanna
1
<1%
-
-
Frolicher-Stehli, Mr. Maxmillian
1
<1%
-
-
Gilinski, Mr. Eliezer
1
<1%
-
-
Murdlin, Mr. Joseph
1
<1%
-
-
Rintamaki, Mr. Matti
1
<1%
-
-
Stephenson, Mrs. Walter Bertram (Martha Eustis)
884
>99%
418
100%
(Other)
1
<1%
-
-
Braund, Mr. Owen Harris
1
<1%
-
-
Boulos, Mr. Hanna
1
<1%
-
-
Frolicher-Stehli, Mr. Maxmillian
1
<1%
-
-
Gilinski, Mr. Eliezer
1
<1%
-
-
Murdlin, Mr. Joseph
1
<1%
-
-
Rintamaki, Mr. Matti
1
<1%
-
-
Stephenson, Mrs. Walter Bertram (Martha Eustis)
1
<1%
-
-
Elsbury, Mr. William James
1
<1%
-
-
Bourke, Miss. Mary
1
<1%
-
-
Chapman, Mr. John Henry
1
<1%
-
-
Van Impe, Mr. Jean Baptiste
1
<1%
-
-
Leitch, Miss. Jessie Wills
1
<1%
-
-
Johnson, Mr. Alfred
1
<1%
-
-
Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")
1
<1%
-
-
Taussig, Miss. Ruth
1
<1%
-
-
Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)
1
<1%
-
-
Slabenoff, Mr. Petco
1
<1%
-
-
Harrington, Mr. Charles H
1
<1%
-
-
Torber, Mr. Ernst William
1
<1%
-
-
Homer, Mr. Harry ("Mr E Haven")
1
<1%
-
-
Lindell, Mr. Edvard Bengtsson
1
<1%
-
-
Karaic, Mr. Milan
1
<1%
-
-
Daniel, Mr. Robert Williams
1
<1%
-
-
Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)
1
<1%
-
-
Shutes, Miss. Elizabeth W
1
<1%
-
-
Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)
1
<1%
-
-
Jarvis, Mr. John Denzil
1
<1%
-
-
Paulner, Mr. Uscher
1
<1%
-
-
Murphy, Miss. Margaret Jane
1
<1%
-
-
Harris, Mr. George
1
<1%
-
-
de Messemaeker, Mrs. Guillaume Joseph (Emma)
1
<1%
-
-
Morrow, Mr. Thomas Rowan
1
<1%
-
-
Sivic, Mr. Husein
1
<1%
-
-
Norman, Mr. Robert Douglas
1
<1%
-
-
Simmons, Mr. John
1
<1%
-
-
Meanwell, Miss. (Marion Ogden)
1
<1%
-
-
Davies, Mr. Alfred J
1
<1%
-
-
Stoytcheff, Mr. Ilia
1
<1%
-
-
Palsson, Mrs. Nils (Alma Cornelia Berglund)
1
<1%
-
-
Doharr, Mr. Tannous
1
<1%
-
-
Jonsson, Mr. Carl
1
<1%
-
-
Appleton, Mrs. Edward Dale (Charlotte Lamson)
1
<1%
-
-
Ross, Mr. John Hugo
1
<1%
-
-
Flynn, Mr. John Irwin ("Irving")
1
<1%
-
-
Kelly, Miss. Mary
1
<1%
-
-
Rush, Mr. Alfred George John
1
<1%
-
-
Patchett, Mr. George
1
<1%
-
-
Garside, Miss. Ethel
1
<1%
-
-
Silvey, Mrs. William Baird (Alice Munger)
1
<1%
-
-
Caram, Mrs. Joseph (Maria Elias)
1
<1%
-
-
Jussila, Mr. Eiriik
1
<1%
-
-
Christy, Miss. Julie Rachel
1
<1%
-
-
Thayer, Mrs. John Borland (Marian Longstreth Morris)
1
<1%
-
-
Downton, Mr. William James
1
<1%
-
-
Jardin, Mr. Jose Neto
1
<1%
-
-
Horgan, Mr. John
1
<1%
-
-
Cook, Mr. Jacob
1
<1%
-
-
Hegarty, Miss. Hanora "Nora"
1
<1%
-
-
Foo, Mr. Choong
1
<1%
-
-
Baclini, Miss. Eugenie
831
93%
418
100%
(Other)
6
SibSp
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
7
(<1%)
7
(2%)
TOP CATEGORIES

Survived
0
608
68%
283
68%
210
35%
1
209
23%
110
26%
112
54%
2
28
3%
14
3%
13
46%
4
18
2%
4
<1%
3
17%
3
16
2%
4
<1%
4
25%
8
7
<1%
2
<1%
0
0%
5
5
<1%
1
<1%
0
0%
ALL
891
100%
418
100%
342
38%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
SibSp
PROVIDES INFORMATION ON...

Parch
0.18
Sex
0.04
Pclass
0.04
Survived
0.03
Embarked
0.03

THESE FEATURES
GIVE INFORMATION
ON SibSp:

Parch
0.15
Pclass
0.04
Sex
0.03
Survived
0.03
Embarked
0.02

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
SibSp
CORRELATION RATIO WITH...

Age
0.23
Fare
0.21
PassengerId
0.09
7
Parch
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
7
(<1%)
8
(2%)
TOP CATEGORIES

Survived
0
678
76%
324
78%
233
34%
1
118
13%
52
12%
65
55%
2
80
9%
33
8%
40
50%
5
5
<1%
1
<1%
1
20%
3
5
<1%
3
<1%
3
60%
4
4
<1%
2
<1%
0
0%
6
1
<1%
1
<1%
0
0%
9
0
0%
2
<1%
ALL
891
100%
418
100%
342
38%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
Parch
PROVIDES INFORMATION ON...

SibSp
0.15
Sex
0.05
Survived
0.02
Embarked
0.02
Pclass
0.01

THESE FEATURES
GIVE INFORMATION
ON Parch:

SibSp
0.18
Sex
0.04
Survived
0.02
Embarked
0.02
Pclass
0.01

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
Parch
CORRELATION RATIO WITH...

Fare
0.26
Age
0.20
PassengerId
0.07
8
Ticket
VALUES:
891
(100%)
418
(100%)
MISSING:
---
---
DISTINCT:
681
(76%)
363
(87%)
7
<1%
-
-
347082
7
<1%
4
<1%
CA. 2343
7
<1%
1
<1%
1601
6
<1%
1
<1%
3101295
6
<1%
2
<1%
CA 2144
6
<1%
-
-
347088
5
<1%
2
<1%
S.O.C. 14879
847
95%
408
98%
(Other)
7
<1%
-
-
347082
7
<1%
4
<1%
CA. 2343
7
<1%
1
<1%
1601
6
<1%
1
<1%
3101295
6
<1%
2
<1%
CA 2144
6
<1%
-
-
347088
5
<1%
2
<1%
S.O.C. 14879
5
<1%
1
<1%
382652
4
<1%
-
-
LINE
4
<1%
1
<1%
PC 17757
4
<1%
-
-
17421
4
<1%
1
<1%
349909
4
<1%
-
-
113760
4
<1%
1
<1%
4133
4
<1%
2
<1%
113781
4
<1%
1
<1%
W./C. 6608
4
<1%
-
-
2666
4
<1%
2
<1%
19950
4
<1%
3
<1%
347077
3
<1%
-
-
C.A. 31921
3
<1%
-
-
PC 17572
3
<1%
1
<1%
C.A. 34651
3
<1%
-
-
363291
3
<1%
-
-
F.C.C. 13529
3
<1%
-
-
345773
3
<1%
-
-
248727
3
<1%
1
<1%
24160
3
<1%
-
-
29106
3
<1%
1
<1%
SC/Paris 2123
3
<1%
-
-
35273
3
<1%
-
-
371110
3
<1%
-
-
230080
3
<1%
1
<1%
PC 17760
3
<1%
-
-
239853
3
<1%
-
-
PC 17582
3
<1%
-
-
347742
3
<1%
-
-
110152
3
<1%
-
-
13502
3
<1%
-
-
110413
3
<1%
1
<1%
PC 17755
2
<1%
1
<1%
PC 17558
2
<1%
-
-
237736
2
<1%
-
-
17474
2
<1%
1
<1%
PC 17758
2
<1%
1
<1%
PP 9549
2
<1%
-
-
S.O./P.P. 3
2
<1%
-
-
P/PP 3381
2
<1%
-
-
PC 17485
2
<1%
1
<1%
2668
2
<1%
-
-
2627
2
<1%
-
-
PC 17604
2
<1%
1
<1%
2653
2
<1%
-
-
2665
2
<1%
-
-
113798
2
<1%
-
-
31027
2
<1%
-
-
2908
2
<1%
2
<1%
W./C. 6607
2
<1%
-
-
WE/P 5735
2
<1%
-
-
35281
2
<1%
-
-
113789
695
78%
384
92%
(Other)
9
Fare
VALUES:
891
(100%)
417
(>99%)
MISSING:
---
1
(<1%)
DISTINCT:
248
(28%)
169
(40%)
ZEROES:
15
(2%)
2
(<1%)
MAX
512
512
95%
112
152
Q3
31
32
AVG
32
36
MEDIAN
14
14
Q1
8
8
5%
7
7
MIN
0
0
RANGE
512
512
IQR
23.1
23.6
STD
49.7
55.9
VAR
2,469
3,126
KURT.
33.4
17.9
SKEW
4.79
3.69
SUM
28,694
14,857
>
NUMERICAL ASSOCIATIONS
(PEARSON, -1 to 1)

Age
0.10
PassengerId
0.01

CATEGORICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)

Pclass
0.59
Embarked
0.28
Parch
0.26
Survived
0.26
SibSp
0.21
Sex
0.18
MOST FREQUENT VALUES

8.05
43
4.8%
17
4.1%
13.0
42
4.7%
17
4.1%
7.8958
38
4.3%
11
2.6%
7.75
34
3.8%
21
5.0%
26.0
31
3.5%
19
4.6%
10.5
24
2.7%
11
2.6%
7.925
18
2.0%
5
1.2%
7.775
16
1.8%
10
2.4%
7.2292
15
1.7%
9
2.2%
0.0
15
1.7%
2
0.5%
26.55
15
1.7%
7
1.7%
7.8542
13
1.5%
8
1.9%
8.6625
13
1.5%
8
1.9%
7.25
13
1.5%
5
1.2%
7.225
12
1.3%
9
2.2%
SMALLEST VALUES

0.0
15
1.7%
2
0.5%
4.0125
1
0.1%
0
0.0%
5.0
1
0.1%
0
0.0%
6.2375
1
0.1%
0
0.0%
6.4375
1
0.1%
2
0.5%
6.45
1
0.1%
0
0.0%
6.4958
2
0.2%
1
0.2%
6.75
2
0.2%
0
0.0%
6.8583
1
0.1%
0
0.0%
6.95
1
0.1%
1
0.2%
6.975
2
0.2%
0
0.0%
7.0458
1
0.1%
0
0.0%
7.05
7
0.8%
2
0.5%
7.0542
2
0.2%
0
0.0%
7.125
4
0.4%
0
0.0%
LARGEST VALUES

512.3292
3
0.3%
1
0.2%
263.0
4
0.4%
2
0.5%
262.375
2
0.2%
5
1.2%
247.5208
2
0.2%
1
0.2%
227.525
4
0.4%
1
0.2%
221.7792
1
0.1%
3
0.7%
211.5
1
0.1%
4
1.0%
211.3375
3
0.3%
1
0.2%
164.8667
2
0.2%
2
0.5%
153.4625
3
0.3%
0
0.0%
151.55
4
0.4%
2
0.5%
146.5208
2
0.2%
1
0.2%
135.6333
3
0.3%
1
0.2%
134.5
2
0.2%
3
0.7%
133.65
2
0.2%
0
0.0%
10
Cabin
VALUES:
204
(23%)
91
(22%)
MISSING:
687
(77%)
327
(78%)
DISTINCT:
147
(16%)
76
(18%)
4
2%
2
2%
C23 C25 C27
4
2%
1
1%
G6
4
2%
-
-
B96 B98
3
1%
1
1%
C22 C26
3
1%
1
1%
D
3
1%
1
1%
F33
3
1%
-
-
E101
180
88%
85
93%
(Other)
4
2%
2
2%
C23 C25 C27
4
2%
1
1%
G6
4
2%
-
-
B96 B98
3
1%
1
1%
C22 C26
3
1%
1
1%
D
3
1%
1
1%
F33
3
1%
-
-
E101
3
1%
1
1%
F2
2
<1%
-
-
B20
2
<1%
-
-
E67
2
<1%
-
-
C125
2
<1%
-
-
E24
2
<1%
-
-
B49
2
<1%
-
-
B77
2
<1%
-
-
D35
2
<1%
2
2%
C78
2
<1%
-
-
C93
2
<1%
-
-
C65
2
<1%
3
3%
B57 B59 B63 B66
2
<1%
-
-
B5
2
<1%
-
-
E121
2
<1%
1
1%
B51 B53 B55
2
<1%
-
-
B18
2
<1%
-
-
C124
2
<1%
-
-
C126
2
<1%
-
-
B35
2
<1%
-
-
E44
2
<1%
-
-
C92
2
<1%
-
-
C68
2
<1%
-
-
D20
2
<1%
-
-
B22
2
<1%
-
-
E25
2
<1%
-
-
D36
2
<1%
-
-
E8
2
<1%
-
-
C83
2
<1%
-
-
C2
2
<1%
-
-
D17
2
<1%
-
-
D26
2
<1%
-
-
D33
2
<1%
-
-
F G73
2
<1%
-
-
E33
2
<1%
-
-
B28
2
<1%
-
-
C52
2
<1%
-
-
C123
2
<1%
2
2%
F4
2
<1%
1
1%
B58 B60
1
<1%
-
-
A23
1
<1%
-
-
D9
1
<1%
-
-
A20
1
<1%
-
-
D50
1
<1%
1
1%
D28
1
<1%
1
1%
D19
1
<1%
-
-
C47
1
<1%
-
-
E17
1
<1%
1
1%
B41
1
<1%
-
-
A26
1
<1%
-
-
E68
1
<1%
-
-
A10
1
<1%
-
-
A24
1
<1%
2
2%
C101
87
43%
70
77%
(Other)
11
Embarked
VALUES:
889
(>99%)
418
(100%)
MISSING:
2
(<1%)
---
DISTINCT:
3
(<1%)
3
(<1%)
TOP CATEGORIES

Survived
S
644
72%
270
65%
217
34%
C
168
19%
102
24%
93
55%
Q
77
9%
46
11%
30
39%
ALL
889
100%
418
100%
342
38%
CATEGORICAL ASSOCIATIONS
(UNCERTAINTY COEFFICIENT, 0 to 1)
Embarked
PROVIDES INFORMATION ON...

Pclass
0.07
Survived
0.03
SibSp
0.02
Parch
0.02
Sex
0.01

THESE FEATURES
GIVE INFORMATION
ON Embarked:

Pclass
0.10
SibSp
0.03
Survived
0.02
Parch
0.02
Sex
0.01

NUMERICAL ASSOCIATIONS
(CORRELATION RATIO, 0 to 1)
Embarked
CORRELATION RATIO WITH...

Fare
0.28
Age
0.25
PassengerId
0.03