1 Introduction

The EDA Report provides exploratory data analysis information on objects that inherit data.frame and data.frame.

1.1 Information of Dataset

The dataset that generated the EDA Report is an ‘data.frame’ object. It consists of 80 observations and 18 variables.

1.2 Information of Variables

The variable information of the data set that generated the EDA Report is shown in the following table.:
Information of Variables
variables types missing_count missing_percent unique_count unique_rate
sex factor 0 0 2 0.0250
age numeric 0 0 40 0.5000
edu factor 0 0 4 0.0500
job factor 0 0 16 0.2000
adm_reason factor 0 0 7 0.0875
time_inj_adm numeric 0 0 66 0.8250
hosp_days numeric 0 0 67 0.8375
inj_level character 0 0 2 0.0250
inj_type factor 0 0 5 0.0625
cause factor 0 0 7 0.0875
surg character 0 0 2 0.0250
ulcer factor 0 0 2 0.0250
ot_adm numeric 0 0 34 0.4250
ot_dis numeric 0 0 50 0.6250
barthel_tot_adm numeric 0 0 43 0.5375
barthel_tot_dis numeric 0 0 64 0.8000
barthel_diff numeric 0 0 67 0.8375
ot_diff numeric 0 0 55 0.6875

The target variable of the data is ‘ulcer’, and the data type of the variable is factor.

1.3 About EDA Report

EDA reports provide information and visualization results that support the EDA process. In particular, it provides a variety of information to understand the relationship between the target variable and the rest of the variables of interest.

2 Univariate Analysis

2.1 Descriptive Statistics

edaData

18 Variables   80 Observations

sex
nmissingdistinct
8002
 Value          F     M
 Frequency     25    55
 Proportion 0.312 0.688
 

age
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800400.99939.5517.3419.0020.0026.7537.5050.0061.2064.05
lowest : 18 19 20 21 22 , highest: 63 64 65 68 78
edu
image
nmissingdistinct
8004
 Value           None   Primary Secondary  Tertiary
 Frequency         17        22        33         8
 Proportion     0.212     0.275     0.412     0.100
 

job
image
nmissingdistinct
80016
Bus Driver (2, 0.025), Business (2, 0.025), Cooking (1, 0.012), Development (2, 0.025), Electricity Worker (1, 0.012), Farmer (25, 0.312), Gold SmithÊ (3, 0.038), House Wife (12, 0.150), Hydro Power (1, 0.012), Labor (3, 0.038), Manager (4, 0.050), Meat Shop/Student (1, 0.012), Mechanic (6, 0.075), Security Guard (2, 0.025), Student (13, 0.162), Tailoring (2, 0.025)
adm_reason
image
nmissingdistinct
8007
 Value             Medical          Rehab Rehab, Surgery  Rehab, Wounds
 Frequency               1             50              1              7
 Proportion          0.012          0.625          0.012          0.088
                                                        
 Value             Surgery     UTI, Rehab         Wounds
 Frequency               1              1             19
 Proportion          0.012          0.012          0.238
 

time_inj_adm
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800661494800.7 4.95 12.00 23.00 57.50 299.251093.602274.45
lowest : 0 1 4 5 8 , highest: 2184 3993 4383 6224 6575
hosp_days
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
80067164.2761.81 3.95 7.80 27.75 46.47 91.00133.40147.55
lowest : 1 2 3 4 5 , highest: 145 196 222 227 430
inj_level
nmissingdistinct
8002
 Value          C Not C
 Frequency     25    55
 Proportion 0.312 0.688
 

inj_type
image
nmissingdistinct
8005
 Value               Cord Compression            Cord Contusion
 Frequency                          1                         5
 Proportion                     0.012                     0.062
                                                               
 Value                       Fracture                    Lesion
 Frequency                         58                         2
 Proportion                     0.725                     0.025
                                     
 Value      Spondy/Dislocation/Sublux
 Frequency                         14
 Proportion                     0.175
 

cause
image
nmissingdistinct
8007
 Value                  Diving                 EQ      External Load
 Frequency                   1                  4                  5
 Proportion              0.012              0.050              0.062
                                                                    
 Value                    Fall       NonTraumatic Transport Accident
 Frequency                  57                  3                  8
 Proportion              0.712              0.038              0.100
                              
 Value          Unknown Trauma
 Frequency                   2
 Proportion              0.025
 

surg
nmissingdistinct
8002
 Value        N   Y
 Frequency   16  64
 Proportion 0.2 0.8
 

ulcer
nmissingdistinct
8002
 Value        N   Y
 Frequency   40  40
 Proportion 0.5 0.5
 

ot_adm
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800340.9818.5710.1 0.00 0.00 0.00 5.5013.1322.1031.05
lowest : 0.000 0.870 1.000 2.000 3.000 , highest: 23.245 27.500 31.000 32.000 33.000
ot_dis
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800500.99621.9410.6 3.85 6.8017.7523.9329.6233.0033.00
lowest : 0.0 1.0 4.0 5.0 7.0 , highest: 29.5 30.0 31.0 32.0 33.0
barthel_tot_adm
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800430.99519.5121.82 0.00 0.00 3.5015.0026.1346.3070.02
lowest : 0.00 2.00 4.00 4.99 5.00 , highest: 65.00 70.00 70.45 80.00 88.00
barthel_tot_dis
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
80064159.6229.2613.0017.0044.9966.3779.7092.0095.05
lowest : 0 4 5 13 15 , highest: 93 95 96 97 99
barthel_diff
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
80067140.1129.03 0.950 3.36718.00042.00057.75071.06076.200
lowest :-4.000000-3.000000 0.000000 1.000000 1.616667
highest:76.00000080.00000085.00000086.00000094.000000

ot_diff
image
nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
800550.99913.3712.4-4.923 0.171 7.00013.37320.49228.00032.000
lowest : -32.000 -14.525 -6.450 -5.920 -4.870 , highest: 26.000 28.000 30.000 32.000 33.000

2.2 Normality Test of Numerical Variables

2.2.1 Statistics and Visualization of (Sample) Data

[ age ]

normality test : Shapiro-Wilk normality test

statistic : 0.95095, p-value : 0.00390743


skewness and kurtosis
type skewness kurtosis
original 0.4732953 2.474590
log transformation -0.1640139 2.026507
sqrt transformation 0.1453309 2.135596



[ time_inj_adm ]

normality test : Shapiro-Wilk normality test

 statistic : 0.43604,  p-value : 5.69515E-16

skewness and kurtosis
type skewness kurtosis
original 3.764415 17.214578
log transformation NaN NaN
sqrt transformation 2.325111 8.504959



[ hosp_days ]

normality test : Shapiro-Wilk normality test

 statistic : 0.75993,  p-value : 4.13639E-10

skewness and kurtosis
type skewness kurtosis
original 2.7325198 14.350205
log transformation -0.9921099 3.980849
sqrt transformation 0.7564683 4.352185



[ ot_adm ]

normality test : Shapiro-Wilk normality test

 statistic : 0.83929,  p-value : 7.19264E-08

skewness and kurtosis
type skewness kurtosis
original 1.1185215 3.403993
log transformation NaN NaN
sqrt transformation 0.1674149 1.864309



[ ot_dis ]

normality test : Shapiro-Wilk normality test

 statistic : 0.8966,  p-value : 8.77511E-06

skewness and kurtosis
type skewness kurtosis
original -0.821819 2.727018
log transformation NaN NaN
sqrt transformation -1.662823 5.352706



[ barthel_tot_adm ]

normality test : Shapiro-Wilk normality test

 statistic : 0.81507,  p-value : 1.29E-08

skewness and kurtosis
type skewness kurtosis
original 1.5277701 4.790740
log transformation NaN NaN
sqrt transformation 0.2809185 2.500305



[ barthel_tot_dis ]

normality test : Shapiro-Wilk normality test

 statistic : 0.94618,  p-value : 0.00209685

skewness and kurtosis
type skewness kurtosis
original -0.5452264 2.373288
log transformation NaN NaN
sqrt transformation -1.2937495 4.498863



[ barthel_diff ]

normality test : Shapiro-Wilk normality test

 statistic : 0.96475,  p-value : 0.0264468

skewness and kurtosis
type skewness kurtosis
original -0.0644089 2.050000
log transformation NaN NaN
sqrt transformation -0.8070250 2.812075



[ ot_diff ]

normality test : Shapiro-Wilk normality test

 statistic : 0.95412,  p-value : 0.00597379

skewness and kurtosis
type skewness kurtosis
original -0.8401841 5.075336
log transformation NaN NaN
sqrt transformation -0.7922443 3.296517



3 Relationship Between Variables

3.1 Correlation Coefficient

3.1.1 Correlation Coefficient by Variable Combination

Table of correlation coefficients (0.5 or more)
Variable1 Variable2 Correlation Coefficient
barthel_diff barthel_tot_dis 0.6515863
ot_diff ot_dis 0.6018759
ot_diff ot_adm -0.5860965
ot_diff time_inj_adm -0.5618024
ot_adm time_inj_adm 0.5611902
barthel_tot_dis ot_dis 0.5178000

3.1.2 Correlation Plot of Numerical Variables

4 Target based Analysis

4.1 Grouped Descriptive Statistics

4.1.1 Grouped Numerical Variables

[ age ]

* Distribution of age
Distribution of age
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 39.55 38.07 41.02
sd 15.23 13.78 16.60
se(mean) 1.70 2.18 2.62
IQR 23.25 22.50 21.50
skewness 0.48 0.26 0.54
kurtosis -0.48 -1.05 -0.45
0% 18.00 19.00 18.00
1% 18.79 19.00 18.39
5% 19.00 19.95 19.00
10% 20.00 20.90 19.00
20% 23.00 22.00 24.80
25% 26.75 26.00 29.00
30% 30.00 28.70 32.80
40% 35.60 33.80 36.00
50% 37.50 37.50 37.50
60% 40.80 40.80 41.20
70% 48.00 43.90 49.30
75% 50.00 48.50 50.50
80% 54.00 50.80 55.40
90% 61.20 58.10 64.10
95% 64.05 61.10 68.50
99% 78.00 63.00 78.00
100% 78.00 63.00 78.00


* Visualization of Distribution



[ time_inj_adm ]

* Distribution of time_inj_adm
Distribution of time_inj_adm
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 494.00 904.93 83.07
sd 1,205.67 1,605.66 141.54
se(mean) 134.80 253.88 22.38
IQR 276.25 822.08 42.75
skewness 3.84 2.57 2.73
kurtosis 15.22 6.13 7.38
0% 0.00 0.00 1.00
1% 0.79 5.85 2.17
5% 4.95 25.45 4.00
10% 12.00 30.60 7.70
20% 18.80 44.40 12.80
25% 23.00 59.50 15.75
30% 28.40 109.20 17.70
40% 37.20 168.00 19.60
50% 57.50 241.00 26.50
60% 136.80 328.20 35.40
70% 241.00 653.17 52.00
75% 299.25 881.58 58.50
80% 475.60 1,121.19 102.00
90% 1,093.60 2,364.90 228.00
95% 2,274.45 4,475.05 392.27
99% 6,297.71 6,438.11 590.42
100% 6,575.00 6,575.00 641.00


* Visualization of Distribution



[ hosp_days ]

* Distribution of hosp_days
Distribution of hosp_days
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 64.27 88.35 40.20
sd 64.96 80.98 28.33
se(mean) 7.26 12.80 4.48
IQR 63.25 89.75 38.25
skewness 2.79 2.14 0.70
kurtosis 12.17 7.09 -0.09
0% 1.00 1.00 2.00
1% 1.00 1.00 2.39
5% 3.95 3.85 4.90
10% 7.80 8.90 5.90
20% 16.60 31.20 12.60
25% 27.75 35.00 17.25
30% 30.40 41.70 26.70
40% 37.80 46.97 30.20
50% 46.47 74.50 35.50
60% 55.80 95.40 42.40
70% 75.30 119.60 50.30
75% 91.00 124.75 55.50
80% 97.40 133.80 60.20
90% 133.40 150.10 81.20
95% 147.55 222.25 94.50
99% 269.63 350.83 106.44
100% 430.00 430.00 108.00


* Visualization of Distribution



[ ot_adm ]

* Distribution of ot_adm
Distribution of ot_adm
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 8.57 10.72 6.42
sd 9.45 10.59 7.69
se(mean) 1.06 1.68 1.22
IQR 13.13 14.75 9.94
skewness 1.14 0.91 1.22
kurtosis 0.51 -0.12 0.58
0% 0.00 0.00 0.00
1% 0.00 0.00 0.00
5% 0.00 0.00 0.00
10% 0.00 0.00 0.00
20% 0.00 0.00 0.00
25% 0.00 1.00 0.00
30% 1.00 2.40 0.61
40% 3.00 5.60 1.00
50% 5.50 10.00 3.50
60% 9.24 10.87 5.80
70% 11.00 13.97 9.00
75% 13.13 15.75 9.94
80% 15.05 17.84 11.20
90% 22.10 31.10 20.10
95% 31.05 33.00 22.05
99% 33.00 33.00 25.74
100% 33.00 33.00 27.50


* Visualization of Distribution



[ ot_dis ]

* Distribution of ot_dis
Distribution of ot_dis
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 21.94 20.10 23.78
sd 9.59 8.60 10.26
se(mean) 1.07 1.36 1.62
IQR 11.88 10.30 10.91
skewness -0.84 -0.64 -1.21
kurtosis -0.21 -0.35 0.46
0% 0.00 0.00 0.00
1% 0.00 1.56 0.00
5% 3.85 4.00 0.95
10% 6.80 7.00 4.90
20% 14.82 13.26 18.59
25% 17.75 15.75 21.34
30% 19.50 16.94 22.38
40% 22.67 19.89 23.94
50% 23.93 22.75 26.15
60% 25.54 23.68 28.60
70% 27.78 24.07 32.00
75% 29.62 26.05 32.25
80% 31.20 27.29 33.00
90% 33.00 30.10 33.00
95% 33.00 32.05 33.00
99% 33.00 33.00 33.00
100% 33.00 33.00 33.00


* Visualization of Distribution



[ barthel_tot_adm ]

* Distribution of barthel_tot_adm
Distribution of barthel_tot_adm
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 19.51 22.85 16.16
sd 21.29 23.02 19.11
se(mean) 2.38 3.64 3.02
IQR 22.63 21.94 17.50
skewness 1.56 1.24 2.06
kurtosis 1.99 0.69 5.07
0% 0.00 0.00 0.00
1% 0.00 0.00 0.00
5% 0.00 0.00 0.00
10% 0.00 0.00 0.00
20% 2.00 2.00 2.00
25% 3.50 4.75 2.00
30% 5.00 7.10 4.00
40% 10.00 14.20 6.00
50% 15.00 17.06 11.00
60% 18.55 21.80 15.35
70% 23.00 26.15 19.00
75% 26.13 26.69 19.50
80% 27.37 34.00 25.40
90% 46.30 64.10 39.10
95% 70.02 70.93 44.35
99% 81.68 80.00 80.98
100% 88.00 80.00 88.00


* Visualization of Distribution



[ barthel_tot_dis ]

* Distribution of barthel_tot_dis
Distribution of barthel_tot_dis
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 59.62 55.48 63.76
sd 25.79 24.38 26.79
se(mean) 2.88 3.86 4.24
IQR 34.71 38.38 34.20
skewness -0.56 -0.41 -0.81
kurtosis -0.59 -0.69 -0.21
0% 0.00 0.00 4.00
1% 3.16 5.85 4.39
5% 13.00 15.95 12.60
10% 17.00 18.80 16.60
20% 34.40 31.00 45.87
25% 44.99 34.25 46.75
30% 47.41 38.10 51.37
40% 56.20 55.00 59.40
50% 66.37 59.63 73.78
60% 72.03 67.00 77.35
70% 76.95 70.98 80.00
75% 79.70 72.63 80.95
80% 80.00 74.86 83.60
90% 92.00 80.20 92.40
95% 95.05 93.10 96.05
99% 97.42 95.00 98.22
100% 99.00 95.00 99.00


* Visualization of Distribution



[ barthel_diff ]

* Distribution of barthel_diff
Distribution of barthel_diff
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 40.11 32.63 47.60
sd 25.20 23.57 24.81
se(mean) 2.82 3.73 3.92
IQR 39.75 33.45 28.13
skewness -0.07 0.33 -0.51
kurtosis -0.93 -0.51 -0.56
0% -4.00 -4.00 2.00
1% -3.21 -3.61 2.00
5% 0.95 -0.15 3.44
10% 3.37 0.90 5.53
20% 13.00 11.31 23.40
25% 18.00 13.75 37.87
30% 25.70 18.00 41.96
40% 38.80 25.60 45.75
50% 42.00 33.50 52.46
60% 47.82 40.65 54.80
70% 53.99 42.43 63.30
75% 57.75 47.20 66.00
80% 63.20 48.26 69.39
90% 71.06 62.89 71.94
95% 76.20 74.30 76.50
99% 87.68 83.05 90.88
100% 94.00 85.00 94.00


* Visualization of Distribution



[ ot_diff ]

* Distribution of ot_diff
Distribution of ot_diff
total Y N
n 80.00 40.00 40.00
NA 0.00 0.00 0.00
mean 13.37 9.38 17.36
sd 11.31 11.20 10.05
se(mean) 1.26 1.77 1.59
IQR 13.49 11.75 13.27
skewness -0.86 -1.44 -0.25
kurtosis 2.29 3.98 -0.99
0% -32.00 -32.00 0.00
1% -18.19 -25.18 0.00
5% -4.92 -6.85 0.18
10% 0.17 -4.98 2.80
20% 4.00 3.80 7.60
25% 7.00 5.50 10.86
30% 8.11 7.00 12.36
40% 11.18 8.66 15.20
50% 13.37 10.25 19.00
60% 17.37 13.19 20.85
70% 20.00 15.30 23.15
75% 20.49 17.25 24.12
80% 21.33 19.00 26.40
90% 28.00 20.00 30.20
95% 32.00 20.05 32.00
99% 32.21 27.71 32.61
100% 33.00 32.00 33.00


* Visualization of Distribution



4.1.2 Grouped Categorical Variables

[ sex ]

* Frequency Table
sex : frequency
N Y Sum
F 12 13 25
M 28 27 55
Sum 40 40 80


* Relative Frequency Table (%)
sex : ratio
N Y Sum
F 30 32.5 31.25
M 70 67.5 68.75
Sum 100 100.0 100.00


* Visualization of Frequency



[ edu ]

* Frequency Table
edu : frequency
N Y Sum
None 5 12 17
Primary 14 8 22
Secondary 18 15 33
Tertiary 3 5 8
Sum 40 40 80


* Relative Frequency Table (%)
edu : ratio
N Y Sum
None 12.5 30.0 21.25
Primary 35.0 20.0 27.50
Secondary 45.0 37.5 41.25
Tertiary 7.5 12.5 10.00
Sum 100.0 100.0 100.00


* Visualization of Frequency



[ job ]

* Frequency Table
job : frequency
N Y Sum
Bus Driver 1 1 2
Business 0 2 2
Cooking 1 0 1
Development 0 2 2
Electricity Worker 0 1 1
Farmer 14 11 25
Gold SmithÊ 2 1 3
House Wife 8 4 12
Hydro Power 1 0 1
Labor 1 2 3
Manager 2 2 4
Meat Shop/Student 0 1 1
Mechanic 1 5 6
Security Guard 2 0 2
Student 7 6 13
Tailoring 0 2 2
Sum 40 40 80


* Relative Frequency Table (%)
job : ratio
N Y Sum
Bus Driver 2.5 2.5 2.50
Business 0.0 5.0 2.50
Cooking 2.5 0.0 1.25
Development 0.0 5.0 2.50
Electricity Worker 0.0 2.5 1.25
Farmer 35.0 27.5 31.25
Gold SmithÊ 5.0 2.5 3.75
House Wife 20.0 10.0 15.00
Hydro Power 2.5 0.0 1.25
Labor 2.5 5.0 3.75
Manager 5.0 5.0 5.00
Meat Shop/Student 0.0 2.5 1.25
Mechanic 2.5 12.5 7.50
Security Guard 5.0 0.0 2.50
Student 17.5 15.0 16.25
Tailoring 0.0 5.0 2.50
Sum 100.0 100.0 100.00


* Visualization of Frequency



[ adm_reason ]

* Frequency Table
adm\_reason : frequency
N Y Sum
Medical 0 1 1
Rehab 37 13 50
Rehab, Surgery 1 0 1
Rehab, Wounds 0 7 7
Surgery 1 0 1
UTI, Rehab 1 0 1
Wounds 0 19 19
Sum 40 40 80


* Relative Frequency Table (%)
adm_reason : ratio
N Y Sum
Medical 0.0 2.5 1.25
Rehab 92.5 32.5 62.50
Rehab, Surgery 2.5 0.0 1.25
Rehab, Wounds 0.0 17.5 8.75
Surgery 2.5 0.0 1.25
UTI, Rehab 2.5 0.0 1.25
Wounds 0.0 47.5 23.75
Sum 100.0 100.0 100.00


* Visualization of Frequency



[ inj_type ]

* Frequency Table
inj\_type : frequency
N Y Sum
Cord Compression 1 0 1
Cord Contusion 4 1 5
Fracture 28 30 58
Lesion 0 2 2
Spondy/Dislocation/Sublux 7 7 14
Sum 40 40 80


* Relative Frequency Table (%)
inj_type : ratio
N Y Sum
Cord Compression 2.5 0.0 1.25
Cord Contusion 10.0 2.5 6.25
Fracture 70.0 75.0 72.50
Lesion 0.0 5.0 2.50
Spondy/Dislocation/Sublux 17.5 17.5 17.50
Sum 100.0 100.0 100.00


* Visualization of Frequency



[ cause ]

* Frequency Table
cause : frequency
N Y Sum
Diving 0 1 1
EQ 1 3 4
External Load 2 3 5
Fall 30 27 57
NonTraumatic 1 2 3
Transport Accident 6 2 8
Unknown Trauma 0 2 2
Sum 40 40 80


* Relative Frequency Table (%)
cause : ratio
N Y Sum
Diving 0.0 2.5 1.25
EQ 2.5 7.5 5.00
External Load 5.0 7.5 6.25
Fall 75.0 67.5 71.25
NonTraumatic 2.5 5.0 3.75
Transport Accident 15.0 5.0 10.00
Unknown Trauma 0.0 5.0 2.50
Sum 100.0 100.0 100.00


* Visualization of Frequency



4.2 Grouped Relationship Between Variables

4.2.1 Grouped Correlation Coefficient

Table of correlation coefficients (0.5 or more)
ulcer Variable1 Variable2 Correlation Coefficient
N barthel_diff barthel_tot_dis 0.7281184
N ot_diff ot_dis 0.7132916
N barthel_tot_dis ot_dis 0.6654641
N barthel_tot_adm ot_adm 0.5019441
Y ot_diff ot_adm -0.6898645
Y ot_diff time_inj_adm -0.6449242
Y ot_adm time_inj_adm 0.6297616
Y barthel_diff barthel_tot_dis 0.5397576
Y barthel_tot_dis barthel_tot_adm 0.5065119

4.2.2 Grouped Correlation Plot of Numerical Variables

- Grouped Correlation Case of (ulcer == N)

- Grouped Correlation Case of (ulcer == Y)