Narrative: Data description
Narrative: Intro to EDA results
Let’s look at summary descriptive statistics for our dataset
vars n mean sd median trimmed mad min max
carat 1 53940 0.80 0.47 0.70 0.73 0.47 0.2 5.01
cut* 2 53940 3.90 1.12 4.00 4.04 1.48 1.0 5.00
color* 3 53940 3.59 1.70 4.00 3.55 1.48 1.0 7.00
clarity* 4 53940 4.05 1.65 4.00 3.91 1.48 1.0 8.00
depth 5 53940 61.75 1.43 61.80 61.78 1.04 43.0 79.00
table 6 53940 57.46 2.23 57.00 57.32 1.48 43.0 95.00
price 7 53940 3932.80 3989.44 2401.00 3158.99 2475.94 326.0 18823.00
x 8 53940 5.73 1.12 5.70 5.66 1.38 0.0 10.74
y 9 53940 5.73 1.14 5.71 5.66 1.36 0.0 58.90
z 10 53940 3.54 0.71 3.53 3.49 0.85 0.0 31.80
range skew kurtosis se
carat 4.81 1.12 1.26 0.00
cut* 4.00 -0.72 -0.40 0.00
color* 6.00 0.19 -0.87 0.01
clarity* 7.00 0.55 -0.39 0.01
depth 36.00 -0.08 5.74 0.01
table 52.00 0.80 2.80 0.01
price 18497.00 1.62 2.18 17.18
x 10.74 0.38 -0.62 0.00
y 58.90 2.43 91.20 0.00
z 31.80 1.52 47.08 0.00
Now, let’s examine each variable of interest individually.
Varible Price is … Decriptive statistics for ‘Price’:
Min. 1st Qu. Median Mean 3rd Qu. Max.
326 950 2400 3930 5320 18800
Finally, let’s examine price distribution across the dataset visually: