Narrative: Data description

Narrative: Intro to EDA results

Let’s look at summary descriptive statistics for our dataset

         vars     n    mean      sd  median trimmed     mad   min      max
carat       1 53940    0.80    0.47    0.70    0.73    0.47   0.2     5.01
cut*        2 53940    3.90    1.12    4.00    4.04    1.48   1.0     5.00
color*      3 53940    3.59    1.70    4.00    3.55    1.48   1.0     7.00
clarity*    4 53940    4.05    1.65    4.00    3.91    1.48   1.0     8.00
depth       5 53940   61.75    1.43   61.80   61.78    1.04  43.0    79.00
table       6 53940   57.46    2.23   57.00   57.32    1.48  43.0    95.00
price       7 53940 3932.80 3989.44 2401.00 3158.99 2475.94 326.0 18823.00
x           8 53940    5.73    1.12    5.70    5.66    1.38   0.0    10.74
y           9 53940    5.73    1.14    5.71    5.66    1.36   0.0    58.90
z          10 53940    3.54    0.71    3.53    3.49    0.85   0.0    31.80
            range  skew kurtosis    se
carat        4.81  1.12     1.26  0.00
cut*         4.00 -0.72    -0.40  0.00
color*       6.00  0.19    -0.87  0.01
clarity*     7.00  0.55    -0.39  0.01
depth       36.00 -0.08     5.74  0.01
table       52.00  0.80     2.80  0.01
price    18497.00  1.62     2.18 17.18
x           10.74  0.38    -0.62  0.00
y           58.90  2.43    91.20  0.00
z           31.80  1.52    47.08  0.00

Now, let’s examine each variable of interest individually.

Varible Price is … Decriptive statistics for ‘Price’:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    326     950    2400    3930    5320   18800 

Finally, let’s examine price distribution across the dataset visually:

plot of chunk VisualPrice