── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
Assignment 14
Diamond Sized
We have data about 53,940 diamonds. Only 126 are larger than 2.5 carats, which represents 0.23% of all diamonds. The distribution of the remainder is shown below:
Figure Figure 1 shows the distribution of diamond sizes under 2.5 carats.
The distribution is strongly right-skewed, with most diamonds falling below 1 carat. The most striking feature is the sharp spike just under 0.5 carats, followed by smaller peaks near 1.0 and 1.5 carats, likely reflecting round-number cutting preferences. Frequency drops off rapidly after each peak, and diamonds above 2 carats are extremely rare.
This section explores how diamond size varies by Cut, Color and Clarity
#Cut
# A tibble: 5 × 5
cut average_carat median_carat largest_carat number_of_diamonds
<ord> <dbl> <dbl> <dbl> <int>
1 Fair 1.05 1 5.01 1610
2 Good 0.849 0.82 3.01 4906
3 Very Good 0.806 0.71 4 12082
4 Premium 0.892 0.86 4.01 13791
5 Ideal 0.703 0.54 3.5 21551
The chart shows that diamond sizes vary by cut. Ideal cut diamonds are mostly smaller, while Fair and Good cuts include some larger diamonds.
#Color
# A tibble: 7 × 5
color average_carat median_carat largest_carat number_of_diamonds
<ord> <dbl> <dbl> <dbl> <int>
1 D 0.658 0.53 3.4 6775
2 E 0.658 0.53 3.05 9797
3 F 0.737 0.7 3.01 9542
4 G 0.771 0.7 3.01 11292
5 H 0.912 0.9 4.13 8304
6 I 1.03 1 4.01 5422
7 J 1.16 1.11 5.01 2808
The chart shows that diamond size changes across color grades, but the pattern is not as strong as the difference seen by cut.
#Clarity
# A tibble: 8 × 5
clarity average_carat median_carat largest_carat number_of_diamonds
<ord> <dbl> <dbl> <dbl> <int>
1 I1 1.28 1.12 5.01 741
2 SI2 1.08 1.01 3.04 9194
3 SI1 0.850 0.76 2.57 13065
4 VS2 0.764 0.63 3.51 12258
5 VS1 0.727 0.57 2.59 8171
6 VVS2 0.596 0.44 2.07 5066
7 VVS1 0.503 0.39 2.31 3655
8 IF 0.505 0.35 2.29 1790
The chart shows that lower clarity groups include more large diamonds, while higher clarity groups tend to have smaller diamonds.
##Largest 20 Diamond
# A tibble: 20 × 5
carat cut color clarity price
<dbl> <ord> <ord> <ord> <int>
1 5.01 Fair J I1 18018
2 4.5 Fair J I1 18531
3 4.13 Fair H I1 17329
4 4.01 Premium I I1 15223
5 4.01 Premium J I1 15223
6 4 Very Good I I1 15984
7 3.67 Premium I I1 16193
8 3.65 Fair H I1 11668
9 3.51 Premium J VS2 18701
10 3.5 Ideal H I1 12587
11 3.4 Fair D I1 15964
12 3.24 Premium H I1 12300
13 3.22 Ideal I I1 12545
14 3.11 Fair J I1 9823
15 3.05 Premium E I1 10453
16 3.04 Very Good I SI2 15354
17 3.04 Premium I SI2 18559
18 3.02 Fair I I1 10577
19 3.01 Premium I I1 8040
20 3.01 Premium F I1 9925
The largest diamonds in this dataset are all above 3 carats. These diamonds vary in cut, color, clarity, and price. This shows that carat size is important, but it is not the only factor that affects diamond value.
##5 rows of Diamond
| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |