PART 1: SUMMARY DATA: COLOR AND CLARITY
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
`summarise()` has grouped output by 'color'. You can override using the
`.groups` argument.
# A tibble: 56 × 5
# Groups: color [7]
color clarity N freq pct
<ord> <ord> <int> <dbl> <dbl>
1 D I1 42 0.00620 1
2 D SI2 1370 0.202 20
3 D SI1 2083 0.307 31
4 D VS2 1697 0.250 25
5 D VS1 705 0.104 10
6 D VVS2 553 0.0816 8
7 D VVS1 252 0.0372 4
8 D IF 73 0.0108 1
9 E I1 102 0.0104 1
10 E SI2 1713 0.175 17
# ℹ 46 more rows
Yes, the data makes sense. The data is organized by the combinations of each color to each clarity type. It shows the number of observations of diamonds that fall into the combination of color and clarity. Furthermore it shows the frequency and percentage out of color variable. Looking at the color variable in each of the color categories there is a lower amount of diamonds with low clarity and high clarity and the rest in the color variable are in the middle. This make sense as there will probably be more average diamonds then the ones that are very nice or very bad.
PART 2: STACKED AND DODGED BAR CHARTS
PART 3: PRACTICE USING PIPES (dplyr) TO SUMMARIZE DATA: LENGTH, WIDTH AND CLARITY
# A tibble: 8 × 6
clarity N length_mean width_mean freq pct
<ord> <int> <dbl> <dbl> <dbl> <dbl>
1 I1 741 6.76 6.71 0.0137 1
2 SI2 9194 6.40 6.40 0.170 17
3 SI1 13065 5.89 5.89 0.242 24
4 VS2 12258 5.66 5.66 0.227 23
5 VS1 8171 5.57 5.58 0.151 15
6 VVS2 5066 5.22 5.23 0.0939 9
7 VVS1 3655 4.96 4.98 0.0678 7
8 IF 1790 4.97 4.99 0.0332 3
Yes, the data makes sense. The table displays the mean of the length and width of the variable clarity. It appears that most of the diamonds fall into the lower to medium clarity categories. Furthermore, it appears that within each clarity category they are grouped together and have similar lengths and widths. It’s looks like the better the clarity smaller the diamonds. This could be because those who are buying the diamonds prefer good clarity over the size. Another reason could be because of the intensive polishing process and be a cause to the reduction to a smaller diamond size.
PART 4: SCATTERPLOT WITH THE AVERAGE LENGTH AND WIDTH AND CLARITY
Part 5: LEGEND AND GUIDES
Part 6: DATA LABELS VS LEGEND
PART 7: INTERPRETATION
It looks like the better the clarity the smaller the diamonds. It also appears that the diamond’s sizes are more clustered together. This could be because those who are buying the diamonds prefer good clarity over the size. Another reason could be because of the intensive polishing process and be a cause to the reduction to a smaller diamond size.