In this assuagement, I used breast cancer dataset. Breast cancer dataset is s freely available dataset that can be downloaded from common data repository sites such as Kaggle, GitHub and UCI. The Breast cancer dataset is used to predict a diagnosis of a patient to determine if he or she is positive of the disease. This data is open because it can be found free from these sources. I downloaded this dataset from a link found on the Kaggle website.
## Warning: package 'ggplot2' was built under R version 3.6.1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(cancer, aes(x=area_mean, fill = as.factor(diagnosis))) + geom_density(alpha = 0.5) + labs(x="area mean", y="Density", title = "Area Density")
In this section, I used to subset() function from the R studio to find a subset of the dataset. The new subset of the data named ‘newdata’ contains a list of 14 observations that satisfied the rule executed. The rule that was executed in this case was meant to separate data that had a radius mean greater than 22. The table of the new subset shown below was developed using kable.
## Warning: package 'kableExtra' was built under R version 3.6.3
| id | diagnosis | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | concave.points_mean | symmetry_mean | fractal_dimension_mean | radius_se | texture_se | perimeter_se | area_se | smoothness_se | compactness_se | concavity_se | concave.points_se | symmetry_se | fractal_dimension_se | radius_worst | texture_worst | perimeter_worst | area_worst | smoothness_worst | compactness_worst | concavity_worst | concave.points_worst | symmetry_worst | fractal_dimension_worst | X | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 83 | 8611555 | M | 25.22 | 24.91 | 171.5 | 1878 | 0.10630 | 0.2665 | 0.3339 | 0.18450 | 0.1829 | 0.06782 | 0.8973 | 1.4740 | 7.382 | 120.00 | 0.008166 | 0.05693 | 0.05730 | 0.02030 | 0.01065 | 0.005893 | 30.00 | 33.62 | 211.7 | 2562 | 0.1573 | 0.6076 | 0.6476 | 0.2867 | 0.2355 | 0.10510 | NA |
| 109 | 86355 | M | 22.27 | 19.67 | 152.8 | 1509 | 0.13260 | 0.2768 | 0.4264 | 0.18230 | 0.2556 | 0.07039 | 1.2150 | 1.5450 | 10.050 | 170.00 | 0.006515 | 0.08668 | 0.10400 | 0.02480 | 0.03112 | 0.005037 | 28.40 | 28.01 | 206.8 | 2360 | 0.1701 | 0.6997 | 0.9608 | 0.2910 | 0.4055 | 0.09789 | NA |
| 123 | 865423 | M | 24.25 | 20.20 | 166.2 | 1761 | 0.14470 | 0.2867 | 0.4268 | 0.20120 | 0.2655 | 0.06877 | 1.5090 | 3.1200 | 9.807 | 233.00 | 0.023330 | 0.09806 | 0.12780 | 0.01822 | 0.04547 | 0.009875 | 26.02 | 23.99 | 180.9 | 2073 | 0.1696 | 0.4244 | 0.5803 | 0.2248 | 0.3222 | 0.08009 | NA |
| 165 | 8712289 | M | 23.27 | 22.04 | 152.1 | 1686 | 0.08439 | 0.1145 | 0.1324 | 0.09702 | 0.1801 | 0.05553 | 0.6642 | 0.8561 | 4.603 | 97.85 | 0.004910 | 0.02544 | 0.02822 | 0.01623 | 0.01956 | 0.003740 | 28.01 | 28.22 | 184.2 | 2403 | 0.1228 | 0.3583 | 0.3948 | 0.2346 | 0.3589 | 0.09187 | NA |
| 181 | 873592 | M | 27.22 | 21.87 | 182.1 | 2250 | 0.10940 | 0.1914 | 0.2871 | 0.18780 | 0.1800 | 0.05770 | 0.8361 | 1.4810 | 5.820 | 128.70 | 0.004631 | 0.02537 | 0.03109 | 0.01241 | 0.01575 | 0.002747 | 33.12 | 32.85 | 220.8 | 3216 | 0.1472 | 0.4034 | 0.5340 | 0.2688 | 0.2856 | 0.08082 | NA |
| 203 | 878796 | M | 23.29 | 26.67 | 158.9 | 1685 | 0.11410 | 0.2084 | 0.3523 | 0.16200 | 0.2200 | 0.06229 | 0.5539 | 1.5600 | 4.667 | 83.16 | 0.009327 | 0.05121 | 0.08958 | 0.02465 | 0.02175 | 0.005195 | 25.12 | 32.68 | 177.0 | 1986 | 0.1536 | 0.4167 | 0.7892 | 0.2733 | 0.3198 | 0.08762 | NA |
| 213 | 8810703 | M | 28.11 | 18.47 | 188.5 | 2499 | 0.11420 | 0.1516 | 0.3201 | 0.15950 | 0.1648 | 0.05525 | 2.8730 | 1.4760 | 21.980 | 525.60 | 0.013450 | 0.02772 | 0.06389 | 0.01407 | 0.04783 | 0.004476 | 28.11 | 18.47 | 188.5 | 2499 | 0.1142 | 0.1516 | 0.3201 | 0.1595 | 0.1648 | 0.05525 | NA |
| 237 | 88299702 | M | 23.21 | 26.97 | 153.5 | 1670 | 0.09509 | 0.1682 | 0.1950 | 0.12370 | 0.1909 | 0.06309 | 1.0580 | 0.9635 | 7.247 | 155.80 | 0.006428 | 0.02863 | 0.04497 | 0.01716 | 0.01590 | 0.003053 | 31.01 | 34.51 | 206.0 | 2944 | 0.1481 | 0.4126 | 0.5820 | 0.2593 | 0.3103 | 0.08677 | NA |
| 340 | 89812 | M | 23.51 | 24.27 | 155.1 | 1747 | 0.10690 | 0.1283 | 0.2308 | 0.14100 | 0.1797 | 0.05506 | 1.0090 | 0.9245 | 6.462 | 164.10 | 0.006292 | 0.01971 | 0.03582 | 0.01301 | 0.01479 | 0.003118 | 30.67 | 30.73 | 202.4 | 2906 | 0.1515 | 0.2678 | 0.4819 | 0.2089 | 0.2593 | 0.07738 | NA |
| 353 | 899987 | M | 25.73 | 17.46 | 174.2 | 2010 | 0.11490 | 0.2363 | 0.3368 | 0.19130 | 0.1956 | 0.06121 | 0.9948 | 0.8509 | 7.222 | 153.10 | 0.006369 | 0.04243 | 0.04266 | 0.01508 | 0.02335 | 0.003385 | 33.13 | 23.58 | 229.3 | 3234 | 0.1530 | 0.5937 | 0.6451 | 0.2756 | 0.3690 | 0.08815 | NA |
| 370 | 9012000 | M | 22.01 | 21.90 | 147.2 | 1482 | 0.10630 | 0.1954 | 0.2448 | 0.15010 | 0.1824 | 0.06140 | 1.0080 | 0.6999 | 7.561 | 130.20 | 0.003978 | 0.02821 | 0.03576 | 0.01471 | 0.01518 | 0.003796 | 27.66 | 25.80 | 195.0 | 2227 | 0.1294 | 0.3885 | 0.4756 | 0.2432 | 0.2741 | 0.08574 | NA |
| 462 | 911296202 | M | 27.42 | 26.27 | 186.9 | 2501 | 0.10840 | 0.1988 | 0.3635 | 0.16890 | 0.2061 | 0.05623 | 2.5470 | 1.3060 | 18.650 | 542.20 | 0.007650 | 0.05374 | 0.08055 | 0.02598 | 0.01697 | 0.004558 | 36.04 | 31.37 | 251.2 | 4254 | 0.1357 | 0.4256 | 0.6833 | 0.2625 | 0.2641 | 0.07427 | NA |
| 504 | 915143 | M | 23.09 | 19.83 | 152.1 | 1682 | 0.09342 | 0.1275 | 0.1676 | 0.10030 | 0.1505 | 0.05484 | 1.2910 | 0.7452 | 9.635 | 180.20 | 0.005753 | 0.03356 | 0.03976 | 0.02156 | 0.02201 | 0.002897 | 30.79 | 23.87 | 211.5 | 2782 | 0.1199 | 0.3625 | 0.3794 | 0.2264 | 0.2908 | 0.07277 | NA |
| 522 | 91762702 | M | 24.63 | 21.60 | 165.5 | 1841 | 0.10300 | 0.2106 | 0.2310 | 0.14710 | 0.1991 | 0.06739 | 0.9915 | 0.9004 | 7.050 | 139.90 | 0.004989 | 0.03212 | 0.03571 | 0.01597 | 0.01879 | 0.004760 | 29.92 | 26.93 | 205.7 | 2642 | 0.1342 | 0.4188 | 0.4658 | 0.2475 | 0.3157 | 0.09671 | NA |
After completing this assignment, there are interesting things that I came across. I was able to learn how to apply the code folding technique to hide or show the code of the rmarkdown file. I was also able to apply the ggplot function to plot two diagrams that can be used to interpret the data. I also applied my skill in the R programming language to subset the dataset to obtain a formatted table as required in the exercise. This was the only task that seemed difficult for me since I had to try several subsets before finally finding the most suitable.