Assignment 2: Breast Cancer Data

In this assuagement, I used breast cancer dataset. Breast cancer dataset is s freely available dataset that can be downloaded from common data repository sites such as Kaggle, GitHub and UCI. The Breast cancer dataset is used to predict a diagnosis of a patient to determine if he or she is positive of the disease. This data is open because it can be found free from these sources. I downloaded this dataset from a link found on the Kaggle website.

Histogram

## Warning: package 'ggplot2' was built under R version 3.6.1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Area Density Diagram

ggplot(cancer, aes(x=area_mean, fill = as.factor(diagnosis))) + geom_density(alpha = 0.5) + labs(x="area mean", y="Density", title = "Area Density")

Kable table

In this section, I used to subset() function from the R studio to find a subset of the dataset. The new subset of the data named ‘newdata’ contains a list of 14 observations that satisfied the rule executed. The rule that was executed in this case was meant to separate data that had a radius mean greater than 22. The table of the new subset shown below was developed using kable.

## Warning: package 'kableExtra' was built under R version 3.6.3
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave.points_mean symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se area_se smoothness_se compactness_se concavity_se concave.points_se symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave.points_worst symmetry_worst fractal_dimension_worst X
83 8611555 M 25.22 24.91 171.5 1878 0.10630 0.2665 0.3339 0.18450 0.1829 0.06782 0.8973 1.4740 7.382 120.00 0.008166 0.05693 0.05730 0.02030 0.01065 0.005893 30.00 33.62 211.7 2562 0.1573 0.6076 0.6476 0.2867 0.2355 0.10510 NA
109 86355 M 22.27 19.67 152.8 1509 0.13260 0.2768 0.4264 0.18230 0.2556 0.07039 1.2150 1.5450 10.050 170.00 0.006515 0.08668 0.10400 0.02480 0.03112 0.005037 28.40 28.01 206.8 2360 0.1701 0.6997 0.9608 0.2910 0.4055 0.09789 NA
123 865423 M 24.25 20.20 166.2 1761 0.14470 0.2867 0.4268 0.20120 0.2655 0.06877 1.5090 3.1200 9.807 233.00 0.023330 0.09806 0.12780 0.01822 0.04547 0.009875 26.02 23.99 180.9 2073 0.1696 0.4244 0.5803 0.2248 0.3222 0.08009 NA
165 8712289 M 23.27 22.04 152.1 1686 0.08439 0.1145 0.1324 0.09702 0.1801 0.05553 0.6642 0.8561 4.603 97.85 0.004910 0.02544 0.02822 0.01623 0.01956 0.003740 28.01 28.22 184.2 2403 0.1228 0.3583 0.3948 0.2346 0.3589 0.09187 NA
181 873592 M 27.22 21.87 182.1 2250 0.10940 0.1914 0.2871 0.18780 0.1800 0.05770 0.8361 1.4810 5.820 128.70 0.004631 0.02537 0.03109 0.01241 0.01575 0.002747 33.12 32.85 220.8 3216 0.1472 0.4034 0.5340 0.2688 0.2856 0.08082 NA
203 878796 M 23.29 26.67 158.9 1685 0.11410 0.2084 0.3523 0.16200 0.2200 0.06229 0.5539 1.5600 4.667 83.16 0.009327 0.05121 0.08958 0.02465 0.02175 0.005195 25.12 32.68 177.0 1986 0.1536 0.4167 0.7892 0.2733 0.3198 0.08762 NA
213 8810703 M 28.11 18.47 188.5 2499 0.11420 0.1516 0.3201 0.15950 0.1648 0.05525 2.8730 1.4760 21.980 525.60 0.013450 0.02772 0.06389 0.01407 0.04783 0.004476 28.11 18.47 188.5 2499 0.1142 0.1516 0.3201 0.1595 0.1648 0.05525 NA
237 88299702 M 23.21 26.97 153.5 1670 0.09509 0.1682 0.1950 0.12370 0.1909 0.06309 1.0580 0.9635 7.247 155.80 0.006428 0.02863 0.04497 0.01716 0.01590 0.003053 31.01 34.51 206.0 2944 0.1481 0.4126 0.5820 0.2593 0.3103 0.08677 NA
340 89812 M 23.51 24.27 155.1 1747 0.10690 0.1283 0.2308 0.14100 0.1797 0.05506 1.0090 0.9245 6.462 164.10 0.006292 0.01971 0.03582 0.01301 0.01479 0.003118 30.67 30.73 202.4 2906 0.1515 0.2678 0.4819 0.2089 0.2593 0.07738 NA
353 899987 M 25.73 17.46 174.2 2010 0.11490 0.2363 0.3368 0.19130 0.1956 0.06121 0.9948 0.8509 7.222 153.10 0.006369 0.04243 0.04266 0.01508 0.02335 0.003385 33.13 23.58 229.3 3234 0.1530 0.5937 0.6451 0.2756 0.3690 0.08815 NA
370 9012000 M 22.01 21.90 147.2 1482 0.10630 0.1954 0.2448 0.15010 0.1824 0.06140 1.0080 0.6999 7.561 130.20 0.003978 0.02821 0.03576 0.01471 0.01518 0.003796 27.66 25.80 195.0 2227 0.1294 0.3885 0.4756 0.2432 0.2741 0.08574 NA
462 911296202 M 27.42 26.27 186.9 2501 0.10840 0.1988 0.3635 0.16890 0.2061 0.05623 2.5470 1.3060 18.650 542.20 0.007650 0.05374 0.08055 0.02598 0.01697 0.004558 36.04 31.37 251.2 4254 0.1357 0.4256 0.6833 0.2625 0.2641 0.07427 NA
504 915143 M 23.09 19.83 152.1 1682 0.09342 0.1275 0.1676 0.10030 0.1505 0.05484 1.2910 0.7452 9.635 180.20 0.005753 0.03356 0.03976 0.02156 0.02201 0.002897 30.79 23.87 211.5 2782 0.1199 0.3625 0.3794 0.2264 0.2908 0.07277 NA
522 91762702 M 24.63 21.60 165.5 1841 0.10300 0.2106 0.2310 0.14710 0.1991 0.06739 0.9915 0.9004 7.050 139.90 0.004989 0.03212 0.03571 0.01597 0.01879 0.004760 29.92 26.93 205.7 2642 0.1342 0.4188 0.4658 0.2475 0.3157 0.09671 NA

Observations

After completing this assignment, there are interesting things that I came across. I was able to learn how to apply the code folding technique to hide or show the code of the rmarkdown file. I was also able to apply the ggplot function to plot two diagrams that can be used to interpret the data. I also applied my skill in the R programming language to subset the dataset to obtain a formatted table as required in the exercise. This was the only task that seemed difficult for me since I had to try several subsets before finally finding the most suitable.