Assignmentt

Assignment 2: Breast Cancer Data

In this assuagement, I used breast cancer dataset. Breast cancer dataset is s freely available dataset that can be downloaded from common data repository sites such as Kaggle, GitHub and UCI. The Breast cancer dataset is used to predict a diagnosis of a patient to determine if he or she is positive of the disease. This data is open because it can be found free from these sources. I downloaded this dataset from a link found on the Kaggle website.

Histogram

## Warning: package 'ggplot2' was built under R version 3.6.1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Area Density Diagram

ggplot(cancer, aes(x=area_mean, fill = as.factor(diagnosis))) + geom_density(alpha = 0.5) + labs(x="area mean", y="Density", title = "Area Density")

Kable table

In this section, I used to subset() function from the R studio to find a subset of the dataset. The new subset of the data named ‘newdata’ contains a list of 14 observations that satisfied the rule executed. The rule that was executed in this case was meant to separate data that had a radius mean greater than 22. The table of the new subset shown below was developed using kable.

## Warning: package 'kableExtra' was built under R version 3.6.3

	id	diagnosis	radius_mean	texture_mean	perimeter_mean	area_mean	smoothness_mean	compactness_mean	concavity_mean	concave.points_mean	symmetry_mean	fractal_dimension_mean	radius_se	texture_se	perimeter_se	area_se	smoothness_se	compactness_se	concavity_se	concave.points_se	symmetry_se	fractal_dimension_se	radius_worst	texture_worst	perimeter_worst	area_worst	smoothness_worst	compactness_worst	concavity_worst	concave.points_worst	symmetry_worst	fractal_dimension_worst	X
83	8611555	M	25.22	24.91	171.5	1878	0.10630	0.2665	0.3339	0.18450	0.1829	0.06782	0.8973	1.4740	7.382	120.00	0.008166	0.05693	0.05730	0.02030	0.01065	0.005893	30.00	33.62	211.7	2562	0.1573	0.6076	0.6476	0.2867	0.2355	0.10510	NA
109	86355	M	22.27	19.67	152.8	1509	0.13260	0.2768	0.4264	0.18230	0.2556	0.07039	1.2150	1.5450	10.050	170.00	0.006515	0.08668	0.10400	0.02480	0.03112	0.005037	28.40	28.01	206.8	2360	0.1701	0.6997	0.9608	0.2910	0.4055	0.09789	NA
123	865423	M	24.25	20.20	166.2	1761	0.14470	0.2867	0.4268	0.20120	0.2655	0.06877	1.5090	3.1200	9.807	233.00	0.023330	0.09806	0.12780	0.01822	0.04547	0.009875	26.02	23.99	180.9	2073	0.1696	0.4244	0.5803	0.2248	0.3222	0.08009	NA
165	8712289	M	23.27	22.04	152.1	1686	0.08439	0.1145	0.1324	0.09702	0.1801	0.05553	0.6642	0.8561	4.603	97.85	0.004910	0.02544	0.02822	0.01623	0.01956	0.003740	28.01	28.22	184.2	2403	0.1228	0.3583	0.3948	0.2346	0.3589	0.09187	NA
181	873592	M	27.22	21.87	182.1	2250	0.10940	0.1914	0.2871	0.18780	0.1800	0.05770	0.8361	1.4810	5.820	128.70	0.004631	0.02537	0.03109	0.01241	0.01575	0.002747	33.12	32.85	220.8	3216	0.1472	0.4034	0.5340	0.2688	0.2856	0.08082	NA
203	878796	M	23.29	26.67	158.9	1685	0.11410	0.2084	0.3523	0.16200	0.2200	0.06229	0.5539	1.5600	4.667	83.16	0.009327	0.05121	0.08958	0.02465	0.02175	0.005195	25.12	32.68	177.0	1986	0.1536	0.4167	0.7892	0.2733	0.3198	0.08762	NA
213	8810703	M	28.11	18.47	188.5	2499	0.11420	0.1516	0.3201	0.15950	0.1648	0.05525	2.8730	1.4760	21.980	525.60	0.013450	0.02772	0.06389	0.01407	0.04783	0.004476	28.11	18.47	188.5	2499	0.1142	0.1516	0.3201	0.1595	0.1648	0.05525	NA
237	88299702	M	23.21	26.97	153.5	1670	0.09509	0.1682	0.1950	0.12370	0.1909	0.06309	1.0580	0.9635	7.247	155.80	0.006428	0.02863	0.04497	0.01716	0.01590	0.003053	31.01	34.51	206.0	2944	0.1481	0.4126	0.5820	0.2593	0.3103	0.08677	NA
340	89812	M	23.51	24.27	155.1	1747	0.10690	0.1283	0.2308	0.14100	0.1797	0.05506	1.0090	0.9245	6.462	164.10	0.006292	0.01971	0.03582	0.01301	0.01479	0.003118	30.67	30.73	202.4	2906	0.1515	0.2678	0.4819	0.2089	0.2593	0.07738	NA
353	899987	M	25.73	17.46	174.2	2010	0.11490	0.2363	0.3368	0.19130	0.1956	0.06121	0.9948	0.8509	7.222	153.10	0.006369	0.04243	0.04266	0.01508	0.02335	0.003385	33.13	23.58	229.3	3234	0.1530	0.5937	0.6451	0.2756	0.3690	0.08815	NA
370	9012000	M	22.01	21.90	147.2	1482	0.10630	0.1954	0.2448	0.15010	0.1824	0.06140	1.0080	0.6999	7.561	130.20	0.003978	0.02821	0.03576	0.01471	0.01518	0.003796	27.66	25.80	195.0	2227	0.1294	0.3885	0.4756	0.2432	0.2741	0.08574	NA
462	911296202	M	27.42	26.27	186.9	2501	0.10840	0.1988	0.3635	0.16890	0.2061	0.05623	2.5470	1.3060	18.650	542.20	0.007650	0.05374	0.08055	0.02598	0.01697	0.004558	36.04	31.37	251.2	4254	0.1357	0.4256	0.6833	0.2625	0.2641	0.07427	NA
504	915143	M	23.09	19.83	152.1	1682	0.09342	0.1275	0.1676	0.10030	0.1505	0.05484	1.2910	0.7452	9.635	180.20	0.005753	0.03356	0.03976	0.02156	0.02201	0.002897	30.79	23.87	211.5	2782	0.1199	0.3625	0.3794	0.2264	0.2908	0.07277	NA
522	91762702	M	24.63	21.60	165.5	1841	0.10300	0.2106	0.2310	0.14710	0.1991	0.06739	0.9915	0.9004	7.050	139.90	0.004989	0.03212	0.03571	0.01597	0.01879	0.004760	29.92	26.93	205.7	2642	0.1342	0.4188	0.4658	0.2475	0.3157	0.09671	NA

Observations

After completing this assignment, there are interesting things that I came across. I was able to learn how to apply the code folding technique to hide or show the code of the rmarkdown file. I was also able to apply the ggplot function to plot two diagrams that can be used to interpret the data. I also applied my skill in the R programming language to subset the dataset to obtain a formatted table as required in the exercise. This was the only task that seemed difficult for me since I had to try several subsets before finally finding the most suitable.