R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot. #For example, in the first row: #Area: 15.26 #Perimeter: 14.84 #Compactness: 0.871 #Length of Kernel: 5.763 #Width of Kernel: 3.312 #Asymmetry Coefficient: 2.221 #Length of Kernel Groove: 5.22 #Type of Wheat i.e. kama, rosa, canadian

View(seeds_dataset) seeds_dataset <- read.delim(“~/seeds_dataset.txt”, header=FALSE) View(seeds_dataset)

Scale the dataset

scaled_data <- scale(seeds_dataset)

Remove rows with missing values

seeds_dataset <- na.omit(seeds_dataset)

View the cleaned dataset

View(seeds_dataset)

Summary of the cleaned dataset

summary(seeds_dataset)

Scale the cleaned dataset

scaled_data <- scale(seeds_dataset)

Calculate the distance matrix

de <- dist(scale(seeds_dataset)) # Standardizing

Hierarchical clustering of the dataset

he <- hclust(de)

Plot the dendrogram

plot(he) plot(he, hang=-0.1, labels=seeds_dataset[[‘V8’]], cex=0.5) # cex to decrease the font size

Cut the cluster by 3 as there are three categories of Wheat seed

clus3e <- cutree(he, 3)

Create confusion matrix

cm <- table(clus3e, seeds_dataset$V8) cm

Error<-100*(1- sum(diag(cm))/sum(cm)) Error

library(cluster) sil<-silhouette(clus3e,de) plot(sil) # Plot the Silhouettes, cluster co-efficient having close to 1 is better then others

set.seed(1234) d<- dist(scale(seeds_dataset[-5])) methds<- c(‘complete’,‘single’,‘average’) avgS<-matrix(NA, ncol=3, nrow=5,dimnames=list(2:6, methds))

for(k in 2:6) { for(m in seq_along(methds)) {h<- hclust(d, meth=methds[m]) c<- cutree(h,k) s<- silhouette(c,d) avgS[k-1,m]=mean(s[,3]) } } avgS

Seeds_dataset Hierarchical Clustering

Tooba Maryam

2024-02-24