Distribution of human gene lengths

First look: boxplot

There are some major outliers wrt gene length

























Distribution of human gene lengths

2nd look at a data: w/o outliers

There are some major outliers wrt gene length

Compare mean to median

Distribution of human gene lengths:

3rd look: log of gene length

Taking the log makes it look more “normal” This will be discussed more in future weeks

Random sample of 100 genes

Random sample of ten

results100 <- vector() for(i in 1:10000){ temporarySample <- sample(gene.length2, size = 100, replace = FALSE) results100[i] <- mean(temporarySample) }

Plot the sample means in a histogram. The histogram shows the sampling distribution of the sample mean. Note: your results won’t be completely identical to Figure 4.1-3, because 10,000 random samples is not a large enough number of iterations to obtain the true sampling distribution with extreme accuracy.

par(mfrow = c(1,1)) hist(results100, breaks = 50, right = FALSE,col = “firebrick”) abline(v = mean(gene.length2), lwd = 4)