Joshua L. Britt
Let's take a random sample of size “n”
s = rnorm(100, mean = 20, sd = 5)
sort(s)
## [1] 4.231 5.657 8.522 11.200 11.502 11.527 13.045 13.413 13.515 13.816
## [11] 14.478 14.748 15.104 15.178 15.428 15.687 15.731 15.857 16.238 16.405
## [21] 16.423 16.624 16.800 16.951 17.348 17.516 17.640 17.661 17.684 17.705
## [31] 17.736 17.778 17.850 17.922 17.947 18.041 18.165 18.339 18.385 18.408
## [41] 18.435 18.510 18.798 18.823 18.825 18.961 19.008 19.090 19.679 19.769
## [51] 19.772 19.835 19.887 19.982 20.245 20.682 20.691 20.737 21.016 21.093
## [61] 21.234 21.258 21.267 21.366 21.377 21.579 21.622 21.712 22.019 22.304
## [71] 22.305 22.464 22.792 22.792 22.821 22.883 23.058 23.331 23.377 23.504
## [81] 23.643 23.720 23.727 23.966 24.144 24.460 24.495 24.771 25.106 25.118
## [91] 25.121 25.182 25.288 26.397 26.526 27.370 29.034 29.546 29.971 32.309
summary(s)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.23 17.50 19.80 19.70 22.80 32.30
Now, we can take a look at the Frequency Distribution
h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, plot = F) # I have to create the object h before I can access the variable counts!!!
h
## $breaks
## [1] 0 5 10 15 20 25 30 35
##
## $counts
## [1] 1 2 9 42 34 11 1
##
## $intensities
## [1] 0.002 0.004 0.018 0.084 0.068 0.022 0.002
##
## $density
## [1] 0.002 0.004 0.018 0.084 0.068 0.022 0.002
##
## $mids
## [1] 2.5 7.5 12.5 17.5 22.5 27.5 32.5
##
## $xname
## [1] "s"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, ylim = c(0,
max(h$counts) + 5))
Inspect the bins!!
bins = table(cut(s, breaks = seq(from = min(h$breaks), to = max(h$breaks), by = h$breaks[2] -
h$breaks[1]), right = F))
bins
##
## [0,5) [5,10) [10,15) [15,20) [20,25) [25,30) [30,35)
## 1 2 9 42 34 11 1
Ok, let's inspect the Density plot or Relative Frequency Histogram know by most introductory statistics text.
h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, freq = F,
ylim = c(0, max(h$density) + 0.03))
The relation between frequency and density distributions is this.
\[ Base\cdot Height=\frac{Part}{Whole}=rf \]
Let's look at the Part/Whole for each class
rf = (h$breaks[2] - h$breaks[1]) * h$density
rf
## [1] 0.01 0.02 0.09 0.42 0.34 0.11 0.01
The sum of relative frequinces…0.01, 0.02, 0.09, 0.42, 0.34, 0.11, 0.01 is 1
\[ Part=Base\cdot Height\cdot Whole \]
I'll now get the frequicnces by…
freq = (h$breaks[2] - h$breaks[1]) * h$density * length(s)
freq
## [1] 1 2 9 42 34 11 1
The last part of this excersie is to overlay the histogram with a density!!
h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, freq = F,
ylim = c(0, max(h$density) + 0.03))
lines(density(s), col = "black", lwd = 2)