How Frequency and Density are connected

Joshua L. Britt

Introduction


Let's take a random sample of size “n”

s = rnorm(100, mean = 20, sd = 5)
sort(s)
##   [1]  4.231  5.657  8.522 11.200 11.502 11.527 13.045 13.413 13.515 13.816
##  [11] 14.478 14.748 15.104 15.178 15.428 15.687 15.731 15.857 16.238 16.405
##  [21] 16.423 16.624 16.800 16.951 17.348 17.516 17.640 17.661 17.684 17.705
##  [31] 17.736 17.778 17.850 17.922 17.947 18.041 18.165 18.339 18.385 18.408
##  [41] 18.435 18.510 18.798 18.823 18.825 18.961 19.008 19.090 19.679 19.769
##  [51] 19.772 19.835 19.887 19.982 20.245 20.682 20.691 20.737 21.016 21.093
##  [61] 21.234 21.258 21.267 21.366 21.377 21.579 21.622 21.712 22.019 22.304
##  [71] 22.305 22.464 22.792 22.792 22.821 22.883 23.058 23.331 23.377 23.504
##  [81] 23.643 23.720 23.727 23.966 24.144 24.460 24.495 24.771 25.106 25.118
##  [91] 25.121 25.182 25.288 26.397 26.526 27.370 29.034 29.546 29.971 32.309
summary(s)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.23   17.50   19.80   19.70   22.80   32.30

Now, we can take a look at the Frequency Distribution

h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, plot = F)  # I have to create the object h before I can access the variable counts!!!
h
## $breaks
## [1]  0  5 10 15 20 25 30 35
## 
## $counts
## [1]  1  2  9 42 34 11  1
## 
## $intensities
## [1] 0.002 0.004 0.018 0.084 0.068 0.022 0.002
## 
## $density
## [1] 0.002 0.004 0.018 0.084 0.068 0.022 0.002
## 
## $mids
## [1]  2.5  7.5 12.5 17.5 22.5 27.5 32.5
## 
## $xname
## [1] "s"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, ylim = c(0, 
    max(h$counts) + 5))

plot of chunk unnamed-chunk-3

Inspect the bins!!

bins = table(cut(s, breaks = seq(from = min(h$breaks), to = max(h$breaks), by = h$breaks[2] - 
    h$breaks[1]), right = F))

bins
## 
##   [0,5)  [5,10) [10,15) [15,20) [20,25) [25,30) [30,35) 
##       1       2       9      42      34      11       1

Ok, let's inspect the Density plot or Relative Frequency Histogram know by most introductory statistics text.

h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, freq = F, 
    ylim = c(0, max(h$density) + 0.03))

plot of chunk unnamed-chunk-5

The relation between frequency and density distributions is this.

\[ Base\cdot Height=\frac{Part}{Whole}=rf \]

Let's look at the Part/Whole for each class

rf = (h$breaks[2] - h$breaks[1]) * h$density
rf
## [1] 0.01 0.02 0.09 0.42 0.34 0.11 0.01

The sum of relative frequinces…0.01, 0.02, 0.09, 0.42, 0.34, 0.11, 0.01 is 1

\[ Part=Base\cdot Height\cdot Whole \]

I'll now get the frequicnces by…

freq = (h$breaks[2] - h$breaks[1]) * h$density * length(s)
freq
## [1]  1  2  9 42 34 11  1

The last part of this excersie is to overlay the histogram with a density!!

h = hist(s, col = "Steelblue2", labels = T, breaks = 5, right = F, freq = F, 
    ylim = c(0, max(h$density) + 0.03))
lines(density(s), col = "black", lwd = 2)

plot of chunk unnamed-chunk-8