M. Drew LaMar
September 10, 2018
“While nothing is more uncertain than a single life, nothing is more certain than the average duration of a thousand lives.”
- Elizur Wright
Definition:The
frequency distribution of a variable is the number of occurrences of all values of that variable in the data.
Definition:The
relative frequency distribution of a variable is the fraction of occurrences of all values of that variable in the dataor population .
Question:What type of plot represents the frequency (relative frequency) distribution for a discrete variable?
Answer:Bar plot
Definition: A
bar plot uses the height of rectangular bars to display the frequency distribution (or relative frequency distribution) of a categorical variable.
Death by tiger
Question: What type of plot represents the frequency distribution for a continuous variable?
Answer: Histogram (which is still a bar plot, actually)
Definition: A
histogram for a frequency distribution uses theheight of rectangular bars to display the frequency distribution of a numerical variable.
Definition: A
histogram for a relative frequency distribution uses thearea of rectangular bars to display the relative frequency distribution of a numerical variable.
Question: What’s the explanatory and response variable?
Answer: Neither
Load and show the data:
salmonSizeData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv"))
head(salmonSizeData)
year sex oceanAgeYears lengthMm massKg
1 1996 FALSE 3 513 3.090
2 1996 FALSE 3 513 2.909
3 1996 FALSE 3 525 3.056
4 1996 FALSE 3 501 2.690
5 1996 FALSE 3 513 2.876
6 1996 FALSE 3 501 2.978
Plot in a histogram:
histObj <- hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.5),
col = "firebrick")
seq(1,4,by=0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Plot in a histogram:
histObj <- hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.5),
col = "firebrick")
Question: What would the height of the second bar from the left be for a relative frequency distribution? (note: current height is 136)
Question: What would the height of the second bar from the left be for a relative frequency distribution, given that we have 228 fish?
\[ Area = Proportion \]
\[ Area = Height \times width \]
\[ Proportion = Height \times 0.5 \]
\[ 136/228 = Height \times 0.5 \]
\[ Height = 2\times 136/228 \]
\[ Height = 1.1929825 \]
Question: What happens with smaller bin width (say width of 0.1)?
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.1),
col = "firebrick",
freq=FALSE)
Question: What happens with smaller bin width (say width of 0.1)?
Definition: The
population mean \( \mu \) is the sum of all the observations in the population divided by \( N \), the number of observations in the population (assuming it is finite - for now).
\[ \mu = \frac{1}{N}\sum_{i=1}^{N}Y_{i}\, \]
Definition: The
sample mean \( \overline{Y} \) is the sum of all the observations in the sample divided by \( n \), the number of sample observations.
\[ \overline{Y} = \frac{1}{n}\sum_{i=1}^{n}Y_{i}\, \]
Question: Is the population mean \( \mu \) a parameter or an estimate? What about the sample mean?
Note that every observation has equal weight (i.e. \( \frac{1}{n} \)), so any outliers can strongly affect the mean. It is a very democratic statistic - equal representation!
Definition: The
population median is the middle measurement of the set of all observations in the population (again, assume population finite for now).
Definition: The
sample median is the middle measurement of the set of all observations in the sample.
How do you compute the median? W&S version:
Look at special cases of \( n=3 \) and \( n=4 \)!!!