M. Drew LaMar
September 15, 2021
“I believe that we do not know anything for certain, but everything probably.”
- Christiaan Huygens
Make sure you read the book for the following discussions
Question: Why is this important to know?
My point here is that you are responsible for all book material, even if we don't cover it in lecture!
Measures | R commands |
---|---|
\( \overline{Y} \) | mean |
\( s^2 \) | var |
\( s \) | sd |
\( IQR \) | IQR \( ^* \) |
Multiple | summary |
\( ^* \) Note that IQR
has different algorithms. To match the algorithm in W&S, you should use IQR(___, type=5)
. There are different algorithms as there are different ways to calculate quantiles. (for curious souls, see ?quantiles
). For the HW, either version is acceptable. Default type in R is type=7
.
Measures | R commands |
---|---|
\( \overline{Y} \) | mean |
\( s^2 \) | var |
\( s \) | sd |
\( IQR \) | IQR |
Multiple | summary |
summary(mydata)
breadth
Min. : 1.00
1st Qu.: 3.00
Median : 8.00
Mean :11.88
3rd Qu.:17.00
Max. :62.00
IQR
would be \( 17-3 = 14 \).
Definition: The
sampling distribution is the population distribution of all values for an estimate that we might obtain when we sample a population.
Definition: The
standard error of an estimate is the standard deviation of the estimate’s sampling distribution.
Definition: The
standard error of the mean is given by
\[ \sigma_{\overline{Y}} = \frac{\sigma}{\sqrt{n}} \] with theapproximate standard error of the mean given by \[ \mathrm{SE}_{\overline{Y}} = \frac{s}{\sqrt{n}} \]
Definition: A
confidence interval is a range of values surrounding the sample estimate that is likely to contain the population parameter.
Definition: A
95% confidence interval provides a most-plausible range for a parameter. Values lying within the interval are most plausible, whereas those outside are less plausible, based on the data.
Read and inspect the data.
locustData <- read.csv("../..//Datasets/chapter02/chap02f1_2locustSerotonin.csv")
head(locustData)
serotoninLevel treatmentTime
1 5.3 0
2 4.6 0
3 4.5 0
4 4.3 0
5 4.2 0
6 3.6 0
str(locustData)
'data.frame': 30 obs. of 2 variables:
$ serotoninLevel: num 5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
$ treatmentTime : int 0 0 0 0 0 0 0 0 0 0 ...
First, calculate the statistics by group needed for the error bars: the mean and standard error. Here, tapply
is used to obtain each quantity by treatment group.
meanSerotonin <- tapply(locustData$serotoninLevel,
locustData$treatmentTime,
mean)
sdSerotonin <- tapply(locustData$serotoninLevel,
locustData$treatmentTime,
sd)
nSerotonin <- tapply(locustData$serotoninLevel,
locustData$treatmentTime,
length)
seSerotonin <- sdSerotonin / sqrt(nSerotonin)
Draw the strip chart and then add the error bars.
\[ \bar{Y} \pm SE_{\bar{Y}} \]
offsetAmount <- 0.2
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
method = "jitter",
vertical = TRUE)
segments(1:3 + offsetAmount,
meanSerotonin - seSerotonin,
1:3 + offsetAmount,
meanSerotonin + seSerotonin)
points(meanSerotonin ~ c(c(1,2,3) + offsetAmount),
pch = 16,
cex = 1.2)
Draw the strip chart and then add the error bars.
\[ \bar{Y} \pm SE_{\bar{Y}} \]
Different error bars!!! \[ \bar{Y} \pm sd \\ \bar{Y} \pm SE_{\bar{Y}} \\ \bar{Y} \pm 2\times SE_{\bar{Y}} \]