Is it true that average of our sample is larger than 180 in statistically significant way? In other terms, can we reject null hypothesis that sample is obtained from the population with population mean that is equal to 180 in favor of an alternative that population mean is larger than 180?
In other words, can we assume that the difference between sample mean and reference population mean (180) is due to randomness, or it shows that the actual population mean is larger than 180?
Consider several samples:
sample1 <- c(170, 181, 190, 187, 185)
Let’s find sample mean:
mean(sample1)
## [1] 182.6
It’s larger than 180. But is it significant difference? First, give
your guess. Then, use t.test function to get p-value and
obtain answer:
t.test(sample1, mu=180, alternative = 'greater')
##
## One Sample t-test
##
## data: sample1
## t = 0.74869, df = 4, p-value = 0.2478
## alternative hypothesis: true mean is greater than 180
## 95 percent confidence interval:
## 175.1966 Inf
## sample estimates:
## mean of x
## 182.6
Can we conclude that the difference is statistically significant?
Now repeat the same steps with the following samples. Try to guess the result before you proceed, and explain which factors you take into account in your guesses.
sample2 <- c(183, 182, 183.1, 182.3, 182.6)
sample3 <- c(190, 180, 210, 190, 200)
sample4 <- c(170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185)
We use package languageR that accompanies book by Baayen
and use dataset durationsOnt from this package (Pluymaekers
et al., 2005).
install.packages("languageR")
## Installing package into '/Users/user/Library/R/3.6/library'
## (as 'lib' is unspecified)
##
## The downloaded binary packages are in
## /var/folders/h2/9nyrt4p55kq6pdvqg02_zmj40000gn/T//RtmpBkpkYE/downloaded_packages
library(languageR)
data(durationsOnt)
durationsOnt
Calculate the mean length of the n (variable
DurationPrefixNasal):
# YOUR CODE HERE
Visualize the distribution of this variable with histogram.
# YOUR CODE HERE
Suppose that previous research of similar recordings had resulted in
a mean of 0.053. Is the mean observed for the new sample significantly
smaller than 0.053? State your null hypothesis and an alternative, then
use t.test to get p-value and answer the question. Use 5%
significance level.
# YOUR CODE HERE
Let us test that t-test work as expected, i.e. it gives false positive results in 5% times or less. To do it, consider the following simulation.
set.seed(42)
population <- c(12, 15, 5, 13, 24, 3, 7, 14, 16, 17, 21, 14.3)
sample_size <- 20
number_of_samples <- 10000
population_mean <- mean(population)
false_positives <- 0
p_values <- replicate(
number_of_samples,
t.test(
sample(population, size = sample_size, replace = TRUE),
mu = population_mean,
alternative = "less"
)$p.value
)
print(paste("False positive rate:", mean(p_values < .05)))
## [1] "False positive rate: 0.0422"
Now the magician claims that he has psychokinetic abilities. To test it, we perform the following experiment. We have a box with five balls, each ball has a number on it: 1, 3, 4, 7, 8. Again we select random ball, record the number, then put the ball again in the box, and repeat the process several times. Thus we obtain a sample of numbers. The magician uses his abilities to make average of these numbers to be as large as possible.
Find population mean, i.e. average of numbers in the box.
Assume that at some experiment we obtained the following sample: 8, 3, 3, 1, 1, 7, 8, 7. Its mean (find it) is larger than the population mean. The magician says:
Now you see? I has psychokinetic abilities! I said I will increase the mean and it is indeed larger than the population mean.
Does it look convincing?
Assume that you obtained sample mean that is equal to 8 and your sample size is 5. Does it look convincing now?
State null hypothesis and an alternative in this experiment.
As the magician claims that he is trying to make the sample mean as large as possible, we use the following decision rule: we choose some value \(\bar x_{crit}\) and will claim that the magician has magical abilities if sample average is equal to \(\bar x_{crit}\) or larger. Let \(\bar x_{crit}=7\). Assume that the magician doesn’t have magical abilities. Simulate 10000 sampling from our box, sample size = 5. How many times would you reject null hypothesis, i.e. claim that the magician indeed has magical abilities?
Plot the distribution of \(\bar x\) for the previous subproblem.
Find such smallest value \(\bar
x_{crit}\) that if you use this, the probability to make type I
error would be less than 0.05. Use quantile function.
Assume that we obtained the following data: 7, 8, 4, 8, 8. Find the corresponding p-value. Would you reject null hypothesis?