Do we see significant difference?

Is it true that average of our sample is larger than 180 in statistically significant way? In other terms, can we reject null hypothesis that sample is obtained from the population with population mean that is equal to 180 in favor of an alternative that population mean is larger than 180?

In other words, can we assume that the difference between sample mean and reference population mean (180) is due to randomness, or it shows that the actual population mean is larger than 180?

Consider several samples:

sample1 <- c(170, 181, 190, 187, 185)

Let’s find sample mean:

mean(sample1)
## [1] 182.6

It’s larger than 180. But is it significant difference? First, give your guess. Then, use t.test function to get p-value and obtain answer:

t.test(sample1, mu=180, alternative = 'greater')
## 
##  One Sample t-test
## 
## data:  sample1
## t = 0.74869, df = 4, p-value = 0.2478
## alternative hypothesis: true mean is greater than 180
## 95 percent confidence interval:
##  175.1966      Inf
## sample estimates:
## mean of x 
##     182.6

Can we conclude that the difference is statistically significant?

Now repeat the same steps with the following samples. Try to guess the result before you proceed, and explain which factors you take into account in your guesses.

sample2 <- c(183, 182, 183.1, 182.3, 182.6)
sample3 <- c(190, 180, 210, 190, 200)
sample4 <- c(170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185, 170, 181, 190, 187, 185)

Working with data

We use package languageR that accompanies book by Baayen and use dataset durationsOnt from this package (Pluymaekers et al., 2005).

install.packages("languageR")
## Installing package into '/Users/user/Library/R/3.6/library'
## (as 'lib' is unspecified)
## 
## The downloaded binary packages are in
##  /var/folders/h2/9nyrt4p55kq6pdvqg02_zmj40000gn/T//RtmpBkpkYE/downloaded_packages
library(languageR)
data(durationsOnt)
durationsOnt

Calculate the mean length of the n (variable DurationPrefixNasal):

# YOUR CODE HERE

Visualize the distribution of this variable with histogram.

# YOUR CODE HERE

Suppose that previous research of similar recordings had resulted in a mean of 0.053. Is the mean observed for the new sample significantly smaller than 0.053? State your null hypothesis and an alternative, then use t.test to get p-value and answer the question. Use 5% significance level.

# YOUR CODE HERE

Simulations

Let us test that t-test work as expected, i.e. it gives false positive results in 5% times or less. To do it, consider the following simulation.

set.seed(42)
population <- c(12, 15, 5, 13, 24, 3, 7, 14, 16, 17, 21, 14.3)
sample_size <- 20
number_of_samples <- 10000
population_mean <- mean(population)
false_positives <- 0
p_values <- replicate(
  number_of_samples,
  t.test(
    sample(population, size = sample_size, replace = TRUE),
    mu = population_mean,
    alternative = "less"
  )$p.value
)


print(paste("False positive rate:", mean(p_values < .05)))
## [1] "False positive rate: 0.0422"

The magician and the mean

Now the magician claims that he has psychokinetic abilities. To test it, we perform the following experiment. We have a box with five balls, each ball has a number on it: 1, 3, 4, 7, 8. Again we select random ball, record the number, then put the ball again in the box, and repeat the process several times. Thus we obtain a sample of numbers. The magician uses his abilities to make average of these numbers to be as large as possible.

Population mean

Find population mean, i.e. average of numbers in the box.

Sample mean vs. population mean

Assume that at some experiment we obtained the following sample: 8, 3, 3, 1, 1, 7, 8, 7. Its mean (find it) is larger than the population mean. The magician says:

Now you see? I has psychokinetic abilities! I said I will increase the mean and it is indeed larger than the population mean.

Does it look convincing?

More results

Assume that you obtained sample mean that is equal to 8 and your sample size is 5. Does it look convincing now?

Hypotheses

State null hypothesis and an alternative in this experiment.

The decision rule

As the magician claims that he is trying to make the sample mean as large as possible, we use the following decision rule: we choose some value \(\bar x_{crit}\) and will claim that the magician has magical abilities if sample average is equal to \(\bar x_{crit}\) or larger. Let \(\bar x_{crit}=7\). Assume that the magician doesn’t have magical abilities. Simulate 10000 sampling from our box, sample size = 5. How many times would you reject null hypothesis, i.e. claim that the magician indeed has magical abilities?

Average distribution

Plot the distribution of \(\bar x\) for the previous subproblem.

Rejection region

Find such smallest value \(\bar x_{crit}\) that if you use this, the probability to make type I error would be less than 0.05. Use quantile function.

p-value

Assume that we obtained the following data: 7, 8, 4, 8, 8. Find the corresponding p-value. Would you reject null hypothesis?