Assignment 2 - Statistics

Confidence intervals
P-values
Errors in inference

Confidence intervals

1.

What does Vasishth mean when he states that the population mean is not a random variable? (p. 61)

For something to be a random variable, it must have multiple possible values with associated probabilities, like the outcomes of a coin toss or a roll of a die. However, the population mean is not a random variable in this sense.Vasishth asserts therefor that the population mean is not a random variable because it’s a fixed point value and can’t vary that represents the average of the entire population.

2.

Read chapter 3.6 and 3.7. How do you change the \(\]alpha\)-level in the t.test() function?

In R, adjust the alpha level in t.test() using conf.level. Default: 0.05 (0.95 confidence). Specify desired value (0-1) with conf.level.

# https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test

# Create 'sample' object
sample <- rnorm(11, mean = 60, sd = 4)

# By providing the set confidence level as an argument in/of the t.test function (), by using (the); conf.level argum. If no argument is provided for the confidence level given for the the t.test() function defaults too 0.95

t.test(sample)$conf.int

## [1] 56.46104 63.62816
## attr(,"conf.level")
## [1] 0.95

t.test(sample, conf.level = 0.99)$conf.int

## [1] 54.9474 65.1418
## attr(,"conf.level")
## [1] 0.99

3.

During the lecture we collectively guessed my length. Describe two ways to ensure that your collective guess is closer to my true length.

1/ Increase the number of measurements taken by different individuals during the lecture; collecting data from more ((and)various) people helps minimize individual errors and biases. By averaging measurements from increased sources,you get a more reliable estimate of your true length, therefor reducing the impact of outliers and random errors.

2/ Excluding outliers, which are data points significantly deviating from the rest, such as unusually high or low guesses compared to the majority, improves the collective guess accuracy by mitigating their influence. These outliers may arise from measurement errors, task misunderstanding, or deliberate exaggeration.

Bonus / By minizing the risk of 3rd variables, so by manipulating them to be more universally shared among guessers, like: In a lecture hall with seats at varying distances from the person whose length is being guessed, disparities in perception and measurement accuracy arise. Closer observers that quess your true length may have a clearer view and make more accurate estimations, while those farther away may struggle. Or by making sure that an inherent understandig of the units of measurement is shared among guessers.

4.

Given that \(\bar{x} = 5\), \(SD = 2\) and the sample size is 50, what is the 90% CI. the 90% confidence interval is approximately (4.5348, 5.4652), rounded to four decimal places.

# Given data
x_bar <- 5     # Sample mean 
SD <- 2        # Standard deviation
n <- 50        # Sample size
confidence_level <- 0.90 # Confidence level

# Calculate the middle part; the object represents the proportion of the distribution that lies within the middle portion, excluding both tails.
mp <- (1 - (1 - confidence_level) / 2) 

# Calculate standard error
standard_error <- SD / sqrt(n)

# Find the critical value for the desired confidence level
critical_value <- qnorm(mp) # critical_value is the z-score corresponding to the desired confidence level.

# Calculate the margin of error
margin_of_error <- critical_value * standard_error

# Calculate the confidence interval
lower_bound <- x_bar - margin_of_error # Calculate the lower bound of the Confidence Interval (CI)
upper_bound <- x_bar + margin_of_error # Calculate the higher bound of the Confidence Interval (CI)
lower_bound; upper_bound

## [1] 4.534765

## [1] 5.465235

5.

What is the probability that 0 (hypothesis about the true population mean) is located within this CI?

the probability that the hypothesis value (0) about the true population mean is 0, because it does not fall in within the specified confidence interval; (4.5348, 5.4652), note. these are rounded to four decimal places.

6.

What happens with the CI if the sample size increases?

As sample size increases, the precision of the estimate improves, resulting in a narrower confidence interval. This reflects a higher level of confidence in the proximity of the sample mean to the true population mean. In essence, a larger sample size provides more information about the population, leading to a more reliable and precise estimation of the population mean through the confidence interval.

7.

How big is the CI if we have data from the full population?

If you have data from the full population, the confidence interval would technically be zero because there is no uncertainty about the population parameter. Confidence intervals are used to estimate a population parameter (like the mean) based on a sample. Since you have information about every individual in the population, you already know the population mean with certainty. There’s no sampling error or uncertainty involved, so the confidence interval collapses to a single point, the actual population mean itself.

8.

Assume \(\alpha = .5\). What is the probability that \(\mu\) is located within the estimated CI, when we keep repeating the experiment? Check your answer with some simulation based on: \(\mu = 10\), \(\sigma = 3\) and \(N=50\).

The probability that µ (mu) is located within the estimated CI is with an increasingly higher amount of simulations closer to 0.5. Which is logical because setting the confidence interval parameter to 0.5 implies a 0.5 probability that the population mean will lie within the confidence interval.

# Parameters
mu <- 10  # True population mean
sigma <- 3  # Population standard deviation
N <- 50  # Sample size
alpha <- 0.5  #  alpha (α) represents the significance  
confidence_level <- 1 - alpha #Calculate from alpha to confidence level (which is in this case the same)
num_simulations <- 10000  # Number of simulations

# Perform simulations
num_within_ci <- replicate(num_simulations, {
  
  # Generate random sample from normal distribution
  sample_data <- rnorm(N, mu, sigma)
  
  # Calculate sample mean
  sample_mean <- mean(sample_data)
  
  # Calculate margin of error
  mp <- (1 - (1 - confidence_level) / 2)
  margin_of_error <- qnorm(mp) * (sigma / sqrt(N))
  
  # Calculate confidence interval
  lower_bound <- sample_mean - margin_of_error
  upper_bound <- sample_mean + margin_of_error
  
  # Check if true mean falls within confidence interval
  if (lower_bound <= mu && mu <= upper_bound) {
    return(1)
  } else {
    return(0)
  }
})

# Calculate probability
probability_within_ci <- mean(num_within_ci)
probability_within_ci

## [1] 0.4955

# Create a bar plot comparing counts of 0s and 1s
barplot(table(num_within_ci), 
        main = "Comparison of simulations outside and inside the confidence interval", 
        xlab = "Value",
        ylab = "Count",
        col = c("red", "blue"),
        legend.text = c("Outside CI", "Inside CI"))

9.

A 95% confidence interval has a ?% chance of describing the sample mean: A) 95% B) 100%

A 95% confidence interval has a 95% chance of describing the sample mean, so ‘A’ is the answer.

10.

For the same data, a 90% CI will be wider than a 95% CI. A) True B) False

The higher the CI the bigger the difference between the LB and UB, so the wider. So ‘B’ is the answer.

P-values

11.

In the lecture we showed that p-values are uniformly distributed given that \(H_{0}\) is true. Now describe the distribution of p-values if \(H_{0}\) is false. Give an explanation.

When the null hypothesis (H0) is true, p-values follow a uniform distribution, meaning all values between 0 and 1 are equally likely. However, if H0 is false and there is a genuine effect in the population, the distribution of p-values skews towards smaller values, indicating stronger evidence against H0. This skewness reflects the increased likelihood of observing extreme or significant results in the sample, leading to smaller p-values.

12.

Use the R code from the lecture slides to investigate what happens to the distribution of p-values in different scenarios. Explain in every scenario why the distribution of the p-values did or did not change, and show how you’ve altered the code for each scenario.

p_values <- numeric() # Create empty vector to store p-values
N <- 5  # Sample of 5 
for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) # Sample from normal distribution with mean 7 and SD 1
  s2 <- rnorm(N, 7, 1) # Sample from normal distribution with mean 7 and SD 1
  p_values[i] <- t.test(s1, s2)$p.value # Test for difference in mean and store p-values
}
h <- hist(p_values, br = 50) # Plot p-values in histogram

How does the distribution of p-values change if:

12a.

there is an effect of Ritalin on study success?

In this scenario, there is the existence of a true effect, indicating that the null hypothesis is false. Consequently, the distribution of p-values shifts, resulting in a simulation with greater frequency of smaller p-values. By altering the mean of s2 from 7 to 8, you can observe a departure from uniform probabilities in the histogram. Instead, the probabilities are now skewed towards smaller p-values, reflecting the increased likelihood of observing significant results indicative of the true effect.

p_values <- numeric() 
N <- 5 
for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) 
  s2 <- rnorm(N, 8, 1)  # Increase the mean of s2 to represent an effect
  p_values[i] <- t.test(s1, s2)$p.value 
}
h <- hist(p_values, br = 50)

12b.

if this effect is bigger than the effect in question 12a?

In this scenario, you augment the disparity between the means of the two samples to portray a more substantial effect size. With this larger effect size, the evidence opposing the null hypothesis intensifies, yielding more pronounced skewing towards smaller p-values. By adjusting the mean of S2 from 8 to 9, the effect size increases further, evident in the histogram’s augmented skew towards lower p-values.

p_values <- numeric() 
N <- 5 
for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) 
  s2 <- rnorm(N, 9, 1)  # Increase the mean of s2 further to represent a larger effect
  p_values[i] <- t.test(s1, s2)$p.value 
}
h <- hist(p_values, br = 50)

12c.

if the power is bigger than the power in question 12a?

To increase power, enlarging the sample size is an a way to do so. Consequently, you observe a heightened skew towards smaller p-values in this scenario, attributed to the diminished role of chance in the mean differences due to the larger sample size.

p_values <- numeric() 
N <- 25  # Increase sample size
for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) 
  s2 <- rnorm(N, 8, 1) 
  p_values[i] <- t.test(s1, s2)$p.value 
}
h <- hist(p_values, br = 50)

12d.

if there is an effect and you use a one-sided rather than a two-sided test?

When conducting a one-sided test instead of a two-sided test, the focus of evaluating evidence shifts towards one direction, potentially leading to different interpretations of the results. However, in scenarios where both samples have the same mean, as in this case, the effect of using a one-sided test might not be immediately apparent in the distribution of p-values.

# One-sided tests with smaller and greater alternatives
p_values_less <- numeric()
p_values_greater <- numeric()
N <- 5 
for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) 
  s2 <- rnorm(N, 7, 1) 
  
  # One-sided test for smaller alternative
  p_values_less[i] <- t.test(s1, s2, alternative = "less")$p.value 
  
  # One-sided test for greater alternative
  p_values_greater[i] <- t.test(s1, s2, alternative = "greater")$p.value 
}
h_less <- hist(p_values_less, br = 50)

h_greater <- hist(p_values_greater, br = 50)

12e.

if \(\alpha\) is .5 rather than .05?

Increase the significance level to α = 0.5. In this scenario, where the significance level (α) is increased to 0.5, the resulting p-values from the t-tests remain the same. The only difference is in the width of the confidence intervals generated by the t-tests, as determined by the specified confidence level. As a result, the histograms of the p-values appear similar, showing no significant difference between the distributions.

p_values_conf_0.5 <- numeric() 
p_values_default <- numeric() 
N <- 5 

for (i in 1:1e5) {
  s1 <- rnorm(N, 7, 1) 
  s2 <- rnorm(N, 7, 1) 
  
  p_values_conf_0.5[i] <- t.test(s1, s2, conf.level = 0.5)$p.value 
  
  p_values_default[i] <- t.test(s1, s2)$p.value 
}

h_conf_0.5 <- hist(p_values_conf_0.5, br = 50)

h_default <- hist(p_values_default, br = 50)

13.

Does the confidence interval of a sample mean changes if you choose a different \(H_{0}\)? Explain.

The confidence interval of a sample mean does not change based on the choice of a different null hypothesis (H0). The confidence interval is calculated solely based on the sample data and the chosen confidence level. It is not affected by the specific hypothesis being tested.

14.

Does the confidence interval of a sample mean changes if you choose a different \(\alpha\)? Explain.

Choosing a different alpha directly affects both the confidence interval and confidence level of a sample mean. Alpha, representing the probability of mistakenly rejecting a true null hypothesis, and confidence level, representing the probability that the true population mean lies within the calculated confidence interval, have an inverse relationship. Higher alpha results in a wider confidence interval, decreasing certainty but lowering the chance of mistakenly excluding the true mean, while lower alpha leads to a narrower confidence interval, increasing certainty but raising the risk of excluding the true mean due to sampling error.

15.

True or false? The p-value is the probability of the null hypothesis being true.

False. The p-value is not the probability of the null hypothesis being true. Instead, it represents the probability of observing the sample data, or more extreme data, given that the null hypothesis is true. In other words, it indicates the strength of evidence against the null hypothesis.

16.

True or false? The p-value is the probability that the result occurred by chance.

False. The p-value is not the probability of the null hypothesis being true. Instead, it represents the probability of obtaining the observed data (or more extreme data) under the assumption that the null hypothesis is true. If the p-value is very small (typically less than a chosen significance level, such as 0.05), we reject the null hypothesis in favor of the alternative hypothesis. However, it does not directly tell us the probability that the null hypothesis itself is true.

Errors in inference

17.

Imagine that 100 researchers study whether vegetarians have a higher IQ, when in reality there is no effect. How often do you expect a significant result nonetheless? Assume that the researchers perform proper studies with \(\alpha = .05\).

With 100 independent studies using α = 0.05, even if there’s no actual difference in IQ between vegetarians and non-vegetarians, about 5 studies are expected to show a significant effect purely by chance due to the probability of a type I error. Random fluctuations in data, also, can lead to statistically significant results.

18.

What statistical error in the conclusion of the researchers is made when a significant result is found?

The statistical error made by the researchers when a significant result is found is a Type I error. This occurs when they incorrectly reject the null hypothesis (i.e., conclude that there is a difference in IQ between vegetarians and non-vegetarians) when it is actually true (i.e., there is no difference).

19.

Assume journals only publish significant effects. Is the observed effect size in those journals different from the true effect size? Explain.

Yes, the observed effect size in journals that only publish significant effects would likely be different from the true effect size. This is because of publication bias, where only statistically significant results are published, leading to an overrepresentation of such results in the literature. Consequently, smaller or non-significant effects may be overlooked or underreported, skewing the perceived effect size in the published literature compared to the true effect size in the population.

20a.

If you want to increase the probability of finding an effect when in reality there is no effect, should you use many or few participants? Or doesn’t it matter? Why?

When there are fewer participants, it’s more likely to detect significant differences as the sample size decreases. This is because larger samples tend to better represent the population, and if the population lacks an effect, larger samples are less likely to produce significant differences. However, using fewer participants indeed increases the probability of finding a false positive effect. Smaller sample sizes lead to lower statistical power, making it more likely to detect effects that do not truly exist. Additionally, random fluctuations in the data can have a larger impact in smaller studies, potentially leading to statistically significant results by chance alone, even in the absence of a real effect. Moreover, with fewer participants, random noise in the data can become statistically significant, further increasing the likelihood of a false positive. See below for a partial simulation:

# Set simulation parameters
num_simulations <- 1000  # Increase simulations for more robust results
alpha <- 0.05             # Significance level
mean <- 10                # Mean for the normal distributions
sd <- 5                  # Standard deviation for the normal distributions

# Define function to perform t-test and find significant results
find_significant <- function(n) {
  # Initialize counter for significant results
  num_significant <- 0
  
  # Perform t-test for each simulation
  for (i in 1:num_simulations) {
    # Generate data from normal distributions with same mean and SD
    control <- rnorm(n, mean, sd)
    treatment <- rnorm(n, mean, sd)
  
    # Perform t-test
    t_test_result <- t.test(control, treatment)
    
    # Check if p-value is less than alpha
    if (t_test_result$p.value < alpha) {
      num_significant <- num_significant + 1
    }
  }
  
  # Calculate proportion of simulations with significant results
  proportion_significant <- num_significant / num_simulations
  
  # Return data frame with counts and additional statistics
  return(data.frame(
    Sample_Size = n,
    num_significant = num_significant,
    proportion_significant = proportion_significant
  ))
}

# Run simulations for different sample sizes
sample_sizes <- c(10, 100, 1000, 10000)
results_list <- lapply(sample_sizes, find_significant)

# Combine results into a single dataframe
results_df <- do.call(rbind, results_list)

# Print the results dataframe
results_df

##   Sample_Size num_significant proportion_significant
## 1          10              46                  0.046
## 2         100              48                  0.048
## 3        1000              50                  0.050
## 4       10000              44                  0.044

20b.

If you want to increase the probability of finding an effect when in reality there is also actually an effect, should you use many or few participants? Or doesn’t it matter? Why?

In the context where a genuine effect exists, employing a larger number of participants enhances the likelihood of uncovering it. This is primarily due to the greater statistical power associated with larger sample sizes: larger sample sizes directly contribute to higher statistical power, which denotes a study’s ability to detect genuine effects.With an increased number of participants, the study becomes more adept at identifying even subtle effects, thereby elevating the probability of rejecting the null hypothesis (no effect) and accurately affirming the presence of a genuine effect.To illustrate, consider the analogy of searching for a rare flower: in a small field, the probability of missing the flower by chance is considerable. Conversely, exploring a larger field with a greater number of plants substantially heightens the likelihood of discovering the flower. Similarly, in research a larger sample size functions akin to a larger field, offering more opportunities to observe the genuine effect.This becomes particularly significant for detecting smaller effects, which may prove challenging to identify in studies with fewer participants.

21.

Suppose you came across a study discussing the impact of Ritalin on study success. The study found no significant difference in the performance of students who take Ritalin compared to those who do not. You observe that the study was conducted with a small number of participants, and thus had low power. Based on their study, the researchers concluded that Ritalin does not have any effect on study success.

What do you think of this conclusion? Do you share the researchers conclusion or would you conclude something else? If so, what would you conclude?

It’s prudent to approach the researchers’ conclusion with skepticism due to the small sample size and low statistical power, which may have hindered the study’s ability to detect potential effects of Ritalin on study success. Adhering to the null hypothesis based on this research is sensible, but it’s essential to acknowledge that this lack of significance doesn’t necessarily imply a lack of effect in the broader population. Replicating the study with larger sample sizes and higher statistical power is indeed crucial for obtaining more reliable and generalizable conclusions regarding the impact of Ritalin on study success.

22.

Suppose you came across another study discussing the impact of Ritalin on study success. In this study, the researchers do find a significant difference between students that use Ritalin and students that don’t use Ritalin. You notice that the research was done with very few participants and thus had low power. The researchers conclude that Ritalin has an effect.

How could they have found an effect, even in such a small sample? (Assume it’s a robust experiment and we trust the researchers.)

In a study with a small sample size, a significant effect can still be found if the effect size is large. Group differences matter more than sample size. Low variability within groups and a larger difference between groups make it easier to detect an effect. The choice of statistical test and alpha level also impact significance. Small samples increase false positives, but significant effects are possible with large effect sizes or favorable factors. Caution is needed in interpreting such results, and replication with larger samples is necessary for confirmation.

23.

Suppose you carry out a between-participants experiment where you have a control and treatment group, say 20 participants in each of the groups. You carry out a two-sample t-test to test for a difference in means between the groups and find that the result is \(\emph{t}(18)=2.7, p<.01\). Which of the following statements are true? Explain your answer for each of the statements.

23a.

You have absolutely disproved the null hypothesis.

False. While a significant p-value suggests that the null hypothesis (no difference between groups) is unlikely, it does not definitively disprove it. It simply indicates that there is evidence against the null hypothesis, but other factors such as effect size and study design should also be considered before drawing conclusions. Also, factors like chance should be considered, because it could have influenced the outcome.

23b.

The probability of the null hypothesis being true is 0.01.

False. The p-value (p < 0.01) indicates the probability of obtaining the observed result (or more extreme results) under the assumption that the null hypothesis is true. It does not directly provide the probability of the null hypothesis itself being true.

23c.

You have absolutely proved that there is a difference between the two means.

False. A significant result (p < 0.01) suggests that there is evidence of a difference between the means, but it does not prove it with absolute certainty. It indicates that the observed difference is unlikely to have occurred by random chance alone, but it does not confirm the presence of a true difference. Effect size and other factors should also be considered.

23d.

You have a reliable experimental finding in the sense that if you were to repeat the experiment 100 times, in 99% of the cases you would get a significant result.

The experiment reliably produces significant results in 99% of repeated trials, indicating strong consistency. However, the practical significance of these findings must be scrutinized, as statistical significance doesn’t necessarily equate to meaningful outcomes. Critical evaluation of the experimental design, control of confounding variables, and choice of significance level are essential for accurate interpretation. Despite the high rate of significant results, careful analysis is required to draw meaningful conclusions from the experiment’s outcomes.