Here are my answers to the Week 3 Practice Quiz Activity of the couse Inferential Statistics with R presented by Coursera and conducted by Mine Çetinkaya-Rundel.

R Markdown

This is a R Markdown file. To a better viewing, it could be forked and knitted on RStudio to a html file or it could be viewed directly as a RPubs publication.

Packages

We will use the devtools package to install the statsr package associated with this course. We need to install and load this package.

install.packages("devtools")
library(devtools)

Now we can install the rest of the packages we will use during the course. Type the following commands in the Console as well:

install.packages("dplyr")
install.packages("ggplot2")
install.packages("shiny")
install_github("StatsWithR/statsr")
  1. Consider the width of two bootstrap confidence intervals constructed based on the same sample. One of the intervals is constructed at a 90% confidence level and the other is constructed at a 95% confidence level. Which of the following is true?
    1. The 95% interval is wider.
    2. The intervals are the same size.
    3. The 90% interval is wider.
    4. There is not enough information to determine which interval is wider.
    5. The 95% interval is wider.

Construct bootstrap confidence intervals using one of the following methods:

Recognize that when the bootstrap distribution is extremely skewed and sparse, the bootstrap confidence interval may not be appropriate.

Since the intervals are based on the same sample we know that the 95% interval will be wider, leaving out only 5% of samples whereas the 90% interval leaves out 10% of samples.

  1. Which of the following is not a situation where the paired test is preferred?
    1. Compare artery thicknesses at the beginning of a study and after 2 years of taking Vitamin E.
    2. Assess effectiveness of a diet regimen by comparing the before and after weights of subjects.
    3. Compare pre- (beginning of semester) and post-test (end of semester) scores of students.
    4. Assess gender-related salary gap by comparing salaries of randomly sampled men and women.
    5. The paired test is not preferred to Assess gender-related salary gap by comparing salaries of randomly sampled men and women.

This question refers to the following:

Since the subjects were sampled randomly, each observation in the men’s group does not have a special correspondence with exactly one observation in the other (women’s) group.

  1. You’ve just read a study that investigated the difference in brain sizes between EU and US citizens, based on data from random samples from both populations. At the 5% significance level the study failed to reject the null hypothesis that EU and US citizens have (on average) brains of equal size. Which of the following is true regarding a 99% confidence interval for the difference in brain sizes?
    1. Since the data come from samples and not populations, no conclusions can be made.
    2. The interval does not contain 0.
    3. Without more information, it is impossible to know whether the interval contains 0.
    4. The interval contains 0.
    5. The interval does contain zero.

This question refers to the following:

Because the study failed to reject the null at the 5% significance level, 0 is in the 95% confidence interval for the difference. Consequently, since a 99% interval is even wider, 0 will certainly be in the 99% interval too.

  1. The figure below shows three unimodal and symmetric curves, which assignment is most plausible?
    1. Solid: normal. Dashed: \(t_{df=5}\). Dotted: \(t_{df=1}\).
    2. Solid: normal. Dashed: \(t_{df=1}\). Dotted: \(t_{df=5}\).
    3. Solid: \(t_{df=1}\). Dashed: \(t_{df=5}\). Dotted: normal.
    4. Solid: \(t_{df=5}\). Dashed: \(t_{df=1}\). Dotted: normal.
    5. The solid curce is normal, while and dashed and dotted are t-distributions with \(t_{df=5}\) and \(t_{df=1}\), respectively.

This question refers to the following:

As the degrees of freedom increases the t distribution starts approaching the normal distribution, and t distributions with lower degrees of freedom will have heavier tails than t distributions with higher degrees of freedom.

  1. We are testing the following hypotheses: \(H_{0}: \mu = 3\), \(H_{A}: \mu >3\). The sample size is 18 and the test statistic is calculated as T = 0.5. What is the p-value
    1. between 0.01 and 0.025
    2. less than 0.01
    3. greater than 0.1
    4. between 0.05 and 0.1
    5. less than 0.005
    6. The p-value is greater than 0.1.
pt(0.5, df = 18 - 1, lower.tail = FALSE)
## [1] 0.3117426
  1. What does ANOVA mean?
    1. Assessment of null observed variability
    2. Assessment of orthogonal variation
    3. Analysis of variance
    4. Aardvarks not over vanilla ants
    5. ANOVA meas analysis of variance.

Analysis of variance (ANOVA) is defined as a statistical inference method that is used to determine - by simultaneously considering many groups at once - if the variability in the sample means is so large that it seems unlikely to be from chance alone.

  1. Which of the following is not a condition required for comparing means across multiple groups using ANOVA?
    1. The means of each group should be roughly equal.
    2. The data within each group should be nearly normal.
    3. The observations should be independent within and across groups.
    4. The variability across the groups should be about equal.
    5. The means of each group are not required to be roughtly equal.

Listing the conditions necessary for performing ANOVA:

Whether the means of each group are equal or not is what the ANOVA tests for, so equality of the means is not a required condition for ANOVA.

  1. Which of the following looks most like an F distribution?
    1. T distribution.
    2. F distribution
    3. Normal distribution.
    4. Another one.
    5. The first one looks like most an F distribution.

Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of the mean square between groups (MSG, variability between groups) and mean square error (MSE, variability within errors).

Also recognize that the F statistic has a right skewed distribution with two different measures of degrees of freedom: one for the numerator (\(df_G = k-1\), where \(k\) is the number of groups), and one for the denominator (\(df_E = n-k\), where \(n\) is the total sample size).