Week 3 Practice Quiz

Here are my answers to the Week 3 Practice Quiz Activity of the couse Inferential Statistics with R presented by Coursera and conducted by Mine Çetinkaya-Rundel.

Packages

We will use the devtools package to install the statsr package associated with this course. We need to install and load this package.

install.packages("devtools")
library(devtools)

Now we can install the rest of the packages we will use during the course. Type the following commands in the Console as well:

install.packages("dplyr")
install.packages("ggplot2")
install.packages("shiny")
install_github("StatsWithR/statsr")

Consider the width of two bootstrap confidence intervals constructed based on the same sample. One of the intervals is constructed at a 90% confidence level and the other is constructed at a 95% confidence level. Which of the following is true?
1. The 95% interval is wider.
2. The intervals are the same size.
3. The 90% interval is wider.
4. There is not enough information to determine which interval is wider.

Construct bootstrap confidence intervals using one of the following methods:

Percentile method: XX% confidence level is the middle XX% of the bootstrap distribution.
Standard error method: If the standard error of the bootstrap distribution is known, and the distribution is nearly normal, the bootstrap interval can also be calculated as \(\bar{x}_{boot} \pm z^{*} SE_{boot}\).

Recognize that when the bootstrap distribution is extremely skewed and sparse, the bootstrap confidence interval may not be appropriate.

Since the intervals are based on the same sample we know that the 95% interval will be wider, leaving out only 5% of samples whereas the 90% interval leaves out 10% of samples.

Which of the following is not a situation where the paired test is preferred?
1. Compare artery thicknesses at the beginning of a study and after 2 years of taking Vitamin E.
2. Assess effectiveness of a diet regimen by comparing the before and after weights of subjects.
3. Compare pre- (beginning of semester) and post-test (end of semester) scores of students.
4. Assess gender-related salary gap by comparing salaries of randomly sampled men and women.

This question refers to the following:

Define observations as paired if each observation in one dataset has a special correspondence or connection with exactly one observation in the other data set.
Carry out inference for paired data by first subtracting the paired observations from each other, and then treating the set of differences as a new numerical variable on which to do inference (such as a confidence interval or hypothesis test for the average difference).

Since the subjects were sampled randomly, each observation in the men’s group does not have a special correspondence with exactly one observation in the other (women’s) group.

You’ve just read a study that investigated the difference in brain sizes between EU and US citizens, based on data from random samples from both populations. At the 5% significance level the study failed to reject the null hypothesis that EU and US citizens have (on average) brains of equal size. Which of the following is true regarding a 99% confidence interval for the difference in brain sizes?
1. Since the data come from samples and not populations, no conclusions can be made.
2. The interval does not contain 0.
3. Without more information, it is impossible to know whether the interval contains 0.
4. The interval contains 0.

This question refers to the following:

Recognize that a good interpretation of a confidence interval for the difference between two parameters includes a comparative statement (mentioning which group has the larger parameter).
Recognize that a confidence interval for the difference between two parameters that doesn’t include 0 is in agreement with a hypothesis test where the null hypothesis that sets the two parameters equal to each other is rejected.

Because the study failed to reject the null at the 5% significance level, 0 is in the 95% confidence interval for the difference. Consequently, since a 99% interval is even wider, 0 will certainly be in the 99% interval too.

The figure below shows three unimodal and symmetric curves, which assignment is most plausible?
1. Solid: normal. Dashed: \(t_{df=5}\). Dotted: \(t_{df=1}\).
2. Solid: normal. Dashed: \(t_{df=1}\). Dotted: \(t_{df=5}\).
3. Solid: \(t_{df=1}\). Dashed: \(t_{df=5}\). Dotted: normal.
4. Solid: \(t_{df=5}\). Dashed: \(t_{df=1}\). Dotted: normal.

This question refers to the following:

Describe how the t-distribution is different from the normal distribution, and what ``heavy tail" means in this context.
Note that the t-distribution has a single parameter, degrees of freedom, and as the degrees of freedom increases this distribution approaches the normal distribution.

As the degrees of freedom increases the t distribution starts approaching the normal distribution, and t distributions with lower degrees of freedom will have heavier tails than t distributions with higher degrees of freedom.

We are testing the following hypotheses: \(H_{0}: \mu = 3\), \(H_{A}: \mu >3\). The sample size is 18 and the test statistic is calculated as T = 0.5. What is the p-value
1. between 0.01 and 0.025
2. less than 0.01
3. greater than 0.1
4. between 0.05 and 0.1
5. less than 0.005

pt(0.5, df = 18 - 1, lower.tail = FALSE)

## [1] 0.3117426

What does ANOVA mean?
1. Assessment of null observed variability
2. Assessment of orthogonal variation
3. Analysis of variance
4. Aardvarks not over vanilla ants

Analysis of variance (ANOVA) is defined as a statistical inference method that is used to determine - by simultaneously considering many groups at once - if the variability in the sample means is so large that it seems unlikely to be from chance alone.

Which of the following is not a condition required for comparing means across multiple groups using ANOVA?
1. The means of each group should be roughly equal.
2. The data within each group should be nearly normal.
3. The observations should be independent within and across groups.
4. The variability across the groups should be about equal.

Listing the conditions necessary for performing ANOVA:

the observations should be independent within and across groups,
the data within each group are nearly normal,
the variability across the groups is about equal.

Whether the means of each group are equal or not is what the ANOVA tests for, so equality of the means is not a required condition for ANOVA.

Which of the following looks most like an F distribution?
1. T distribution.
2. F distribution
3. Normal distribution.
4. Another one.

Recognize that the test statistic for ANOVA, the F statistic, is calculated as the ratio of the mean square between groups (MSG, variability between groups) and mean square error (MSE, variability within errors).

Also recognize that the F statistic has a right skewed distribution with two different measures of degrees of freedom: one for the numerator (\(df_G = k-1\), where \(k\) is the number of groups), and one for the denominator (\(df_E = n-k\), where \(n\) is the total sample size).

Week 3 Practice Quiz

R Markdown

Packages