Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.


Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

  1. P(x > 110) = 25.2%
# Calculate the probability of the upper tail with pnorm
pnorm(mean = 100, sd = 15, q = 110, lower.tail = FALSE)
## [1] 0.2524925
  1. P(x < 110) = 74.8%
# Calculate the probability of the lower tail with pnorm
pnorm(mean = 100, sd = 15, q = 110)
## [1] 0.7475075

Question 2

  1. What is the probability that a random selected student will have an IQ between 80 and 120? (We haven’t explicitly learned. Think about it though.)
# Subtract the probability of an IQ less than 80 from the probability of an IQ less that 120
pnorm(mean = 100, sd = 15, q = 120) - pnorm(mean = 100, sd = 15, q = 80)
## [1] 0.8175776

The probability of an IQ between 80 and 120 is 81.8%.

  1. Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?
# Calculate the probability that the mean IQ of ten students is greater than 110
# For the probability of a mean, the standard deviation is divided by the square root of the group size
pnorm(mean = 100, sd = 15/sqrt(12), q = 110, lower.tail = FALSE)
## [1] 0.01046067

The probability of 12 students having an average IQ greater than 110 is 1.05%

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Store the NSCC student dataset in environment
nscc_students <- read.csv("nscc_student_data.csv")
# Find the mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the std dev of pulse rates of this sample
sd(nscc_students$PulseRate, na.rm = TRUE)
## [1] 12.51105
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))[["FALSE"]]
## [1] 38

Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found? It is quite unlikely however it is likely to be nearby to the sample mean.

Question 4

If we assume the mean pulse rate for all NSCC students is \(\sigma = 12\), construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude in a complete sentence below.
(Note: we can create a valid confidence interval here since n > 30)

# Store mean
student_pulse_mean <- mean(nscc_students$PulseRate, na.rm = TRUE)
# Store sample size
student_pulse_ss <- table(is.na(nscc_students$PulseRate))[["FALSE"]]

# Store confidence interval margin
student_pulse_conf95_margin <- qnorm(0.975)*12/sqrt(student_pulse_ss)

# Calculate lower bound of 95% CI
student_pulse_mean - student_pulse_conf95_margin
## [1] 69.65831
# Calculate upper bound of 95% CI
student_pulse_mean + student_pulse_conf95_margin
## [1] 77.28906

I am 95% confident that the student pulse rate population mean is between 69.66 and 77.29.

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Store confidence interval margin
student_pulse_conf99_margin <- qnorm(0.995)*12/sqrt(student_pulse_ss)
# Calculate lower bound of 99% CI
student_pulse_mean - student_pulse_conf99_margin
## [1] 68.45943
# Calculate upper bound of 99% CI
student_pulse_mean + student_pulse_conf99_margin
## [1] 78.48794

I am 99% confident that the student pulse rate population mean is between 68.46 and 78.49.

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 5 and 6. The 99% confidence interval is wider than the 95% interval. This is because a wider interval has a higher chance of containing the true population mean. We are sacrificing accuracy for confidence.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits with \(\sigma = 3.1\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

  1. Write the hypotheses \[H_0: \mu = \mu_{old}\] \[H_A: \mu \neq \mu_{old}\]

  2. Create confidence interval

# Calculate mean of Credits variable
student_credits_mean <- mean(nscc_students$Credits, na.rm = TRUE)

# Calculate sample size of Credits variable
student_credits_ss <- table(is.na(nscc_students$Credits))[["FALSE"]]
# Calculate the confidence bounds
student_credits_conf95_margin <- qnorm(0.995)*3.1/sqrt(student_credits_ss)

# Lower bound of 95% CI
student_credits_mean - student_credits_conf95_margin
## [1] 10.51245
# Upper bound of 95% CI
student_credits_mean + student_credits_conf95_margin
## [1] 13.03755
  1. Make decision to reject H0 or fail to reject H0 based on confidence interval
    As \(\mu_{old}\) is well within our confidence interval, we must fail to reject \(H_0\)

  2. Write a concluding statement
    There is not sufficient evidence that this year’s average credits taken is any different from the 2009-10 academic year.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults. Recall the assumption that \(\sigma = 12\) for NSCC student pulse rates.

  1. Write the hypotheses \[H_0: \mu = 72\] \[H_A: \mu \neq 72\]

  2. Calculate p-value of getting sample data by chance

# Calculate mean of PulseRate variable
student_pulse_mean <- mean(nscc_students$PulseRate, na.rm = TRUE)
student_pulse_ss <- table(is.na(nscc_students$PulseRate))[["FALSE"]]
# Probability of getting that sample data by random chance if mean was indeed 72bpm
pnorm(mean = 72, sd = 12/sqrt(student_pulse_ss), q = student_pulse_mean)
## [1] 0.775485
  1. Make decision to reject H0 or fail to reject H0 at a significance level of 0.05 based on p-value.
    We fail to reject \(H_0\) as the p-value is greater than 0.05.

  2. Write a concluding statement
    There is not sufficient evidence that NSCC students have a significantly differing pulse rate from the national average.