In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
#Find the probability that a randomly selected person will have an IQ higher than 110.
pnorm(q = 110, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.2524925
The probability that a randomly selected person will have an IQ greater than 110 is 0.2525.
#Find the probability that a randomly selected person will have an IQ less than 110.
pnorm(q = 110, mean = 100, sd = 15)
## [1] 0.7475075
The probability that a randomly selected person will have an IQ less than 110 is 0.7475.
#Find the probability that a randomly selected person will have an IQ between 80 and 120. To find that probability, we need to find the probability of getting an IQ less than 120, then we need to find a probability of getting an IQ less than 80; the last step will be finding the differenece between p(IQ<120) and p(IQ<80).
pnorm(120, 100, 15) - pnorm(80, 100, 15)
## [1] 0.8175776
The probability that a randomly selected person will have an IQ between 80 and 120 is 0.8176.
pnorm(q = 110, mean = 100, sd = 15/sqrt(12), lower.tail = FALSE)
## [1] 0.01046067
The probability of 12 selected students having mean IQ greater than 110 is 0.0105.
Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset and answer the question that follows below.
# Store the NSCC student dataset in environment
nscc_students <- read.csv("nscc_student_data.csv")
# Find the mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the std dev of pulse rates of this sample
sd(nscc_students$PulseRate, na.rm = TRUE)
## [1] 12.51105
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))
##
## FALSE TRUE
## 38 2
The mean of the pulse rate variable is 73.47, the standard deviation of that variable is 12.51, and the sample size of the variable pulse rate is 38.
Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?
It is highly unlikely that the sample mean is exactly the same as the population mean, but it is possible.
If we assume the mean pulse rate for all NSCC students is \(\sigma = 12\), construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude in a complete sentence below.
(Note: we can create a valid confidence interval here since n > 30)
# Store mean
mean_pulse_rate <- mean(nscc_students$PulseRate, na.rm = TRUE)
# Calculate lower bound of 95% CI
mean_pulse_rate - 1.96*12/sqrt(38)
## [1] 69.65824
# Calculate upper bound of 95% CI
mean_pulse_rate + 1.96*12/sqrt(38)
## [1] 77.28913
We can be 95% confident that the population mean of the pulse rate variable is between 69.66 and 77.29.
Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.
# Calculate lower bound of 99% CI
mean_pulse_rate - 2.58*12/sqrt(38)
## [1] 68.45131
# Calculate upper bound of 99% CI
mean_pulse_rate + 2.58*12/sqrt(38)
## [1] 78.49606
We can be 99% confident that the population mean of the pulse rate variable is between 68.45 and 78.50.
Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.
The confidence interval is wider if we want to be 99% sure that the population mean within a certain range, compared to the 95% confidence interval. To be more sure that the population mean is within a certain range, that range should be bigger/wider.
In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits with \(\sigma = 3.1\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.
The Null hypothesis (\(H_0\)) in this case will be that on average an NSCC student took 12.1 credits last year, same as in Fall 2009. The alternate hypothesis (\(H_A\)) is that on average a student at NSCC took more or less than 12.1 credits last year compared to the Fall of 2009.
\(H_0\): \(\mu_{LY}\) = 12.1
\(H_A\): \(\mu_{LY}\) \(\neq\) 12.1
# Calculate mean of Credits variable
mean(nscc_students$Credits)
## [1] 11.775
# Calculate sample size of Credits variable
table(is.na(nscc_students$Credits))
##
## FALSE
## 40
The mean of the variable Credits in the nscc_students dataset is 11.775 and the sample size is 40.
# Lower bound of 95% CI
11.775 - 1.96*3.1/sqrt(40)
## [1] 10.8143
# Upper bound of 95% CI
11.775 + 1.96*3.1/sqrt(40)
## [1] 12.7357
Based on the 95% confidence interval, we cannot reject the Null hypothesis, because the mean value of 12.1 is within the confidence interval between 10.81 and 12.74.
Because the mean value of 12.1 is within the 95% confidence interval, it is possible that the average number of credits that the students took at the NSCC last year is equal to the average number of credits that a student at NSCC took in Fall of 2009; therefore, we cannot reject the Null hypothesis (\(H_0\)).
NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults. Recall the assumption that \(\sigma = 12\) for NSCC student pulse rates.
In this case, our Null hypothesis (\(H_0\)) is that the average pulse rate of the NSCC students due to the stress level is the same as the national average of 72 bpm for adults. The alternate hypothesis (\(H_A\)) is that the average pulse rate of the NSCC students is higher than the national average of 72 bpm for adults.
\(H_0\): \(\mu_{students}\) = 72
\(H_A\): \(\mu_{students}\) > 72
# Calculate mean of PulseRate variable
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Probability of getting that sample data or more by random chance if mean was indeed 72bpm
pnorm(mean_pulse_rate, 72, 12/sqrt(38), lower.tail = FALSE)
## [1] 0.224515
Since the p-value of 0.2245 is greater than the significance level of 0.05, we fail to reject the \(H_0\).
There is no sufficient evidence to support the claim that students at NSCC have pulse rate related to stress higher than the national average of 72 bpm for adults.