In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
# Probability when x is greater than 110
pnorm(110, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.2524925
When randomly selected, the probability of having an IQ greater than 110 is 25.24%
# Probability when x is less than 110
pnorm(110, mean = 100, sd = 15)
## [1] 0.7475075
When randomly selected, the probability of having an IQ less than 110 is 74.75
# Probability of a random student with an IQ less than 80
pnorm(80, mean = 100, sd = 15)
## [1] 0.09121122
# Probability of a random student with an IQ less than 120
pnorm(120, mean = 100, sd = 15)
## [1] 0.9087888
# To find the probability of an IQ between 80 and 120 we should find the difference between IQ < 120 and IQ < 80
0.9087888 - 0.09121122
## [1] 0.8175776
Since the probability of having an IQ less than 80 is 9.12% and the probability of having of an IQ less 120 is 90.88%, the probability of a random selected student with an IQ between 80 and 120 is 81.76%
# Probability of the mean IQ of 12 random students greater that 110
pnorm(110, mean = 100, sd = 15/sqrt(12), lower.tail = FALSE)
## [1] 0.01046067
The probability that their mean is greater than 110 is 1.05%
Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset and answer the question that follows below.
# Store the NSCC student dataset in environment
nscc_students <- read.csv("~/Desktop/nscc_student_data.csv")
# Find the mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the std dev of pulse rates of this sample
sd(nscc_students$PulseRate, na.rm = TRUE)
## [1] 12.51105
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))
##
## FALSE TRUE
## 38 2
Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?
It is unlikely that the population mean pulse rate for all NSCC students is exactly equal to the sample mean found.
If we assume the mean pulse rate for all NSCC students is \(\sigma = 12\), construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude in a complete sentence below.
(Note: we can create a valid confidence interval here since n > 30)
# Store mean
mean_PulseRate <- mean(nscc_students$PulseRate, na.rm = TRUE)
# Calculate lower bound of 95% CI
mean_PulseRate - 1.96*(12/sqrt(38))
## [1] 69.65824
# Calculate upper bound of 95% CI
mean_PulseRate + 1.96*(12/sqrt(38))
## [1] 77.28913
I am 95% confident that the mean pulse rate for all NSCC students is between 69.66 and 77.29
Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.
# Calculate lower bound of 99% CI
mean_PulseRate - 2.58*(12/sqrt(38))
## [1] 68.45131
# Calculate upper bound of 99% CI
mean_PulseRate + 2.58*(12/sqrt(38))
## [1] 78.49606
I am 99% confident that the mean pulse rate for all NSCC students is between 68.45 and 78.50
Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.
The 99% confidence interval is wider than the 95% confidence interval. The greater the confidence the more possible values the interval can have.
In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits with \(\sigma = 3.1\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.
\(H_0\): \(\mu\) = 12.1
\(H_A\): \(\mu\) \(\neq\) 12.1
# Calculate mean of Credits variable
mean(nscc_students$Credits)
## [1] 11.775
# Calculate sample size of Credits variable
table(is.na(nscc_students$Credits))
##
## FALSE
## 40
# Store mean
mean_credits <- mean(nscc_students$Credits)
# Lower bound of 95% CI
mean_credits - 1.96*(3.1/sqrt(40))
## [1] 10.8143
# Upper bound of 95% CI
mean_credits + 1.96*(3.1/sqrt(40))
## [1] 12.7357
I am 95% confident that the mean credits for all NSCC students is between 10.81 and 12.74
Since the average of 12.1 credits is within our confidence interval, we fail to reject \(H_0\).
There is not sufficient evidence to say that the average credits for NSCC last year is any different than the average credits from fall 2009.
NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults. Recall the assumption that \(\sigma = 12\) for NSCC student pulse rates.
\(H_0\): \(\mu\) = 72
\(H_A\): \(\mu\) > 72
# Calculate mean of PulseRate variable
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Probability of getting that sample data by random chance if mean was indeed 72bpm
pnorm(73.47, mean = 72, sd = 12/sqrt(38), lower.tail = FALSE)
## [1] 0.2250823
the p-value is 0.2250823
Since the p-value is > 0.05, we fail to reject the \(H_0\).
There is not sufficient evidence to support the claim that NSCC students have a higher pulse rate than the national average of 72 bpm for adults.