In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
a. P(x > 140)
# Use the pnorm function to find probabilities in a normal distribution.
pnorm(140, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.003830381
The probability of a randomly selected person having an IQ above 140 is 0.0038.
pnorm(110, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.7475075
The probability of a randomly selected person having an IQ less than 110 is 0.7475.
# To calculate this probability, I'll find the probability of an IQ below 120 and subtract the probability of an IQ below 80.
pnorm(120, mean = 100, sd = 15, lower.tail = TRUE) -
pnorm( 80, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.8175776
The probability of a randomly selected person having an IQ between 80 and 120 is 0.8176. This seems reasonable considering that approximately 95% of IQ’s will fall within two standard deviations of the population mean of 100, which is the range 70 to 130.
Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.
a. What is the probability of a randomly selected student will have an IQ greater than 110?
#This is the complement of question 1b, so we can subtract that answer from 1.
1 - pnorm(110, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.2524925
The probability that a randomly selected student has an IQ greater than 110 is 0.2525.
To find this probability, use the pnorm function with the same mean, replacing the standard deviation with the sample mean standard error, which is the population standard deviation of 15 divided by the square root of the sample size of 12. \(SE = \sigma / \sqrt{n} = 15/\sqrt{12}\)
pnorm(110, mean = 100, sd = (15/sqrt(12)), lower.tail = FALSE)
## [1] 0.01046067
I can check this by calculating the z-score and finding the probability from the standard normal distribution. \(z* = \frac{(x - \mu)}{SE}\)
#First, calculate z*
z_star <- (110 - 100)/(15/sqrt(12))
z_star
## [1] 2.309401
#Then, use pnorm to find the probability
pnorm(z_star, mean = 0, sd = 1, lower.tail = FALSE)
## [1] 0.01046067
The probability of the mean of 12 students’ IQ’s being more than 110 is 0.0104, which is smaller than the probability of any individual IQ being over 110, 0.2525.
Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?
#Load the data into the object called nscc_students
nscc_students <- read.csv("C:/Users/Henhoag/Desktop/Math143H/Projects/nscc_student_data.csv")
#First, let's get rid of the NA's. I'll store the subset that excludes observations that are missing pulse rate data in the object called PulseRateStudents
PulseRateStudents <- subset(nscc_students, nscc_students$PulseRate != "NA")
#Find the mean pulse rate.
mean(PulseRateStudents$PulseRate)
## [1] 73.47368
#Calculate the standard deviation of the pulse rates.
sd(PulseRateStudents$PulseRate)
## [1] 12.51105
#The sample size is just the number of rows in the PulseRateStudents object. I'll store it in the variable numberPulseRate.
numberPulseRate <- nrow(PulseRateStudents)
numberPulseRate
## [1] 38
The mean pulse rate of the 38 students is 73.47 bpm, with a standard deviation of 12.51 bpm. I think it is unlikely that the population mean pulse rate for all NSCC students is equal to this sample mean pulse rate.
Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)
95% confidence intervals are calculated with the formula: \(\bar{x}\pm 1.96\cdot\sigma/\sqrt{n}\)
# Store mean and std dev
meanPulseRate <- mean(PulseRateStudents$PulseRate)
sdPulseRate <- sd(PulseRateStudents$PulseRate)
# Calculate lower bound of 95% CI
meanPulseRate - 1.96*(sdPulseRate/sqrt(numberPulseRate))
## [1] 69.49575
# Calculate upper bound of 95% CI
meanPulseRate + 1.96*(sdPulseRate/sqrt(numberPulseRate))
## [1] 77.45162
Based on these data, we are about 95% confident that the average pulse rate of NSCC students is larger than 69.50 bpm but less than 77.45 bpm.
Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.
99% confidence intervals are calculated with the formula: \(\bar{x}\pm 2.58\cdot\sigma/\sqrt{n}\)
# Calculate lower bound of 99% CI
meanPulseRate - 2.58*(sdPulseRate/sqrt(numberPulseRate))
## [1] 68.23742
# Calculate upper bound of 99% CI
meanPulseRate + 2.58*(sdPulseRate/sqrt(numberPulseRate))
## [1] 78.70995
Based on these data, we are about 99% confident that the average pulse rate of NSCC students is larger than 68.24 bpm but less than 78.71 bpm.
Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.
The 95% confidence interval is (69.50, 77.45), which is a range of approximately 8 bpm. The 99% confidence interval is (68.23, 78.71), corresponding to a range of approximately 10.5 bpm. As we increase the confidence interval, we are more confident that the true population mean pulse rate is within the interval.
In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.
\(H_0\): \(\mu_{LastYear} = \mu_{Fall2009} =\) 12.1 credits
\(H_A\): \(\mu_{LastYear} \ne\) 12.1 credits
# Store mean of Credits variable
meanCredits <- mean(nscc_students$Credits)
# Store standard deviation of Credits variable
sdCredits <- sd(nscc_students$Credits)
# Store sample size of Credits variable
#Since I don't see any NA entries in the Credits column, the sample size is the same as the number of students surveyed (40).
numberCredits <- nrow(nscc_students)
# Lower bound of 95% CI
meanCredits - 1.96*(sdCredits/sqrt(numberCredits))
## [1] 10.73056
# Upper bound of 95% CI
meanCredits + 1.96*(sdCredits/sqrt(numberCredits))
## [1] 12.81944
Since the Fall 2009 mean of 12.1 credits falls within the confidence interval of (10.73, 12.82) credits, we fail to reject the \(H_0\).
NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.
\(H_0\): \(\mu_{NSCC}\) = \(\mu_{Adult}\)= 72 bpm
\(H_A\): \(\mu_{NSCC} \gt\) 72 bpm
# Probability of getting sample data by random chance if mean was indeed 72bpm
pnorm(meanPulseRate, mean = 72, sd = (sdPulseRate/sqrt(numberPulseRate)), lower.tail = FALSE)
## [1] 0.2338856
We calculated a sample mean of 73.47 bpm in Question 3. The p-value, the probability of getting this mean for a sample size of 38 people, is 0.2339.
Make decision to reject \(H_0\) or fail to reject \(H_0\) at a significance level of 0.05 based on p-value.
Since the p-value of 0.2339 is greater than the significance level of 0.05, we fail to reject the \(H_0\).
Write a concluding statement
When standing pulse rates are used as a proxy for stress, there is not sufficient evidence to support the claim that NSCC students have higher than average stress levels.