In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
a. P(x > 140)
# Use the pnorm function to find probabilities in a normal distribution.
pnorm(140, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.003830381
pnorm(110, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.7475075
pnorm(120, mean = 100, sd = 15, lower.tail = TRUE) -
pnorm( 80, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.8175776
Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.
a. What is the probability of a randomly selected student will have an IQ greater than 110?
1 - pnorm(110, mean = 100, sd = 15, lower.tail = TRUE)
## [1] 0.2524925
pnorm(110, mean = 100, sd = (15/sqrt(12)), lower.tail = FALSE)
## [1] 0.01046067
Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?
nscc_students <- read.csv("/Users/Nlilly/downloads/nscc_student_data.csv")
PulseRateStudents <- subset(nscc_students, nscc_students$PulseRate != "NA")
#Find the mean pulse rate.
mean(PulseRateStudents$PulseRate)
## [1] 73.47368
## [1] 73.47368
#Calculate the standard deviation of the pulse rates.
sd(PulseRateStudents$PulseRate)
## [1] 12.51105
## [1] 12.51105
#The sample size is the numbers PulseRateStudents object., store it in the variable numberPulseRate.
numberPulseRate <- nrow(PulseRateStudents)
numberPulseRate
## [1] 38
## [1] 38
Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)
# Store mean and std dev
meanPulseRate <- mean(PulseRateStudents$PulseRate)
sdPulseRate <- sd(PulseRateStudents$PulseRate)
# Calculate lower bound of 95% CI
meanPulseRate - 1.96*(sdPulseRate/sqrt(numberPulseRate))
## [1] 69.49575
# Calculate upper bound of 95% CI
meanPulseRate + 1.96*(sdPulseRate/sqrt(numberPulseRate))
## [1] 77.45162
Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.
# Calculate lower bound of 99% CI
meanPulseRate - 2.58*(sdPulseRate/sqrt(numberPulseRate))
## [1] 68.23742
# Calculate upper bound of 99% CI
meanPulseRate + 2.58*(sdPulseRate/sqrt(numberPulseRate))
## [1] 78.70995
Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.
The 95% confidence interval is between (69.50, 77.45) which is about 8 bpm. The 99% confidence interval is between (68.23, 78.71) which is about 10.5 bpm. What expanding the confidence interval expands the range making the confidence interval higher because there is a better chance of being within the range.
In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.
Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)
H0: μLastYear=μFall2009= 12.1 credits HA: μLastYear ≠ 12.1 credits
Create confidence interval
# Store mean of Credits variable
meanCredits <- mean(nscc_students$Credits)
# Store standard deviation of Credits variable
sdCredits <- sd(nscc_students$Credits)
# Store sample size of Credits variable
numberCredits <- nrow(nscc_students)
# Lower bound of 95% CI
meanCredits - 1.96*(sdCredits/sqrt(numberCredits))
## [1] 10.73056
# Upper bound of 95% CI
meanCredits + 1.96*(sdCredits/sqrt(numberCredits))
## [1] 12.81944
We cannot reject the hypothesis based on the data.
NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.
H0: μNSCC = μAdult= 72 bpm HA: μNSCC> 72 bpm
# Probability of getting sample data by random chance if mean was indeed 72bpm
pnorm(meanPulseRate, mean = 72, sd =(sdPulseRate/sqrt(numberPulseRate)), lower.tail = FALSE)
## [1] 0.2338856
We fail to reject the H0 because it is the p-value is greater than the significance level.
There is not enough sufficient evidence to support the claim that Norhtshore student’s have an above average stress level.