In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, confidence intervals and hypothesis tests to determine if NSCC students differ from typical averages or if any differences are just due to random variation.
Tasks:
Load and store the sample NSCC Student Dataset using the read.csv() function.
# Store the NSCC student dataset in environment
NSCCstudent <- read.csv("nscc_student_data.csv")
Find the sample mean and sample size of the PulseRate variable in this dataset and answer the question that follows below.
# Mean pulse rate of this sample
mean(NSCCstudent$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(NSCCstudent$PulseRate))
##
## FALSE TRUE
## 38 2
The mean pulse rate of the sample is 73.47, and the sample size is 38.
Questions:
Do you expect the sample mean to equal the true population mean? Why? The sample mean will very likely be different from the true population mean because it is missing a lot of data and is a smaller sample size. However, it should be comparable.
If we took a different sample, would we get the same results? A different sample would very likely have a different mean, but once again would be in the same ballpark.
Task: Construct 90%, 95%, and 99% Confidence Intervals for the mean pulse rate of all NSCC students. Assume that σ = 14.
# Store mean
NSCCmean <- mean(NSCCstudent$PulseRate, na.rm = TRUE)
# Calculate lower bound of 90% CI
NSCCmean - 1.645*(14/sqrt(38))
## [1] 69.73772
# Calculate upper bound of 90% CI
NSCCmean + 1.645*(14/sqrt(38))
## [1] 77.20964
# Calculate lower bound of 95% CI
NSCCmean - 1.96*(14/sqrt(38))
## [1] 69.02233
# Calculate upper bound of 95% CI
NSCCmean + 1.96*(14/sqrt(38))
## [1] 77.92504
# Calculate lower bound of 99% CI
NSCCmean - 2.58*(14/sqrt(38))
## [1] 67.61425
# Calculate upper bound of 99% CI
NSCCmean + 2.58*(14/sqrt(38))
## [1] 79.33312
Questions:
Interpret your 95% CI in plain language I can say with 95% confidence that the average pulse rate is between 69 and 77.9.
How does the interval change as confidence increases? The range gets larger, to account for more possibilities that could be true.
Which interval would you report and why? I would probably report the 95% interval, as it is a good medium between being too specific and too general. I also know it is generally the standard used by statisticians.
Consider the national average pulse rate for US adults to be 72 bpm. Let’s test the claim that NSCC students differ from that national average.
\(H_0: \mu = 72\)
\(H_A: \mu \neq 72\)
Tasks:
Use your confidence interval – Does it contain 72? 72 does fall within the confidence interval.
Based on that – Do you reject or fail to reject the null hypothesis? I fail to reject the null hypothesis, as there is not enough evidence to say that the pulse rates of NSCC students differ from the national average.
Questions:
What does your result suggest about NSCC students? The pulse rates of NSCC students may be slightly more than the national average, though they could be the same or less.
Does “fail to reject” mean NSCC students are the same as average? It does not mean that they are the same as average, just that with the evidence we have they COULD be the same and that possibility cannot be ruled out.
Task: Recall the sample data you got in question 1. For the hypotheses in question 3, compute the test statistic of that sample data and the p-value using pnorm().
# Probability of sample mean 73.47 or higher for a sample of n = 38
pnorm(q = NSCCmean, mean = 72, sd = 14/sqrt(38), lower.tail = FALSE)
## [1] 0.2582061
Questions:
What does your p-value represent in context? The p-value represents that there is a 25.8% probability that a sample of any 38 people would have a mean pulse rate of 73.47.
Using an α = 0.05, is your conclusion the same as with the 95% confidence interval? If not, why might they differ? My conclusion is with the p-value is the same as with the 95% confidence interval, as the null hypothesis cannot be rejected due to the high probability.
If you repeated this study of collecting NSCC students’ pulse rates to determine if they differ from the national average:
Could your conclusions change? Why? The conclusion could change, as there is variability in selecting different samples.
What assumptions did you rely on in using these methods? I relied on an assumption about the standard deviation about the sample, as well as the relatively arbitrary cutoff points for the confidence intervals and p-values.
Are there any limitations, flaws, questions, or concerns you have with the analysis done in this project? Not particularly, other than the aforementioned standard deviation. I think it would also be good to conduct an analysis where I researched and picked out the data myself perhaps, though that doesn’t matter as much for something like pulse rate.