Purpose – Are North Shore Students Different?

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, confidence intervals and hypothesis tests to determine if NSCC students differ from typical averages or if any differences are just due to random variation.


Question 1: Sample v. Population

Tasks:

Load and store the sample NSCC Student Dataset using the read.csv() function.

# Store the NSCC student dataset in environment
nscc_student_data_1_ <- read.csv("/Users/patrickmannion/Downloads/nscc_student_data (1).csv")

Find the sample mean and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Mean pulse rate of this sample
mean(nscc_student_data_1_$PulseRate,na=TRUE)
## [1] 73.47368
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
sample(nscc_student_data_1_$PulseRate)
##  [1] 50 66 92 74 80 96 60 66 66 64 NA 66 70 72 69 80 62 60 65 92 NA 92 71 88 65
## [26] 72 87 89 98 85 75 80 60 80 64 89 70 61 60 56

The average pulse rate of the sample, n/a values are counted as zeros. The sample size of pulse rates is 38, and the mean is approximately 73.474 Questions:

Question 2: Confidence Intervals

Task: Construct 90%, 95%, and 99% Confidence Intervals for the mean pulse rate of all NSCC students. Assume that σ = 14.

# Store mean
mean<-mean(nscc_student_data_1_$PulseRate,na=TRUE)

# Calculate lower bound of 95% CI
mean-1.96*(14/sqrt(38))
## [1] 69.02233
# Calculate upper bound of 95% CI
mean+1.96*(14/sqrt(38))
## [1] 77.92504
# Calculate 90% Confidence Intervals 
mean-2.58*(14/sqrt(38))
## [1] 67.61425
mean+2.58*(14/sqrt(38))
## [1] 79.33312
# Calculate 99% Confidence Intervals
mean-1.645*(14/sqrt(38))
## [1] 69.73772
mean+1.645*(14/sqrt(38))
## [1] 77.20964

The lower and upper bounds of the 95%, 90% and 99% confidence interval are (69.022, 77.925), (67.614, 79.333), and (69.738, 77.210),calculated using the mean pulse rate, known values for 90, 95, and 99% confidenc intervals, and the given std. deviation of 14 Questions:

79.333-67.614
## [1] 11.719
77.925-69.022 
## [1] 8.903
77.210-69.738
## [1] 7.472

Subtracting the upper bound from the lower bound gives the range between the two values. As confidence intervals increase, the range between the lower and upper bounds decreases - Which interval would you report and why? I would report the 95% confidence interval, From our data it provides the greatest possible coverage of the data, as it has the greatest possibel range of values. It is also common industry standard to use the 95% interval, and is something to bcome familiar with.

Question 3: Hypothesis Testing with a Confidence Interval

Consider the national average pulse rate for US adults to be 72 bpm. Let’s test the claim that NSCC students differ from that national average.

\(H_0: \mu = 72\)
\(H_A: \mu \neq 72\)

Tasks:

Question 4: Hypothesis Testing with a P-value

Task: Recall the sample data you got in question 1. For the hypotheses in question 3, compute the test statistic of that sample data and the p-value using pnorm().

sd(nscc_student_data_1_$PulseRate, na=TRUE)
## [1] 12.51105
pnorm(72, 73.474, 12.511 )
## [1] 0.4531066

The p-value is calculated using the sample mean, the quantity being measured, and the standard deviation calculated above. It is approximately 0.453.

Questions:

Question 5: Reflection

If you repeated this study of collecting NSCC students’ pulse rates to determine if they differ from the national average:

#- Could your conclusions change? Why?
Yes, conclusions could change. This sample only supplies a small portion of the school’s entire population, it may not refelct the entire student body. #- What assumptions did you rely on in using these methods?
It wss assumed that the data was sampled randomly, has a normal distribution, and is appropriately spread and scaled. #- Are there any limitations, flaws, questions, or concerns you have with the analysis done in this project?
May be more useful to take multple pulse rates at different activity levels.