Project #4 - Introduction to Statistical Inference

Purpose – Are North Shore Students Different?

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, confidence intervals and hypothesis tests to determine if NSCC students differ from typical averages or if any differences are just due to random variation.

Question 1: Sample v. Population

Tasks:

Load and store the sample NSCC Student Data set using the read.csv() function.

# Store the NSCC student data set in environment
 nscc_student_data <- read.csv("C:/Users/aless/Downloads/nscc_student_data.csv")

Find the sample mean and sample size of the Pulse Rate variable in this data set and answer the question that follows below.

# Mean pulse rate of this sample
mean(nscc_student_data$PulseRate, na.rm = TRUE)

## [1] 73.47368

# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_student_data$PulseRate))

## 
## FALSE  TRUE 
##    38     2

The mean pulse rate approximately 73.47 and the sample size is 38.

Questions:

Do you expect the sample mean to equal the true population mean? Why?

I do not expect this sample mean to equal the true population mean due to variability and a smaller size than the population. However, if we were to take many sample means and find the mean of means, we’d get a lot closer to the true population mean.
If we took a different sample, would we get the same results?

It’s very likely that we would get different results than our initial mean value of 73.47, due to variability and randomness.

Question 2: Confidence Intervals

Task: Construct 90%, 95%, and 99% Confidence Intervals for the mean pulse rate of all NSCC students. Assume that σ = 14.

# Store mean
pulsemean <- mean(nscc_student_data$PulseRate, na.rm = TRUE)

#Calculate lower bound of 90% CI
pulsemean - 1.65*(14/sqrt(38))

## [1] 69.72637

# Calculate upper bound of 90% CI
pulsemean + 1.65*(14/sqrt(38))

## [1] 77.221

# Calculate lower bound of 95% CI
pulsemean - 1.96*(14/sqrt(38))

## [1] 69.02233

# Calculate upper bound of 95% CI
pulsemean + 1.96*(14/sqrt(38))

## [1] 77.92504

# Calculate lower bound of 99% CI
pulsemean - 2.58*(14/sqrt(38))

## [1] 67.61425

# Calculate upper bound of 99% CI
pulsemean + 2.58*(14/sqrt(38))

## [1] 79.33312

The 90% confidence interval is 69.7 to 77.2

The 95% confidence interval is 69.0 to 77.9

The 99% confidence interval is 67.6 to 79.3

Questions:

Interpret your 95% CI in plain language:

We are pretty confident that the true average of NSCC students pulse rates is somewhere between 69.0 bpm and 77.9 bpm.
How does the interval change as confidence increases?

As confidence increases, the interval widens. This expands the range and gives a better chance of the true mean value being within the upper and lower bounds.
Which interval would you report and why?

I would report the 95% confidence interval because it is more precise than the 99% confidence interval and has more leeway than the 90% confidence interval. Furthermore, it is the standard.

Question 3: Hypothesis Testing with a Confidence Interval

Consider the national average pulse rate for US adults to be 72 bpm. Let’s test the claim that NSCC students differ from that national average.

\(H_0: \mu = 72\)
\(H_A: \mu \neq 72\)

Tasks:

Use your confidence interval – Does it contain 72?

Yes, 72 does fall in between the 95% confidence interval of 69.0 to 77.9.
Based on that – Do you reject or fail to reject the null hypothesis?

I fail to reject the null hypothesis because there is not enough information to say that the mean pulse rate of NSCC is the same as the national average, 72 bpm. The true mean of NSCC pulse rates could be anywhere in the interval.

Questions:

What does your result suggest about NSCC students?

It is likely that the population of NSCC students have an average pulse rate between 69.0 bpm to 77.9 bpm. We do not know the true population mean of NSCC students pulse rates, we only have an estimate of where it lies.
Does “fail to reject” mean NSCC students are the same as average?

No, it just means that we don’t have enough information to confirm or deny that NSCC students have the same average pulse rate as the national average.

Question 4: Hypothesis Testing with a P-value

Task: Recall the sample data you got in question 1. For the hypotheses in question 3, compute the test statistic of that sample data and the p-value using pnorm().

#This code allows us to compute the p-value of our sample data
pnorm(73.47, 72, 14/sqrt(38), lower.tail = FALSE)*2

## [1] 0.5174614

Questions:

What does your p-value represent in context?

The p-value means that there is a 0.50 probability of randomly picking someone that has a bpm greater than or lesser than 73.47.
Using an α = 0.05, is your conclusion the same as with the 95% confidence interval? If not, why might they differ?

This probability is too high for it to be likely that the average bpm of the NSCC students is the same as the national, thus there is not enough evidence and I am still failing to reject the null hypothesis.

Question 5: Reflection

If you repeated this study of collecting NSCC students’ pulse rates to determine if they differ from the national average:

Could your conclusions change? Why?

My conclusions could definitely change because I’d be collecting the pulse rate of different students than before. I may collect more or less pulse rates in my study, maybe I wouldn’t be able to find enough people in a deadline. Methods of the study could change, this method had people report their own data and it could be inaccurate. If I were to have a nurse take and record the pulses of students myself it may be more accurate. These are all factors that could influence my conclusions.
What assumptions did you rely on in using these methods?

If I repeat this study using the same method, I’d be assuming that students can properly take their own pulse rates and are inputting accurate data, which is commonly not the case. Perhaps if I were to have someone like a nurse take and record the pulses of students for data, I’d be assuming that they are a good nurse and are accurate. But again, this may not be the case and could cause results to be inaccurate.
Are there any limitations, flaws, questions, or concerns you have with the analysis done in this project?

I don’t have any concerns, but I do find it interesting how most of the participants in this study have either a pulse rate within the 80s or above pulse rate or 60s and below. There are not many participants who have pulse rates within the 70s. This would explain the large p-value.

Project #4 - Introduction to Statistical Inference

MAT143H - Introduction to Statistics Honors

Alessandra Marenghi

Due: Thursday, April 9

Purpose – Are North Shore Students Different?

Question 1: Sample v. Population

Question 2: Confidence Intervals

Question 3: Hypothesis Testing with a Confidence Interval

Question 4: Hypothesis Testing with a P-value

Question 5: Reflection