Project #4 - Introduction to Statistical Inference

Purpose – Are North Shore Students Different?

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, confidence intervals and hypothesis tests to determine if NSCC students differ from typical averages or if any differences are just due to random variation.

Question 1: Sample v. Population

Tasks:

Load and store the sample NSCC Student Dataset using the read.csv() function.

# Store the NSCC student dataset in environment
nscc <- read.csv("nscc_student_data.csv")

Find the sample mean and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Mean pulse rate of this sample
mean(nscc$PulseRate, na.rm = TRUE)

## [1] 73.47368

# Find the sample size of pulse rates (hint: its how many non-NA values are there)
sum(!is.na(nscc$PulseRate))

## [1] 38

Questions:

Do you expect the sample mean to equal the true population mean? Why? No, I do not expect the sample mean to equal the true population mean because of random sampling, it is an estimate not the exact value.
If we took a different sample, would we get the same results? No, we would not get the same results if we took a different sample because it would produce different means leading to different results.

Question 2: Confidence Intervals

Task: Construct 90%, 95%, and 99% Confidence Intervals for the mean pulse rate of all NSCC students. Assume that σ = 14.

# Store mean
mn <- mean(nscc$PulseRate, na.rm = TRUE)
mn

## [1] 73.47368

#Store sample size 
ss <- sum(!is.na(nscc$PulseRate))
ss

## [1] 38

# Calculate lower bound of 90% CI
mn - 1.645*(14/sqrt(ss))

## [1] 69.73772

# Calculate upper bound of 90% CI
mn + 1.645*(14/sqrt(ss))

## [1] 77.20964

# Calculate lower bound of 95% CI
mn - 1.96*(14/sqrt(ss))

## [1] 69.02233

# Calculate upper bound of 95% CI
mn + 1.96*(14/sqrt(ss))

## [1] 77.92504

# Calculate lower bound of 99% CI
mn - 2.58*(14/sqrt(ss))

## [1] 67.61425

# Calculate upper bound of 99% CI
mn + 2.58*(14/sqrt(ss))

## [1] 79.33312

Questions:

Interpret your 95% CI in plain language I am 95% confident that the true mean pulse rate for all NSCC students is between 69.02 and 77.93 bpm.
How does the interval change as confidence increases? As the confidence level increases, the interval becomes wider, increasing the likelihood that the interval includes the true population mean.
Which interval would you report and why? I would report the 95% interval because it accounts for accuracy and precision.

Question 3: Hypothesis Testing with a Confidence Interval

Consider the national average pulse rate for US adults to be 72 bpm. Let’s test the claim that NSCC students differ from that national average.

\(H_0: \mu = 72\)
\(H_A: \mu \neq 72\)

Tasks:

Use your confidence interval – Does it contain 72?
Based on that – Do you reject or fail to reject the null hypothesis?

Questions:

What does your result suggest about NSCC students? There isn’t sufficient evidence to say that NSCC students differ from 72 bpm.
Does “fail to reject” mean NSCC students are the same as average? Failing to reject the null hypothesis does not mean NSCC students are the same as average because there isn’t enough evidence to say they’re different.

Question 4: Hypothesis Testing with a P-value

Task: Recall the sample data you got in question 1. For the hypotheses in question 3, compute the test statistic of that sample data and the p-value using pnorm().

#Calculate and store z-test comparing sample mean to 72 bpm
z <- (mn - 72)/ (14/sqrt(ss))
z

## [1] 0.6488857

# Calculate the p-value 
2*pnorm(abs(z), lower.tail = FALSE)

## [1] 0.5164123

Questions:

What does your p-value represent in context? The p-value represents the probability of getting a sample mean as extreme as mine if the true population was actually 72 bpm.
Using an α = 0.05, is your conclusion the same as with the 95% confidence interval? If not, why might they differ? Yes, my conclusion is the same as the 95% because there isn’t enough evidence to say that NSCC students differ from the national average.

Question 5: Reflection

If you repeated this study of collecting NSCC students’ pulse rates to determine if they differ from the national average:

Could your conclusions change? Why? Yes, my conclusions would change because using a different sample would lead to producing a different mean leading to a different confidence interval or p-value.
What assumptions did you rely on in using these methods?
The assumptions I used to rely on these methods are the sigma being known: 14 and that the sampling distribution is approximately normal.
Are there any limitations, flaws, questions, or concerns you have with the analysis done in this project? The possible limitations or concerns are that there is missing data in the data set and the sample may not truly represent all NSCC students.

Project #4 - Introduction to Statistical Inference

MAT143H - Introduction to Statistics Honors

Lilyanna Romero

Due: Tuesday, April 12

Purpose – Are North Shore Students Different?

Question 1: Sample v. Population

Question 2: Confidence Intervals

Question 3: Hypothesis Testing with a Confidence Interval

Question 4: Hypothesis Testing with a P-value

Question 5: Reflection