Project #4 - Introduction to Statistical Inference

Purpose – Are North Shore Students Different?

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, confidence intervals and hypothesis tests to determine if NSCC students differ from typical averages or if any differences are just due to random variation.

Question 1: Sample v. Population

Tasks:

Load and store the sample NSCC Student Dataset using the read.csv() function.

# Store the NSCC student dataset in environment
getwd()

## [1] "/Users/jadaperez/Desktop/hon stats"

nscc_students <- read.csv("/Users/jadaperez/Desktop/hon stats/nscc_student_data.csv")

Find the sample mean and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)

## [1] 73.47368

# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))

## 
## FALSE  TRUE 
##    38     2

Questions:

Do you expect the sample mean to equal the true population mean? Why? No, the sample may contain outliers causing the mean to be greater or less than the true population mean.
If we took a different sample, would we get the same results? No, a different sample would contain different values and may also contain outliers.

Question 2: Confidence Intervals

Task: Construct 90%, 95%, and 99% Confidence Intervals for the mean pulse rate of all NSCC students. Assume that σ = 14.

# Store mean
mean_pulse <- mean(nscc_students$PulseRate, na.rm=TRUE)

# Calculate lower bound of 95% CI
mean_pulse - 1.96*(14/sqrt(38))

## [1] 69.02233

# Calculate upper bound of 95% CI
mean_pulse + 1.96*(14/sqrt(38))

## [1] 77.92504

# Calculate lower bound of 99% CI
mean_pulse - 2.58*(14/sqrt(38))

## [1] 67.61425

# Calculate upper bound of 99% CI
mean_pulse + 2.58*(14/sqrt(38))

## [1] 79.33312

# Calculate lower bound of 90% CI
mean_pulse - 1.645*(14/sqrt(38))

## [1] 69.73772

# Calculate upper bound of 90% CI
mean_pulse + 1.645*(14/sqrt(38))

## [1] 77.20964

Questions:

Interpret your 95% CI in plain language

I am 95% confident that the population mean pulse rate is between 69.02 and 77.93 bpm.

How does the interval change as confidence increases?

The interval would widen.

Which interval would you report and why?

I’d report 95% because it’s the standard. Since I don’t know the true population mean it’s be the best option between 90% and 99%.

Question 3: Hypothesis Testing with a Confidence Interval

Consider the national average pulse rate for US adults to be 72 bpm. Let’s test the claim that NSCC students differ from that national average.

\(H_0: \mu = 72\)
\(H_A: \mu \neq 72\)

Tasks:

Use your confidence interval – Does it contain 72?

Yes.

Based on that – Do you reject or fail to reject the null hypothesis?

I fail to reject the null hypothesis.

Questions:

What does your result suggest about NSCC students?

There is not sufficient evidence to say the mean pulse rate of NSCC students is any different than the average US adult.

Does “fail to reject” mean NSCC students are the same as average?

No, it means we don’t have enough evidence to say they are different, not that they’re the same.

Question 4: Hypothesis Testing with a P-value

Task: Recall the sample data you got in question 1. For the hypotheses in question 3, compute the test statistic of that sample data and the p-value using pnorm().

pnorm(73.47, 72, 14/sqrt(38), lower.tail=FALSE)

## [1] 0.2587307

2*0.2587307

## [1] 0.5174614

Questions:

What does your p-value represent in context?

The p-value represents how probable it is to get my result when the true population mean is 72. So in this case, there’s a 51.7% probability of getting a mean of 73.47 bpm when the true population mean is 72 bpm.

Using an α = 0.05, is your conclusion the same as with the 95% confidence interval? If not, why might they differ?

Yes, it is the same with the 95% confidence interval. Since my p-value is larger I will also fail to reject the null hypothesis.

Question 5: Reflection

If you repeated this study of collecting NSCC students’ pulse rates to determine if they differ from the national average:

Could your conclusions change? Why?

Yes my conclusions could change. This is because I’d be using a different random sample with different values that would give me different results.

What assumptions did you rely on in using these methods?

I assumed the population distribution is approximately normal.

Are there any limitations, flaws, questions, or concerns you have with the analysis done in this project?

There could be many concerns, for example not using fully online NSCC students in the data. This would mean the data isn’t representative of the entire population.

Project #4 - Introduction to Statistical Inference

MAT143H - Introduction to Statistics Honors

Jada Perez

Due: Tuesday, April 7

Purpose – Are North Shore Students Different?

Question 1: Sample v. Population

Question 2: Confidence Intervals

Question 3: Hypothesis Testing with a Confidence Interval

Question 4: Hypothesis Testing with a P-value

Question 5: Reflection