Project #4 - Sampling Distributions

Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.

Instructions

Update the author line at the top to have your name in it.
You must knit this document to an html file and publish it to RPubs. Once you have published your project to the web, you must copy the url link into the appropriate Course Project assignment in MyOpenMath before 9:00am on the due date.
Answer all the following questions completely. Some may ask for written responses.
Use R chunks for code to be evaluated where needed and always comment all of your code so the reader can understand what your code aims to accomplish.
Proofread your knitted document before publishing it to ensure it looks the way you want it to. Use double spaces at the end of a line to create a line break and make sure text does not have a header label that isn’t supposed to.
Delete these instructions from your published project.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
a. P(x > 140)

pnorm(140, 100, 15, lower.tail = FALSE)

## [1] 0.003830381

0.003830381

P(x < 110)

pnorm(110, 100, 15)

## [1] 0.7475075

0.7475075 c. What is the probability that a random selected student will have an IQ between 80 and 120?

pnorm(120, 100, 15) - pnorm(80,100, 15)

## [1] 0.8175776

0.8175776

Question 2

Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.
a. What is the probability of a randomly selected student will have an IQ greater than 110?

pnorm(110, 100, 15, lower.tail = FALSE)

## [1] 0.2524925

Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?

IQ <- rnorm(12, 100, 15)
avgIQ <- mean(IQ)
pnorm(110, avgIQ, 15, lower.tail = FALSE)

## [1] 0.1051935

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?

I don’t think it is likely for the population mean of pulse rate for All NSCC students is exactly equal to the sample mean.

nsccdata <- read.csv("nscc_student_data.csv")

Question 4

Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)

# Store mean and std dev
mn <- mean(nsccdata$PulseRate, na.rm = TRUE)
std <- sd(nsccdata$PulseRate, na.rm = TRUE)


# Calculate lower bound of 95% CI
mn - 1.96*(std/sqrt(40))

## [1] 69.59647

# Calculate upper bound of 95% CI
mn + 1.96*(std/sqrt(40))

## [1] 77.3509

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mn - 2.58*(std/sqrt(40))

## [1] 68.37

# Calculate upper bound of 99% CI
mn + 2.58*(std/sqrt(40))

## [1] 78.57736

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.
The difference was only by about one point in one way or the other.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)
\(H_0\): \(\mu_{2017}\) = 12.1
\(H_A\): \(\mu_{2017}\) \(\neq\) 12.1
Create confidence interval

# Store mean of Credits variable

mncred <- mean(nsccdata$Credits)

# Store standard deviation of Credits variable

sdcred <- sd(nsccdata$Credits)

# Store sample size of Credits variable
sample_cred <- length(nsccdata$Credits)

# Lower bound of 95% CI
mncred - 1.96*(sdcred/sqrt(40))

## [1] 10.73056

# Upper bound of 95% CI
mncred + 1.96*(sdcred/sqrt(40))

## [1] 12.81944

Make decision to reject \(H_0\) or fail to reject \(H_0\) based on confidence interval

Based on the confidence intervals, I must reject the \(H_0\).

Write a concluding statement

Based on the evidence, The convidence interval is higher than the 12.1. Therefore, I must reject the hypotheses.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\): \(\mu_{2017}\) = 72
\(H_A\): \(\mu_{2017}\) \(\neq\) 72

Calculate p-value of getting sample statistics by chance

# Probability of getting sample data by random chance if mean was indeed 72bp
mnpr <- mean(nsccdata$PulseRate, na.rm = TRUE)
sdpr <- sd(nsccdata$PulseRate, na.rm = TRUE)

pnorm(72, mnpr, sdpr, lower.tail = FALSE)

## [1] 0.5468832

Make decision to reject \(H_0\) or fail to reject \(H_0\) at a significance level of 0.05 based on p-value.

Since the p-value is > 0.05, we fail to reject the \(H_0\).

Write a concluding statement

With the p-value being greater than 0.05, North Shore Student’s Pulse rate is higher than the national average bpm for adults.