Project #4

Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

P(x > 110)

#Use the pnorm() function to find x
pnorm(110,100,15,lower.tail = FALSE)

## [1] 0.2524925

Here x = .25249 so the probability that x is greater than 110 is 25.25%

P(x < 110)

#Use the pnorm() function to find x
pnorm(110,100,15)

## [1] 0.7475075

Here x = .747507, so the probability that x is less than 110 is 74.75%

Question 2

What is the probability that a random selected student will have an IQ between 80 and 120? (We haven’t explicitly learned. Think about it though.)

#Use the pnorm() function to find the probabiity
pnorm(120,100,15, lower.tail = TRUE) - pnorm(80,100,15, lower.tail = TRUE)

## [1] 0.8175776

The probability that a randomly selected student has an IQ between 80 and 120 is 81.8%.

Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?

#Use the pnorm() function to find this probability
pnorm(110,100,15/sqrt(12), lower.tail = FALSE)

## [1] 0.01046067

The probability that a random sample of twelve students have a mean IQ greater than 110 is 1.05%

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Store the NSCC student dataset in environment
nscc_students <- read.csv("nscc_student_data.csv")

# Find the mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)

## [1] 73.47368

# Find the std dev of pulse rates of this sample
sd(nscc_students$PulseRate,na.rm = TRUE)

## [1] 12.51105

# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))

## 
## FALSE  TRUE 
##    38     2

The mean of the Pulse Rate variable is 73.5, the standard deviation is 12.5 and the sample size is 38 students

Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?

I think that it is unlikely that the population man pulse rate for all NSCC students is exactly equal to the sample mean.

Question 4

If we assume the mean pulse rate for all NSCC students is \(\sigma = 12\), construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude in a complete sentence below.
(Note: we can create a valid confidence interval here since n > 30)

# Store mean
mean_PR <- mean(nscc_students$PulseRate, na.rm = TRUE)

# Calculate lower bound of 95% CI
mean_PR - 1.96*(12/sqrt(38))

## [1] 69.65824

# Calculate upper bound of 95% CI
mean_PR + 1.96*(12/sqrt(38))

## [1] 77.28913

I am 95% confident that the mean pulse rate for all NSCC Students will be between 69.65 and 77.28

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mean_PR - 2.58*(12/sqrt(38))

## [1] 68.45131

# Calculate upper bound of 99% CI
mean_PR + 2.58*(12/sqrt(38))

## [1] 78.49606

I am 99% confident that the mean pulse rate for all NSCC students will be between 68.45 and 78.5

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 5 and 6.

The difference between the 99% confidence interval(CI) and the 95% confidence(CI) interval is that the lower and upper bound for the 99% CI is lower than the 95% CI.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits with \(\sigma = 3.1\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

Write the hypotheses (Try to emulate the “latex” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0:\mu = 12.1\) \(H_0:\mu\neq12.1\)

Create confidence interval

# Calculate mean of Credits variable
mean(nscc_students$Credits, na.rm = TRUE)

## [1] 11.775

# Calculate sample size of Credits variable
table(is.na(nscc_students$Credits))

## 
## FALSE 
##    40

#Store the mean of the Credits Variable 
mean_CR <-mean(nscc_students$Credits, na.rm = TRUE)

# Lower bound of 95% CI
mean_CR - 1.96*(3.1/sqrt(40))

## [1] 10.8143

# Upper bound of 95% CI
mean_CR + 1.96*(3.1/sqrt(40))

## [1] 12.7357

Make decision to reject H0 or fail to reject H0 based on confidence interval
Because the average credits of 12.1 from the previous semester fall between 10.81 and 12.74, we fail to reject the \(H_0\)
Write a concluding statement

Because of the above evidence we conclude that we can’t know for sure whether the average credits differ from the fall 2009 semester.