Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.


Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

  1. P(x > 110)
# Probabilty that x will be greater than 110 for a randomly selected person
1-pnorm(110, mean= 100, sd = 15)
## [1] 0.2524925

The probabilty that x will be greater than 110 for a randomly selected person is 25.25%.

  1. P(x < 110)
# Probabilty that x will be less than 110 for a randomly selected person
pnorm(110, mean= 100, sd = 15)
## [1] 0.7475075

The probabilty that x will be less than 110 for a randomly selected person is 74.75%.

Question 2

a.What is the probability that a randomly selected student will have an IQ between 80 and 120? (We haven’t explicitly learned. Think about it though.)

# Find the probability that a randomly selected student will have an IQ between 80 and 120
pnorm(120, mean = 100, sd = 15, lower.tail=TRUE) - pnorm(80, mean = 100, sd = 15, lower.tail=TRUE)
## [1] 0.8175776

The probability that a random selected student will have an IQ between 80 and 120 is 81.76%.

  1. Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?
# Probability of sample mean 110 or higher for a sample of n = 12
pnorm(110, 100, sd = 15/sqrt(12), lower.tail = FALSE)
## [1] 0.01046067

The probability that the mean IQ of 12 randomly selected students will be greater than 110 is 1.05%.

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset and answer the question that follows below.

# Store the NSCC student dataset in environment
nscc_students <- read.csv("E:/stats/nscc_student_data.csv")
# Find the mean pulse rate of this sample
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the std dev of pulse rates of this sample
sd(nscc_students$PulseRate, na.rm = TRUE)
## [1] 12.51105
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
table(is.na(nscc_students$PulseRate))
## 
## FALSE  TRUE 
##    38     2

The likeliness that the population mean pulse rate for all NSCC students being exactly equal to 73.47 is 0.

Question 4

If we assume the mean pulse rate for all NSCC students is \(\sigma = 12\), construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude in a complete sentence below.
(Note: we can create a valid confidence interval here since n > 30)

# Store mean
mean_PulseRate <- mean(nscc_students$PulseRate, na.rm = TRUE)
sd_PulseRate <- sd(nscc_students$PulseRate, na.rm = TRUE)

# Calculate lower bound of 95% CI
mean_PulseRate - 1.96*(12/sqrt(38))
## [1] 69.65824
# Calculate upper bound of 95% CI
mean_PulseRate + 1.96*(12/sqrt(38))
## [1] 77.28913

I am 95% confident that the mean pulse rate for all NSCC students will be between 69.65 and 77.28.

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mean_PulseRate - 2.58*(12/sqrt(38))
## [1] 68.45131
# Calculate upper bound of 99% CI
mean_PulseRate + 2.58*(12/sqrt(38))
## [1] 78.49606

I am 99% confident that the mean pulse rate for all NSCC students will be between 68.45 and 78.5.

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.

The 95% confindence interval for the mean pulse rate for all NSCC students is between 69.65 and 77.28 for question 4, and the 99% confindence interval for the mean pulse rate for all NSCC students is between 68.45 and 78.5 for question 5. This is due to the expansion of the intervals, allowing more values to be included.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits with \(\sigma = 3.1\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

  1. Write the hypotheses

\(H_0\): \(\mu\) = 12.1

\(H_A\): \(\mu\) \(\neq\) 12.1

  1. Create confidence interval
# Calculate mean of Credits variable
mean(nscc_students$Credits, na.rm = TRUE)
## [1] 11.775
# Calculate sample size of Credits variable
table(is.na(nscc_students$Credits))
## 
## FALSE 
##    40
# Store mean and standard deviation
mean_Credits <- mean(nscc_students$Credits, na.rm = TRUE)
sd_Credits <- sd(nscc_students$Credits, na.rm = TRUE)

# Lower bound of 95% CI

mean_Credits - 1.96*(3.1/sqrt(40))
## [1] 10.8143
# Upper bound of 95% CI
mean_Credits + 1.96*(3.1/sqrt(40))
## [1] 12.7357
  1. Make decision to reject H0 or fail to reject H0 based on confidence interval

Since 12.1 falls whithin the range of 10.81 and 12.74, we fail to reject the \(H_0\).

  1. Write a concluding statement

There is not sufficient evidence to support the claim that the average credits differs last year from Fall 2009.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults. Recall the assumption that \(\sigma = 12\) for NSCC student pulse rates.

  1. Write the hypotheses

\(H_0\): \(\mu\) = 72
\(H_A\): \(\mu\) > 72

  1. Calculate p-value of getting sample data by chance
# Calculate mean of PulseRate variable
mean(nscc_students$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Probability of getting that sample data by random chance if mean was indeed 72bpm
pnorm(mean_PulseRate, mean = 72, sd = (12/sqrt(38)), lower.tail = FALSE)
## [1] 0.224515
  1. Make decision to reject H0 or fail to reject H0 at a significance level of 0.05 based on p-value.

We fail to reject \(H_0\) since p > 0.05.

  1. Write a concluding statement

There is not sufficient evidence to support the claim NSCC students have a higher pulse rate than the national average of 72 bpm for adults.