Project #4 - Sampling Distributions

Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
a. P(x > 140)

#Storing the mean and standard deviation for IQ scores
mn_IQ <- 100
sd_IQ <- 15

#Using the pnorm function to find the probability of x being greater than 140
pnorm(140, mn_IQ, sd_IQ, lower.tail = FALSE)

## [1] 0.003830381

The probability of a randomly selected person having an IQ above 140 is .0038, or .38%.

P(x < 110)

#Using the pnorm function to find the probability of x being less than 110
pnorm(110, mn_IQ, sd_IQ)

## [1] 0.7475075

The probability of a randomly selected person having an IQ below 110 is .7475, or 74.75%.

What is the probability that a randomly selected student will have an IQ between 80 and 120?

#Finding the probability of a randomly selected student having an IQ below 120
pnorm(120, mn_IQ, sd_IQ)
## [1] 0.9087888

#Finding the probability of a randomly selected student having an IQ below 80
pnorm(80, mn_IQ, sd_IQ)
## [1] 0.09121122

#Subtracting the probability of a student scoring less than 80 from the probability of scoring less than 120 to  find the probability of scoring between 80 and 120
0.9087888-0.09121122
## [1] 0.8175776

The probability that a randomly selected student will have an IQ between 80 and 120 is .8176, or 81.76%.

Question 2

Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.
a. What is the probability of a randomly selected student having an IQ greater than 110?

#Using the pnorm function to find the probability of a random student having and IQ greater than 110
pnorm(110, mn_IQ, sd_IQ, lower.tail=FALSE)

## [1] 0.2524925

The probability of a random student having an IQ score greater than 110 is .2525, or 25.25%.

Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?

#Using the pnorm function to find the probability of a random sample of 12 students having a mean IQ greater than 110
pnorm(110, mean = 100, sd = 15/sqrt(12), lower.tail = FALSE)

## [1] 0.01046067

The probability that the students’ mean IQ is greater than 110 is .0105, or 1.05%.

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?

#Loading the NSCC Student Dataset and storing it as an object in the environment
nscc_student_data <- read.csv("nscc_student_data-2.csv")

#Finding the mean standard deviation of the PulseRate variable of the NSCC Student Dataset
mean(nscc_student_data$PulseRate, na.rm=TRUE)
## [1] 73.47368
sd(nscc_student_data$PulseRate, na.rm=TRUE)
## [1] 12.51105

#Using the structure function to find the number of observations in the dataset, which is equal to the sample    size
str(nscc_student_data$PulseRate)
##  int [1:40] 64 75 74 65 NA 72 72 60 66 60 ...

The PulseRate variable in the NSCC Student Dataset has a mean of 73.47, a standard deviation of 12.51, and a sample size of 40 students. It is unlikely that the population mean pulse rate for all NSCC students is exactly equal to the sample mean found.

Question 4

Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)

# Store mean and std dev
mn_PR <- 73.47
sd_PR <- 12.51


# Calculate lower bound of 95% CI
mn_PR-1.96*sd_PR/sqrt(40)
## [1] 69.59311

# Calculate upper bound of 95% CI
mn_PR+1.96*sd_PR/sqrt(40)
## [1] 77.34689

We are 95% confident that the mean pulse rate for all NSCC students is between 69.59 and 77.35.

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mn_PR-2.58*sd_PR/sqrt(40)
## [1] 68.36675

# Calculate upper bound of 99% CI
mn_PR+2.58*sd_PR/sqrt(40)
## [1] 78.57325

We are 99% confident that the mean pulse rate of all NSCC students is between 68.37 and 78.57.

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.

In questions 4 and 5, it can be seen that the 95% confidence interval has a smaller range of possible values (69.59 to 77.35) than the 99% confidence interval (68.37 and 78.57). The confidence interval estimates widened as confidence increased. As there are more values the mean could possibly be equal to, there is a greater confidence that the estimate is accurate.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\): \(\mu\) = 12.1

\(H_A\): \(\mu\) \(\neq\) 12.1

Create confidence interval

# Store mean of Credits variable
mn_Credits <- mean(nscc_student_data$Credits)

# Store standard deviation of Credits variable
sd_Credits <- sd(nscc_student_data$Credits)

#Finding the number of observations in the Credits variable
str(nscc_student_data$Credits)
##  int [1:40] 13 12 6 9 15 9 15 15 13 16 ...

# Store sample size of Credits variable
ss_Credits <- 40

# Lower bound of 95% CI
mn_Credits - 1.96*sd_Credits/sqrt(ss_Credits)
## [1] 10.73056

# Upper bound of 95% CI
mn_Credits + 1.96*sd_Credits/sqrt(ss_Credits)
## [1] 12.81944

We are 95% confident that the average number of credits for students last year (the year from which the NSCC student dataset is derived) falls between 10.73 and 12.82.

Make decision to reject \(H_0\) or fail to reject \(H_0\) based on confidence interval

In this instance, we fail to reject \(H_0\), as the average of 12.1 credits from the Fall 2009 semster falls within the confidence interval of 10.7 to 12.8 calculated from the NSCC student dataset.

Write a concluding statement

There is not sufficient evidence to say that the average number of credits for NSCC students in the NSCC dataset differs from the average number of credits for students from the Fall semester of 2009.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\): \(\mu\) = 72

\(H_A\): \(\mu\) > 72

Calculate p-value of getting sample statistics by chance

#Finding sample size of Pulse Rate Variable
str(nscc_student_data$PulseRate)
##  int [1:40] 64 75 74 65 NA 72 72 60 66 60 ...

# Probability of getting sample data by random chance if mean was indeed 72bpm
pnorm(mn_PR, mean = 72, sd = sd_PR/sqrt(40), lower.tail = FALSE)
## [1] 0.2286884

The p-value is 0.2287.

Make decision to reject \(H_0\) or fail to reject \(H_0\) at a significance level of 0.05 based on p-value.

Because the p-value is greater than 0.05, we fail to reject \(H_0\).

Write a concluding statement

There is not sufficient evidence to support the claim that students at NSCC have a higher pulse rate than the average adult.