Project #4 - Sampling Distributions

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

P(x > 140)

pnorm(140, mean= 100, sd=15, lower.tail = FALSE)

## [1] 0.003830381

The probability that a randomly selected person recieved a score of higher than 140 is 0.38%.

P(x < 110)

pnorm(110, mean=100, sd=15)

## [1] 0.7475075

The probability that a randomly selected person recieved a score of lower than 110 is 75%.

What is the probability that a random selected student will have an IQ between 80 and 120?

# P(x < 120)
pnorm(120, mean=100, sd=15)

## [1] 0.9087888

# P(x < 80)
pnorm(80, mean=100, sd=15)

## [1] 0.09121122

# subtract P(x < 80) from P(x < 120)
0.9087888-0.09121122

## [1] 0.8175776

The probability is 82%.

Question 2

Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.

What is the probability of a randomly selected student will have an IQ greater than 110?

# P(x > 110)
pnorm(110, mean= 100, sd=15, lower.tail = FALSE)

## [1] 0.2524925

The probability is 25%.

Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?

Well, the probability would be small.

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?

nscc_student_data <- read.csv("~/Desktop/honorsStats/nscc_student_data.csv")

mean(nscc_student_data$PulseRate, na.rm = TRUE)

## [1] 73.47368

sd(nscc_student_data$PulseRate, na.rm = TRUE)

## [1] 12.51105

The mean is 73.47, the standard deviation is 12.5, and the sample size is 40. (I found the sample size by simply looking at the data set.) It is unlikely that the population mean of the pulse rate variable for all NSCC students is exactly equal to this sample mean, because this is a small sample size. There are hundreds or thousands of students at NSCC.

Question 4

Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)

# Store mean and std dev

mn <- mean(nscc_student_data$PulseRate, na.rm = TRUE)
stdev <- sd(nscc_student_data$PulseRate, na.rm = TRUE)

# Calculate lower bound of 95% CI
mn - 1.96*(stdev/sqrt(40))

## [1] 69.59647

# Calculate upper bound of 95% CI
mn + 1.96*(stdev/sqrt(40))

## [1] 77.3509

We can be 95% confident that the mean pulse rate of all NSCC students will be between 69.59 and 77.35.

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mn - 2.58*(stdev/sqrt(40))

## [1] 68.37

# Calculate upper bound of 99% CI
mn + 2.58*(stdev/sqrt(40))

## [1] 78.57736

We can be 99% confident that the mean pulse rate of all NSCC students will be between 68.37 and 78.58.

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.

The range in question 5 is wider than the range in question 4 because the more sure you are that the mean will fall within a certain range, the wider that range gets. You can be decently confident (95%) that the mean will be within a smaller range, but you can be VERY confident (99%) that it will be within a bigger range because, well, the range is bigger.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2009.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\) The average NSCC student took 11.78 credits last year. \(H_A\) The average NSCC student did not take 11.78 credits last year.

Create confidence interval

# Store mean of Credits variable
cred_mean <- mean(nscc_student_data$Credits)

# Store standard deviation of Credits variable
cred_stdev <- sd(nscc_student_data$Credits)

# Store sample size of Credits variable
cred_ss <- 40

# Lower bound of 95% CI
cred_mean - 1.96 * (cred_stdev/sqrt(cred_ss))

## [1] 10.73056

# Upper bound of 95% CI
cred_mean + 1.96 * (cred_stdev/sqrt(cred_ss))

## [1] 12.81944

Make decision to reject \(H_0\) or fail to reject \(H_0\) based on confidence interval

I will fail to reject the null hypothesis.

Write a concluding statement

There is not sufficient evidence at this time to support the claim that last year’s credit average was any different than that of the 2009-2010 school year.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\) Students at NSCC do not have a higher than average stress level.

\(H_A\) The average pulse rate of NSCC students does, in fact, differ from the normal pulse rate.

This could also be written as…

\(H_0\): \(\mu\) = 72 bpm

\(H_A\): \(\mu\) \(\neq\) 72 bpm

Calculate p-value of getting sample statistics by chance

# Probability of getting sample data by random chance if mean was indeed 72bpm

# pulse rate mean
mean(nscc_student_data$PulseRate, na.rm = TRUE)

## [1] 73.47368

# pulse rate sd
sd(nscc_student_data$PulseRate, na.rm = TRUE)

## [1] 12.51105

# pulse rate ss is 38


pnorm(72, mean = 73.74, sd = (12.51/sqrt(38)))

## [1] 0.1956118

Make decision to reject \(H_0\) or fail to reject \(H_0\) at a significance level of 0.05 based on p-value.

19.56%

Write a concluding statement

Since the p-value is much greater than 0.05, I fail to reject the null hypothesis.