STA 032 - R HW 3

Problem I

(a) Mean and Standard Deviation of Height

The mean is 68.1026486

The standard deviation is 4.1381504

(b) Proportion of Students within 2 Standard Deviations of the Mean

## [1] 0.9722222

This proportion suggests that the heights are very close to being normally distributed due to the empirical rule. It states that 95% of the data should be within two standard deviations of the mean, and since we got 97% it should be fairly close to normally distributed.

(c) QQ Plot of Height

Yes, this plot suggests normal distribution because the data closely resembles the line of a sample normal plot. It follows the reference line fairly closely.

(d) 95% Confidence Interval for Average Student Height

The 95% confidence interval for the average height of the students is (67.896344, 68.3089531)

Problem II

(a) Mean and Standard Deviation of Pulse

The mean is 74.2162791

The standard deviation is 18.2218379

(b) Proportion of Students within 1 Standard Deviation of the Mean

## [1] 0.8010336

This suggests that the students’ pulses are not normally distributed. It should be approximately 68% within one standard deviation of the mean, but rather it is 80%, showing that it is too far off of the empirical rule to be normally distributed.

(c) QQ Plot of Pulses

This suggests that student pulses are not normally distributed because the data varies too much from the reference line too much for it to be normal.

(d) 99% Confidence Interval for Average Student Pulse

The 99% confidence interval for the average pulse of the students is (73.0218512, 75.4107069)

Problem III

(a) Histogram of hsGPA

No, this data does not appear to be normally distributed because the data is skewed to the right side of the x-axis. It is a very unbalance histogram, making it not normally distributed.

(b) 90% Confidence Interval for Average Student GPA

The 90% confidence interval for the average GPA of the students is (3.6170576, 3.6450424)

(c)

The 90% condfidence level of 3.61 to 3.64 reflects that 90% of the students high school GPAs will be between the values of 3.61 to 3.64. This also shows how skewed the data is because such a large amount of the data lies within such a small interval, making it much harder for it to be normally distributed. This interval is relatively small to the range of all the data, and the spread of the data and histogram above.

(d)

Yes, you need to assume that the data is normally distributed in order for the interval to be valid. This is one of the requirements for a confidence interval to be valid. This is due to the fact the curve of normal distribution allows for even distribution and the middle of the bell curve would contain the confidence interval, where it may be skewed the interval would then not be valid.

APPENDIX OF CODE

student=read.csv("~/Desktop/STA 032 - R HW 1_files/student.csv")
meanh = mean(student$height)
sdh = sd(student$height)
the.mean = mean(student$height)
the.sd = sd(student$height)
lower.bounds = the.mean - 2*the.sd
upper.bounds = the.mean + 2*the.sd
mean(student$height > lower.bounds & student$height < upper.bounds)
qqnorm(student$height, main = "Normal Probability Plot for Student Height")
qqline(student$height)
stuff = t.test(student$height, conf.level = 0.95)
meanp = mean(student$pulse)
sdp = sd(student$pulse)
the.meanp = mean(student$pulse)
the.sdp = sd(student$pulse)
lower.boundsp = the.meanp - 1*the.sdp
upper.boundsp = the.meanp + 1*the.sdp
mean(student$pulse > lower.boundsp & student$pulse < upper.boundsp)
qqnorm(student$pulse, main = "Normal Probability Plot for Student Pulse")
qqline(student$pulse)
stuffp = t.test(student$pulse, conf.level = 0.99)
hist(student$hsGPA, main = "High School GPA", xlab = "GPA")
stuffg = t.test(student$hsGPA, conf.level = 0.90)