R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Question A

“5.2.9 The basal diameter of a sea anemone is an indicator of its age. The density curve shown here represents the distribution of diameters in a certain large population of anemones; the population mean diameter is 4.2 cm, and the standard deviation is 1.4 cm. Let Y ̅ represent the mean diameter of 25 anemones randomly chosen from the population.

Find the approximate value of Pr{4≤Y ̅≤5}.
Why is your answer to part (a) approximately correct even though the population distribution of diameters is clearly not normal? Would the same approach be equally valid for a sample of size 2 rather than 25? Why or why not?"

#(a) The approximate value of Pr{4 ≤ Y̅ ≤ 5}
#Answer = 0.27729439
a <- pnorm(4, 4.2, 1.4)
b <- pnorm(5, 4.2, 1.4)
answer <- b - a
answer

## [1] 0.2729439

#(b) With right skewed distribution curves, we still see the mean lying to the right of the median. The same approach would not be equally valid for a sample of size 2. With smaller sample sizes, the sample is disproportinately represented due to lack of data points/ The sample size most certainly influences the mean, and the variance of the distribution curve.

Question B

“6.2.7 For each of the following, decide whether the description fits the SD or the SE. (a) This quantity is a measure of the accuracy of the sample mean as an estimate of the population mean. (b) This quantity tends to stay the same as the sample size goes up. (c) This quantity tends to go down as the sample size goes up.”

## (a) is for SE because it is concerning the random sample take from the population rather than the population as a whole. This means that results obtained from the sample (whether biased or unbiased) will be a plausible representation of the whole population, but by no means will the results be the "true value."

## (b) is for SD because it is a measurement relative to the population mean. Elements that are influenced by the increase in sample size are the standard error of the sample mean-  the variance of the sample, and the range approximating the population mean becomes more precise as the sample sizes grows to a value close to the populaiton size. 

## (c) is for SE as mentioned above, the variance of the sample approximates the SD of a population as the sample size increases. The tails of the distribution curve are reduced in thickness as the sample size goes up.

Section 6.3 Confidence Interval for μ

Question C

“6.3.6 A zoologist measured tail length in 86 individuals, all in the one-year age group, of the deermouse Peromyscus. The mean length was 60.43 mm and the standard deviation was 3.06 mm. A 95% confidence interval for the mean is (59.77, 61.09). (a) True or false (and say why): We are 95% confident that the average tail length of the 86 individuals in the sample is between 59.77 mm and 61.09mm. (b) True or false (and say why): We are 95% confident that the average tail length of all the individuals in the population is between 59.77 mm and 61.09mm.”

#When the SD is unknown we perform a t-test in order to gain the 95% CI. The Student's t Distribution uses "s", the variance of the sample, rather than sigma (the variance of the population). 
#When we do know the mean and the SD, as we do in this case, then we determine our confidence interval. If we seek to gain a 95% CI, we calculate it by looking at 1 - alpha (where alpha represents the error). In this case it is 0.05 --> 1 - 0.05 = 0.95. 
###CI = 95% is represented by the "critical value" of a Z-score represented as z with a subscript of (2/alpha).
#The confidence interval does not descibe probability, it merely describes confidence over 

#(a) True: The sample CI does not represent that of the population. As we increase the sample size, we would obtain narrower CI intervals. Nonetheless, the CI will still contain the mean length of the deermouse Peromyscus. 
#(b) True: The sample size is greater than 30, which would imply that it can approximate the population mean. Thus, indicating that the t-distribution curve tails, potentially obtained form the calculation, have thinner tails which would be more representative of the normal distriution curve of the population.

Question D

“6.3.7 Refer to Exercise 6.3.6 (a) Without doing any computations, would an 80% confidence interval for the data in Exercise 6.3.6 be wider, narrower, or about the same? Explain. (b) Without doing any computations, if 500 mice were sampled rather than 86, would the 95% confidence interval listed in Exercise 6.3.6 be wider, narrower, or about the same? Explain.”

#(a) Comparing 95% CI interval ranges with an 80% CI interval range posited by the question, the latter would be narrower. This is because the ranges radiate outwardly from the mean. Considering that normal distribution curves are symmetrical, I'll simplify my explanation by focusing on the lower 50% of the curve, starting from the mean. Rather than covering 47.5% of the lower portion of the curve, an 80% CI only covers 40% of it.
#(b) As stated in 6.3.6 (a), increasing sample size creates narrower CI interval ranges.

Question E

“6.3.10 Human beta-endorphin (HBE) is a hormone secreted by the pituitary gland under conditions of stress. A researcher conducted a study to investigate whether a program of regular exercise might affect the resting (unstressed) concentration of HBE in the blood. He measured blood HBE levels, in January and again in May, from 10 participants in a physical fitness program. The results were as shown in the table.

Construct a 95% confidence interval for the population mean difference on HBE levels between January and May. (Hint: You need to use only the values in the right-hand column.)
Interpret the confidence interval from part (a). That is, explain what the interval tells you about HBE levels. (See Example 6.3.4 and 6.3.5)
Using your interval to support your answer, is there evidence that HBE levels are lower in May than January? (Hint: Does your interval include the value zero?)"

#(a) Calculate the 95% CI.
HBE_difference_data <- c(20, 18, 28, 0, 7, 34, 16, -5, 8, 4)
t.test(HBE_difference_data)

## 
##  One Sample t-test
## 
## data:  HBE_difference_data
## t = 3.3151, df = 9, p-value = 0.00901
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   4.129062 21.870938
## sample estimates:
## mean of x 
##        13

#CI 95% = (4.129062, 21.870938)

#(b) CIs tell us that we are 95% confident that the true mean is between 4.129062 and  21.870938. The only issue is that this is a small sample, comprised of 10 data points provided by 10 individuals. Thus, we know that the CI interval range, even though it is of 95% in this instance, will be very wide.

#(c) I would suggest that collecting more data points is necessary in order to gain a full picture concerning HBE hormone secretion during different months of the year. However, from this preliminary data, we can see a significant distinction between the mean of January and that of May. The CI 95% however, for the difference between both, will not consider the 0 value, as it is is outside of the lower limit of the CI 95% (this being 4.129062).

HW 3 - HBH 550