Please put the answers for Part I next to the question number (2pts each):
7a. Describe the two distributions (2pts).
The distribution of A is unimodal and slightly skewed to the right by the potential outliers. Provided the size of the sample is large enough, we can assume distribution is nearly normal.
The distribution of B appears to be normal and with much smaller range.
7b. Explain why the means of these two distributions are similar but the standard deviations are not (2 pts).
According to the Central Limit Theorem, for any population distribution with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of the sample mean \(\bar { X }\) is approximately normal with mean \(\mu\) and standard deviation \({ \sigma }/{ \sqrt { n } }\), and the approximation improves as n increases. This explains the difference in standard deviation for our population and sample, where \({ 3.22 }/{ \sqrt{ 30 }} \approx 0.59\).
7c. What is the statistical principal that describes this phenomenon (2 pts)?
Central limit theorem.
Consider the four datasets, each with two columns (x and y), provided below.
For each column, calculate (to two decimal places):
## x y
## 1 9 7.5
## 2 9 7.5
## 3 9 7.5
## 4 9 7.5
OR
## x y
## 1 9 7.6
## 2 9 8.1
## 3 9 7.1
## 4 8 7.0
## x y
## 1 3.3 2
## 2 3.3 2
## 3 3.3 2
## 4 3.3 2
## x y
## x 1.00 0.82
## y 0.82 1.00
## x y
## x 1.00 0.82
## y 0.82 1.00
## x y
## x 1.00 0.82
## y 0.82 1.00
## x y
## x 1.00 0.82
## y 0.82 1.00
Model 1 meets linearity and constant variability conditions, however, normality is not met. It is not appropriate for a linear regression model.
Model 2 doesn’t meet any of the conditions required for a linear regression.
Model 3 can be used tentatively as it meets necessary conditions but has outliers.
Model 4 doens’t meet linearity, constant variability or normality conditions. Not appropriate to use.
All of the datasets above have the same mean, standard deviation, correlation, and relatively the same median. Plotting, however, reveals that not all satisfy conditions of normality, linearity and constant variability. Hence, visualizations are very important to see the patterns not visible when only looking at the basic summary statistics.