PSY 3801 - Homework #2

PART I

Exercise 5.2

Range: 93-55 = 38
Variance:

\[ \bar{X} = \frac{(66 + 75 + 72 + 71 + 55 + 56 + 72 + 93 + 73 + 72 + 72 + 73 + 91 + 66 + 71 + 56 + 59)}{17} = 70.177\]

\[ \Sigma(X-\overline{X})^2 = 1800.471\]

\[N-1 = 17-1 = 16\]

\[s^2 = \frac{1800.471}{16} = 112.529\]

Standard Deviation:

\[s = \sqrt{112.529} = 10.608\]

Exercise 5.6

Data sets

Original: {12, 15, 34, 28, 27, 9, 2}
Original + 3: {15, 18, 37, 31, 30, 12, 5}
Original - 3: {9, 12, 31, 25, 24, 6, -1}

Original data: mean = 18.143, sd = 11.682 \[\bar{X} = \frac{(12 + 15 + 34 + 28 + 27 + 9 + 2)}{7} = 18.143\] \[ \Sigma(X-\overline{X})^2 = 818.857\] \[N-1 = 7-1 = 6\] \[s^2 = \frac{818.857}{6} = 136.476\] \[s = \sqrt{136.476} = 11.682\]

Original + 3: mean = 21.143, sd = 11.682 \[\bar{X} = \frac{(15 + 18 + 37 + 31 + 30 + 12 + 5)}{7} = 21.143\] \[ \Sigma(X-\overline{X})^2 = 818.857\] \[N-1 = 7-1 = 6\] \[s^2 = \frac{818.857}{6} = 136.476\] \[s = \sqrt{136.476} = 11.682\]

Original - 3: mean = 15.143, sd = 11.682 \[\bar{X} = \frac{(9 + 12 + 31 + 25 + 24 + 6 - 1)}{7} = 15.143\] \[ \Sigma(X-\overline{X})^2 = 818.857\] \[N-1 = 7-1 = 6\] \[s^2 = \frac{818.857}{6} = 136.476\] \[s = \sqrt{136.476} = 11.682\]

As these transformations show, adding/subtracting a constant affects the mean such that the mean changes by the amount of the constant k (in the case of addition, is mean + k, and in the case of subtraction, is mean - k). However, because all data are transformed by the same k, the variability in the transformed dataset is the same as the original dataset.

Exercise 5.7

Data sets

Original*3:{36, 45, 102, 84, 81, 27, 6}
Original/3:{4, 5, 11.333, 9.333, 9, 3, 0.667}

Original multiplied by 3: mean = 54.429 , sd = 35.047 \[\bar{X} = \frac{(9 + 12 + 31 + 25 + 24 + 6 - 1)}{7} = 54.429\] \[ \Sigma(X-\overline{X})^2 = 7369.714\] \[N-1 = 7-1 = 6\] \[s^2 = \frac{7369.714}{6} = 1228.286\] \[s = \sqrt{1228.286} = 35.047\]

Original divided by 3: mean = 6.048 , sd = 3.894 \[\bar{X} = \frac{(9 + 12 + 31 + 25 + 24 + 6 - 1)}{7} = 6.048\] \[ \Sigma(X-\overline{X})^2 = 90.984\] \[N-1 = 7-1 = 6\] \[s^2 = \frac{90.984}{6} = 15.164\] \[s = \sqrt{15.164} = 3.894\]

As these transformations show, multiplying/dividing a dataset by a constant k multiplies/divides the standard deviation by the constant k, respectively. \[ 11.682*3 = 35.047\] \[ \frac{11.682}{3} = 3.894\] Similarly, multiplying/dividing a dataset by a constant k multiplies/divides the mean by the constant k, respectively. \[18.143*3 = 54.429\] \[\frac{18.143}{3} = 6.048\]

Exercise 5.8

Original data set: {5, 8, 3, 8, 6, 9, 9, 7}

\[\bar{X} = \frac{(5 + 8 + 3 + 8 + 6 + 9 + 9 + 7)}{8} = 6.875\] \[ \Sigma(X-\overline{X})^2 = 30.875\] \[N-1 = 8-1 = 7\] \[s^2 = \frac{30.875}{7}= 4.411\] \[s = \sqrt{4.411} = 2.1002\]

Transformed data set: \[ \frac{observation}{\sigma^2} = (2.381, 3.809, 1.429, 3.809, 2.857, 4.285, 4.285, 3.333)\] \[ \Sigma(X-\overline{X})^2 = 75\] \[N-1 = 8-1 = 7\] \[s^2 = \frac{7}{7}= 1\] \[s = \sqrt{1} = 1\]

Exercise 9.22

“Reliable”, in this context, has to do with how well the findings from the sample would relate to the true correlation between two variables within the population.
With smaller samples, the potential for a large sampling error becomes greater. This could be brought about by an underlying variable not controlled for, perhaps some bit of demographic information (age group, location, etc.)

PART II

Question #1

Negatively skewed
Positively skewed
Negatively skewed
Not skewed

Question #2

\(\frac{99.4-98.6}{0.7} = 1.143\)
\(\frac{101.1-98.6}{0.7} = 3.57\)
\(\frac{98.5-98.6}{0.7} = -0.143\)
\(\frac{94.1-98.6}{0.7} = -6.429\)

Question #3

Variance is based on squared deviations and is less easy to interpret; taking the square root gives a measure of variability that is in the units of the data.
We square the deviation scores to change the sign of negative deviations.

Question #4

\(\frac{(6 + 3 + 5 + 0 + 6)}{5} = 4\)
Because the sum of the first five deviation scores is -1, and the sum of deviations in a complete data set will always equal zero, the missing deviation value must be equal to 1. Thus: (-3 1 1 -4 4 1) is the complete set of deviation scores.

\(\Sigma(X-\overline{X})^2 = 44\)
\(N-1 = 6-1 = 5\)
\(s^2 = \frac{44}{5}= 8.8\)
\(s = \sqrt{8.8} = 2.967\)

IQR (ordered): {-3, 0, 2, 5, 6, 6}
Median values: 2, 5
Median = \(\frac{7}{2} = 3.5\)

Question #5

There appears to be a reasonably strong positive linear correlation between the number of hours spent studying and the test score earned on the math test. Stated differently, the more hours spent studying appears to yield a higher math test score.

\(N-1 = 10-1 = 9\)
\(Covariance = \frac{\Sigma{((Time - \overline{Time})(Score - \overline{Score}))}}{9}\)
\(r = \frac{Covariance(time,score)}{(s_{time})(s_{score})}\)
- \(cov = \frac{41.95}{9} = 4.661\)
- \(s_{time} = \sqrt{\frac{9.8}{9}} = \sqrt{1.089} = 1.044\)
- \(s_{score} = \sqrt{\frac{253.301}{9}} = \sqrt{28.145} = 5.305\)
- \(r = \frac{4.661}{(1.044)(5.305)} = 0.842\)

This correlation coefficient \(r = 0.842\) is considered to be a strong positive correlation, as previously estimated.

Question #6

\(N-1 = 10-1 = 9\)
\(Covariance = \frac{\Sigma{((Mood - \overline{Mood})(Pos.Actions - \overline{Pos.Actions}))}}{9}\)
\(r = \frac{Covariance(mood,pos.actions)}{(s_{mood})(s_{pos.actions})}\)
- \(cov = \frac{-46.44}{9} = -5.16\)
- \(s_{mood} = \sqrt{\frac{31.329}{9}} = \sqrt{3.481} = 1.866\)
- \(s_{pos.actions} = \sqrt{\frac{86.4}{9}} = \sqrt{9.6} = 3.098\)
- \(r = \frac{-5.16}{(1.866)(3.098)} = -0.893\)

Mood Z-scores:
- \(s_{mood} = 1.866\)
- \(\overline{mood} = 5.99\)
- \(Z-transformation = \frac{mood score - \overline{mood}}{s_{mood}} = \frac{mood score - 5.99}{1.866} = (1.131, -0.799, 0.649, 0.059, -1.71, -1.281, 0.756, -0.316, 1.131, 0.381)\)
Positive action Z-scores:
- \(s_{pos.action} = 3.098\)
- \(\overline{pos.action} = 10.66\)
- \(Z-transformation = \frac{pos.action - \overline{pos.action}}{s_{pos.action}} = \frac{pos.action - 10.66}{3.098} = (-0.516, 1.097, -1.162, 0.129, 2.066, 0.452, -0.516, -0.194, -1.162, -0.194)\)

The relationship could be explained by stating that as mood score decreases, the number of positive actions tends to increase; conversely, as the number of positive interactions decreases, the mood score tends to increase.

Question #7

Charlie is correct, because the distribution of z-scores is normally distributed. Sasha’s graph shows a distribution that is negatively skewed.