Stat 3301 Homework 2

Part 1.1:

From the plot produced for NBA Forwards in the 2013-2014 Season, there is a positive relationship between height and weight for these players. There are few outliers, therefore it can be classified as a weak postive relationship.

Part 1.2:

For this linear model, we will have the form

weight_i = ß₀ + height_i*ß₁ + e_i; e_i ~ N(0, σ²)

We assume that these e_i values are independent of one another and identically distributed; one random error will have no bearing on the random component of another data point. This e_i should follow a normal distribution, because the E(e_i)=0 and the Var(e_i)= σ².

For this regression model

E(wt|ht) =

V(wt|ht) = σ², estimated by:

Part 1.3:

B₀ =

B₁ =

ß₀ = -73.9018
ß₁ = 3.7774
estimated σ² = 253.2537

From this linear model fit, we obtain the model

weight_i = -73.9018 + height_i*3.7774 + e_i; e_i ~ N(0, 253.2537)

From this fitted model, we see that the intercept is -73.9018 and the slope is 3.7774. This means that if the height were to be zero inches, we would expect a weight of -73.9018 pounds. Clearly this estimation is unrealistic and this is because that our data is way outside of this range, as we are working with height primarily between 76 and 84 inches. As for the slope, we can expect an average increase of 3.7774 pounds per every incremental inch in height.

Part 1.4

To calculate the confidence interval at a 90% confidence level, we will use the equation…

Using this equation, we calculate a 90% confidence interval for the average weight of forwards who are 81.5" tall to be (231.7969, 236.1224). We are 90% confident that the average weight of forwards who are 81.5" tall in a sample would be within the interval (231.7969, 236.1224).

Part 1.5

Band Formula =

Part 1.6

Comparing the models for Forwards and Guards, we see a few differences in their fitted values. First, the ß₀ are quite different, but since neither value actually makes sense, this difference is neglible. Next, the B₁ are different. For guards, the average increase in weight is 4.338 pounds for each incremental inch, while for forwards it is only 3.7774 pounds. This means, that as guards become taller, we expect a larger increase in weight. Finally, the estimated variance is larger for our forwards model than it is for the guards. This means we have higher variability in our forward data points, and that we expect a larger random component in our linear model. This is represented by larger bands when it comes to our confidence interval that we produced in part 1.5.

Part 2.8.1

α refers to the fixed component, the mean function of our regression model. This is the part of this new regression model that calculates our estimate for the sample values. The other parameters other than α attribute a random component e_i and the deviation from the sample mean.

Part 2.8.2

Part 3 ~ 2.16.1

The simple linear regression model corresponding to the graph above is
log(Fertility Rate) = 2.6655 - 0.2071*log(GDP)

Part 3 ~ 2.16.2

From the plot above we see a negative relationship between the data. As the log(GDP) increases by 1, we see that the log(Fertility Rate) decrease by 0.2071. If the log(GDP) is close to 0, we would expect the log(Fertility Rate) to be close to 2.6655, the Y-intercept.

Part 3 ~ 2.16.3

For the UN data, we will test the hypothesis that the slope is 0 versus the alternative that it is negative using a one-sided test. The hypotheses are as follows…

H₀: ß₁ = 0
H₁: ß₁ < 0

Under the null hypothesis the test statistic follows a t distribution. The test statistic t = -14.7851. The p-value corresponding to this test stat is equal to 4.531178410^{-34}. This will hold a high significant level, we’ll say it is significant at the 99% confidence level. Since this p-value is very small, we reject H₀ and conclude that we have significant evidence to conclude H₁. This means that we have significant evidence to conclude that the slope/ß₁ is less than 0.

Part 4 ~ 2.19.1

Part 4 ~ 2.19.2

For the Hamilton data, we will test the hypothesis that the value y is equal to 1 versus the alternative that it is not 1 using a two-sided test. The hypotheses are as follows…

H₀: y = 1
H₁: y ≠ 0

For this hypothesis test we get a t-statistic (t-distribution) of 0.587 and a p-value of 0.56. We do not have significant evidence to reject H₀ and we conclude that y is equal to 1, which supports the information in the problem that the exponent y is close to 1.

Part 4 ~ 2.19.3

In this graph, the response variable is Hamilton, the frequncy with which Hamilton used a word, and the predictor variable is HamiltonRank, the rank of that word among the words that Hamilton used. Using this model we can clearly state the relationship between the log of both the rank and count of the words. As the log of the rank increases by 1, we see that the log of the count will decrease by 1.122. In comparison to the first plot, we have much more data near the right tail end of our regression, so this model will fit less for the most common words that we find.
B₀ =

B₁ =

B₀ = 4.7712
B₁ = -1.0076
log(Hamilton) = 4.7712 + -1.0076*log(HamiltonRank)

In this graph, the response variable is Hamilton, the frequncy with which Hamilton used a word, and the predictor variable is HamiltonRank, the rank of that word among the words that Hamilton used. Using this model we can clearly state the relationship between the log of both the rank and count of the words. As the log of the rank increases by 1, we see that the log of the count will decrease by 1.279. In comparison to the previous plots, we see a steeper negative slope, indicating that the right tail of the data is heavily weight the regression model. A significant portion of the data falls below the right tail showing it is not a good model for ranks past 50.
B₀ =

B₁ =

B₀ = 5.4557
B₁ = -1.2791
log(Hamilton) = 5.4557 + -1.2791*log(HamiltonRank)