MATH/STAT H361: Homework 3

Name:

Problem 3

Dr. Suzanne Rohrback used a novel approach in a series of experiments to examine calcium-binding proteins. The data from one experiment are provided in the Fluorescence dataset in the Stat2Data package. The variable Calcium is the log of the free calcium concentration and ProteinProp is the proportion of protein bound to calcium.

(a) Find the regression line for predicting the proportion of protein bound to calcium from the transformed free calcium concentration.

library(Stat2Data)
data("Fluorescence")
head("Fluorescence")

[1] "Fluorescence"

fluorescence_model <- lm(ProteinProp ~ Calcium, data = Fluorescence)
summary(fluorescence_model)


Call:
lm(formula = ProteinProp ~ Calcium, data = Fluorescence)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22712 -0.09454  0.00176  0.10410  0.21375 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.06586    0.08876   23.27   <2e-16 ***
Calcium      0.17514    0.01107   15.82   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1199 on 49 degrees of freedom
Multiple R-squared:  0.8363,    Adjusted R-squared:  0.8329 
F-statistic: 250.3 on 1 and 49 DF,  p-value: < 2.2e-16

(b) What is the regression standard error?

The regression SE is 0.01107 for the coefficient and 0.08876 for the intercept.

(c) Plot the regression line and all of the points on a scatterplot. Does the regression line appear to provide a good fit?

The regression line does not seem like a food fit, the data seems quadratic and not linear in nature.

plot(ProteinProp ~ Calcium, data = Fluorescence)
abline(fluorescence_model)

(d) Analyze the residual plots. Are conditions for the regression model met?

I would say that the conditions for a regression model are not met. It does not seem like there is constant variance based on the residuals vs fitted plot.

plot(fluorescence_model)

NA

Problem 4

Researchers were interested in looking for an association between body size and the number of eggs produced by a moth. BodyMass and Eggs are both recorded for 39 moths in the dataset MothEggs in Stat2Data.

(a) Before looking at the data, would you expect the association between body mass and number of eggs to be positive or negative? Explain.

I would expect the association between body mass and the number of eggs to be positive, as a higher body mass might create more space in the mother moth for eggs.

(b) Fit a linear regression model for predicting Eggs from BodyMass. Is the association between the two variables statistically significant? Justify your answer.

Yes, the association is statistically significant, at the 0.01 level, as shown by the “**” next to the p-value.

data("MothEggs")
mothmodel <- lm(Eggs ~ BodyMass, data = MothEggs)
summary(mothmodel)


Call:
lm(formula = Eggs ~ BodyMass, data = MothEggs)

Residuals:
     Min       1Q   Median       3Q      Max 
-157.586  -17.187    3.162   25.790   67.960 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)    24.38      45.38   0.537  0.59423   
BodyMass       79.86      26.69   2.992  0.00492 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 44.75 on 37 degrees of freedom
Multiple R-squared:  0.1948,    Adjusted R-squared:  0.173 
F-statistic:  8.95 on 1 and 37 DF,  p-value: 0.004916

(c) The conditions for inference are not met, primarily because there is one very unusual observation. Identify this observation and what makes it unusual.

Point 39 is the one unusual observation, it laid 0 eggs unlike every other datapoint.

plot(mothmodel)

NA

(d) Fit the model again after removing this unusual point. Compare the estimated slopes and comment on the difference between the two models.

The slope of the new model is 0.000911 whereas in the old model it was 0.00492. The association of the new model has a higher level of statistical significance (0.001) than the old model (0.01).

mothmodelno39 <- MothEggs[-39,]
head(mothmodelno39)


NewMothModel <- lm(Eggs ~ BodyMass, data = mothmodelno39)

summary(NewMothModel)


Call:
lm(formula = Eggs ~ BodyMass, data = mothmodelno39)

Residuals:
     Min       1Q   Median       3Q      Max 
-115.079  -20.785   -0.846   21.763   63.917 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    29.56      37.28   0.793 0.433043    
BodyMass       79.24      21.92   3.615 0.000911 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 36.75 on 36 degrees of freedom
Multiple R-squared:  0.2664,    Adjusted R-squared:  0.246 
F-statistic: 13.07 on 1 and 36 DF,  p-value: 0.0009108

(e) Do you think we were justified in removing this unusual point from the model? Why or why not?

Yes. The unusual point was an outlier and possibly a fluke in our data with no eggs being laid. 0 eggs had a large effect on our model.

Problem 5

(a) In R, sample 100 datapoints from a uniform distribution with min -1 and max 1.

mydata <- runif(100, -1, 1)

(b) Before generating a normal Q-Q plot, predict what you will see.

Hint: How might the tails of your uniform distribution differ from the tails of a normal distribution?

The tails of my uniform distribution will likely be different from that of a normal distribution because the sampling methods allows for pulling data from any extreme between -1 and 1, there is not necessarily a normal tendency for this data.

(c) Generate the Q-Q plot for your uniformly sampled data. Comment on where and why it deviates from the Q-Q line. This is

The data for Q-Q plot deviates from the Q-Q line at the extremities, which makes sense as the Q-Q line follows the normal distribution and our data is not centered around a particular value and will have more points farther away from the “mean”.

qqnorm(mydata)
qqline(mydata)

LS0tCnRpdGxlOiAiU1RBVDM2MSBIVyAzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIE1BVEgvU1RBVCBIMzYxOiBIb21ld29yayAzCgpOYW1lOgoKIyMgUHJvYmxlbSAzCgpEci4gU3V6YW5uZSBSb2hyYmFjayB1c2VkIGEgbm92ZWwgYXBwcm9hY2ggaW4gYSBzZXJpZXMgb2YgZXhwZXJpbWVudHMgdG8gZXhhbWluZSBjYWxjaXVtLWJpbmRpbmcgcHJvdGVpbnMuIFRoZSBkYXRhIGZyb20gb25lIGV4cGVyaW1lbnQgYXJlIHByb3ZpZGVkIGluIHRoZSAqKkZsdW9yZXNjZW5jZSoqIGRhdGFzZXQgaW4gdGhlIFN0YXQyRGF0YSBwYWNrYWdlLiBUaGUgdmFyaWFibGUgKkNhbGNpdW0qIGlzIHRoZSBsb2cgb2YgdGhlIGZyZWUgY2FsY2l1bSBjb25jZW50cmF0aW9uIGFuZCAqUHJvdGVpblByb3AqIGlzIHRoZSBwcm9wb3J0aW9uIG9mIHByb3RlaW4gYm91bmQgdG8gY2FsY2l1bS4KCioqKGEpKiogRmluZCB0aGUgcmVncmVzc2lvbiBsaW5lIGZvciBwcmVkaWN0aW5nIHRoZSBwcm9wb3J0aW9uIG9mIHByb3RlaW4gYm91bmQgdG8gY2FsY2l1bSBmcm9tIHRoZSB0cmFuc2Zvcm1lZCBmcmVlIGNhbGNpdW0gY29uY2VudHJhdGlvbi4KCmBgYHtyfQpsaWJyYXJ5KFN0YXQyRGF0YSkKZGF0YSgiRmx1b3Jlc2NlbmNlIikKaGVhZCgiRmx1b3Jlc2NlbmNlIikKZmx1b3Jlc2NlbmNlX21vZGVsIDwtIGxtKFByb3RlaW5Qcm9wIH4gQ2FsY2l1bSwgZGF0YSA9IEZsdW9yZXNjZW5jZSkKc3VtbWFyeShmbHVvcmVzY2VuY2VfbW9kZWwpCmBgYAoKKiooYikqKiBXaGF0IGlzIHRoZSByZWdyZXNzaW9uIHN0YW5kYXJkIGVycm9yPwoKVGhlIHJlZ3Jlc3Npb24gU0UgaXMgMC4wMTEwNyBmb3IgdGhlIGNvZWZmaWNpZW50IGFuZCAwLjA4ODc2IGZvciB0aGUgaW50ZXJjZXB0LgoKKiooYykqKiBQbG90IHRoZSByZWdyZXNzaW9uIGxpbmUgYW5kIGFsbCBvZiB0aGUgcG9pbnRzIG9uIGEgc2NhdHRlcnBsb3QuIERvZXMgdGhlIHJlZ3Jlc3Npb24gbGluZSBhcHBlYXIgdG8gcHJvdmlkZSBhIGdvb2QgZml0PwoKVGhlIHJlZ3Jlc3Npb24gbGluZSBkb2VzIG5vdCBzZWVtIGxpa2UgYSBmb29kIGZpdCwgdGhlIGRhdGEgc2VlbXMgcXVhZHJhdGljIGFuZCBub3QgbGluZWFyIGluIG5hdHVyZS4KCmBgYHtyfQpwbG90KFByb3RlaW5Qcm9wIH4gQ2FsY2l1bSwgZGF0YSA9IEZsdW9yZXNjZW5jZSkKYWJsaW5lKGZsdW9yZXNjZW5jZV9tb2RlbCkKYGBgCgoqKihkKSoqIEFuYWx5emUgdGhlIHJlc2lkdWFsIHBsb3RzLiBBcmUgY29uZGl0aW9ucyBmb3IgdGhlIHJlZ3Jlc3Npb24gbW9kZWwgbWV0PwoKSSB3b3VsZCBzYXkgdGhhdCB0aGUgY29uZGl0aW9ucyBmb3IgYSByZWdyZXNzaW9uIG1vZGVsIGFyZSBub3QgbWV0LiBJdCBkb2VzIG5vdCBzZWVtIGxpa2UgdGhlcmUgaXMgY29uc3RhbnQgdmFyaWFuY2UgYmFzZWQgb24gdGhlIHJlc2lkdWFscyB2cyBmaXR0ZWQgcGxvdC4KCmBgYHtyfQpwbG90KGZsdW9yZXNjZW5jZV9tb2RlbCkKCmBgYAoKIyMgUHJvYmxlbSA0CgpSZXNlYXJjaGVycyB3ZXJlIGludGVyZXN0ZWQgaW4gbG9va2luZyBmb3IgYW4gYXNzb2NpYXRpb24gYmV0d2VlbiBib2R5IHNpemUgYW5kIHRoZSBudW1iZXIgb2YgZWdncyBwcm9kdWNlZCBieSBhIG1vdGguICpCb2R5TWFzcyogYW5kICpFZ2dzKiBhcmUgYm90aCByZWNvcmRlZCBmb3IgMzkgbW90aHMgaW4gdGhlIGRhdGFzZXQgKipNb3RoRWdncyoqIGluIFN0YXQyRGF0YS4KCioqKGEpKiogQmVmb3JlIGxvb2tpbmcgYXQgdGhlIGRhdGEsIHdvdWxkIHlvdSBleHBlY3QgdGhlIGFzc29jaWF0aW9uIGJldHdlZW4gYm9keSBtYXNzIGFuZCBudW1iZXIgb2YgZWdncyB0byBiZSBwb3NpdGl2ZSBvciBuZWdhdGl2ZT8gRXhwbGFpbi4KCkkgd291bGQgZXhwZWN0IHRoZSBhc3NvY2lhdGlvbiBiZXR3ZWVuIGJvZHkgbWFzcyBhbmQgdGhlIG51bWJlciBvZiBlZ2dzIHRvIGJlIHBvc2l0aXZlLCBhcyBhIGhpZ2hlciBib2R5IG1hc3MgbWlnaHQgY3JlYXRlIG1vcmUgc3BhY2UgaW4gdGhlIG1vdGhlciBtb3RoIGZvciBlZ2dzLgoKKiooYikqKiBGaXQgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCBmb3IgcHJlZGljdGluZyAqRWdncyogZnJvbSAqQm9keU1hc3MqLiBJcyB0aGUgYXNzb2NpYXRpb24gYmV0d2VlbiB0aGUgdHdvIHZhcmlhYmxlcyBzdGF0aXN0aWNhbGx5IHNpZ25pZmljYW50PyBKdXN0aWZ5IHlvdXIgYW5zd2VyLgoKWWVzLCB0aGUgYXNzb2NpYXRpb24gaXMgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCwgYXQgdGhlIDAuMDEgbGV2ZWwsIGFzIHNob3duIGJ5IHRoZSAiXCpcKiIgbmV4dCB0byB0aGUgcC12YWx1ZS4KCmBgYHtyfQpkYXRhKCJNb3RoRWdncyIpCm1vdGhtb2RlbCA8LSBsbShFZ2dzIH4gQm9keU1hc3MsIGRhdGEgPSBNb3RoRWdncykKc3VtbWFyeShtb3RobW9kZWwpCmBgYAoKKiooYykqKiBUaGUgY29uZGl0aW9ucyBmb3IgaW5mZXJlbmNlIGFyZSBub3QgbWV0LCBwcmltYXJpbHkgYmVjYXVzZSB0aGVyZSBpcyBvbmUgdmVyeSB1bnVzdWFsIG9ic2VydmF0aW9uLiBJZGVudGlmeSB0aGlzIG9ic2VydmF0aW9uIGFuZCB3aGF0IG1ha2VzIGl0IHVudXN1YWwuCgpQb2ludCAzOSBpcyB0aGUgb25lIHVudXN1YWwgb2JzZXJ2YXRpb24sIGl0IGxhaWQgMCBlZ2dzIHVubGlrZSBldmVyeSBvdGhlciBkYXRhcG9pbnQuCgpgYGB7cn0KcGxvdChtb3RobW9kZWwpCgpgYGAKCioqKGQpKiogRml0IHRoZSBtb2RlbCBhZ2FpbiBhZnRlciByZW1vdmluZyB0aGlzIHVudXN1YWwgcG9pbnQuIENvbXBhcmUgdGhlIGVzdGltYXRlZCBzbG9wZXMgYW5kIGNvbW1lbnQgb24gdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgdHdvIG1vZGVscy4KClRoZSBzbG9wZSBvZiB0aGUgbmV3IG1vZGVsIGlzIDAuMDAwOTExIHdoZXJlYXMgaW4gdGhlIG9sZCBtb2RlbCBpdCB3YXMgMC4wMDQ5Mi4gVGhlIGFzc29jaWF0aW9uIG9mIHRoZSBuZXcgbW9kZWwgaGFzIGEgaGlnaGVyIGxldmVsIG9mIHN0YXRpc3RpY2FsIHNpZ25pZmljYW5jZSAoMC4wMDEpIHRoYW4gdGhlIG9sZCBtb2RlbCAoMC4wMSkuCgpgYGB7cn0KbW90aG1vZGVsbm8zOSA8LSBNb3RoRWdnc1stMzksXQpoZWFkKG1vdGhtb2RlbG5vMzkpCgpOZXdNb3RoTW9kZWwgPC0gbG0oRWdncyB+IEJvZHlNYXNzLCBkYXRhID0gbW90aG1vZGVsbm8zOSkKCnN1bW1hcnkoTmV3TW90aE1vZGVsKQpgYGAKCioqKGUpKiogRG8geW91IHRoaW5rIHdlIHdlcmUganVzdGlmaWVkIGluIHJlbW92aW5nIHRoaXMgdW51c3VhbCBwb2ludCBmcm9tIHRoZSBtb2RlbD8gV2h5IG9yIHdoeSBub3Q/CgpZZXMuIFRoZSB1bnVzdWFsIHBvaW50IHdhcyBhbiBvdXRsaWVyIGFuZCBwb3NzaWJseSBhIGZsdWtlIGluIG91ciBkYXRhIHdpdGggbm8gZWdncyBiZWluZyBsYWlkLiAwIGVnZ3MgaGFkIGEgbGFyZ2UgZWZmZWN0IG9uIG91ciBtb2RlbC4KCiMjIFByb2JsZW0gNQoKKiooYSkqKiBJbiBSLCBzYW1wbGUgMTAwIGRhdGFwb2ludHMgZnJvbSBhIHVuaWZvcm0gZGlzdHJpYnV0aW9uIHdpdGggbWluIC0xIGFuZCBtYXggMS4KCmBgYHtyfQpteWRhdGEgPC0gcnVuaWYoMTAwLCAtMSwgMSkKCmBgYAoKKiooYikqKiAqQmVmb3JlKiBnZW5lcmF0aW5nIGEgbm9ybWFsIFEtUSBwbG90LCBwcmVkaWN0IHdoYXQgeW91IHdpbGwgc2VlLgoKKkhpbnQqOiBIb3cgbWlnaHQgdGhlIHRhaWxzIG9mIHlvdXIgdW5pZm9ybSBkaXN0cmlidXRpb24gZGlmZmVyIGZyb20gdGhlIHRhaWxzIG9mIGEgbm9ybWFsIGRpc3RyaWJ1dGlvbj9cCgpUaGUgdGFpbHMgb2YgbXkgdW5pZm9ybSBkaXN0cmlidXRpb24gd2lsbCBsaWtlbHkgYmUgZGlmZmVyZW50IGZyb20gdGhhdCBvZiBhIG5vcm1hbCBkaXN0cmlidXRpb24gYmVjYXVzZSB0aGUgc2FtcGxpbmcgbWV0aG9kcyBhbGxvd3MgZm9yIHB1bGxpbmcgZGF0YSBmcm9tIGFueSBleHRyZW1lIGJldHdlZW4gLTEgYW5kIDEsIHRoZXJlIGlzIG5vdCBuZWNlc3NhcmlseSBhIG5vcm1hbCB0ZW5kZW5jeSBmb3IgdGhpcyBkYXRhLgoKKiooYykqKiBHZW5lcmF0ZSB0aGUgUS1RIHBsb3QgZm9yIHlvdXIgdW5pZm9ybWx5IHNhbXBsZWQgZGF0YS4gQ29tbWVudCBvbiB3aGVyZSBhbmQgd2h5IGl0IGRldmlhdGVzIGZyb20gdGhlIFEtUSBsaW5lLiBUaGlzIGlzCgpUaGUgZGF0YSBmb3IgUS1RIHBsb3QgZGV2aWF0ZXMgZnJvbSB0aGUgUS1RIGxpbmUgYXQgdGhlIGV4dHJlbWl0aWVzLCB3aGljaCBtYWtlcyBzZW5zZSBhcyB0aGUgUS1RIGxpbmUgZm9sbG93cyB0aGUgbm9ybWFsIGRpc3RyaWJ1dGlvbiBhbmQgb3VyIGRhdGEgaXMgbm90IGNlbnRlcmVkIGFyb3VuZCBhIHBhcnRpY3VsYXIgdmFsdWUgYW5kIHdpbGwgaGF2ZSBtb3JlIHBvaW50cyBmYXJ0aGVyIGF3YXkgZnJvbSB0aGUgIm1lYW4iLgoKYGBge3J9CnFxbm9ybShteWRhdGEpCnFxbGluZShteWRhdGEpCmBgYAo=

STAT361 HW 3

MATH/STAT H361: Homework 3

Problem 3

Problem 4

Problem 5