Homework 1

26 BE 7023 & 26 PH 7023: Advanced Biostatistics

Autumn, 2017

On transformations in linear regression + confidence and prediction bands

_______________________________________________________________________

Activate the package ‘MASS.’ Download the data ‘mammals.’

#install.packages("MASS")
#data(package="MASS")

Loading mammals dataset

data(mammals, package="MASS")

1. Describe the data. Include the size, top ten rows and summary statistics of the data.

dim(mammals)
## [1] 62  2

The dataset has 62 observations (rows) from 2 columns

head(mammals, 10)
##                    body brain
## Arctic fox        3.385  44.5
## Owl monkey        0.480  15.5
## Mountain beaver   1.350   8.1
## Cow             465.000 423.0
## Grey wolf        36.330 119.5
## Goat             27.660 115.0
## Roe deer         14.830  98.2
## Guinea pig        1.040   5.5
## Verbet            4.190  58.0
## Chinchilla        0.425   6.4

The top ten rows are shown above

2. Obtain a plot of the data with x = body weight and y = brain weight. Comment on the graph.

plot(mammals$body, mammals$brain, col="RED", pch=15, cex=1,
    xlab="Body Weight", ylab="Brain Weight", 
    main="Plot of Brain Wt. as a Function of Body Wt.")

##Brain weight is plotted on y-axis and Body weight on x-axis. There seems to be about 5 outlier values

3. Take natural logarithms of the weights and then plot the data. Make the scatter plot as descriptive as possible. Comment on the graph.

mammals1<-mammals[order(mammals$body),]
mammals1$logbody<-log(mammals1$body)
mammals1$logbrain<-log(mammals1$brain)
#head(mammals1)
#dim(mammals1)
#summary(mammals1)

###The scatterplot of Log(Brain Weight) vs. Log(Body Weight) shows more uniformly distributed and there seems to be a strong correlation between these two variables

4. Fit a simple linear regression model of ln(brain weight) on ln(body weight). Write the prediction equation.

US<-lm(logbrain~logbody, data=mammals1)
summary(US)
## 
## Call:
## lm(formula = logbrain ~ logbody, data = mammals1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71550 -0.49228 -0.06162  0.43597  1.94829 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.13479    0.09604   22.23   <2e-16 ***
## logbody      0.75169    0.02846   26.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6943 on 60 degrees of freedom
## Multiple R-squared:  0.9208, Adjusted R-squared:  0.9195 
## F-statistic: 697.4 on 1 and 60 DF,  p-value: < 2.2e-16

The prediction equation is: Log(Brain Weight)=2.13479 + 0.75169*(Log(Body Weight))

Report R-square and comment on it.

The R-square value is 0.9208. The adjusted R-square is 0.9195. P-value <2.2e-16

Thus about 92% of the variability in Log(Brain Weight) can be explained by Log(Body Weight)

Estimate the population standard deviation.

Residual standard error=0.6943=standard deviation of the population/sq.rt.(sample size)

Therefore, population standard deviation=0.6943*sq.rt(62)=5.474798

0.6953*sqrt(62)
## [1] 5.474798

5. Transform the regression model back to the original variables. Comment on the resultant model.

log(BrainWeight)=2.13479 + 0.75169xLog(BodyWeight)

log(BrainWeight)=log(8.4553) + Log(BodyWeight)^0.75169

log(BrainWeight)=log(8.4553xBodyWeight^0.75169)

BrainWeight=8.4553xBodyWeight)^0.75160

BrainWeight/(BodyWeight)^0.75160=8.4553

Comment: The ratio of Brain Weight to (Body Weight)^0.75 is a constant value of 8.4553

6. Build a 95% confidence band as well as prediction band around the regression line after transformations.

Plotting Log(Brain Weight) vs. Log(Body Weight)

plot(mammals1$logbody, mammals1$logbrain, col="RED", pch=15, cex=1,
    xlab="Log(Body Weight)", ylab="Log(Brain Weight)", 
    main="Plot of Log.Brain Wt. as a Function of Log.Body Wt.")
abline(US, col="black", lwd=2)

c<-qt(0.975, 62, lower.tail = TRUE)
c
## [1] 1.998972
mean(log(mammals1$body))
## [1] 1.337539
sd(log(mammals1$body))
## [1] 3.123128
mean(log(mammals1$body))-c*sd(log(mammals1$body))*(1/sqrt(62))
## [1] 0.5446717
mean(log(mammals1$body))+c*sd(log(mammals1$body))*(1/sqrt(62))
## [1] 2.130406
mean(log(mammals1$body))-c*sd(log(mammals1$body))*(sqrt(1+(1/62)))
## [1] -4.95565
mean(log(mammals1$body))+c*sd(log(mammals1$body))*(sqrt(1+(1/62)))
## [1] 7.630728

Determine Confidence and Prediction values for the Log(Body Weight)

US1<-predict(US, newdata=mammals1, int="c")
head(US1)
##                                   fit        lwr        upr
## Lesser short-tailed shrew -1.84788197 -2.2648403 -1.4309237
## Little brown bat          -1.32685299 -1.7084166 -0.9452893
## Big brown bat             -0.70076691 -1.0409859 -0.3605480
## Mouse                     -0.70076691 -1.0409859 -0.3605480
## Musk shrew                -0.14774646 -0.4529196  0.1574267
## Star-nosed mole            0.01998741 -0.2749092  0.3148841
US2<-predict(US, newdata=mammals1, int="p")
head(US2)
##                                   fit       lwr        upr
## Lesser short-tailed shrew -1.84788197 -3.297920 -0.3978443
## Little brown bat          -1.32685299 -2.767112  0.1134060
## Big brown bat             -0.70076691 -2.130628  0.7290946
## Mouse                     -0.70076691 -2.130628  0.7290946
## Musk shrew                -0.14774646 -1.569677  1.2741838
## Star-nosed mole            0.01998741 -1.399773  1.4397476
US3<-data.frame(US1, US2)
head(US3)
##                                   fit        lwr        upr       fit.1
## Lesser short-tailed shrew -1.84788197 -2.2648403 -1.4309237 -1.84788197
## Little brown bat          -1.32685299 -1.7084166 -0.9452893 -1.32685299
## Big brown bat             -0.70076691 -1.0409859 -0.3605480 -0.70076691
## Mouse                     -0.70076691 -1.0409859 -0.3605480 -0.70076691
## Musk shrew                -0.14774646 -0.4529196  0.1574267 -0.14774646
## Star-nosed mole            0.01998741 -0.2749092  0.3148841  0.01998741
##                               lwr.1      upr.1
## Lesser short-tailed shrew -3.297920 -0.3978443
## Little brown bat          -2.767112  0.1134060
## Big brown bat             -2.130628  0.7290946
## Mouse                     -2.130628  0.7290946
## Musk shrew                -1.569677  1.2741838
## Star-nosed mole           -1.399773  1.4397476

Plot of Log(Brain Weight) vs. Log(Body Weight) with Confidence and Prediction Bands

plot(mammals1$logbody, mammals1$logbrain, col="RED", pch=15, cex=0.5,
     xlab="Log(Body Weight)", ylab="Log(Brain Weight)", 
     main="Plot of Log(Brain Wt.) as a Function of Log(Body Wt.)")
abline(US, col="blue", lwd=2)
matlines(mammals1$logbody, US3$upr, lty=c(1, 2, 2), col=c("black", "blue", "blue"))
matlines(mammals1$logbody, US3$lwr, lty=c(1, 2, 2), col=c("black", "blue", "blue"))
matlines(mammals1$logbody, US3$lwr.1, lty=c(1, 2, 2), col=c("black", "red", "red"))
matlines(mammals1$logbody, US3$upr.1, lty=c(1, 2, 2), col=c("black", "red", "red"))