library(MASS) #load the package
Problem 1 (chickwts data)
boxplot(weight ~ feed, chickwts) #see page 49-50 or 202 in Albert
interpretation: It seems that casein results in the greatest weight gain and horsebean the least weight gain. Would need to do analysis of variance to decide if differences are statistically significant.
Note:You can see sunflower has some outliers (below or above first/third quartile by 1.5 times more than IQR).
Problem 2 (iris data)
attach(iris)
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
aggregate(iris[, 1:3], by = list(Species), mean) #had trouble using by function for iris[,1:3] as described on p 55
## Group.1 Sepal.Length Sepal.Width Petal.Length
## 1 setosa 5.006 3.428 1.462
## 2 versicolor 5.936 2.770 4.260
## 3 virginica 6.588 2.974 5.552
Problem 3 (mtcars data)
boxplot(log(mtcars), names = c("log(mpg)", "log(cyl)", "log(disp)", "log(hp)",
"log(drat)", "log(wt)", "log(qsec)", "log(vs)", "log(am)", "log(gear)",
"log(carb)"))
pairs(mtcars[, c(1, 3, 4, 5, 6, 7)]) #we don't plot discrete quantitative variables
negative correlation between mpg and disp and mpg and hp and mpg and wt
positive correlation between mpg and drat and weak pos correlation with qsec
similar for other variables.
Problem 4 (mammals data)
attach(mammals)
names(mammals)
## [1] "body" "brain"
mammals$r = ifelse(body > 0, round(brain/body, 2), "na")
o = order(mammals$r)
sorted.mammals = mammals[o, ]
head(sorted.mammals)
## body brain r
## African elephant 6654.0 5712.0 0.86
## Cow 465.0 423.0 0.91
## Pig 192.0 180.0 0.94
## Brazilian tapir 160.0 169.0 1.06
## Water opossum 3.5 3.9 1.11
## Horse 521.0 655.0 1.26
tail(sorted.mammals)
## body brain r
## Galago 0.200 5.00 25.00
## Little brown bat 0.010 0.25 25.00
## Rhesus monkey 6.800 179.00 26.32
## Lesser short-tailed shrew 0.005 0.14 28.00
## Owl monkey 0.480 15.50 32.29
## Ground squirrel 0.101 4.00 39.60
The African elephant has the smallest ratio of brain to body mass and the ground squirrel has the largest.
Problem 5 (mammals data, cont)
log-log scatterplot for body mass versus ration of (body mass)/(brain mass)
plot(log(body), log(mammals$r), ylab = "log(r)")
Problem 6 (Lake Huron Data)
first we plot annual mean LakeHuron data with a Lowess smoothing function
plot(LakeHuron, ylab = "mean annual temperature")
abline(h = mean(LakeHuron), col = "red")
lines(lowess(LakeHuron), col = "green")
now we try first differencing to eliminate the trend
d = diff(LakeHuron)
plot(d, ylab = "First differences of mean annual temp")
abline(h = 0, lty = 3)
lines(lowess(d))
first differencing does a good job because the lowess plot was almost linear