Abalone are a very common type of shellfish. They are an excellent source of iron and pantothenic acid, is a nutritious food resource and farming in Australia, America and East Asia. 100 grams of abalone yields more than 20% recommended daily intake of these nutrients. The economic value of abalone is positively correlated with its age. Therefore, to detect the age of abalone accurately is important for both farmers and customers to determine its price. However, the current technology to decide the age is quite costly and inefficient. Farmers usually cut the shells and count the rings through microscopes to estimate the abalone’s age. This complex method increases the cost and limits its popularity.
The data set Abalone is obtained from UCI Machine Learning Repository (1995). The data set contains physical measurements of 4177 abalones recorded in December 1995 by Marine Research Laboratories Taroona, Department of Primary Industry and Fisheries, Tasmania, Australia. There are nine variables, namely, Sex, Length, Diameter, and Height, Whole weight, Shucked weight, Viscera weight, Shell weight and Rings.The variable Rings is linearly related to the age of an abalone, as age equals to number of rings plus 1.5.
library(Zelig)
library(ZeligChoice)
library(dplyr)
library(tidyr)
library(ggplot2)
# read the dataset into a data frame
abalone <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data", header=FALSE)
# add column names
names(abalone) <- c("sex", "length", "diameter", "height", "weight.whole",
"weight.shucked", "weight.viscera", "weight.shell", "rings")
# confirm that there are 4177 observations
nrow(abalone)
## [1] 4177
# confirm that there are no missing values
sum(is.na(abalone))
## [1] 0
My dependent variable in the analysis is going to be “Rings”. It is measured as the number of rings observed after cutting and examining an abalone. Although, rings does not denote the age of a given abalone directly, it determines it more-or-less perfectly; the age of an abalone equals Rings + 1.5.
There are seven continuous predictors included in the dataset. The first three constitute shell measurements:
The other four explanatory variables are related to the weight of abalone:
# looking at the dataset
head(abalone)
str(abalone)
## 'data.frame': 4177 obs. of 9 variables:
## $ sex : Factor w/ 3 levels "F","I","M": 3 3 1 3 2 2 1 1 3 1 ...
## $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
## $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
## $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
## $ weight.whole : num 0.514 0.226 0.677 0.516 0.205 ...
## $ weight.shucked: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
## $ weight.viscera: num 0.101 0.0485 0.1415 0.114 0.0395 ...
## $ weight.shell : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
## $ rings : int 15 7 9 10 7 8 20 16 9 19 ...
Abalones can live up to 50 years, depending on a species. Using the “Count Poisson Model” because the main response variable is an integer. I will estimate the age of an abalone given its physical measurements.
A1<- zelig(rings ~ length+ diameter + height+ weight.viscera, model = "poisson", data = abalone, cite = F)
summary(A1)
## Model:
##
## Call:
## z5$zelig(formula = rings ~ length + diameter + height + weight.viscera,
## data = abalone)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.0273 -0.5395 -0.2162 0.3015 3.7872
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.35434 0.03834 35.326 < 2e-16
## length -0.64867 0.25329 -2.561 0.0104
## diameter 2.83027 0.30110 9.400 < 2e-16
## height 1.46653 0.13861 10.580 < 2e-16
## weight.viscera -0.54154 0.10474 -5.170 2.34e-07
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4139.3 on 4176 degrees of freedom
## Residual deviance: 2527.3 on 4172 degrees of freedom
## AIC: 19672
##
## Number of Fisher Scoring iterations: 6
##
## Next step: Use 'setx' method
From the above model, a unit increase in diameter of 2.8, rings increase by 1.35. A unit increase in height will increase rings by 1.35.
A2<- zelig(rings ~ length+ diameter+ height*sex + weight.viscera, model = "poisson", data = abalone, cite = F)
summary(A2)
## Model:
##
## Call:
## z5$zelig(formula = rings ~ length + diameter + height * sex +
## weight.viscera, data = abalone)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.8059 -0.5135 -0.1631 0.2815 3.7913
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.89055 0.05631 33.575 < 2e-16
## length -0.89046 0.25664 -3.470 0.000521
## diameter 2.14766 0.30550 7.030 2.06e-12
## height 0.86099 0.18002 4.783 1.73e-06
## sexI -0.71028 0.05570 -12.751 < 2e-16
## sexM -0.23297 0.04915 -4.740 2.14e-06
## weight.viscera -0.35106 0.11252 -3.120 0.001809
## height:sexI 4.41398 0.40666 10.854 < 2e-16
## height:sexM 1.36387 0.30264 4.507 6.59e-06
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4139.3 on 4176 degrees of freedom
## Residual deviance: 2325.5 on 4168 degrees of freedom
## AIC: 19478
##
## Number of Fisher Scoring iterations: 5
##
## Next step: Use 'setx' method
When interaction is introduced in this model between height and sex, a unit increase in height of 0.86 and diameter 2.14, rings increase by 1.89. An increase in the unit of height of an infant of 4.4, increases rings by 1.89, and for males a unit increase in height of 1.36 increase rings by 1.89.
x <- setx(A2, weight.viscera = mean(abalone$weight.viscera))
x1 <- setx(A2, weight.viscera = mean(abalone$weight.viscera) + sd(abalone$weight.viscera))
s <- sim(A2, x = x, x1 = x1)
summary(s)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 10.11488 0.09049927 10.11368 9.937112 10.29371
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 10.059 3.09889 10 4.975 16
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 9.737407 0.1537298 9.732129 9.427616 10.04421
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 9.76 3.035723 10 4 16
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.3774751 0.1206811 -0.376636 -0.6070176 -0.1426891
An abalone with one ring has an average viscera weight of less than 0.39.
xl <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .25))
xl1 <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .25) + sd(abalone$weight.viscera))
sl <- sim(A2, x = xl, x1 = xl1)
fdl <- sl$get_qi(xvalue="x1", qi="fd")
xm <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .50))
xm1 <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .50) + sd(abalone$weight.viscera))
sm <- sim(A2, x = xm, x1 = xm1)
fdm <- sm$get_qi(xvalue="x1", qi="fd")
xh <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .75))
xh1 <- setx(A2, weight.viscera = quantile(abalone$weight.viscera, .75) + sd(abalone$weight.viscera))
sh <- sim(A2, x = xh, x1 = xh1)
fdh <- sh$get_qi(xvalue="x1", qi="fd")
d <- as.data.frame(cbind(fdl, fdm, fdh))
head(d)
The above table shows the average viscera weight per rings of an abalone in each quartile with. In the 25th quartile and 1 standard deviation plus, the average difference of the viscera weight of an abalone with one ring is less than 0.39, than the average mean weight in 50th quartile of less than 0.56. We can notice though, that as the rings increase, the average difference of viscera weight decreases.
tidd <- d %>%
gather(quantile, dif, 1:3)
head(tidd)
tidd %>%
group_by(quantile) %>%
summarise(mean = mean(dif), sd = sd(dif))
library(ggplot2)
ggplot(tidd, aes(dif)) + geom_histogram() + facet_grid(~ quantile)
On average, there is a higher difference in the viscera weight in the 75th quartile as compared to 25th and 50th quartiles. Likewise, we can see that the quartiles plus 1 standard deviation is more spread out in the 50th quartile.
From the above model and simulations, there is a correlations between the physical measurements and the rings of the abalone. As it is time consuming for farmers to look at the rings through a microscope and know the age, this model can be serve as a predictor evaluating mean viscera weight that corresponds to the number of rings. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.
Reference
Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn and Wes B Ford (1994) “The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait”, Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288)