I did a brief exploration of the Abalone dataset as a way of learning about the ggplot2 package.
First, let's read the dataset into a data frame and add column names:
abalone <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data",
header = FALSE)
names(abalone) <- c("sex", "length", "diameter", "height", "weight.whole", "weight.shucked",
"weight.viscera", "weight.shell", "rings")
Let's take a quick look at the data:
summary(abalone)
## sex length diameter height weight.whole
## F:1307 Min. :0.075 Min. :0.055 Min. :0.000 Min. :0.002
## I:1342 1st Qu.:0.450 1st Qu.:0.350 1st Qu.:0.115 1st Qu.:0.442
## M:1528 Median :0.545 Median :0.425 Median :0.140 Median :0.799
## Mean :0.524 Mean :0.408 Mean :0.140 Mean :0.829
## 3rd Qu.:0.615 3rd Qu.:0.480 3rd Qu.:0.165 3rd Qu.:1.153
## Max. :0.815 Max. :0.650 Max. :1.130 Max. :2.825
## weight.shucked weight.viscera weight.shell rings
## Min. :0.001 Min. :0.0005 Min. :0.0015 Min. : 1.00
## 1st Qu.:0.186 1st Qu.:0.0935 1st Qu.:0.1300 1st Qu.: 8.00
## Median :0.336 Median :0.1710 Median :0.2340 Median : 9.00
## Mean :0.359 Mean :0.1806 Mean :0.2388 Mean : 9.93
## 3rd Qu.:0.502 3rd Qu.:0.2530 3rd Qu.:0.3290 3rd Qu.:11.00
## Max. :1.488 Max. :0.7600 Max. :1.0050 Max. :29.00
Note that for sex, I=infant.
Let's briefly explore rings using a density plot, and see how sex affects rings:
library(ggplot2)
ggplot(abalone) + aes(rings, color = sex) + geom_density()
That's interesting - the density plot for female and male is almost identical. (And just by chance, red was mapped to Female, blue was mapped to Male, and green was mapped to Infant… how funny!)
Okay, let's try stacking those instead:
ggplot(abalone) + aes(rings, fill = sex) + geom_density(position = "stack")
I think the first plot was more informative.
Let's move on, and see the relationship between length and rings, and how that is affected by sex. This plot uses a lot of the ggplot2 options in order to look more polished:
ggplot(abalone) + aes(length, rings, color = sex) + geom_point() + labs(x = "Shell Length",
y = "Number of Rings", title = "Number of Rings vs Length", color = "Sex of Abalone") +
theme(legend.position = c(0, 1), legend.justification = c(0, 1), legend.background = element_rect(fill = "white",
color = "black")) + scale_color_hue(labels = c("Female", "Infant", "Male"))
You could also accomplish something similar by faceting:
ggplot(abalone) + aes(length, rings, color = sex) + geom_point() + labs(x = "Shell Length",
y = "Number of Rings", title = "Number of Rings vs Length") + facet_grid(. ~
sex, labeller = label_both) + stat_smooth(method = "lm", se = FALSE) + theme(legend.position = "none")