To have some ideas: appsilon.com
You can find a copy of this notebook online at RPubs.com
We have two distinct .csv files, and having a single
dataframe should be better.
Explore the two spreadsheets, find out how to “join” them
data.1975 <- read.csv(file.path('data', 'finch_beaks_1975.csv'))
data.2012 <- read.csv(file.path('data', 'finch_beaks_2012.csv'))
Have a look at the two dataframe structure and identify similarities:
str(data.1975)
## 'data.frame': 403 obs. of 4 variables:
## $ band : int 2 9 12 15 305 307 308 309 311 312 ...
## $ species : chr "fortis" "fortis" "fortis" "fortis" ...
## $ Beak.length..mm: num 9.4 9.2 9.5 9.5 11.5 11.1 9.9 11.5 10.8 11.3 ...
## $ Beak.depth..mm : num 8 8.3 7.5 8 9.9 8.6 8.4 9.8 9.2 9 ...
data.1975 |> head()
## band species Beak.length..mm Beak.depth..mm
## 1 2 fortis 9.4 8.0
## 2 9 fortis 9.2 8.3
## 3 12 fortis 9.5 7.5
## 4 15 fortis 9.5 8.0
## 5 305 fortis 11.5 9.9
## 6 307 fortis 11.1 8.6
str(data.2012)
## 'data.frame': 248 obs. of 4 variables:
## $ band : int 19022 19028 19032 19041 19044 19048 19072 19082 19104 19114 ...
## $ species: chr "fortis" "fortis" "fortis" "fortis" ...
## $ blength: num 10 12.5 9.3 10.3 11 10.1 9.6 10.9 10.3 9.8 ...
## $ bdepth : num 8.5 8.9 7.5 9.6 9.2 8.2 7.8 8.6 8.4 7.7 ...
data.2012 |> head()
## band species blength bdepth
## 1 19022 fortis 10.0 8.5
## 2 19028 fortis 12.5 8.9
## 3 19032 fortis 9.3 7.5
## 4 19041 fortis 10.3 9.6
## 5 19044 fortis 11.0 9.2
## 6 19048 fortis 10.1 8.2
Looks like we just have columns with different names, data are the same, so we can append the two dataframes. Remember adding a new column to preserve the information related to year.
names.ok <- c('band', 'species', 'Beak.length', 'Beak.depth')
names(data.1975) <- names.ok
names(data.2012) <- names.ok
data.1975$year <- 1975
data.2012$year <- 2012
data <- rbind(data.1975, data.2012)
rm(data.1975, data.2012)
Identify a Probability Distribution Function (are data “normal”?), check for outlier data…
ggplot(data, aes(x=Beak.length, fill=species)) + geom_histogram() + facet_wrap(.~year)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data, aes(x=Beak.length, fill=species)) + geom_density(alpha=0.5) + facet_wrap(.~year)
ggplot(data, aes(x=Beak.length, fill=as.factor(year))) + geom_density(alpha=0.5) + facet_wrap(~species, nrow=2)
Data look reasonably “gaussian”… what about
Beak.depth?
ggplot(data, aes(x=Beak.depth, fill=species)) + geom_histogram() + facet_wrap(.~year)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data, aes(x=Beak.depth, fill=species)) + geom_density(alpha=0.5) + facet_wrap(.~year)
ggplot(data, aes(x=Beak.depth, fill=as.factor(year))) + geom_density(alpha=0.5) + facet_wrap(~species, nrow=2)
How many species we have?
We have 2 species.
Possibly the labels fortis and scandens
refer to Geospiza fortis and G. scandens.
Geospiza fortis
Geospiza
scandens
Beak dimensions are correlated?
There is variation in beak dimensions between years and species?
# classic box-and-whisker
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_boxplot() + facet_wrap(facets=data$year, ncol=2, nrow=1)
# bee swarm plot
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_beeswarm() + facet_wrap(facets=data$year, ncol=2, nrow=1)
# bee swarm plot with Tukey texturing (to not overlap points)
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_quasirandom() + facet_wrap(facets=data$year, ncol=2, nrow=1)
What about Beak.depth?
# classic box-and-whisker
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_boxplot() + facet_wrap(facets=data$year, ncol=2, nrow=1)
# bee swarm plot
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_beeswarm() + facet_wrap(facets=data$year, ncol=2, nrow=1)
# bee swarm plot with Tukey texturing (to not overlap points)
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_quasirandom() + facet_wrap(facets=data$year, ncol=2, nrow=1)
Some hints: - means can be compared easily: we have two groups,
either by year and by species. - does it make
any sense keeping species together, or is it better
analyzing each species separately? - lm and
confint are useful commands…