R Markdown!

To have some ideas: appsilon.com

You can find a copy of this notebook online at RPubs.com

Get data

We have two distinct .csv files, and having a single dataframe should be better.

Explore the two spreadsheets, find out how to “join” them

data.1975 <- read.csv(file.path('data', 'finch_beaks_1975.csv'))
data.2012 <- read.csv(file.path('data', 'finch_beaks_2012.csv'))

Have a look at the two dataframe structure and identify similarities:

str(data.1975)
## 'data.frame':    403 obs. of  4 variables:
##  $ band           : int  2 9 12 15 305 307 308 309 311 312 ...
##  $ species        : chr  "fortis" "fortis" "fortis" "fortis" ...
##  $ Beak.length..mm: num  9.4 9.2 9.5 9.5 11.5 11.1 9.9 11.5 10.8 11.3 ...
##  $ Beak.depth..mm : num  8 8.3 7.5 8 9.9 8.6 8.4 9.8 9.2 9 ...
data.1975 |> head()
##   band species Beak.length..mm Beak.depth..mm
## 1    2  fortis             9.4            8.0
## 2    9  fortis             9.2            8.3
## 3   12  fortis             9.5            7.5
## 4   15  fortis             9.5            8.0
## 5  305  fortis            11.5            9.9
## 6  307  fortis            11.1            8.6
str(data.2012)
## 'data.frame':    248 obs. of  4 variables:
##  $ band   : int  19022 19028 19032 19041 19044 19048 19072 19082 19104 19114 ...
##  $ species: chr  "fortis" "fortis" "fortis" "fortis" ...
##  $ blength: num  10 12.5 9.3 10.3 11 10.1 9.6 10.9 10.3 9.8 ...
##  $ bdepth : num  8.5 8.9 7.5 9.6 9.2 8.2 7.8 8.6 8.4 7.7 ...
data.2012 |> head()
##    band species blength bdepth
## 1 19022  fortis    10.0    8.5
## 2 19028  fortis    12.5    8.9
## 3 19032  fortis     9.3    7.5
## 4 19041  fortis    10.3    9.6
## 5 19044  fortis    11.0    9.2
## 6 19048  fortis    10.1    8.2

Looks like we just have columns with different names, data are the same, so we can append the two dataframes. Remember adding a new column to preserve the information related to year.

names.ok <- c('band', 'species', 'Beak.length', 'Beak.depth')
names(data.1975) <- names.ok
names(data.2012) <- names.ok
data.1975$year <- 1975
data.2012$year <- 2012
data <- rbind(data.1975, data.2012)
rm(data.1975, data.2012)

Condition data

Identify a Probability Distribution Function (are data “normal”?), check for outlier data…

ggplot(data, aes(x=Beak.length, fill=species)) + geom_histogram() + facet_wrap(.~year)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data, aes(x=Beak.length, fill=species)) + geom_density(alpha=0.5) + facet_wrap(.~year)

ggplot(data, aes(x=Beak.length, fill=as.factor(year))) + geom_density(alpha=0.5) + facet_wrap(~species, nrow=2)

Data look reasonably “gaussian”… what about Beak.depth?

ggplot(data, aes(x=Beak.depth, fill=species)) + geom_histogram() + facet_wrap(.~year)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data, aes(x=Beak.depth, fill=species)) + geom_density(alpha=0.5) + facet_wrap(.~year)

ggplot(data, aes(x=Beak.depth, fill=as.factor(year))) + geom_density(alpha=0.5) + facet_wrap(~species, nrow=2)

Explore data

How many species we have?

We have 2 species.

Possibly the labels fortis and scandens refer to Geospiza fortis and G. scandens.

Geospiza fortis Geospiza fortis

Geospiza scandens Geospiza scandens

Beak dimensions are correlated?

There is variation in beak dimensions between years and species?

# classic box-and-whisker
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_boxplot() + facet_wrap(facets=data$year, ncol=2, nrow=1)

# bee swarm plot
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_beeswarm() + facet_wrap(facets=data$year, ncol=2, nrow=1)

# bee swarm plot with Tukey texturing (to not overlap points)
ggplot(data, aes(x=species, y=Beak.length, colour=species)) + geom_quasirandom() + facet_wrap(facets=data$year, ncol=2, nrow=1)

What about Beak.depth?

# classic box-and-whisker
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_boxplot() + facet_wrap(facets=data$year, ncol=2, nrow=1)

# bee swarm plot
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_beeswarm() + facet_wrap(facets=data$year, ncol=2, nrow=1)

# bee swarm plot with Tukey texturing (to not overlap points)
ggplot(data, aes(x=species, y=Beak.depth, colour=species)) + geom_quasirandom() + facet_wrap(facets=data$year, ncol=2, nrow=1)

Define questions

Find answers

Some hints: - means can be compared easily: we have two groups, either by year and by species. - does it make any sense keeping species together, or is it better analyzing each species separately? - lm and confint are useful commands…