Mixophyes is a genus of frogs. In this activity we will explore the
mixo-simplified.csv dataset, which is available from the
course website in Blackboard. Download the CSV file to the same this Rmd
directory, then we will set the working directory to the current source
file location by setwd command
We can set it manually from menu Session -> Set working directory -> To source file location
We also remove the environment and set the default theme for ggplot2
library('ggplot2')
rm( list=ls ())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
theme_set(theme_bw())
scale_colour_brewer_stat6020 <- function(...) {
scale_colour_brewer(palette = "Dark2")
}
scale_fill_brewer_stat6020 <- function(...) {
scale_fill_brewer(palette = "Dark2")
}
options(
ggplot2.discrete.colour = scale_colour_brewer_stat6020,
ggplot2.discrete.fill = scale_fill_brewer_stat6020
)
1.1. Read the data in:
mixo <- read.csv('mixo-simplified.csv')
1.2. Show a summary:
summary(mixo)
## Gender Recap Mass SVL
## Length:1312 Length:1312 Min. : 4.00 Min. : 28.60
## Class :character Class :character 1st Qu.: 28.00 1st Qu.: 63.60
## Mode :character Mode :character Median : 59.00 Median : 79.70
## Mean : 62.97 Mean : 76.38
## 3rd Qu.: 71.00 3rd Qu.: 84.80
## Max. :220.00 Max. :115.00
## NA's :206 NA's :206
## Righ.Tibia Head.Width Head.Length Survey.no
## Min. :17.90 Min. :11.00 Min. :10.50 Min. : 1.0
## 1st Qu.:42.65 1st Qu.:25.75 1st Qu.:22.00 1st Qu.: 7.0
## Median :53.60 Median :32.90 Median :29.50 Median :22.0
## Mean :51.16 Mean :31.74 Mean :28.13 Mean :17.9
## 3rd Qu.:56.15 3rd Qu.:35.00 3rd Qu.:32.40 3rd Qu.:27.0
## Max. :74.80 Max. :50.60 Max. :46.30 Max. :32.0
## NA's :205 NA's :205 NA's :691 NA's :1
Notice that there are many observations (rows) for which part of the
variables (columns) are missing; these are indicated as NA
values in R. These values may produce warnings in some of the
subsequent items of this activity, because some functions in R
automatically ignore these values or simply discard the entire
corresponding rows when performing the analyses.
SVL (Snout-Vent Length) is a measure of the length of the frog.
1.3. Produce a scatterplot of Mass as predicted by SVL, and include a linear model to the plot (see example with the “Strength” dataset):
ggplot(mixo,aes(x=SVL, y=Mass))+
geom_point(size=1)+
geom_smooth(method='lm')+
ggtitle('Mass by SVL')
## `geom_smooth()` using formula 'y ~ x'
There is clearly a non-linear relationship between these variables, so the linear model does not fit the data properly. If we change the method from ‘lm’ to ‘loess’, we can see how non-linear of the relationship is
ggplot(mixo,aes(x=SVL, y=Mass))+
geom_point(size=1)+
geom_smooth(method='loess')+
ggtitle('Mass by SVL')
## `geom_smooth()` using formula 'y ~ x'
Thinking about frogs, perhaps the mass of the frog would be related to its volume, which would be related to the cube of its length:
where \(\beta_1\) is some positive constant. Recalling that:
\(log(a^b) = b \cdot log(a)\), and
\(log(a \cdot b) = log(a) + log(b)\),
if we apply the log on both sides we have:
Since \(log(\beta_1)\) is just a constant, a plot with both Mass and SVL log transformed should produce an approximately linear relationship.
1.4. Try and see yourself:
ggplot(mixo,aes(x=SVL, y=Mass))+
geom_point(size=2)+
geom_smooth(method='lm' )+
scale_x_log10()+
scale_y_log10()+
ggtitle('Mass by SVL - log scale')
## `geom_smooth()` using formula 'y ~ x'
Yes - very strong linear relationship!
1.5. Modify the plot in item 1.4 so that the data points are colour
coded by Gender, but the linear model is kept the same (global, for the
whole data set). You can achieve this by applying the col
aesthetic as Gender, locally to the geom_point() function
only:
ggplot(mixo,aes(x=SVL, y=Mass))+
geom_point(size=1, aes(color=Gender, shape=Gender))+
geom_smooth(method='lm' )+
scale_x_log10()+
scale_y_log10()+
ggtitle('Mass by SVL - log scale')
## `geom_smooth()` using formula 'y ~ x'
1.6. Now try to produce and plot separate models for each Gender by
applying the col aesthetic as Gender again, but now
globally in the ggplot() function:
ggplot(mixo,aes(x=SVL, y=Mass, color=Gender))+
geom_point()+
geom_smooth(method='lm' )+
scale_x_log10()+
scale_y_log10()+
ggtitle('Mass by SVL - log scale')
## `geom_smooth()` using formula 'y ~ x'
1.7 Produce a scatterplot of the length of the right tibia by SVL, grouped by Gender, and fit a separate linear model to each group:
ggplot(mixo,aes(x=SVL, y=Righ.Tibia,color=Gender))+
geom_point()+
geom_smooth(method='lm')+
ggtitle('Right Tibia by SVL ')
## `geom_smooth()` using formula 'y ~ x'