Formative 1 - Does Sex Explain Wingspan in Mosquitoes?

Author

Curtis (N1125263) & Sophie (N0842818)

Importing the Data

We converted the data from a text file to an excel sheet and imported it into Quatro using the readxl r package.

library(readxl)
mosquito <- read_excel("mosquito.xlsx")
View(mosquito)

Packages needed

## installing the appropriate packages
library(tidyverse)
library(ggplot2)
library(dplyr)
library(hrbrthemes)
library(RColorBrewer)

Looking at the data

mosquito %>% ## using mosquito data
  group_by(sex) %>% ## grouping by sex
  summarise(wingmean = mean(wing), ## calculating mean wingspan for males and females
          wingmed = median(wing), ## calculating median wingspan for males and females
          wingmin = min(wing), ## calculating minimum wingspan for males and females
          wingmax = max(wing)) %>% ## calculating maximum wingspan for males and females
  ungroup() ## final ungrouping of the data - makes sure subsequent operations wont get affected
# A tibble: 2 × 5
  sex    wingmean wingmed wingmin wingmax
  <chr>     <dbl>   <dbl>   <dbl>   <dbl>
1 Female     47.2    46.4    25.2    69.8
2 Male       50.4    52.0    27.4    66.1

Very similar values between the sexes.

Making a Box plot

We created a Box Plot that shows the the wingspan of the male mosquitoes vs the female mosquitoes.

ggplot(mosquito, aes(x = sex, 
             y = wing, 
             fill = sex)) + ## making a plot using the mosquito data with x axis being sex and the y axis being wingspan
  scale_fill_brewer (palette = "Accent") + ## customising the colour (above made it so its filled by sex)
  geom_boxplot(alpha = 0.8) + ## making a box plot with the alpha 0.8
labs(x = "Sex", y = "Wingspan in mm", title = "The Wingspan of Mosquitoes Between Sexes")  + ## changing the name of the axis and titles
      theme_minimal() + theme(legend.position = "none", plot.background = element_rect(color = "#6080aa", fill = NA, size = 1)) ## changing the theme and removing the legend

From the Box Plot we can see that there is a slight difference of the male and female wingspans with the females having a greater min and max value and the males have a greater median value. Although it looks there is no significant difference a statistical test would be able to determine whether there is with confidence.

Making a Violin Plot

We created a Violin Plot that shows the the wingspan of the male mosquitoes vs the female mosquitoes. There are also Box Plots within the the Violins and the sample size is at the bottom of the x-axis.

sample_size = mosquito %>% group_by(sex) %>% summarize(num=n()) ## set the sample size

mosquito %>% ## using the mosquito data
  left_join(sample_size) %>% ## adding the sample size to the bottom of the plot
  mutate(myaxis = paste0(sex, "\n", "n=", num)) %>% ## creates new coloumn called myaxis that combines sex values with a newline and the string "n=" folowed by the corrosponding value from the num column
  ggplot( aes(x=myaxis, y=wing, fill=sex)) + ## making a plot with x being myaxis that we created and y being wingspan
    geom_violin(width=1, alpha = 0.8) + ## making violin plot
    geom_boxplot(width=0.1, color="black", alpha=0.2) + ## making box plot
    scale_fill_brewer (palette = "Accent") + ## making the colours match the previous box plot
     labs(x = "Sex", y = "Wingspan in mm", title = "The Wingspan of Mosquitoes Between Sexes") + ## labelling the plot
      theme_minimal() + theme(legend.position = "none", plot.background = element_rect(color = "#6080aa", fill = NA, size = 1)) ## changing the theme and removing the legend

The densities was higher for shorter wingspans in females whereas the densities was higher at larger wingspans for males. The males have one peak (unimodal) whereas you could argue that the females have two peaks (bimodal). Although this data is fictional if this was real and the females did have two peaks it could suggest that two different species were measured by accident.

Does Sex Explain Wingspan in Mosquitoes?

Based on these graphs, the wingspan between males and females is negligible as although the densities were larger in different places the interquartile range and medians was similar as they overlapped in the box plots.