Formative1

Author

Sumaya Nur

Formative 1

In our group we picked the mosquito data set which contains sex which is a categorical variable and wing span which is numerical. From this we concluded that sex and wings correlation is a question that could be asked of this data set. We decided the best way to showcase this would be in a boxplot or histogram ive chosen to do a boxplot.

Reading in the data

Below is the code to read in my data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.table(file = "mosquitos.txt", header = TRUE)
head(data)
  ID     wing sex
1  1 37.83925   f
2  2 50.63106   f
3  3 39.25539   f
4  4 38.05383   f
5  5 25.15835   f
6  6 57.95632   f

Visualising the Data

Below is a box plot showcasing the data and the range in wing length by sex. This was done using tidyverse and ggplot

library(tidyverse)
library(ggplot2)

mosquitos <- data
ggplot(mosquitos, aes(x = sex, y = wing, fill = sex)) +
  geom_boxplot() + 
  labs(title = "Wing length distribution in mosquitos by sex")+
  stat_summary(fun = "mean", geom = "point", shape = 8,
               size = 2, color = "white")