Formative Assessment - Data Exploration

Author

Tamara Walker

What can wing length indicate about the sexual dimorphism of mosquitos?

Sexual dimorphism varies greatly among species, however in the insect family there are many instances where females are larger than males. In a study of mosquitoes this may suggest that average wing length of female mosquitoes is likely to be longer than that of their male equivalent.

Preliminary steps

  1. Open and save a new Quarto workbook.

  2. Set the working directory to the same folder as the Quarto workbook.

  3. Create a code chunk and load any potential packages that will be required to produce the required data.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(ggthemes)
library(knitr)
library(rlang)
library(modeldata)
library(readr)
library(magrittr)

Preparing the data

  1. Download the mosquitos.txt file and save to the same folder as the Quarto workbook.
mosquitos <- read_table("mosquitos.txt") # Imports the data from the mosquitos.txt file into the Quarto project.
  1. Produce a table to view the data.
data.frame(mosquitos) # Produces a basic table displaying all data within the imported file.
     ID     wing sex
1     1 37.83925   f
2     2 50.63106   f
3     3 39.25539   f
4     4 38.05383   f
5     5 25.15835   f
6     6 57.95632   f
7     7 46.58526   f
8     8 33.16252   f
9     9 55.30522   f
10   10 46.47553   f
11   11 36.62714   f
12   12 39.21257   f
13   13 53.72027   f
14   14 34.33493   f
15   15 45.71824   f
16   16 38.15299   f
17   17 45.41749   f
18   18 61.47701   f
19   19 37.85110   f
20   20 39.55613   f
21   21 38.85710   f
22   22 57.73284   f
23   23 40.85127   f
24   24 39.10695   f
25   25 31.49612   f
26   26 53.24897   f
27   27 46.28474   f
28   28 46.87615   f
29   29 35.30032   f
30   30 58.32131   f
31   31 68.51993   f
32   32 41.96362   f
33   33 48.09749   f
34   34 46.32078   f
35   35 59.65020   f
36   36 41.61149   f
37   37 43.71475   f
38   38 66.20846   f
39   39 58.56829   f
40   40 53.25179   f
41   41 58.73036   f
42   42 50.30130   f
43   43 44.00815   f
44   44 69.81825   f
45   45 40.37582   f
46   46 51.36792   f
47   47 50.03187   f
48   48 60.32544   f
49   49 47.29252   f
50   50 48.14393   f
51   51 52.88405   m
52   52 36.61598   m
53   53 52.97575   m
54   54 57.29169   m
55   55 43.53158   m
56   56 66.13355   m
57   57 54.02540   m
58   58 44.79380   m
59   59 53.13227   m
60   60 59.76525   m
61   61 54.26859   m
62   62 65.90059   m
63   63 47.65107   m
64   64 52.27643   m
65   65 55.55607   m
66   66 48.69044   m
67   67 47.87074   m
68   68 44.54322   m
69   69 42.89192   m
70   70 44.23855   m
71   71 40.36747   m
72   72 56.32602   m
73   73 27.42760   m
74   74 49.28878   m
75   75 56.21204   m
76   76 62.30341   m
77   77 57.09814   m
78   78 43.18794   m
79   79 52.29912   m
80   80 60.76773   m
81   81 40.01366   m
82   82 50.30523   m
83   83 62.00337   m
84   84 36.34514   m
85   85 64.56280   m
86   86 61.91520   m
87   87 47.68147   m
88   88 56.54568   m
89   89 54.80366   m
90   90 33.98422   m
91   91 58.64392   m
92   92 51.58475   m
93   93 29.50289   m
94   94 36.46946   m
95   95 50.31745   m
96   96 51.68598   m
97   97 58.89978   m
98   98 42.22055   m
99   99 47.22805   m
100 100 53.92501   m

Analysing the data

The table displays 100 rows of data. To begin analysis it will be helpful to summarise some of this information.

  1. To check the collected data is proportionally unbiased it would be useful to compare the number of total females surveyed to the total of males.
mosquitos %>% # Calls on the data set mosquitos
  group_by(sex) %>% # Groups the information by sex
  summarize(n = n()) # Summarises the data by sex and presents the information in a table.
# A tibble: 2 × 2
  sex       n
  <chr> <int>
1 f        50
2 m        50

This summary indicates that the quantities of collected survey information is equal between males and females, therefore proportionately balanced.

  1. Summarising the data by a variety of measures and for each sex would be a good indication of the differences.
mosquitos %>% # Calls on the data set mosquitos.
  group_by(sex) %>% # Groups by sex.
  summarize(mean_wing = mean(wing, na.rm = TRUE)) # Calculates the mean wing length for each sex and displays in a table.
# A tibble: 2 × 2
  sex   mean_wing
  <chr>     <dbl>
1 f          47.2
2 m          50.4
mosquitos %>% # Calls on the group mosquitos.
  group_by(sex) %>% # Groups by sex.
  summarize(median_wing = median(wing, na.rm = TRUE)) # Calculates the median wing length for each sex and displays in a table.
# A tibble: 2 × 2
  sex   median_wing
  <chr>       <dbl>
1 f            46.4
2 m            52.0
mosquitos %>% # Calls on the group mosquitos.
  group_by(sex) %>% # Groups by sex.
  summarize(sd_wing = sd(wing, na.rm = TRUE)) # Calculates the standard deviation of wing length for each sex and displays in a table.
# A tibble: 2 × 2
  sex   sd_wing
  <chr>   <dbl>
1 f        9.99
2 m        9.19
mosquitos %>% # Calls on the group mosquitos.
  group_by(sex) %>% # Groups by sex.
  summarize(range_wing = range(wing, na.rm = TRUE)) # Calculates the range of wing length for each sex and displays in a table.
# A tibble: 4 × 2
# Groups:   sex [2]
  sex   range_wing
  <chr>      <dbl>
1 f           25.2
2 f           69.8
3 m           27.4
4 m           66.1

The mean and median figures indicate that on average the wings of a male mosquito are larger than of a female. However, the standard deviation indicates that there is a greater variation between the largest and smallest female wing length than that in males. The range confirms this variation and also indicates that across both sexes, the females wing length was recorded as the smallest wing length in one instance but also the largest in another.

It would be helpful to visualise the data in more detail. A boxplot would illustrate the varying measurements above and also show the dispersion of data.

Visualising the data

ggplot(data = mosquitos, aes( x = sex, y = wing, color = sex, fill= sex)) + geom_boxplot() + geom_jitter() + labs(x = "Sex", y = "Wing length (mm)", fill = "Sex", color = "Sex") # Produces a boxplot including visualisation of the random distribution of data deviating from the norm.

The boxplot confirms prior observation, however there appears to be a significant amount of data that deviates from the normal pattern.It will then be useful to use a scientific calculation to determine if there is any significant difference.

The Welch two sampled T-test would be suitable to indicate any signifant difference between the observed and expected results.

t.test(wing ~ sex, data = mosquitos) # Produces a Welch Two Sample t-test.

    Welch Two Sample t-test

data:  wing by sex
t = -1.6686, df = 97.324, p-value = 0.09842
alternative hypothesis: true difference in means between group f and group m is not equal to 0
95 percent confidence interval:
 -7.0098735  0.6064862
sample estimates:
mean in group f mean in group m 
       47.17738        50.37907 

Conclusion

Both the P value (>0.05) and T value (<2) suggest that there is no significant difference between the observed and expected relationship between the wing length of male and female mosquitos, and therefore imply to fail to reject the null hypothesis. However, the fluctuations evident in the range of data along with the limited data source reported suggest that further investigation based on additional factors such as location or age of specimens may be beneficial to reach further conclusion.