Formative 1

Author

Group Project

Published

October 25, 2024

Formative assessment : mosquitoes

overview

  • Working in groups, and using R, explore the dataset provided to determine the nature of the data and suggest what questions could be asked of the data. The mosquitoes data provided contains just two variables: sex and wingspan. Sex is a categorical, nominal variable and wingspan a quantitative, continuous variable. Using different sets of graphs we will be able to figure out if there is a connection between the mosquito’s gender and the wing span.

the questions..

  1. Is there a significant difference in wing length between male and female mosquitoes?

  2. Is length of wingspan influenced by gender?”

  3. Is there any difference within the wingspan of female and males ?

Data received

ID Wing Size Sex The data is limited both in amounts of data and range of information however it is possible to use statistics to understand The variation in wing by population studied and by sex The mean, median and standard deviaton of this variable and if there is a statistical difference by sex. The data can also be displayed visually in a histograpms, box and density plots both as whole population and by sex.

The tests that can be used ?

  1. T-test: Simple two-sample t test can be used to compare the mean wing length between males and females to test for a statistically significant difference. 2.Pearson’s r and Spearman’s p (non-parametric) could be used to determine the strength and direction of relationship between sex and wingspan (using point-biserial correlation in this case due to having one continuous and one binary variable

Graphs we can use?

  1. box-plot

  2. density-plot

  3. histogram

now to the stuff…the graphs.

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
mosquito_data <- read.table("mosquitos.txt", sep = "\t", header = TRUE)

Mean, median and sd of wing size by whole population

summary_stats <- mosquito_data %>%
  summarise(
    mean_wing = mean(wing, na.rm=TRUE), #Mean of wing sizes 
    median_wing = median(wing, na.rm = TRUE), #Median of wing sizes
    sd_wing = sd(wing,na.rm = TRUE) # Standard deviation of wing sizes
  )
print(summary_stats)
  mean_wing median_wing  sd_wing
1  48.77822    48.41719 9.680198

Density Plot

# Inspect the data
str(mosquito_data)
'data.frame':   100 obs. of  3 variables:
 $ ID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ wing: num  37.8 50.6 39.3 38.1 25.2 ...
 $ sex : chr  "f" "f" "f" "f" ...
head(mosquito_data)
  ID     wing sex
1  1 37.83925   f
2  2 50.63106   f
3  3 39.25539   f
4  4 38.05383   f
5  5 25.15835   f
6  6 57.95632   f
#Create a density plot
ggplot(mosquito_data, aes(x = wing , fill = sex)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Wing Lengths by Sex", x = "Wing ", y = "Density") +
  theme_minimal()

Box Plot to see wing size distribution by sex

ggplot(mosquito_data,aes(x = sex, y = wing, fill = sex)) +
  geom_boxplot() +
  labs(title = "Boxplot of Mosquito Wing Sizes by Sex",
       x = "Sex",
       y = "Wing Size") +
  scale_fill_manual(values = c("lightblue", "lightcoral")) +
  theme_minimal() 

Histogram

ggplot(mosquito_data, aes(x = wing)) +
  geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Mosquito Wing Sizes",
       x = "Wing Size",
       y = "Frequency") +
  theme_minimal()

Using the independent t-test to compare wing size by sex

t_test_result <- t.test(wing ~ sex, data = mosquito_data)
print(t_test_result)

    Welch Two Sample t-test

data:  wing by sex
t = -1.6686, df = 97.324, p-value = 0.09842
alternative hypothesis: true difference in means between group f and group m is not equal to 0
95 percent confidence interval:
 -7.0098735  0.6064862
sample estimates:
mean in group f mean in group m 
       47.17738        50.37907 
p_value <- t_test_result$p.value
print(paste("P-value:", p_value))
[1] "P-value: 0.0984171086062613"

The P value is 0.098 which is greater than the typical significance level of 0.05. Since 0.098>0.05 the result is not statistically significant so there is not enough evidence to conclude that there is a significant difference in wing sizes based on sex. The data could be rerun on a much large data set to reassess.

#Some further questions to be investigated…

  1. What specific differences in wing span exist between male and female mosquitoes across various species?

  2. Does wing length affect the vulnerability of male and female mosquitoes to predators?

  3. How does wingspan in males compare to that of females in terms of variability and outliers?