Working in groups, and using R, explore the dataset provided to determine the nature of the data and suggest what questions could be asked of the data.
You will be expected to adapt R scripts given in class to carry out your proposed analysis. Please note you will not be asked or expected to conduct any analyses that you have not already covered in class.
Annotate the R script that you used to explore the data.
Submit the annotated R code (eg. *.R). The R code should be fully reproducible, meaning that it should allow someone else with access to the raw data to recreate the results by running the script without further modifications. Ensure that the code includes:
Code to access the required packages.
Explanations of the tasks performed within the script. For example ‘ ##this step ……##’
The dataset used for analysis.
Questions
What are the mean, median, and standard deviation of wingspans for each sex? Is the distribution of wingspans normal or skewed?
What is the distribution of wingspan within each sex?
Is there a significant difference in mean wingspan among the groups
How might climate change affect the distribution of mosquito wingspans and their potential impact on public health?
Is there a correlation between wingspan and other variables (e.g., body size, age, geographic location)?
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rlang)
Attaching package: 'rlang'
The following objects are masked from 'package:purrr':
%@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
flatten_raw, invoke, splice
read.csv("mosquitos.txt", sep ="\t", header =TRUE) ##this step is for Importing and visualising the dataset used for this project
ID wing sex
1 1 37.83925 f
2 2 50.63106 f
3 3 39.25539 f
4 4 38.05383 f
5 5 25.15835 f
6 6 57.95632 f
7 7 46.58526 f
8 8 33.16252 f
9 9 55.30522 f
10 10 46.47553 f
11 11 36.62714 f
12 12 39.21257 f
13 13 53.72027 f
14 14 34.33493 f
15 15 45.71824 f
16 16 38.15299 f
17 17 45.41749 f
18 18 61.47701 f
19 19 37.85110 f
20 20 39.55613 f
21 21 38.85710 f
22 22 57.73284 f
23 23 40.85127 f
24 24 39.10695 f
25 25 31.49612 f
26 26 53.24897 f
27 27 46.28474 f
28 28 46.87615 f
29 29 35.30032 f
30 30 58.32131 f
31 31 68.51993 f
32 32 41.96362 f
33 33 48.09749 f
34 34 46.32078 f
35 35 59.65020 f
36 36 41.61149 f
37 37 43.71475 f
38 38 66.20846 f
39 39 58.56829 f
40 40 53.25179 f
41 41 58.73036 f
42 42 50.30130 f
43 43 44.00815 f
44 44 69.81825 f
45 45 40.37582 f
46 46 51.36792 f
47 47 50.03187 f
48 48 60.32544 f
49 49 47.29252 f
50 50 48.14393 f
51 51 52.88405 m
52 52 36.61598 m
53 53 52.97575 m
54 54 57.29169 m
55 55 43.53158 m
56 56 66.13355 m
57 57 54.02540 m
58 58 44.79380 m
59 59 53.13227 m
60 60 59.76525 m
61 61 54.26859 m
62 62 65.90059 m
63 63 47.65107 m
64 64 52.27643 m
65 65 55.55607 m
66 66 48.69044 m
67 67 47.87074 m
68 68 44.54322 m
69 69 42.89192 m
70 70 44.23855 m
71 71 40.36747 m
72 72 56.32602 m
73 73 27.42760 m
74 74 49.28878 m
75 75 56.21204 m
76 76 62.30341 m
77 77 57.09814 m
78 78 43.18794 m
79 79 52.29912 m
80 80 60.76773 m
81 81 40.01366 m
82 82 50.30523 m
83 83 62.00337 m
84 84 36.34514 m
85 85 64.56280 m
86 86 61.91520 m
87 87 47.68147 m
88 88 56.54568 m
89 89 54.80366 m
90 90 33.98422 m
91 91 58.64392 m
92 92 51.58475 m
93 93 29.50289 m
94 94 36.46946 m
95 95 50.31745 m
96 96 51.68598 m
97 97 58.89978 m
98 98 42.22055 m
99 99 47.22805 m
100 100 53.92501 m
Data analyisis
The data is comprised of a population of 100 mosqitos (ID) and lists their wingspan (measured in mm) and sex (m/f). Due to the small population size it is necessary to use statistical measures to determine if there is significance between variables. At this point one would be able to accept or decline a hypotheses.
#the range of the wingspan in the data
```rqb# The mean median and standard deveiation of the dataset grouped by sex (m and f).
summary_stats_by_sex <- mosquitos %>% #Create a new data frame. use the pipe opperator to use the data as an input to the group by and summarise function group_by(sex) %>%
summarise( mean_wing = mean(wing, na.rm = TRUE), #na.rm=true means NA valuees are removed. median_wing = median(wing, na.rm = TRUE), sd_wing = sd(wing, na.rm = TRUE))
```{r}#The mean and median and standard deveiation of the data for wingpsan
summary_stats <- mosquitos %>% #Create a new data frame. use the pipe opperator to use the data as an input
summarise(
mean_wing = mean(wing, na.rm = TRUE),
median_wing = median(wing, na.rm = TRUE),
sd_wing = sd(wing, na.rm = TRUE)
)
print(summary_stats) #puts the data in the console displaying mean median and standard deveiation
Data presentation
A linegraph to highlight the correlation betweeen sex of the mosquitos and wingspan.