This data set shows the results of mosquito trap sites in the District of Colubia from 2016-2018. During mosquito season DC health use surveillance and mitigation methods to controll the population of mosquitos, aswell as testing mosquitoes for West Nile virus. Data for October 2017 is not included as it would have skewed the results due to a number of reasons such as the amount of traps that faild by the end of the season. Some variables from this data set are trap type, collect time of day,females collected, males collected, and species.
Source: Published by City of Washington, DC. https://catalog.data.gov/dataset/mosquito-trap-sites
Loading librarys.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 2023 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): TRAPTYPE, ATTRACTANTSUSED, TRAPID, ADDRESS, TOWN, STATE, COUNTY, T...
dbl (8): X, Y, FEMALESCOLLECTED, MALESCOLLECTED, UNKNOWNCOLLECTED, LATITUDE...
lgl (3): SE_ANNO_CAD_DATA, CREATED, EDITED
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Cleaning.
names(mosquito_trap_sites) <-tolower(names(mosquito_trap_sites)) ## making column names all lower case.
mts_s <- mosquito_trap_sites |>select(c(traptype,trapcollect,collecttimeofday,species,femalescollected,malescollected)) ## Selecting columns that might be used
mts_sm <- mts_s |>mutate(totalcollected = (femalescollected + malescollected)) ## Adding a column to see total number of mosquitoes collected for each row.head(mts_sm)
fit1 <-lm(femalescollected ~ malescollected, data = mts_smg)summary(fit1)
Call:
lm(formula = femalescollected ~ malescollected, data = mts_smg)
Residuals:
Min 1Q Median 3Q Max
-47.715 -7.776 -5.776 -0.776 309.084
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.77643 0.44666 21.89 <2e-16 ***
malescollected 0.55289 0.02937 18.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.67 on 2021 degrees of freedom
Multiple R-squared: 0.1492, Adjusted R-squared: 0.1488
F-statistic: 354.5 on 1 and 2021 DF, p-value: < 2.2e-16
The equation for this model would be femalescollected = 0.55(malescollected)+9.8, meaning for each additional malescollected there is a predicted increase of 0.55 femalescollected. The P-value of malescollected has 3 asterisk which suggests that it is meaningful. But looking at the Adjusted R-squared value (0.15) only 15% of the variation in this observation may be explained by this model, Which means 85% of the variation in the data is probably not explained by this model.
A Plot
p1 <-ggplot(mts_smg, aes(x = totalcollected, y = species, color = traptype )) +labs(title ="Trap Type VS Total Catch Of Species",caption ="City of Washington, DC",x ="Total Collected",y ="Species",color ="Trap Type") +theme_minimal(base_size =12) +scale_color_brewer(palette ="Set2")p1 +geom_point()
## Grouping by trap type and species and summarizing total number of mosquitoes collected.mts_fg <- mts_sm |>group_by(traptype, species) |>summarise(total_collected = (femalescollected + malescollected))
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'traptype', 'species'. You can override
using the `.groups` argument.
mts_fg
# A tibble: 2,023 × 3
# Groups: traptype, species [27]
traptype species total_collected
<chr> <chr> <dbl>
1 ABC Trap aegypti 9
2 ABC Trap aegypti 2
3 ABC Trap aegypti 9
4 ABC Trap aegypti 4
5 ABC Trap aegypti 2
6 ABC Trap aegypti 4
7 ABC Trap aegypti 10
8 ABC Trap aegypti 3
9 ABC Trap aegypti 4
10 ABC Trap aegypti 5
# ℹ 2,013 more rows
Final Graph (Grade this one)
ggplot(mts_fg, aes(x= total_collected, y = species, fill=traptype))+geom_col(position =position_dodge(width =1))+labs(title ="Total Mosquito by Species Collected by Trap Types",x ="Total Collected",y ="Species",fill ="Trap Type",caption ="Source: City of Washington, DC") +theme_minimal(base_size =8) +scale_fill_brewer(palette ="Set2")
A
First I cleaned the data set, turned column names to all lower case using tolower(names(name of data)), selected columns that might be used using select, and added a column to see total number of mosquitoes collected for each row using mutate. Then grouped by species,trap type,females collected,males collected, and total collected. Next created a linear Regression. Then wrote the equation for the model and analyzed the model. After that made a plot. Then grouped by trap type and species and summarized total number of mosquitoes collected. Lastly made the final graph.
B
The final bar graph visualizes the total number of mosquitoes by species collected by the three different trap types. The x axis is the total, the y axis is the mosquito species and the 3 different colors represent the trap types.It was surprising to see the amount of Pipiens collected.
C
It would have been interesting to add more variables or test different ones. There were plenty of errors whiles working on this project but looking at older assignments helped solve the issues.