Mosquito Trap Sites, DC Project 1

Author

yonas

This data set shows the results of mosquito trap sites in the District of Colubia from 2016-2018. During mosquito season DC health use surveillance and mitigation methods to controll the population of mosquitos, aswell as testing mosquitoes for West Nile virus. Data for October 2017 is not included as it would have skewed the results due to a number of reasons such as the amount of traps that faild by the end of the season. Some variables from this data set are trap type, collect time of day,females collected, males collected, and species.

Source: Published by City of Washington, DC. https://catalog.data.gov/dataset/mosquito-trap-sites

Loading librarys.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)

Importing data set.

mosquito_trap_sites <- read_csv("Mosquito_Trap_Sites.csv")
Rows: 2023 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): TRAPTYPE, ATTRACTANTSUSED, TRAPID, ADDRESS, TOWN, STATE, COUNTY, T...
dbl  (8): X, Y, FEMALESCOLLECTED, MALESCOLLECTED, UNKNOWNCOLLECTED, LATITUDE...
lgl  (3): SE_ANNO_CAD_DATA, CREATED, EDITED

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Cleaning.

names(mosquito_trap_sites) <- tolower(names(mosquito_trap_sites))  ## making column names all lower case. 
mts_s <- mosquito_trap_sites |>
  select(c(traptype,trapcollect,collecttimeofday,species,femalescollected,malescollected)) ## Selecting columns that might be used 
mts_sm <- mts_s |>
  mutate(totalcollected = (femalescollected + malescollected)) ## Adding a column to see total number of mosquitoes collected for each row.
  head(mts_sm)
# A tibble: 6 × 7
  traptype  trapcollect collecttimeofday species femalescollected malescollected
  <chr>     <chr>       <chr>            <chr>              <dbl>          <dbl>
1 ABC Trap  2016/04/27… Morning          pipiens                4              3
2 Gravid T… 2016/06/15… Morning          pipiens               50              5
3 ABC Trap  2016/06/08… Morning          pipiens                2              0
4 Gravid T… 2016/05/11… Afternoon        pipiens                3              0
5 Gravid T… 2016/07/20… Morning          pipiens               23              1
6 Gravid T… 2016/10/19… Morning          pipiens               16             11
# ℹ 1 more variable: totalcollected <dbl>
mts_smg <- mts_sm |>
  group_by(species,traptype,femalescollected,malescollected, totalcollected) 
 ## Grouping by species,traptype,femalescollected,malescollected, and totalcollected   
  
  
  

mts_smg
# A tibble: 2,023 × 7
# Groups:   species, traptype, femalescollected, malescollected, totalcollected
#   [778]
   traptype trapcollect collecttimeofday species femalescollected malescollected
   <chr>    <chr>       <chr>            <chr>              <dbl>          <dbl>
 1 ABC Trap 2016/04/27… Morning          pipiens                4              3
 2 Gravid … 2016/06/15… Morning          pipiens               50              5
 3 ABC Trap 2016/06/08… Morning          pipiens                2              0
 4 Gravid … 2016/05/11… Afternoon        pipiens                3              0
 5 Gravid … 2016/07/20… Morning          pipiens               23              1
 6 Gravid … 2016/10/19… Morning          pipiens               16             11
 7 Gravid … 2016/08/03… Morning          pipiens               14              0
 8 Gravid … 2016/09/28… Morning          pipiens               19             16
 9 Gravid … 2016/09/28… Morning          sp.                    7              0
10 Gravid … 2016/07/27… Morning          pipiens               12              0
# ℹ 2,013 more rows
# ℹ 1 more variable: totalcollected <dbl>

Leniar Regression

p1 <- ggplot(mts_smg, aes(x = femalescollected, y = malescollected)) +
  geom_smooth(method='lm',formula=y~x) +
  labs(title = "Corelation of Females Collected VS Males Collected",
  caption = "City of Washington, DC",
  x = "Females Collected",
  y = "Males Collected",) +
  theme_minimal(base_size = 12) 
p1 + geom_point(color = "purple", size = 0.1)

cor(mts_smg$femalescollected, mts_smg$malescollected)
[1] 0.3862949
fit1 <- lm(femalescollected ~ malescollected, data = mts_smg)
summary(fit1)

Call:
lm(formula = femalescollected ~ malescollected, data = mts_smg)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.715  -7.776  -5.776  -0.776 309.084 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.77643    0.44666   21.89   <2e-16 ***
malescollected  0.55289    0.02937   18.83   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.67 on 2021 degrees of freedom
Multiple R-squared:  0.1492,    Adjusted R-squared:  0.1488 
F-statistic: 354.5 on 1 and 2021 DF,  p-value: < 2.2e-16

The equation for this model would be femalescollected = 0.55(malescollected)+9.8, meaning for each additional malescollected there is a predicted increase of 0.55 femalescollected. The P-value of malescollected has 3 asterisk which suggests that it is meaningful. But looking at the Adjusted R-squared value (0.15) only 15% of the variation in this observation may be explained by this model, Which means 85% of the variation in the data is probably not explained by this model.

A Plot

p1 <- ggplot(mts_smg, aes(x = totalcollected, y = species, color = traptype )) +
  labs(title = "Trap Type VS Total Catch Of Species",
  caption = "City of Washington, DC",
  x = "Total Collected",
  y = "Species",
  color = "Trap Type") +
  theme_minimal(base_size = 12) +
  scale_color_brewer(palette = "Set2")
p1 + geom_point()

## Grouping by trap type and species and summarizing total number of mosquitoes collected.
mts_fg <- mts_sm |>
  group_by(traptype, species) |>
  summarise(total_collected = (femalescollected + malescollected))
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'traptype', 'species'. You can override
using the `.groups` argument.
mts_fg
# A tibble: 2,023 × 3
# Groups:   traptype, species [27]
   traptype species total_collected
   <chr>    <chr>             <dbl>
 1 ABC Trap aegypti               9
 2 ABC Trap aegypti               2
 3 ABC Trap aegypti               9
 4 ABC Trap aegypti               4
 5 ABC Trap aegypti               2
 6 ABC Trap aegypti               4
 7 ABC Trap aegypti              10
 8 ABC Trap aegypti               3
 9 ABC Trap aegypti               4
10 ABC Trap aegypti               5
# ℹ 2,013 more rows

Final Graph (Grade this one)

ggplot(mts_fg, aes(x= total_collected, y = species, fill=traptype))+
  geom_col(position = position_dodge(width = 1))+
  labs(title = "Total Mosquito by Species Collected by Trap Types",
       x = "Total Collected",
       y = "Species",
       fill = "Trap Type",
       caption = "Source: City of Washington, DC") +
  theme_minimal(base_size = 8) +
  scale_fill_brewer(palette = "Set2")

A

First I cleaned the data set, turned column names to all lower case using tolower(names(name of data)), selected columns that might be used using select, and added a column to see total number of mosquitoes collected for each row using mutate. Then grouped by species,trap type,females collected,males collected, and total collected. Next created a linear Regression. Then wrote the equation for the model and analyzed the model. After that made a plot. Then grouped by trap type and species and summarized total number of mosquitoes collected. Lastly made the final graph.

B

The final bar graph visualizes the total number of mosquitoes by species collected by the three different trap types. The x axis is the total, the y axis is the mosquito species and the 3 different colors represent the trap types.It was surprising to see the amount of Pipiens collected.

C

It would have been interesting to add more variables or test different ones. There were plenty of errors whiles working on this project but looking at older assignments helped solve the issues.