Background


Hunting and Ecology

          The natural world is often seen as a place to visit, where humans observe beauty and gather resources, but do not belong. However, hunting and the infrastructure that supports it dispel the notion that humans are not part of our own ecosystems. In the absence of natural predators, hunters help control the populations of animals such as white-tailed deer (Riley et al., 2003). In addition, projects to restore and conserve the habitats of game animals often depend on support from hunters (Chapagain & Poudyal, 2020). The size and spending power of this constituency is considerable, as in 2024, New York hosted over 500,000 licensed hunters who contributed an estimated $1.6 billion to the state’s GDP (Scuderi et al., 2024). Engaging with this audience provides states with an opportunity to use charismatic game animals to gather data and public support for environmental, conservation, and healthcare objectives.

Ruffed Grouse

          The ruffed grouse (Bonasa umbellus) is a North American game bird second only to turkeys in popularity with New York hunters (Figure 1) (Ruffed Grouse Hunting - NYSDEC, n.d.). However, according to the New York Department of Environmental Conservation, this status is threatened by a marked population decline of over 80% in the past 80 years. Human activity is suspected to be a major cause, but not through hunting. Ruffed grouse require a disrupted habitat to thrive, preferring the early successional forests that arise from abandoned fields, timber farming, and wildfires (Porter & Jarzyna, 2013; Ruffed Grouse Hunting - NYSDEC, n.d.). As such, they are vulnerable both to habitat fragmentation as humans develop the lands around them, and forest restoration efforts that strive to create mature, established forests (Porter & Jarzyna, 2013). Sub-optimal habitats can lead to decreased protein intake and greater exposure to plant toxins, serving to decrease nutritional condition and suppress the immune system (Downs & Stewart, 2014; Servello & Kirkpatrick, 1987). If these conditions lead to chronic stress, increased corticosterone can further depresses the immune system (Owen et al., 2012; Roy et al., 2022).
          Immunocompetence is of great importance to ruffed grouse as they share a habitat with diverse communities of mosquitoes, many of which serve as vectors for arthropod-borne viruses (arboviruses) (Gorris et al., 2021). Birds and mosquitoes are part of the life cycle of many viruses, most of them harmless, but mosquito-host dynamics have severe consequences for human and animal health. For example, Culex mosquitoes are mostly ornithophilic, meaning that they are likely to spread viruses among bird species, but less likely to introduce a new one acquired from a mammalian host (Riccetti et al., 2022; Shahhosseini et al., 2021). However, other common mosquito genera such as Aedes and Anopheles are typically more generalist feeders, increasing the potential for interspecies exchange (Shahhosseini et al., 2021).
          While rare, international commerce has made it easier for viruses to spread between continents with potentially devastating effects on naïve species (Findlater & Bogoch, 2018). In parallel to modern technological advancements, climate change has affected the timing and routes of migratory birds, who may then spread viruses to new countries through contact with novel mosquito species (Heidecke et al., 2023). Although the exact method of transport is unknown, one such virus was introduced to New York in 1999, where it found an abundance of competent vectors and vulnerable hosts (Ronca et al., 2021).
\[\\[0.1in]\]
Figure 1. A female ruffed grouse flaring her neck feathers in a defensive display (Stedman, 2015).

Figure 1. A female ruffed grouse flaring her neck feathers in a defensive display (Stedman, 2015).

West Nile Virus

          West Nile Virus (WNV) is an orthoflavivirus from the Japanese encephalitis serogroup that exploits an enzootic cycle of avian and mosquito hosts (Bialosuknia et al., 2022). Although humans are an unintended dead-end host, spillover events have been known to cause human illness since 1937, when WNV was first isolated from a febrile patient in Uganda (Baum, 2008; Smithburn et al., 1940). Native to sub-Saharan Africa, WNV only gained mainstream attention in the 1990s when it was introduced to Europe and North America (May et al., 2011). Due to the primary vector being ornithophilic mosquitoes, it is estimated that only 2.2% of Americans are seropositive, although state by state estimates can be considerably higher (Cahill et al., 2017; Carson et al., 2012; Ronca et al., 2019). WNV infection is asymptomatic for most of the population; however, 20% of people infected will develop flu-like symptoms, and an unlucky <1% will develop West Nile Virus Neuroinvasive Disease (Santini et al., 2022). This condition causes swelling of the brain, meninges, or both, with damage to other organ systems commonly reported (CDC, 2025; Santini et al., 2022). These patients face long, expensive hospital stays, a considerable mortality rate (10%), and chronic neurological deficits (Santini et al., 2022). In 2024, the CDC received 1,063 reports of neuroinvasive WNV out of a total of 1,466 symptomatic cases (CDC, 2025).
          While the human cost of WNV is a serious concern, outbreaks are frequently preceded by deadly epizootics in bird populations (Eidson, 2001). While many bird species can host WNV, ruffed grouse appear to be especially vulnerable to illness (George et al., 2015; Komar et al., 2003; Nemeth et al., 2017). WNV affects nearly every part of the grouse’s body, causing inflammation of the heart, brain, liver, and kidneys, with necrotic lesions affecting the liver, pancreas, and digestive tract (Nemeth et al., 2017). In field surveys, the relatively low seroprevalence of WNV, despite an abundance of suitable mosquito vectors, suggests a high mortality rate (Nemeth et al., 2021). Due to the systemic nature of this illness, death can result from dehydration, malnutrition, or heart failure, while the damage to the heart and brain, particularly the cerebellum, may result in chronic illness (Nemeth et al., 2017). In ideal conditions, grouse populations can exhibit resilience to WNV, but the immunosuppressive consequences of habitat loss have made WNV a major driver of population loss (Roy et al., 2022; Stauffer et al., 2018).

Mosquito Vectors

          Given the dire risks presented by WNV, vector testing and surveillance have been implemented by all 50 US states and many constituent counties (CDC, 2025). Among these states, Connecticut’s arbovirus surveillance program is particularly comprehensive. From 2001 to 2018, data from the Connecticut program indicated that Culex pipiens is the likely main vector, with other ornithophilic Culex and Culiseta species serving as secondary vectors (McMillan et al., 2020; Shepard & Armstrong, 2023). However, many other species play a role as secondary or bridge vectors between birds and mammals. For example, generalist mosquitoes such as Aedes Japonicus or Ochlerotatus trivittatus may cause spillover events when present in an ecosystem, as they are also competent vectors (Tiawsirisup et al., 2004; Turell et al., 2005).
          While individual species are all part of the equation, the Connecticut program detected WNV in half of all tested species in the 2001-2018 time period (McMillan et al., 2020). Given the virus’ prevalence, overall mosquito numbers, community composition, and environmental variables also play a significant role in WNV risk. For example, high overall mosquito diversity is associated with high WNV seroprevalence in local birds, while areas with a greater percentage of mammalophilic mosquitoes tend to have a negative correlation (Martínez-de La Puente et al., 2018; Roche et al., 2013). This correlation may not be neatly linear, though, as one Chicago study found that there is a point at which increasing species diversity is correlated with reduced mosquito populations and WNV infection rates (Chaves et al., 2011). Beyond identifying potential vectors, making sense of WNV epizootics and spillover events requires knowledge of local ecosystems.

The Shared Ecosystem

          As with many ecological systems, the web of interactions among mosquitoes, ruffed grouse, humans, and WNV is complex. However, all parties are linked by the forest they interact with. Just as ruffed grouse require specific forest types for nesting and foraging, most species of mosquitoes are dependent on forest vegetation. Adult males and some adult females feed upon nectar, while larvae are dependent on algae and rotting plant matter. Some mosquito species are even more entwined with their habitats, using specific tree root configurations to shield their larvae from winter temperatures, or piercing their breathing tubes into certain aquatic plant stems to use as snorkels (Burkett-Cadena, 2013). The niches available to mosquito species are, in large part, controlled by the available plant life.
          Although many links have been made between mosquito species and WNV in different US states, there are few universals. One reason for this discrepancy is that previous studies of mosquito and arbovirus communities mostly focus on areas of permanent human habitation. While these environments are where WNV has its greatest impact on humans, spillovers are unusual events; the forest habitats sampled in this study are where the virus is routinely spread and maintained. Human relationships with these sites can range from destruction and fragmentation to conservation and restoration, but the reciprocal aspect is rarely considered. Migratory birds and mosquitoes such as Turdus migratorius (American robin) and Aedes vexans can introduce viruses hundreds of kilometers from their origin, while occasional visitors, such as hunters entering a trail or Culiseta melanura breeding on the edge of their forest, can produce spillover events (Briegel et al., 2001; Howard et al., 1989). Despite these risks, little is known about the impact of specific forest types and management practices on WNV.

Looking Ahead

          While the infrastructure for mosquito arbovirus testing in New York State is robust, it is not currently effective at preventing outbreaks of WNV in ruffed grouse, and may not be equipped to handle humans’ future needs. Current models of climate change predict that WNV transmission will increase and develop more complex patterns of seasonality, while the WNV circulating in New York today has changed since the 90s (Bialosuknia et al., 2022; Heidecke et al., 2023). A 2022 study of WNV genotypes in New York found that WNV is evolving to favor strains that are more infectious and easily transmitted (Bialosuknia et al., 2022). Despite this forecast, current methods of arbovirus control still rely on old methods: pesticide application timed and targeted for mosquitos of a specific species (Dusfour & Chaney, 2021). While this approach still has a place in modern mosquito control, environmental factors from hidden oviposition sites to pesticide resistance leave vulnerable communities unprotected (Dusfour & Chaney, 2021; Weng et al., 2024). Fundamentally, WNV does not rely on a single species, location, or time period; it uses a diverse stock of infected hosts and vectors that allow it to come back year after year.
          Amid mounting incentives to control WNV, innovative uses of extant monitoring infrastructure must be explored, broadening the focus from infected mosquitos to community-level trends. It is not possible to eradicate arboviruses from the environment, but small changes to mitigate their effects can have meaningful consequences. While most arboviral diseases are still obscure, the ruffed grouse can serve as an example of the value mosquito control efforts in isolated forests bring to populated areas. Ongoing conservation efforts are already working to change forest composition in New York State to benefit species such as ruffed grouse (NYSDEC, n.d.). Perhaps something as simple as planting a different type of tree in boundary areas or protecting native plant species in grouse habitats can help mitigate the spread of WNV. This research aims to identify mosquito community and environmental trends that can inspire future research and land management decisions to minimize arboviral disease risk in humans and endangered wildlife.

Significance


This study aims to examine the mosquito communities of New York State in a mostly exploratory fashion. Given the complexity of the dataset and the system it is attempting to represent, follow-up studies will be required to confirm any patterns found here.

However, this study will test the provisional hypotheses that:

  1. Alpha diversity will be highest in sites with low/medium WNV risk and in ecotones.
  2. If mosquito communities vary in composition, differences will be greatest among WNV risk categories and least among ecoregions.
  3. If differences in mosquito communities are found, no single genus or species will drive the separation.
  4. As mosquito prevalence increases, WNV risk level will also increase.

Recognizing state-wide trends in ecological relationships, instead of simply trying to eliminate Culex or Culiseta mosquitoes in populated areas, could allow the DEC to adopt a more holistic approach to mosquito control. Possible outcomes include the redistribution of WNV monitoring resources to high-risk forest environments and the implementation of specific and nuanced mosquito-reduction practices. While these practices may include the use of species-level techniques such as releasing sterile males, niche-level practices such as destroying oviposition areas near hiking trails may also provide protection (Weng et al., 2024).

\[\\[0.1in]\]

Objectives


  1. Organize data and generate visualizations to display trends.
  2. Identify variation in mosquito species richness and diversity among DEC ecoregions, WNV risk levels, and forest types.
  3. Identify variation in mosquito community composition among DEC ecoregions, WNV risk levels, and forest types.
  4. Determine which mosquito species best explain the observed variation among sites.
  5. Identify how population size of those key mosquito species differs among DEC ecoregions, WNV risk, and forest types.
  6. Determine how total mosquito counts vary among WNV risk levels and forest types.

\[\\[0.1in]\]

Data Collection


Site Selection

          The study area encompasses three to eight sites in each of the seven ecoregions of Upstate New York as defined by the New York State Department of Environmental Conservation (DEC) (Figure 2). Starting with DEC maps, areas of interest in each region were identified based on the presence of early successional forest, known stable or declining grouse population, and diversity of mosquito species. These areas included state forests, wildlife management areas, privately managed forests, and other similar types of land for which data concerning the selection criteria was readily available. GPS coordinates were then randomly selected in each of these areas and evaluated for logistical feasibility, allowing unsuitable sites to be discarded and replaced. These sites were not equally distributed among ecoregions as the primary purpose of this study was to survey the presence and health of grouse populations. Accordingly, ecoregions with higher past and present grouse populations received more trapping sites.
Figure 2. The 35 mosquito trapping sites in relation to the DEC ecoregions of New York State.

Figure 2. The 35 mosquito trapping sites in relation to the DEC ecoregions of New York State.

Mosquito Collection

          Mosquitoes were collected using two CDC standard light and gravid traps at each site. These traps were unmodified and deployed according to manufacturer instructions. Using CO2 from dry ice as a lure, light traps take advantage of the mosquitoes’ dorsal light response, disrupting their flight path so that they can be collected in the trap’s vacuum. However, since CO2 mimics the breath of a vertebrate host, the majority of mosquitoes caught using these traps are nulliparous (not fed). Gravid traps are designed to catch blood-fed adult females using an attractive oviposition site as a lure. The trap consists of a small tub of hay-infused water with a vacuum on top to collect any visitors.
          In the first four days of August 2023 and 2024, two of each trap type was set out at dusk in each location and collected the following dawn. The trapping sites consisted of three points arranged in an equilateral triangle with sides measuring 200m. No mosquito traps were placed at point C, only an Audiomoth automated recording unit to detect the drumming made by male ruffed grouse (Hill et al., 2019). Points A and B contained one of each trap type and an Audiomoth. This setup ensured that drumming could be detected coming from any direction and provided redundancy in case of equipment malfunction. Collected mosquitoes were visually identified to the species level through the use of the Darsie and Ward key (Darsie, 2005).

Forest Type

          Forest type was ascertained using a 10 BAF (basal area factor) angle gauge at the center of each sampling site. Each tree included in the sample was identified and the forest type was categorized as Coniferous, Northern hardwoods, mixed oak, or a combination of any two. Coniferous forests were dominated by pine, spruce, and/or fir. Mixed oak forests were dominated by multiple species of oaks. Northern hardwood forests were dominated by maple, birch, beech, and ash. Ecotones were labeled with both forest types.

Data Summary


Data Tables

All tables presented in this report are reactables. Column widths are resizable and data can be sorted alphabetically by column through clicking on the column header.

Table 1. Mosquito trapping data named Mosquito.csv.

Table 2. CDC West Nile Virus Reports by county named County.csv.

Sites and Forest Types

All 35 sites were sampled for both years but six changed forest type from 2023 to 2024. The reasons for these changes are unknown.

Adirondack Ecological Center went from Coniferous/Northern Hardwoods to Coniferous.
Cranberry Mountain went from Mixed Oak to Northern Hardwoods.
Manguap Valley went from Northern Hardwoods to Mixed Oak.
Sargent Pond and Titusville Mountain went from Northern Hardwoods to Coniferous/Northern Hardwoods.
Tug Hill went from Coniferous to Northern Hardwoods.
Wilcox Lake went from Coniferous/Northern Hardwoods to Northern Hardwoods.

Trapping

          The first year of trapping encountered some logistical difficulties that resulted in slight deviations. Weather events such as high winds occasionally rendered a trap non-functional. In addition, the trapping procedure was accomplished using multiple undergraduate assistants and cooperation between DEC and ESF personnel, occasionally leading to miscommunication. The time and location of these irregularities are detailed below.
          Only one light trap could be used on night two of trapping at the Adirondack Ecological Center in 2023, so three were set out the next night to compensate. A similar situation occurred at Boutwell Hill, 2023 on the third and fourth night and Fort Jackson, 2023 on the first and fourth night. At Connecticut Hill and Heiberg Forest in 2023, three light traps were set out on the third night so only one was set out on the fourth. At Fort Drum in 2023, only one gravid trap could be set out on the second night, so three were used the fourth night. The second year of trapping benefited from previous experience and only experienced one issue: due to equipment malfunction, only one of each trap type could be used in the Capital District in 2024 on the fourth night. The fourth night at Capital District 2024 was thus discarded for the analysis.

CDC Data

          The CDC collects reports of human cases of WNV through the ArboNET national surveillance system. Although participation varies state-to-state, New York provides county-level data and reports of non-human WNV infections. No further information is available about non-human activity, but New York State has a dead-bird reporting program, so it can be presumed that these cases include reports of WNV in susceptible bird species (Eidson, 2001). Human cases are divided into total (all symptomatic cases) and neuroinvasive. A third category, presumptive viremic blood donors (PVDs), represent blood donors who do not exhibit any symptoms, but still have high levels of WNV antibodies detected by routine blood product screening. WNV can infect humans through donated blood and organs, so these cases, while not reflected in the total, are nationally notifiable.
          The “County” dataset contains the names of each sampling site, latitude and longitude, the county, and the number of total human WNV cases, neuroinvasive cases, and PVDs from 1999-2023. Sampling site and latitude and longitude were added manually and counties that contained no sampling sites were manually removed.These modifications were done using Excel, as this dataset was made for an separate project.
          These data are almost guaranteed to under-report cases and over-represent the severity of WNV due to the system’s reliance on confirmed diagnoses from medical professionals. Due to the virus’ nonspecific symptoms such as headache, nausea, fatigue, and fever, patients may be reticent to seek expensive medical treatment, and doctors may not suspect arboviral infection (Santini et al., 2022). As a result, data from 1999-2023 was used in this study to reflect county-level trends more accurately than single-year reports, which would have reflected only 4 human cases and no PVDs from 2023-2024. The full dataset contains 94 human cases with 69% being neuroinvasive. An additional 16 PVDs were recorded, as well as non-human infection in every county. As seen in Figure 3, DEC ecoregion 7 had the most human cases at 41, while ecoregions 6 and 9 were tied for least at three each. Ecoregion 9 also had the fewest PVDs, with zero recorded, while ecoregion 4 had the most: five cases (Figure 3). Counties were designated by level of human WNV activity based on total cases and PVDs combined: None (0), Low (1-4), Medium (5-9), and High (10+). This designation is designed to create distinct and meaningful groups from these data rather than reflect a recognized standard. PVDs and human cases were combined to reflect overall detected seropositivity and partially mitigate the over-representation of severe cases.
Figure 3. Choropleth maps of presumed viremic donors (a) and reported symptomatic human cases (b) in New York State by DEC ecoregion. Data used encompass CDC reports from 1999-2023.

Figure 3. Choropleth maps of presumed viremic donors (a) and reported symptomatic human cases (b) in New York State by DEC ecoregion. Data used encompass CDC reports from 1999-2023.

Mosquito Species

          The “Mosquito” dataset encompasses 1,362,353 individual mosquitoes from eight genera and 25 species. Despite the species’ importance in the Connecticut survey, Culex pipiens was notably absent from this dataset. As shown in Table 1, Ochlerotatus triseriatus was the most common with 57020 individuals collected, while the closely related Ochlerotatus trivittaus was the least common at 47211. Every site, and therefore ecoregion, WNV risk category, and forest type, contained mosquito communities of equal richness, with the specnumber function reporting each of the 25 species at every site across both of the sampling years. The trapping effort was sufficient to detect all likely mosquito species, as neither the specpool nor estimateR functions predicted any undetected species at any site at each year. No notably dominant or rare species were found at any site and no site saw more than 100 individuals of a species trapped per night. Mosquito totals and species richness did not vary significantly between 2023 and 2024, but slightly more mosquitoes (+603) were collected the second year.
          The “Pretty Names” dataset includes the full scientific name, abbreviation, preferred host type, and generations per season for each mosquito species (Table 3). It is not strictly necessary, but is included for ease of labeling figures and to give the reader a better sense of the species in question.

\[\\[0.03in]\] Table 3. Names, count data, and bionomic information for the 25 mosquito species observed in this study.

Administrative

          Since the mosquito collection datasheet also served a logistical purpose, some columns are included that are not relevant to the current analysis. Tube number reflects the tube mosquitoes were put in for later identification and would have been used to track identification errors or lost samples if any had been experienced. Trap type (Light or Gravid), day, and designation (Points A or B) were included to track equipment during trapping sessions. These categories were ignored for mosquito community analysis.

Author’s Note

          The mosquito data provided for this project are simulated. I am not at liberty to disclose the exact circumstances, but I was provided with these data in hopes that I could start coding and substitute the real data at some point during the semester. I was also supposed to be able to produce WNV detection data (for the mosquitoes), which I have tried to simulate by adding CDC data. This plan fell through in late March. Amid the process of switching the focus of my master’s project, I have not had the time or resources to fully redo this project on such short notice.
          The data provided to me were randomly generated (to the best of my knowledge) and contained no significant trends. For the purpose of practicing the vegan package, six 0s have been added to the columns for Aedes vexans, Anopheles punctipennis, Culiseta melanura, and Ochlerotatus trivittaus in sites with a WNV risk of “None.” These 0s are evenly distributed by year, but not by designation(A/B) or trap type. These species were chosen to contain one likely main vector and one likely bridge vector, as well as adding some messiness with my favorite species, and my undergraduate assistant’s favorite species.
          While the collection methods and description of the study are accurate to the best of my knowledge, I have also had difficulty obtaining this information. Please forgive any resulting incongruities.

Analysis Methods


  1. Import data sets.
    1. Calculate WNV risk per site from CDC data and merge with count data.
  2. Create a data frame for site characteristics and counts per site per year using a custom function.
  3. Create explanatory figures and tables.
    1. Group New York counties by ecoregion.
    2. Plot the 35 sampling sites on the ecoregion map.
    3. Make an ecoregion-level choropleth for reported human WNV cases and PVDs.
    4. Create a reactable of WNV data, mosquito collection data, and mosquito names and characteristics.
  4. Determine species richness (specnumber) and alpha diversity (diversity) for each site. If differences are found, move to ecoregion, WNV risk level, and forest type.
  5. Predict species undetected during each sampling window using specpool and estimateR.
  6. Use adonis2 to measure mosquito community dissimilarity among sites. If differences are found, move to ecoregion, WNV risk level, and forest type.
  7. If differences are found, create new data frames for Principal Component Analysis (PCA) of site dissimilarity using custom function.
  8. Run PCA using rda and determine the amount of variation explained by each axis using eigenvalues (eigenvals).
  9. Plot sites and species in ordination space using the PCA data.
  10. Perform Non-metric Multidimensional Scaling (NMDS) modelling using metaMDS to clarify the role of significant species.
    1. Create stress plot to assess model fit.
    2. Place sites in ordination space and group them by WNV risk level.
    3. Use envfit to plot species with a significant effect (p<0.001) on community dissimilarity.
  1. Use statistical tests to determine the differences in populations of significant species by WNV risk level.
    1. Assess normality by Shapiro-Wilks test using RVAideMemoire.
      • If distributions are normal, run an anova and Tukey HSD.
      • If distributions are not normal, go to b.
    2. Run a Poisson regression and test for overdispersion using dispersiontest.
      • If distributions follow Poisson distribution, run Tukey HSD.
      • If distributions are overdispersed, go to c. 
    3. Run negative binomial regression using MASS and compare to other models using AIC and chi squared.
      • If model has better fit than Poisson and explains more than null model, run Tukey HSD.
      • If model does not, discontinue analysis.
  2. Use statistical tests to determine the differences in mosquito count by WNV risk level.
    1. Plot histograms to visualize distributions.
    2. Analyze with the same decision tree as 11.
  3. Use statistical tests to determine the differences in mosquito count by forest type.
    1. Plot histograms to visualize distributions
    2. Analyze with the same decision tree as 11.

\[\\[0.25in]\]

Discussion


Hypotheses

1. Alpha diversity will be highest in sites with low/medium WNV risk and in ecotones.

While alpha diversity (as measured by the Shannon Weiner index) varied significantly by WNV risk category (p < 0.01), no significant differences were found among forest types. The analysis indicates that low and medium WNV risk sites had similar Shannon Weiner indices that were significantly higher than no-risk sites (Figure 4). However, the high-risk sites were not significantly different from any other risk level. This finding may be due to the small sample size of high-risk sites (n=3), or the effect of increased mosquito diversity on WNV ecology. While the relationship between mosquito diversity and WNV risk is initially positive, there is a tipping point after which the correlation becomes negative (Chaves et al., 2011).

Figure 4. Sampling sites grouped by WNV risk and plotted by average Shannon Weiner index.

Figure 4. Sampling sites grouped by WNV risk and plotted by average Shannon Weiner index.

\[\\[0.01in]\]

2. If mosquito communities vary in composition, differences will be greatest among WNV risk categories and least among ecoregions.

As with the first hypothesis, the WNV risk categories showed significant differences in community composition (p = 0.001), but no significant differences were found among forest types or ecoregions. While environmental variables such as forest type and geographic location were expected to have a greater impact, the site selection process may have masked such differences. All sites were selected because they had similar characteristics: early successional forest with a known ruffed grouse population. Although more research would be required to verify this theory: the selection pressures of colonizing a recently disrupted area may drive the site’s vegetation to adopt strategies more similar to other early successional forests than to comparable established forests. If this pressure exists, it is also possible that vegetative features used by mosquitoes, such as stumps, flowers, or muddy tree roots, are less varied among the sampling sites than among comparable mature forests. Future directions for study should include sampling in mature forests, as well as a more detailed survey of plant life. Including species such as reeds, herbaceous flowering plants, and algae may also provide insight into mosquito communities, as these resources are used in different ways by different species.

3. If differences in mosquito communities are found, no single genus or species will drive the separation.

This hypothesis was proven correct, as four species were found to be significant drivers of variation among sites with differing WNV risk levels: Aedes vexans (Ae.vex), Anopheles punctipennis (An.pun), Culiseta melanura (Cs.mel), and Ochlerotatus trivittatus (Oc.triv) (Figures 5,6). These species belong to four different genera and display varied lifestyles, supporting the notion that no single species or “type” of mosquito is responsible for WNV risk. The speculative roles of each species are described further in the “Mosquitoes” tab.

Figure 5. Principal Component Analysis of sampling sites by WNV risk level. Points represent sites while arrows represent mosquito species. The length and direction of the arrows indicates the amount and direction of site variation each species is responsible for. The axes are labeled with the percentage of site variation they can account for.

Figure 5. Principal Component Analysis of sampling sites by WNV risk level. Points represent sites while arrows represent mosquito species. The length and direction of the arrows indicates the amount and direction of site variation each species is responsible for. The axes are labeled with the percentage of site variation they can account for.

Figure 6. Non-metric Multidimensional Scaling analysis of differences among WNV risk levels and the species that drive such variation. Risk levels are represented by ellipses, although not enough high-risk sites were sampled to generate an ellipsis. Only the most significant mosquito species (p=0.001), were included.

Figure 6. Non-metric Multidimensional Scaling analysis of differences among WNV risk levels and the species that drive such variation. Risk levels are represented by ellipses, although not enough high-risk sites were sampled to generate an ellipsis. Only the most significant mosquito species (p=0.001), were included.

\[\\[0.01in]\]

4. As mosquito prevalence increases, WNV risk level will also increase.

Sites with no WNV risk were found to have a lower total than low-risk sites, but no other significant differences were found (Figure 7). This result is difficult to interpret. The finding seems to indicate that there is some threshold of mosquito richness that must be met for WNV to present a danger, but medium/high risk sites did not differ from either low or no risk sites. Looking at the most significant mosquito species may offer some insight. The suspected bridge vectors also showed lower totals in no-risk sites than low-risk sites (Figure 8). This repeated pattern may indicate that a certain density of bridge vectors is required for spillover events to become possible, but complex interactions among hosts, mosquitoes, and the environment determine the likelihood of such events. Fully describing these interactions is outside the scope of this study.

Figure 7. Average total mosquito counts per site per day, grouped by WNV risk level.

Figure 7. Average total mosquito counts per site per day, grouped by WNV risk level.

Figure 8. Average total *Ochlerotatus trivittatus* counts (a.) and *Aedes vexans* (b.) per site per day, grouped by WNV risk level.

Figure 8. Average total Ochlerotatus trivittatus counts (a.) and Aedes vexans (b.) per site per day, grouped by WNV risk level.

\[\\[0.01in]\]

Major caveats:

Many of the methods used in this study displayed red flags of poor explanatory power. The NMDS and PCA used to visualize differences among mosquito communities were in accordance with each other, but fit the data worse than is typically accepted. The PCA created one axis that explained 22.81% of variation and another that only explained 9.3%, leaving the vast majority of variation unaccounted for. Similarly, the NMDS model had a stress level of 0.2444585, which is above the usually-accepted threshold of 0.2. While factors such as sample size and the complexity of WNV ecology could cause these signs to appear in representative models, more complex models that account for random effects, such as Generalized Linear Mixed Models (GLMMs), should be conducted as follow-up. A GLMM could also account for the totals of individual mosquito species. As the negative binomial model did not fit the species count data well, the associations between species populations and WNV risk levels should be double-checked. This study did not involve GLMMs as this work was mostly exploratory and the need for more sophisticated models was not anticipated.

In addition, the mosquito community data have been interpreted with the assumption that the four-day sampling windows in 2023 and 2024 were representative of mosquito diversity and richness for the entire time season However, the striking uniformity of mosquito species richness among sites, regions, and years was unexpected and may indicate that the sampling window was not representative. These mosquito species do not hatch and mature simultaneously; their varied reproductive and overwintering strategies lead to staggered population peaks from late spring to early fall. Regular sampling throughout spring-fall would likely reveal greater spatiotemporal differences.

Mosquitoes

The four mosquito species identified as significant drivers of WNV risk variation can be divided into three categories: bridge vectors, main vectors, and unknowns.

Bridge Vectors

          Bridge vectors are mosquitoes capable of bringing a virus out of it’s sylvatic cycle and exposing other species. The two possible bridge vectors identified in this study, Ochlerotatus trivittatus and Aedes vexans are both primarily mammalophillic, but display unusually aggressive and opportunistic feeding behaviors that could easily expose them to avian viruses (Briegel et al., 2001; Duryea, 1990). While unlikely to be infected, both species are competent vectors of WNV, although other aspects of their biology and behavior indicate very different ecological roles (Tiawsirisup et al., 2004, 2008). Oc. trivittatus is not known to travel far from its hatching location, but it’s willingness to search for bloodmeals at all hours, including in broad daylight, exposes the insect to different host species than crepuscular or night-feeding mosquitoes (Duryea, 1990; Wright & Knight, 1966). This behavior indicates a potential danger to humans visiting its habitat, such as hunters, hikers, and park employees. On the other hand, Ae. vexans is an exceptionally strong flier known to undertake regular host-seeking flights ranging up to 17km and migration flights of over 100km (Briegel et al., 2001). Close relatives of the species are also known to take advantage of weather events such as floods and human activities such as car travel to disperse further (Darsie, 2005; Eritja et al., 2017). An Ae vexans that begins life in an isolated forest could easily acquire a virus and spread it to seemingly unrelated habitats. The relatively low numbers of these species captured at no-risk sites vs. low-risk sites suggests that higher populations may be sufficient to initiate transmission of WNV to humans. However, the lack of significant difference in totals among any other risk level suggests that neither species can create a sustained pattern of WNV spillovers. This finding is not unexpected, as factors such as inter- and intraspecific competition, main vector activity, and host availability greatly influence both species’ chances of WNV exposure.

\[\\[0.01in]\]

Main Vectors

          Although WNV is most often noticed when it affects humans, the virus is poorly adapted for mammalian hosts (Baum, 2008). As a result, most WNV transmission occurs between ornithophillic mosquitoes and their avian hosts (Bialosuknia et al., 2022). This sylvatic cycle is the only way WNV can be maintained and spread, so the mosquito species most responsible for exposing a site’s birds to WNV is considered the site’s main vector.
         While Culex pipiens, a common main WNV vector, was unexpectedly absent from the mosquito collections, the only ornithophillic species driving differences among WNV risk levels can plausibly fill the same role. Culiseta melanura is a mosquito species that ranges across the Eastern third of North America, adapting to diverse climates from the Caribbean to Southern Canada (Darsie, 2005). Cs. melanura plays an almost undisputed role as the main vector of avian arboviruses such as Eastern Equine Encephalitis and Highlands J virus, and may be an important vector of WNV between birds (Armstrong & Andreadis, 2022; McMillan et al., 2020). Bridge vectoring is not suspected, however, as the species displays a consistent preference for avian hosts and isolated, densely forested environments (Armstrong & Andreadis, 2022). Instead, Cs. melanura’s main role in WNV ecology is to maintain the sylavtic cycle. One possible reason for Cs. melanura’s prominence as a driver of WNV risk in New York state is the insect’s exceptional cold tolerance throughout all life stages (Mahmood & Crans, 1998). This adaptation provides WNV with a method to persist throughout the state’s cold winters and spread early in the spring/summer, before less hardy mosquito species have emerged (Andreadis et al., 2012). This consistency provides the ideal conditions for vertebrate hosts to develop significant WNV levels in their blood before being fed upon by other mosquito species (Molaei et al., 2013). The lack of significant difference in total Cs. melanura captured among risk levels points to factors such as host availability and bridge vector activity playing a greater role in spillover events than the species itself.

\[\\[0.01in]\]

Unknowns

          Anopheles punctipennis is a species strongly associated with several arboviruses, but its role in WNV ecology is largely unknown (McMillan et al., 2020). Displaying a strong preference for mammalian bloodmeals, particularly from deer, An. punctipennis should be at low risk of exposure to WNV (Molaei et al., 2008, 2009). Existing data supports this assumption, as the Connecticut survey found no An. punctipennis infected with WNV (McMillan et al., 2020). As a result, no studies of vector competence have been performed with WNV, but the insect has repeatedly been proven incompetent to vector Eastern Equine Encephalitis, the predominantly avian arbovirus closely associated with Cs. melanura (Vaidyanathan et al., 1997).
          As no direct link between An. punctipennis and WNV has been established, the insect may influence risk level through competition or indicate some yet-undetermined environmental factor. Areas with a greater percentage of mammalophillic mosquitoes have been negatively correlated with WNV risk, but this study found that areas with no WNV risk had fewer An.punctipennis, not more (Martínez-de La Puente et al., 2018; Roche et al., 2013). One possible explanation involves the species’ reproductive strategy. The exact numbers vary by geographic location, but An. punctipennis is relatively long-lived, multivoltine, and has a short gonotrophic cycle (Jensen et al., 1998). This strategy serves to generate a large, persistent population which could alter vertebrate host behavior or crowd out other mosquito species that occupy similar niches (Andreadis et al., 2014; Jensen et al., 1998). Some sites may have commonalities involving hosts, oviposition areas, or other resources that foster low WNV exposure in mammal-feeding mosquitoes. An. punctipennis may serve as an indicator for different site characteristics that foster more WNV exposure. Direct competition may also be involved: increasing other mammalophillic species’ willingness to take avian bloodmeals. However, sheer numbers are likely not a major factor as no significant difference was found in total An. punctipennis captured at any risk level. Further research is needed to uncover this species’ true role in WNV ecology.

\[\\[0.01in]\]

Control

          One of this study’s aims was to suggest mosquito control strategies tailored to WNV mitigation in New York State. However, the species uncovered as significant do not provide any easy answers. While mosquitoes that lay eggs in containers of stagnant water or rotting stumps can be targeted by cleanup efforts, the four highlighted species use vernal pools or wetlands: environments with conservation value of their own. Of the four, only Ae. vexans and Oc. trivittatus are known to be major human pests, with control efforts usually following complaints from the public (Duryea, 1990; O’Malley, 1990). Ideally, the next step would involve finding an early warning system of population peaks for these two, which will require season-long sampling. Additional variables such as rainfall may also prove useful, as both species rely on ephemeral water sources for oviposition.

Hosts

          This study predominantly focused on mosquito communities, but they represent only half of the sylvatic cycle. No information can be concluded about the role of vertebrate hosts in WNV ecology from these data, but future study could incorporate data on host populations and WNV seropositivity rates. In addition, when recently-fed mosquitoes are captured, the vertebrate host species can be identified from the bloodmeal using either genetic or serological methods (Reeves et al., 2018). Incorporating data on the most common hosts for each mosquito species could help identify the vertebrates at each site most responsible for maintaining WNV. Populations of these vertebrates may serve as a proxy for WNV risk, allowing communities to assess the resources needed to protect residents from WNV without the up-front cost of establishing a mosquito testing program.
          Additional focus on hosts may also provide a source of funding and public engagement with the science. Knowledge of and attitudes about arbovirus and mosquito control vary widely among the general public, and many communities lack the resources to develop education and outreach programs (Moise et al., 2022). However, the people of New York State already have a cultural and financial bond with ruffed grouse (Ruffed Grouse Hunting - NYSDEC, n.d.). Including hunters as part of WNV ecology provides a novel way to engage with the public and foster collaboration. The overarching study these mosquitoes were collected for is currently using hunter-submitted carcasses and blood samples of ruffed grouse to look at WNV rates and nutritional condition across New York State.

WNV Risk

          While mosquito community composition, overall mosquito numbers, and totals of two mosquito species were found to differ significantly by WNV risk level, these conclusions are preliminary. One glaring issue with the current data is that only three high-risk sites were sampled. These sites did not provide enough data for the NMDS to calculate a centroid, precluding a meaningful comparison with sites at other risk levels. No significant differences were found between high risk sites and any other risk level sites, but it is possible that differences exist that were not captured by the extremely small sample.
          A more systemic issue is that CDC reports are incomplete as a measure of WNV risk. As mentioned in the Data Summary, reported symptomatic cases exclude those with milder symptoms, as such patients are less likely to seek medical attention or be tested for WNV (Santini et al., 2022). In addition, since most cases of WNV have mild symptoms, counties with more case reports may not truly have a higher seropositivity rate. Factors such as population age, average income, and number/location of healthcare facilities are likely also represented in the data for each county. A representative picture of human infection rates in New York State is more likely to be generated through serosurvey: finding WNV antibodies in resident’s blood regardless of symptoms. Directions for further study include continued sampling of mosquitoes, especially in high-risk areas, and serosurvey of human and animal residents of counties containing sampling sites.

Conclusions


          West Nile virus presents a complex challenge for conservation and public health. While no single study could capture the full web of ecological interactions among vectors, WNV, vertebrate hosts, and humans, this preliminary study identified several trends. The lack of significant differences in mosquito communities among ecoregions suggests that high-level characteristics such as latitude do not have as much explanatory power as local, site-level characteristics. However, the limited forest type data collected for this study was likely insufficient to capture environmental differences among sites, as it also did not have much explanatory power. The absence of Cx. pipiens and the relatively minor role of other Cx. species was unexpected and warrants further study. However, the species that drove community dissimilarity were of four different genera, supporting the theory that community dynamics have more of an effect than single species or genera. These four species offer hints to the unique main and bridge vector dynamics of New York State.
          As the differences in total mosquito counts and alpha diversity among WNV risk levels mirror the community dissimilarity driven by suspected bridge vectors, the role of these type of vectors requires more investigation. Future studies integrating mosquito community data with vertebrate and plant community data are more likely to uncover the driving environmental factors. The addition of serosurvey in vertebrate hosts and WNV screening in mosquitoes would provide a more accurate picture of WNV infection than the CDC reports used here. More mosquito sampling is also necessary to link mosquito community features to high WNV risk. While these studies would require significant investment, WNV is worth addressing as a serious threat to human health.
          Humans are often perceived as separate from nature, but no other entity in our ecosystem recognizes this distinction. The viruses that affect native wildlife also affect our health, culture, and economy. As climate change and other anthropogenic activities introduce new threats to our shared ecosystem, humans also have the ability to recognize and alter high-risk environments. This study does not provide easy solutions, but does bolster the idea of expanding WNV mitigation policy. Early indications are that mosquito control should account for bridge vectors and work to anticipate population peaks. While not directly tested by this study, ruffed grouse may serve to unify conservation, epidemiology, and public resources to tackle the complex and nuanced issue of WNV ecology.

References


R Packages

pkgs <- grateful::cite_packages(output = "table", out.dir = ".")
knitr::kable(pkgs)
Package Version Citation
AER 1.2.14 @AER
base 4.4.3 @base
emmeans 1.11.0 @emmeans
ggrepel 0.9.6 @ggrepel
ggsignif 0.6.4 @ggsignif
ggthemes 5.1.0 @ggthemes
knitr 1.50 @knitr2014; @knitr2015; @knitr2025
maps 3.4.2.1 @maps
MASS 7.3.64 @MASS
patchwork 1.3.0 @patchwork
reactable 0.4.4 @reactable
rmarkdown 2.29 @rmarkdown2018; @rmarkdown2020; @rmarkdown2024
rsconnect 1.3.4 @rsconnect
RVAideMemoire 0.9.83.7 @RVAideMemoire
tidyverse 2.0.0 @tidyverse
vegan 2.6.10 @vegan
viridis 0.6.5 @viridis

Citation table generated by grateful:

Rodriguez-Sanchez F, Jackson C (2024). grateful: Facilitate citation of R packages. https://pakillo.github.io/grateful/.

Online sources

Apart from help pages and vignettes for the packages used in this report, the following websites were consulted:

Bartlett, J. (2021, April 2). Deviance goodness of fit test for Poisson regression. The Stats Geek. https://thestatsgeek.com/2014/04/26/deviance-goodness-of-fit-test-for-poisson-regression/

Examples. (n.d.). https://glin.github.io/reactable/articles/examples.html#theming

How to calculate and visualize Z scores in R. (2023, May 15). https://www.codingthepast.com/2023/05/15/Z-score-in-R.html

Lam. (2020, November 24). Non-parametric: differences in multiple groups. https://lamurian.rbind.io/note/biostat-ukrida/09-nonpar-meandiff-2/

Negative Binomial Regression | R Data Analysis Examples. (n.d.). https://stats.oarc.ucla.edu/r/dae/negative-binomial-regression/

R: What is data-masking and why do I need {{? (n.d.). https://search.r-project.org/CRAN/refmans/rlang/html/topic-data-mask.html

RPubs - vegan cheatsheet. (n.d.). https://www.rpubs.com/an-bui/vegan-cheat-sheet

Stackoverflow and the subreddits r/rstats and r/statistics were also used.

No AI was used in the creation of this report.

Literature

Andreadis, T. G., Armstrong, P. M., Anderson, J. F., & Main, A. J. (2014). Spatial-Temporal Analysis of Cache Valley Virus (Bunyaviridae: Orthobunyavirus ) Infection in Anopheline and Culicine Mosquitoes (Diptera: Culicidae) in the Northeastern United States, 1997–2012. Vector-Borne and Zoonotic Diseases, 14(10), 763–773. https://doi.org/10.1089/vbz.2014.1669

Armstrong, P. M., & Andreadis, T. G. (2022). Ecology and Epidemiology of Eastern Equine Encephalitis Virus in the Northeastern United States: An Historical Perspective. Journal of Medical Entomology, 59(1), 1–13. https://doi.org/10.1093/jme/tjab077

Baum, S. G. (2008). Zoonoses-with friends like this, who needs enemies? Transactions of the American Clinical and Climatological Association, 119, 39–51; discussion 51-52.

Bialosuknia, S. M., Dupuis Ii, A. P., Zink, S. D., Koetzner, C. A., Maffei, J. G., Owen, J. C., Landwerlen, H., Kramer, L. D., & Ciota, A. T. (2022). Adaptive evolution of West Nile virus facilitated increased transmissibility and prevalence in New York State. Emerging Microbes & Infections, 11(1), 988–999. https://doi.org/10.1080/22221751.2022.2056521

Briegel, H., Waltert, A., & Kuhn, R. (2001). Reproductive Physiology of Aedes ( Aedimorphus ) vexans (Diptera: Culicidae) in Relation to Flight Potential. Journal of Medical Entomology, 38(4), 557–565. https://doi.org/10.1603/0022-2585-38.4.557

Burkett-Cadena, N. D. (2013). Mosquitoes of the southeastern United States. University of Alabama Press. Cahill, M. E., Yao, Y., Nock, D., Armstrong, P. M., Andreadis, T. G., Diuk-Wasser, M. A., & Montgomery, R. R. (2017). West Nile Virus Seroprevalence, Connecticut, USA, 2000–2014. Emerging Infectious Diseases, 23(4), 708–710. https://doi.org/10.3201/eid2304.161669

Carson, P. J., Borchardt, S. M., Custer, B., Prince, H. E., Dunn-Williams, J., Winkelman, V., Tobler, L., Biggerstaff, B. J., Lanciotti, R., Petersen, L. R., & Busch, M. P. (2012). Neuroinvasive Disease and West Nile Virus Infection, North Dakota, USA, 1999–2008. Emerging Infectious Diseases, 18(4), 684–686. https://doi.org/10.3201/eid1804.111313

CDC. (2025, January 14). Current Year Data (2024). West Nile Virus. https://www.cdc.gov/west-nile-virus/data-maps/current-year-data.html

Chapagain, B. P., & Poudyal, N. C. (2020). Economic benefit of wildlife reintroduction: A case of elk hunting in Tennessee, USA. Journal of Environmental Management, 269, 110808. https://doi.org/10.1016/j.jenvman.2020.110808

Chaves, L. F., Hamer, G. L., Walker, E. D., Brown, W. M., Ruiz, M. O., & Kitron, U. D. (2011). Climatic variability and landscape heterogeneity impact urban mosquito diversity and vector abundance and infection. Ecosphere, 2(6), art70. https://doi.org/10.1890/ES11-00088.1

Darsie, R. F., & Ward, R. A. (with Chang, C. C., & Litwak, T.). (2005). Identification and geographical distribution of the mosquitos of North America, north of Mexico. University Press of Florida.

Downs, C., & Stewart, K. (2014). A primer in ecoimmunology and immunology for wildlife research and management. California Fish and Game, 100(3), 371–395.

Duryea, R. D. (1990). Aedes trivittatus in New Jersey. Proceedings of the New Jersey Mosquito Control Association, 73–78. https://vectorbio.rutgers.edu/outreach/species/sp12.htm

Dusfour, I., & Chaney, S. C. (2021). Mosquito control. In M. Hall & D. Tamïr, Mosquitopia (1st ed., pp. 213–233). Routledge. https://doi.org/10.4324/9781003056034-19

Eidson, M. (2001). Dead Bird Surveillance as an Early Warning System for West Nile Virus. Emerging Infectious Diseases, 7(4), 631–635. https://doi.org/10.3201/eid0704.010405

Eritja, R., Palmer, J. R. B., Roiz, D., Sanpera-Calbet, I., & Bartumeus, F. (2017). Direct Evidence of Adult Aedes albopictus Dispersal by Car. Scientific Reports, 7(1), 14399. https://doi.org/10.1038/s41598-017-12652-5

Findlater, A., & Bogoch, I. I. (2018). Human Mobility and the Global Spread of Infectious Diseases: A Focus on Air Travel. Trends in Parasitology, 34(9), 772–783. https://doi.org/10.1016/j.pt.2018.07.004

George, T. L., Harrigan, R. J., LaManna, J. A., DeSante, D. F., Saracco, J. F., & Smith, T. B. (2015). Persistent impacts of West Nile virus on North American bird populations. Proceedings of the National Academy of Sciences, 112(46), 14290–14294. https://doi.org/10.1073/pnas.1507747112

Gorris, M. E., Bartlow, A. W., Temple, S. D., Romero-Alvarez, D., Shutt, D. P., Fair, J. M., Kaufeld, K. A., Del Valle, S. Y., & Manore, C. A. (2021). Updated distribution maps of predominant Culex mosquitoes across the Americas. Parasites & Vectors, 14(1), 547. https://doi.org/10.1186/s13071-021-05051-3

Heidecke, J., Lavarello Schettini, A., & Rocklöv, J. (2023). West Nile virus eco-epidemiology and climate change. PLOS Climate, 2(5), e0000129. https://doi.org/10.1371/journal.pclm.0000129

Hill, A. P., Prince, P., Snaddon, J. L., Doncaster, C. P., & Rogers, A. (2019). AudioMoth: A low-cost acoustic device for monitoring biodiversity and the environment. HardwareX, 6, e00073. https://doi.org/10.1016/j.ohx.2019.e00073

Howard, J. J., White, D. J., & Muller, S. L. (1989). Mark-Recapture Studies on the Culiseta (Diptera: Culicidae) Vectors of Eastern Equine Encephalitis Virus. Journal of Medical Entomology, 26(3), 190–199. https://doi.org/10.1093/jmedent/26.3.190

Jensen, T., Dritz, D. A., Fritz, G. N., Washino, R. K., & Reeves, W. C. (1998). Lake Vera revisited: Parity and survival rates of Anopheles punctipennis at the site of a malaria outbreak in the Sierra Nevada foothills of California. The American Journal of Tropical Medicine and Hygiene, 59(4), 591–594. https://doi.org/10.4269/ajtmh.1998.59.591

Komar, N., Langevin, S., Hinten, S., Nemeth, N., Edwards, E., Hettler, D., Davis, B., Bowen, R., & Bunning, M. (2003). Experimental infection of North American birds with the New York 1999 strain of West Nile virus. Emerging Infectious Diseases, 9(3), 311–322. https://doi.org/10.3201/eid0903.020628

Martínez-de La Puente, J., Ferraguti, M., Ruiz, S., Roiz, D., Llorente, F., Pérez-Ramírez, E., Jiménez-Clavero, M. Á., Soriguer, R., & Figuerola, J. (2018). Mosquito community influences West Nile virus seroprevalence in wild birds: Implications for the risk of spillover into human populations. Scientific Reports, 8(1), 2599. https://doi.org/10.1038/s41598-018-20825-z

May, F. J., Davis, C. T., Tesh, R. B., & Barrett, A. D. T. (2011). Phylogeography of West Nile Virus: From the Cradle of Evolution in Africa to Eurasia, Australia, and the Americas. Journal of Virology, 85(6), 2964–2974. https://doi.org/10.1128/JVI.01963-10

McMillan, J. R., Armstrong, P. M., & Andreadis, T. G. (2020). Patterns of mosquito and arbovirus community composition and ecological indexes of arboviral risk in the northeast United States. PLOS Neglected Tropical Diseases, 14(2), e0008066. https://doi.org/10.1371/journal.pntd.0008066

Molaei, G., Andreadis, T. G., Armstrong, P. M., & Diuk-Wasser, M. (2008). Host-Feeding Patterns of Potential Mosquito Vectors in Connecticut, USA: Molecular Analysis of Bloodmeals from 23 Species of Aedes, Anopheles, Culex, Coquillettidia, Psorophora, and Uranotaenia. Journal of Medical Entomology, 45(6), 1143–1151. https://doi.org/10.1093/jmedent/45.6.1143

Molaei, G., Andreadis, T. G., Armstrong, P. M., Thomas, M. C., Deschamps, T., Cuebas-Incle, E., Montgomery, W., Osborne, M., Smole, S., Matton, P., Andrews, W., Best, C., Cornine, F., Bidlack, E., & Texeira, T. (2013). Vector-Host Interactions and Epizootiology of Eastern Equine Encephalitis Virus in Massachusetts. Vector-Borne and Zoonotic Diseases, 13(5), 312–323. https://doi.org/10.1089/vbz.2012.1099

Molaei, G., Farajollahi, A., Scott, J. J., Gaugler, R., & Andreadis, T. G. (2009). Human Bloodfeeding by the Recently Introduced Mosquito, Aedes japonicus japonicus, and Public Health Implications. Journal of the American Mosquito Control Association, 25(2), 210–214. https://doi.org/10.2987/09-0012.1

Nemeth, N. M., Bosco-Lauth, A. M., Williams, L. M., Bowen, R. A., & Brown, J. D. (2017). West Nile Virus Infection in Ruffed Grouse ( Bonasa umbellus ): Experimental Infection and Protective Effects of Vaccination. Veterinary Pathology, 54(6), 901–911. https://doi.org/10.1177/0300985817717770

Nemeth, N. M., Williams, L. M., Bosco-Lauth, A. M., Oesterle, P. T., Helwig, M., Bowen, R. A., & Brown, J. D. (2021). WEST NILE VIRUS INFECTION IN RUFFED GROUSE (BONASA UMBELLUS) IN PENNSYLVANIA, USA: A MULTI-YEAR COMPARISON OF STATEWIDE SEROSURVEYS AND VECTOR INDICES. Journal of Wildlife Diseases, 57(1). https://doi.org/10.7589/JWD-D-19-00016

NYSDEC. (n.d.). Young Forest Initiative On Wildlife Management Areas. Retrieved April 6, 2025, from https://dec.ny.gov/places-to-go/wildlife-management-areas/young-forest-initiative

O’Hara, R. B., & Kotze, D. J. (2010). Do not log‐transform count data. Methods in Ecology and Evolution, 1(2), 118–122. https://doi.org/10.1111/j.2041-210X.2010.00021.x

O’Malley, C. (1990). Aedes vexans (Meigan): An Old Foe. Proceedings of the New Jersey Mosquito Control Association, 90–95. https://vectorbio.rutgers.edu/outreach/species/sp13.htm

Owen, J. C., Nakamura, A., Coon, C. A., & Martin, L. B. (2012). The effect of exogenous corticosterone on West Nile virus infection in Northern Cardinals (Cardinalis cardinalis). Veterinary Research, 43(1), 34. https://doi.org/10.1186/1297-9716-43-34

Porter, W. F., & Jarzyna, M. A. (2013). Effects of landscape‐scale forest change on the range contraction of ruffed grouse in New York State, USA. Wildlife Society Bulletin, 37(1), 198–208. https://doi.org/10.1002/wsb.225

Reeves, L. E., Gillett-Kaufman, J. L., Kawahara, A. Y., & Kaufman, P. E. (2018). Barcoding blood meals: New vertebrate-specific primer sets for assigning taxonomic identities to host DNA from mosquito blood meals. PLOS Neglected Tropical Diseases, 12(8), e0006767. https://doi.org/10.1371/journal.pntd.0006767

Riccetti, N., Fasano, A., Ferraccioli, F., Gomez-Ramirez, J., & Stilianakis, N. I. (2022). Host selection and forage ratio in West Nile virus–transmitting Culex mosquitoes: Challenges and knowledge gaps. PLOS Neglected Tropical Diseases, 16(10), e0010819. https://doi.org/10.1371/journal.pntd.0010819

Riley, S. J., Decker, D. J., Enck, J. W., Curtis, P. D., Lauber, T. B., & Brown, T. L. (2003). Deer populations up, hunter populations down: Implications of interdependence of deer and hunter population dynamics on management. Écoscience, 10(4), 455–461. https://doi.org/10.1080/11956860.2003.11682793

Roche, B., Rohani, P., Dobson, A. P., & Guégan, J.-F. (2013). The Impact of Community Organization on Vector-Borne Pathogens. The American Naturalist, 181(1), 1–11. https://doi.org/10.1086/668591

Ronca, S. E., Murray, K. O., & Nolan, M. S. (2019). Cumulative Incidence of West Nile Virus Infection, Continental United States, 1999-2016. Emerging Infectious Diseases, 25(2), 325–327. https://doi.org/10.3201/eid2502.180765

Ronca, S. E., Ruff, J. C., & Murray, K. O. (2021). A 20-year historical review of West Nile virus since its initial emergence in North America: Has West Nile virus become a neglected tropical disease? PLOS Neglected Tropical Diseases, 15(5), e0009190. https://doi.org/10.1371/journal.pntd.0009190

Roy, C. L., Carstensen, M., LaSharr, K., Humpal, C., Dick, T., Kunkel, M., & Nemeth, N. M. (2022). WEST NILE VIRUS EXPOSURE AND INFECTION AMONG HUNTER-HARVESTED RUFFED GROUSE (BONASA UMBELLUS) COHORTS IN A STABLE POPULATION. Journal of Wildlife Diseases, 58(1). https://doi.org/10.7589/JWD-D-21-00018

Ruffed Grouse Hunting—NYSDEC. (n.d.). Retrieved January 29, 2025, from https://dec.ny.gov/things-to-do/hunting/small-game/ruffed-grouse

Santini, M., Haberle, S., Židovec-Lepej, S., Savić, V., Kusulja, M., Papić, N., Višković, K., Župetić, I., Savini, G., Barbić, L., Tabain, I., Kutleša, M., Krajinović, V., Potočnik-Hunjadi, T., Dvorski, E., Butigan, T., Kolaric-Sviben, G., Stevanović, V., Gorenec, L., … Vilibić-Čavlek, T. (2022). Severe West Nile Virus Neuroinvasive Disease: Clinical Characteristics, Short- and Long-Term Outcomes. Pathogens, 11(1), 52. https://doi.org/10.3390/pathogens11010052

Scuderi, B., Olds, E., & Southwick, R. (2024). Hunting in America: An Economic Force for Conservation. Sportsmen’s Alliance Foundation.

Servello, F. A., & Kirkpatrick, R. L. (1987). Regional Variation in the Nutritional Ecology of Ruffed Grouse. The Journal of Wildlife Management, 51(4), 749. https://doi.org/10.2307/3801737

Shahhosseini, N., Frederick, C., Racine, T., Kobinger, G. P., & Wong, G. (2021). Modeling host-feeding preference and molecular systematics of mosquitoes in different ecological niches in Canada. Acta Tropica, 213, 105734. https://doi.org/10.1016/j.actatropica.2020.105734

Shepard, J. J., & Armstrong, P. M. (2023). Jamestown Canyon virus comes into view: Understanding the threat from an underrecognized arbovirus. Journal of Medical Entomology, 60(6), 1242–1251. https://doi.org/10.1093/jme/tjad069

Smithburn, K. C., Hughes, T. P., Burke, A. W., & Paul, J. H. (1940). A Neurotropic Virus Isolated from the Blood of a Native of Uganda. The American Journal of Tropical Medicine, s1-20(4), 471–492. https://doi.org/10.4269/ajtmh.1940.s1-20.471

Stauffer, G. E., Miller, D. A. W., Williams, L. M., & Brown, J. (2018). Ruffed grouse population declines after introduction of West Nile virus. The Journal of Wildlife Management, 82(1), 165–172. https://doi.org/10.1002/jwmg.21347

Stedman, L. (2015). Ruffed Grouse (https://www.flickr.com/photos/49208525@N08/18645551408) [Photography]. Flickr.

Tiawsirisup, S., Kinley, J. R., Tucker, B. J., Evans, R. B., Rowley, W. A., & Platt, K. B. (2008). Vector Competence of Aedes vexans (Diptera: Culicidae) for West Nile Virus and Potential as an Enzootic Vector. Journal of Medical Entomology, 45(3), 452–457. https://doi.org/10.1093/jmedent/45.3.452

Tiawsirisup, S., Platt, K. B., Evans, R. B., & Rowley, W. A. (2004). Susceptibility of Ochlerotatus trivittatus (Coq.), Aedes albopictus (Skuse), and Culex pipiens (L.) to West Nile virus infection. Vector Borne and Zoonotic Diseases (Larchmont, N.Y.), 4(3), 190–197. https://doi.org/10.1089/vbz.2004.4.190

Trani, M., Brooks, R., Schmidt, T., Rudis, V., & Gabbard, C. (2001). Patterns and trends of early successional forests in the Eastern United States. Wildlife Society Bulletin, 29. https://doi.org/10.2307/3784166

Turell, M. J., Dohm, D. J., Sardelis, M. R., O’guinn, M. L., Andreadis, T. G., & Blow, J. A. (2005). An Update on the Potential of North American Mosquitoes (Diptera: Culicidae) to Transmit West Nile Virus. Journal of Medical Entomology, 42(1), 57–62. https://doi.org/10.1093/jmedent/42.1.57

Vaidyanathan, R., Edman, J. D., Cooper, L. A., & Scott, T. W. (1997). Vector Competence of Mosquitoes (Diptera: Culicidae) from Massachusetts for a Sympatric Isolate of Eastern Equine Encephalomyelitis Virus. Journal of Medical Entomology, 34(3), 346–352. https://doi.org/10.1093/jmedent/34.3.346

Weng, S.-C., Masri, R. A., & Akbari, O. S. (2024). Advances and challenges in synthetic biology for mosquito control. Trends in Parasitology, 40(1), 75–88. https://doi.org/10.1016/j.pt.2023.11.001

Results


Expand Results

Setup

Datasets were imported.

moz_data <- read.csv("Mosquito.csv")

County<-read.csv("County.csv")

Pretty_Names<-read.csv("Pretty_Names.csv")

Libraries were loaded.

#Tidyverse
library(rlang)
library(dplyr)
library(knitr)
#Figure Design
library(ggplot2)
library(viridis)
library(maps)
library(ggthemes)
library(ggsignif)
library(ggrepel)
library(patchwork)
#Tables
library(reactable)
#Community Ecology
library(vegan)
#Statistics
library(MASS)
library(AER)
library(RVAideMemoire)
library(emmeans)
#Bibliography management
library(grateful)

A WNV risk category column was then calculated for the CDC data, all categorical columns were changed into factors, and the CDC data was merged with the Mosquito data.

County$WNV<-ifelse(County$Risk_Total < 1, "None", 
                   ifelse(County$Risk_Total <4, "Low", 
                          ifelse(County$Risk_Total <9, "Medium", "High")))

Row and column names were then created in the merged dataset for better organization.

The were not many NAs in the project, only two rows in one year and one site where the traps got knocked over. These NAs were removed.
moz_data$PerYear <- paste(moz_data$Site, moz_data$Year)
.rowNamesDF(moz_data, make.names=FALSE) <- moz_data$ID
moz_data[,1] <- NULL
moz_data = merge(moz_data, County[,c("Site","WNV")], by="Site")
moz_data$WNV <- factor(moz_data$WNV , levels=c("None", "Low", "Medium", "High"))
moz_data<- moz_data %>%
  relocate(WNV, .before = Ae.vex)
moz_data<-na.omit(moz_data)

Smush is a custom function for condensing count data into totals for a given category.

If I manually typed this out whenever I wanted to “smush” a dataset, dplyr would look at column names within the dataset, and treat the columns like a normal object. However, making a function in base R conflicts with dplyr’s style of data masking, confusing dplyr about where to find the column and how to use it. Making a Smush function requires {{}} to guide dplyr to the column names and tell it to hang onto them until receiving the full instructions. In this case, Smush was used to generate a table of species counts per site per year to fit the expected format of subsequent vegan functions.
Smush <- function(dataset, col_name){
dataset %>%
group_by({{col_name}}) %>%
summarise_if(is.integer, sum) %>%
distinct()
}

Creating data frames for diversity analysis

SpecMoz contains totals of each species per site per year without any site attributes, while site_data contains totals in the same format but with relevant site data still intact. This separation was necessary for several of the subsequent vegan funtions. I attempted for over a month to make a version of Smush that could handle multiple column names, but could not get it to work.
SpecMoz<-Smush(moz_data, moz_data$PerYear)
SpecMoz <- as.data.frame(SpecMoz, row.names = NULL, optional = FALSE)
.rowNamesDF(SpecMoz, make.names=FALSE) <- SpecMoz$`moz_data$PerYear`
SpecMoz[,1] <- NULL


site_data <-(
  moz_data %>%
  group_by(moz_data$PerYear,moz_data$Site,moz_data$Forest.Type, 
           moz_data$Year, moz_data$Region, moz_data$WNV) %>%
  summarise_if(is.integer, sum) %>%
  distinct())
site_data <- as.data.frame(site_data, row.names = NULL, optional = FALSE)
.rowNamesDF(site_data, make.names=FALSE) <- site_data$`moz_data$PerYear`
site_data[,1] <- NULL

This is what the mosquito count data (SpecMoz) looks like post-Smush. For the original table, see the Data Tables tab under Data Summary

reactable(SpecMoz, 
          bordered = TRUE,
          resizable = TRUE,
          wrap = FALSE
          )

Descriptive Figures and Tables

Figure 2 was created in two parts. The first step was to categorize counties by DEC ecoregions.

While there are probably more efficient ways to accomplish this task, I was having trouble understanding %in% and wanted to practice.
NYcounty<-subset(map_data("county"), region == "new york")
NYcounty$ecoregion = vector(mode='character', length=length(NYcounty$subregion))
NYcounty$ecoregion[NYcounty$subregion %in% c('nassau', 'suffolk')] <- '1'

NYcounty$ecoregion[NYcounty$subregion %in% c('bronx', 'kings', 'new york',
                                             'queens', 'richmond')] <- '2'

NYcounty$ecoregion[NYcounty$subregion %in% c('dutchess','orange','putnam','rockland',
                                             'sullivan','ulster','westchester')] <- '3'

NYcounty$ecoregion[NYcounty$subregion %in% c('albany','columbia','delaware','greene',
                                             'montgomery','otsego','rensselaer','schenectady',
                                             'schoharie')] <- '4'

NYcounty$ecoregion[NYcounty$subregion %in% c('clinton','essex','franklin','fulton',
                                             'hamilton','saratoga','warren',
                                             'washington')] <- '5'

NYcounty$ecoregion[NYcounty$subregion %in% c('herkimer','jefferson',
                                             'lewis','oneida','st lawrence')] <- '6'

NYcounty$ecoregion[NYcounty$subregion %in% c('broome','cayuga','chenango','cortland',
                                             'madison','onondaga','oswego','tioga',
                                             'tompkins')] <- '7'

NYcounty$ecoregion[NYcounty$subregion %in% c('chemung','genesee','livingston','monroe',
                                             'ontario','orleans','schuyler',
                                             'seneca','steuben','wayne','yates')] <- '8'

NYcounty$ecoregion[NYcounty$subregion %in% c('allegany','cattaraugus','chautauqua',
                                             'erie','niagara','wyoming')] <- '9'


NYmap<-ggplot(data = NYcounty, aes(x=long, y = lat, group = group)) +
coord_quickmap() +
theme_void()

The second step was to plot the 35 sampling sites on the previously created map of New York State with DEC ecoregions color coded.

NYcol<-c("#88CCEE","#CC6677","#DDCC77","#117733","#AA4499","#44AA99","#882255")

NYFIG<-NYmap + 
  theme_void() + 
  geom_polygon(data = NYcounty, aes(x=long, y = lat, fill = ecoregion), color = "white") +
scale_fill_manual(values =c("dimgray","lightgray",NYcol), name= "DEC Ecoregion", 
labels = c("1 - Long Island", "2 - New York City", "3 - Lower Hudson Valley", 
"4 - Capital Region/Northern Catskills", "5 - Eastern Adirondacks/Lake Champlain", 
"6 - Western Adirondacks/Eastern Lake Ontario","7 - Central New York", 
"8 - Western Finger Lakes", "9 - Western New York"))+
theme(legend.position = "right")+
 geom_polygon(color = "black", fill = NA)   + 
geom_point(data=County, aes(x=Longitude, y=Latitude), 
color="black", fill="white", pch=21, size=3, inherit.aes=FALSE)

Choropleth maps of WNV cases and PVDs in New York State were then created for Figure 3.

The CDC reports WNV data by county for NY state. Assigning WNV risk level to each site meant that, since some counties contained multiple sites, this project’s CDC dataset contained some duplicate values. Smush was used to remove these duplicates.
deduped.County <- distinct(County, county, .keep_all = TRUE)
deduped.County<-Smush(deduped.County, ecoregion)
deduped.County$ecoregion<-as.character(deduped.County$ecoregion)

Virus <- inner_join(NYcounty, deduped.County, by = "ecoregion")
VirusMap<-ggplot(data = Virus, aes(x=long, y = lat, group = group)) +
coord_quickmap() +
theme_void()

The maps were then created and put together using patchwork.

PVDFIG<-VirusMap + 
  theme_void() + 
  geom_polygon(data = Virus, aes(x=long, y = lat), color = "white") +
geom_polygon(data = Virus, aes(fill = PVD))+
theme(legend.position = "right")+
scale_fill_viridis(option="rocket", begin = 0.9, end = 0.1, name="Presumed Viremic Donors")+
 geom_polygon(color = "black", fill = NA)   + 
guides(colour = guide_colourbar(order = 1),alpha = guide_legend(order = 2),
       size = guide_legend(order = 3))+
  ggtitle("a.") +
  theme(plot.title.position = "plot", plot.title = element_text(size = 20, face = "bold"))

CASFIG<-VirusMap + 
  theme_void() + 
  geom_polygon(data = Virus, aes(x=long, y = lat), color = "white") +
geom_polygon(data = Virus, aes(fill = Total_Case))+
theme(legend.position = "right")+
scale_fill_viridis(option="mako", begin = 0.9, end = 0.1, name="Symptomatic Cases")+
 geom_polygon(color = "black", fill = NA)   + 
guides(colour = guide_colourbar(order = 1),alpha = guide_legend(order = 2),
       size = guide_legend(order = 3))+
    ggtitle("b.") +
  theme(plot.title.position = "plot", plot.title = element_text(size = 20, face = "bold"))

Table 3 did not require many changes from the Pretty Names dataset.

Totals and percents for each mosquito species were calculated and the names were cleaned up for display in a reactable.
Percent<- function(x){
  y<-((x/sum(x)*100))
  format(round(y, digits=2))
}


Pretty_Names$Host.Type<-as.factor(Pretty_Names$Host.Type)
Pretty_Names$Abbreviation<-as.factor(Pretty_Names$Abbreviation)
Pretty_Names$Generations.per.Season<-as.factor(Pretty_Names$Generations.per.Season)
Pretty_Names$Short<-as.factor(Pretty_Names$Short)

Pretty_Names$Totals<-colSums(moz_data[,10:34])
Pretty_Names$Percent<-Percent(Pretty_Names$Totals)
Pretty_Names$Totals<-as.numeric(Pretty_Names$Totals)
Pretty_Names$Percent<-as.numeric(Pretty_Names$Percent)

Pretty_Names <-(Pretty_Names %>%
arrange(-Pretty_Names$Percent))

Presentable_Names <-(Pretty_Names %>%
rename("Host Type" = `Host.Type`) %>%
rename("Generations per Season" = `Generations.per.Season`))

\[\\[0.03in]\]

Diversity Analysis

This section was mainly guided by the RPubs Vegan tutorial “Vegan cheat sheet” by An Bui.

Alpha diversity includes both the number of species and their relative abundance.

The number of species per site was assessed with specnumber and revealed a uniform count of 25 species. No further testing of ecoregion, forest type, or WNV risk category was necessary.
##                           Df    Sum Sq   Mean Sq F value Pr(>F)
## site_data$`moz_data$Site` 34 2.019e-26 5.938e-28       1  0.499
## Residuals                 35 2.078e-26 5.938e-28
Taking relative abundance into account using the Shannon Weiner index (diversity) revealed slight differences among sites and forest types but significant differences among WNV risk categories.
Site
##                           Df    Sum Sq   Mean Sq F value Pr(>F)  
## site_data$`moz_data$Site` 34 0.0008728 2.567e-05   1.559  0.098 .
## Residuals                 35 0.0005762 1.646e-05                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Forest Type
##                                  Df    Sum Sq   Mean Sq F value Pr(>F)  
## site_data$`moz_data$Forest.Type`  5 0.0002142 4.285e-05   2.221 0.0629 .
## Residuals                        64 0.0012348 1.929e-05                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WNV Risk Category
##                          Df    Sum Sq   Mean Sq F value  Pr(>F)   
## site_data$`moz_data$WNV`  3 0.0003032 1.011e-04   5.821 0.00136 **
## Residuals                66 0.0011459 1.736e-05                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A Tukey HSD test revealed the specific significant differences among WNV risk categories and a boxplot was made of the Shannon Weiner indices.
TukeyHSD(aov(diversity(SpecMoz) ~ site_data$`moz_data$WNV`, data = site_data))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = diversity(SpecMoz) ~ site_data$`moz_data$WNV`, data = site_data)
## 
## $`site_data$`moz_data$WNV``
##                      diff           lwr         upr     p adj
## Low-None     0.0041574833  0.0010747993 0.007240167 0.0038465
## Medium-None  0.0045103806  0.0007591592 0.008261602 0.0121216
## High-None    0.0009257985 -0.0039857033 0.005837300 0.9595171
## Medium-Low   0.0003528973 -0.0035883783 0.004294173 0.9953284
## High-Low    -0.0032316848 -0.0082898308 0.001826461 0.3402383
## High-Medium -0.0035845821 -0.0090758081 0.001906644 0.3213817
ggplot(SpecMoz, aes(x =site_data$`moz_data$WNV`, y = diversity(SpecMoz), fill = site_data$`moz_data$WNV`)) + 
  geom_boxplot() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5), axis.title=element_text(size=14,face="bold"), axis.text=element_text(size=12)) + 
  labs(x = "Risk Level", y = "Shannon-Weiner Index", title = NULL) +
  scale_fill_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  Transparent+
  geom_signif(annotations ="**", y_position = c(3.214), xmin = c("None"), xmax =c("Low"))+
  geom_signif(annotations ="*", y_position = c(3.216), xmin = c("None"), xmax =c("Medium"))

The estimated number of unsampled species was calculated for the overall dataset using specpool and for each site per year using estimateR.

Both indicated that the 25 species sampled at each site each year represented every species at the site, with none going undetected.

Next, adonis2 was used to assess community dissimilarity.

adonis2 is essentially a perMANOVA test that places each site’s mosquito community in ordination space based on synthetic axes that explain the maximum amount of difference among them. These mosquito communities are then grouped by a supplied variable, then these groups are compared. If the center of the groups in ordination space are close together, the communities are not significantly different from one another based on the supplied variable.
As the sites have appeared strikingly uniform in the past tests, the first adonis2 groups communities by Site to determine if any differences exist between communities at all.
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = SpecMoz ~ site_data$`moz_data$Site`, data = site_data)
##          Df SumOfSqs      R2      F Pr(>F)  
## Model    34  0.15524 0.53566 1.1875  0.015 *
## Residual 35  0.13457 0.46434                
## Total    69  0.28982 1.00000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Given that significant differences were found, ecoregion, forest type, and WNV risk category were also tested, with only WNV risk indicating significant differences.
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = SpecMoz ~ site_data$`moz_data$WNV`, data = site_data)
##          Df SumOfSqs      R2      F Pr(>F)    
## Model     3 0.024186 0.08345 2.0031  0.001 ***
## Residual 66 0.265632 0.91655                  
## Total    69 0.289818 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Seeing the details of how these WNV risk categories differ requires Principal Component Analysis (PCA).

The first step was to use Smush to create a pared-down version of site_data with no mosquito counts and a new version of SpecMoz that contains mosquito data per site, not per site and year.

The artificial axes that make up the ordination space were then constructed and WNV risk categories were added to allow sites to be grouped in later analysis.

## Call: rda(X = SiteDF)
## 
## -- Model Summary --
## 
##               Inertia Rank
## Total          855416     
## Unconstrained  855416   25
## 
## Inertia is variance
## 
## -- Eigenvalues --
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8 
## 195100  79602  75146  63308  59437  51839  50604  44527 
## (Showing 8 of 25 unconstrained eigenvalues)

The eigenvalues produced by the PCA reveal what proportion of the variation among sites can be explained by each axis.

summary(eigenvals(PCA))
## Importance of components:
##                             PC1       PC2       PC3       PC4       PC5
## Eigenvalue            1.951e+05 7.960e+04 7.515e+04 6.331e+04 5.944e+04
## Proportion Explained  2.281e-01 9.306e-02 8.785e-02 7.401e-02 6.948e-02
## Cumulative Proportion 2.281e-01 3.211e-01 4.090e-01 4.830e-01 5.525e-01
##                             PC6       PC7       PC8       PC9      PC10
## Eigenvalue            5.184e+04 5.060e+04 4.453e+04 3.945e+04 3.253e+04
## Proportion Explained  6.060e-02 5.916e-02 5.205e-02 4.611e-02 3.803e-02
## Cumulative Proportion 6.131e-01 6.722e-01 7.243e-01 7.704e-01 8.084e-01
##                            PC11      PC12      PC13      PC14      PC15
## Eigenvalue            3.066e+04 2.304e+04 1.910e+04 1.863e+04 1.565e+04
## Proportion Explained  3.585e-02 2.694e-02 2.233e-02 2.177e-02 1.829e-02
## Cumulative Proportion 8.443e-01 8.712e-01 8.935e-01 9.153e-01 9.336e-01
##                            PC16      PC17      PC18      PC19      PC20
## Eigenvalue            1.370e+04 1.064e+04 8.172e+03 7.403e+03 5.563e+03
## Proportion Explained  1.601e-02 1.244e-02 9.554e-03 8.655e-03 6.503e-03
## Cumulative Proportion 9.496e-01 9.621e-01 9.716e-01 9.803e-01 9.868e-01
##                            PC21      PC22      PC23      PC24      PC25
## Eigenvalue            3.959e+03 3.151e+03 2.531e+03 1.167e+03 5.145e+02
## Proportion Explained  4.628e-03 3.684e-03 2.959e-03 1.364e-03 6.015e-04
## Cumulative Proportion 9.914e-01 9.951e-01 9.980e-01 9.994e-01 1.000e+00
Moving the decimal point two places to the left gives that proportion as a percent. In this case, the X axis (PC1) explains 22.81% of the variation while the Y axis (PC2) explains 9.3%.

Species can then be plotted as arrows on the same ordination space to determine their effect on community dissimilarity.

PCAvect <- scores(PCA, display = "species") %>% 
  as.data.frame()
plot_PCA <- ggplot() +
  geom_point(data = PCAscores, size=3, aes(x = PC1, y = PC2, color=WNV, shape=WNV)) +
  scale_color_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  geom_vline(xintercept = c(0), color = "grey70", linetype = 1) +
  geom_hline(yintercept = c(0), color = "grey70", linetype = 1) +
  geom_segment(data = PCAvect, aes(x = 0, y = 0, xend = PC1, yend = PC2), 
               arrow = arrow(length = unit(0.2, "cm"), type = "closed")) +
 geom_text_repel(data = PCAvect, max.overlaps = 20, 
                 aes(x = PC1, y = PC2, label = rownames(PCAvect))) +
Transparent +
labs(x = "PC1 (22.81%)",
     y = "PC2 (9.3%)",
 title = "Principal Components Analysis by WNV Risk") +
  guides(color = guide_legend(title = "Risk Level"), 
         shape= guide_legend(title = "Risk Level"))
plot_PCA

This plot reveals that most of the mosquito species do not have much of an effect on community dissimilarity, but four stand out: Aedes vexans, Anopheles punctipennis, Culiseta melanura, and Ochlerotatus trivittatus.

The direction of the arrows in relation to the sites hints that they may be less present in sites with no WNV risk. The length of the arrows indicates that they have more of an effect on community dissimilarity than the species with shorter arrows.

While the PCA plot is intriguing, it is, admittedly, unpleasant to look at.

Non-metric Multidimensional Scaling (NMDS) provides a way to reduce the number of axes down to two, simplifying the ordination into something easier to interpret. This model works by first plotting each site randomly on an arbitrary set of two axes. It then starts rearranging the points in an iterative fashion with the goal of representing “real” differences among sites in ordination space. Similar mosquito communities should be close and dissimilar communities should be far apart.
Moz_NMDS <- metaMDS(SiteDF)

Before viewing the NMDS model, checking the stress provides an estimate of how well the model represents the data.

The line on the plot represents the model and the circles represent the communities. The distance between the circles and the line shows how far away each point is from being represented accurately in the ordination. This discordance is called stress. A stress value below 0.2 is ideal, but not always achievable for large datasets. Increasing complexity means more tradeoffs, so points are less likely to be represented perfectly. A stress of 0.3 or above indicates that the model is highly suspect.
stressplot(Moz_NMDS, p.col = "#8C9DAD", l.col = "#1b140e")

Moz_NMDS
## 
## Call:
## metaMDS(comm = SiteDF) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(SiteDF)) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.2444585 
## Stress type 1, weak ties
## Best solution was repeated 1 time in 20 tries
## The best solution was from try 12 (random start)
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'wisconsin(sqrt(SiteDF))'
The stress is above 0.2, but below 0.3. The results were plotted, but interpreted with appropriate caution.
Moz_Stress <- scores(Moz_NMDS, display = "sites") %>% 
  as.data.frame() %>% 
  tibble::rownames_to_column("Site") %>% 
  full_join(SiteforPCA, by = "Site")

plot_nmds <- ggplot(Moz_Stress, aes(x = NMDS1, y = NMDS2, color = WNV, shape = WNV)) +
  geom_point(size = 3) +
  stat_ellipse(linetype = 2, linewidth = 1) +
  scale_color_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  Transparent +
  labs(x = "NMDS X", y = "NMDS Y", title = NULL) +
  guides(color = guide_legend(title = "Risk Level"), shape= guide_legend(title = "Risk Level"))

There are not enough high WNV risk sites to calculate an ellipse.

Adding significant species to the ordination space may still be useful for visualization. The envfit function allows the species count data to be added to the ordination space using the same process as the first NMDS. In this case, the model had 999 tries to find the best placement for the species.
fit <- envfit(Moz_NMDS, SiteDF, perm = 999) 

The species were then filtered to only include those that have a significant impact on community dissimilarity (p value = 0.001) and represented as arrows similar to those in the PCA plot.

Note: the plot below was slightly altered from the original output to ensure that all parts were easily visible. The NMDS1 and NMDS2 axis values used by the envfit function were reduced by a constant factor (0.65, 0.2), which did not alter the direction, but did bring them closer to the site ellipses. The code below will produce the original figure.

The effect of those four species is agreed upon by the PCA and NMDS models.

However, it is hard to see from the plot what differences among WNV risk levels they are responsible for. Further statistical tests of these species and their relationship with WNV risk level may shed light on their role.

\[\\[0.03in]\]

Effect of Mosquito Species

The first species in question is Ochlerotatus trivittatus as it was the least plentiful species overall

Checking for normality is the first step towards satisfying the assumptions of an ANOVA. The Interpret function is a simple custom function for printing mean and standard deviation in a way that is easy (for me) to interpret. While eyeballing the means and standard deviations can reveal some issues, I also used byf.shapiro from RVAideMemoire to be sure.
Interpret<- function(x) {
sprintf("Mean (Std. Dev) = %1.2f (%1.2f)", mean(x), sd(x))
}

with(moz_data, tapply(moz_data$Oc.triv,moz_data$WNV, Interpret))
##                              None                               Low 
## "Mean (Std. Dev) = 37.33 (32.45)" "Mean (Std. Dev) = 47.11 (31.09)" 
##                            Medium                              High 
## "Mean (Std. Dev) = 44.15 (31.39)" "Mean (Std. Dev) = 45.00 (32.30)"
byf.shapiro(moz_data$Oc.triv~moz_data$WNV,data=moz_data)
## 
##  Shapiro-Wilk normality tests
## 
## data:  moz_data$Oc.triv by moz_data$WNV 
## 
##             W   p-value    
## None   0.8978 < 2.2e-16 ***
## Low    0.9405 1.112e-10 ***
## Medium 0.9362 2.017e-07 ***
## High   0.9158 1.236e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The counts of Oc.trivittatus per site do not appear to be normally distributed. While transformation is usually the next step, as O’Hara and Kotze said:

“DO NOT LOG TRANSFORM COUNT DATA!”

A poisson regression was next on the list as it is often used for count data.
TrivFish<-glm(moz_data$Oc.triv ~ moz_data$WNV, family = poisson)
summary(TrivFish)
## 
## Call:
## glm(formula = moz_data$Oc.triv ~ moz_data$WNV, family = poisson)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        3.619775   0.007471  484.54   <2e-16 ***
## moz_data$WNVLow    0.232788   0.010775   21.60   <2e-16 ***
## moz_data$WNVMedium 0.167878   0.013229   12.69   <2e-16 ***
## moz_data$WNVHigh   0.186888   0.016950   11.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 33421  on 1117  degrees of freedom
## Residual deviance: 32912  on 1114  degrees of freedom
## AIC: 38229
## 
## Number of Fisher Scoring iterations: 5

The results are interesting, but does the model fit the data? A dispersiontest from AER is next.

dispersiontest(TrivFish,trafo=1)
## 
##  Overdispersion test
## 
## data:  TrivFish
## z = 36.714, p-value < 2.2e-16
## alternative hypothesis: true alpha is greater than 0
## sample estimates:
##    alpha 
## 23.27223

The answer is a resounding “No.”

The poisson model assumes an equal mean and variance, but the model has strong evidence of overdispersion, with a dispersion constant over 20 and a p-value well below 0.001. A negative binomial regression accounts for overdispersion, so it was the next port of call. This model was created using MASS.
TrivBin<-glm.nb(moz_data$Oc.triv ~ moz_data$WNV)
summary(TrivBin)
## 
## Call:
## glm.nb(formula = moz_data$Oc.triv ~ moz_data$WNV, init.theta = 0.7888318081, 
##     link = log)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         3.61977    0.05193  69.703  < 2e-16 ***
## moz_data$WNVLow     0.23279    0.07974   2.919  0.00351 ** 
## moz_data$WNVMedium  0.16788    0.09741   1.723  0.08480 .  
## moz_data$WNVHigh    0.18689    0.12702   1.471  0.14120    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.7888) family taken to be 1)
## 
##     Null deviance: 1353.5  on 1117  degrees of freedom
## Residual deviance: 1344.1  on 1114  degrees of freedom
## AIC: 10597
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.7888 
##           Std. Err.:  0.0334 
## 
##  2 x log-likelihood:  -10587.1310

The Akaike Information Criterion (AIC) for this model was quite a bit lower than the poisson regression, which is a sign that the model is a better fit for the data.

However, determining whether the model explains the effect of WNV risk level on Oc. trivittatus counts requires comparison with a null model.
TrivNull<-glm.nb(moz_data$Oc.triv ~ 1)
summary(TrivNull)
## 
## Call:
## glm.nb(formula = moz_data$Oc.triv ~ 1, init.theta = 0.7823762308, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.74309    0.03412   109.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.7824) family taken to be 1)
## 
##     Null deviance: 1344.1  on 1117  degrees of freedom
## Residual deviance: 1344.1  on 1117  degrees of freedom
## AIC: 10601
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.7824 
##           Std. Err.:  0.0330 
## 
##  2 x log-likelihood:  -10596.5100
anova(TrivBin,TrivNull)
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: moz_data$Oc.triv
##          Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
## 1            1 0.7823762      1117       -10596.51                      
## 2 moz_data$WNV 0.7888318      1114       -10587.13 1 vs 2     3 9.379692
##      Pr(Chi)
## 1           
## 2 0.02464629

The WNV risk model modestly outperformed the null model in terms of AIC but slightly under-performed in log-likelihood.

An anova comparing the two models revealed a significant difference (<0.05), so WNV risk does have some explanatory power, but the deviance goodness-of fit can also be tested directly for each model.
pchisq(deviance(TrivBin),df.residual(TrivBin),lower=FALSE)
## [1] 2.234498e-06
pchisq(deviance(TrivNull),df.residual(TrivNull),lower=FALSE)
## [1] 2.997222e-06

The null hypothesis is that the model represents the data well.

The low p values for both models suggest that Oc. trivittus count does not follow a negative binomial distribution. A zero-inflated model does not seem warranted as there are few zeroes in the entire dataset. Does adding forest type help the model represent the data?
TrivFor<-glm.nb(moz_data$Oc.triv ~ moz_data$WNV + moz_data$Forest.Type)
summary(TrivFor)
## 
## Call:
## glm.nb(formula = moz_data$Oc.triv ~ moz_data$WNV + moz_data$Forest.Type, 
##     init.theta = 0.7916109915, link = log)
## 
## Coefficients:
##                                                   Estimate Std. Error z value
## (Intercept)                                        3.43638    0.16789  20.468
## moz_data$WNVLow                                    0.28517    0.08984   3.174
## moz_data$WNVMedium                                 0.10348    0.10588   0.977
## moz_data$WNVHigh                                   0.20117    0.12883   1.561
## moz_data$Forest.TypeConiferous_Mixed Oak           0.42156    0.27284   1.545
## moz_data$Forest.TypeConiferous_Northern Hardwoods  0.20955    0.17511   1.197
## moz_data$Forest.TypeMixed Oak                      0.06108    0.26194   0.233
## moz_data$Forest.TypeMixed Oak_Northern Hardwoods   0.31067    0.20287   1.531
## moz_data$Forest.TypeNorthern Hardwoods             0.15061    0.15707   0.959
##                                                   Pr(>|z|)    
## (Intercept)                                         <2e-16 ***
## moz_data$WNVLow                                     0.0015 ** 
## moz_data$WNVMedium                                  0.3284    
## moz_data$WNVHigh                                    0.1184    
## moz_data$Forest.TypeConiferous_Mixed Oak            0.1223    
## moz_data$Forest.TypeConiferous_Northern Hardwoods   0.2314    
## moz_data$Forest.TypeMixed Oak                       0.8156    
## moz_data$Forest.TypeMixed Oak_Northern Hardwoods    0.1257    
## moz_data$Forest.TypeNorthern Hardwoods              0.3376    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.7916) family taken to be 1)
## 
##     Null deviance: 1357.6  on 1117  degrees of freedom
## Residual deviance: 1344.1  on 1109  degrees of freedom
## AIC: 10603
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.7916 
##           Std. Err.:  0.0335 
## 
##  2 x log-likelihood:  -10583.1280
pchisq(deviance(TrivFor),df.residual(TrivFor),lower=FALSE)
## [1] 1.358016e-06

The model that includes forest type has an even higher AIC than the null model.

A different model such as a Generalized Linear Mixed Model may be needed, but since the ordination itself was not an ideal fit, the negative binomial model will be accepted as preliminary but suggestive A significant difference among groups was detected, so a Tukey HSD test to reveal significant comparisons between groups was warranted.
TrivBin$WNV<-TrivBin$moz_data$WNV


pairwise1 = emmeans(TrivBin, ~ WNV)

pairs(pairwise1, adjust="tukey")
##  contrast      estimate     SE  df z.ratio p.value
##  None - Low     -0.2328 0.0797 Inf  -2.919  0.0184
##  None - Medium  -0.1679 0.0974 Inf  -1.723  0.3114
##  None - High    -0.1869 0.1270 Inf  -1.471  0.4549
##  Low - Medium    0.0649 0.1020 Inf   0.635  0.9208
##  Low - High      0.0459 0.1310 Inf   0.351  0.9852
##  Medium - High  -0.0190 0.1420 Inf  -0.134  0.9991
## 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

The difference in Oc. trivittatus count among WNV risk levels is not as dramatic as the NMDS suggested, but is in alignment with the small but noticable differences suggested by the PCA.

ggplot(moz_data, aes(x = moz_data$WNV, y = moz_data$Oc.triv, fill = moz_data$WNV)) + 
  geom_boxplot() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5), axis.title=element_text(size=14,face="bold"), axis.text=element_text(size=12)) + 
  labs(x = "Risk Level", y = "Oc. trivittatus Counts", title = NULL) +
  scale_fill_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  Transparent +
  geom_signif(annotations ="*", y_position = c(110), xmin = c("None"), xmax =c("Low"))

The next three species followed similar patterns in terms of data structure

None of them were normally distributed and all exhibited overdispersion. Both a poisson and negaive binomial regression were conducted and the negative binomial regression always has a much lower AIC. Chi-square tests always revealed a small but significant amount of explanatory power over the null hypothesis, but a poor fit nonetheless.

As seen below, no significant differences were observed for Anopheles punctipennis.

pairs(pairwise1, adjust="tukey")
##  contrast      estimate     SE  df z.ratio p.value
##  None - Low     -0.1974 0.0789 Inf  -2.502  0.0596
##  None - Medium  -0.2308 0.0963 Inf  -2.396  0.0778
##  None - High    -0.1820 0.1260 Inf  -1.449  0.4687
##  Low - Medium   -0.0334 0.1010 Inf  -0.330  0.9876
##  Low - High      0.0153 0.1290 Inf   0.119  0.9994
##  Medium - High   0.0487 0.1410 Inf   0.346  0.9857
## 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

Aedes vexans exhibited a similar pattern to Oc. trivittatus.

pairs(pairwise1, adjust="tukey")
##  contrast      estimate     SE  df z.ratio p.value
##  None - Low    -0.21142 0.0783 Inf  -2.702  0.0348
##  None - Medium -0.21250 0.0956 Inf  -2.224  0.1167
##  None - High   -0.16789 0.1250 Inf  -1.347  0.5329
##  Low - Medium  -0.00108 0.1000 Inf  -0.011  1.0000
##  Low - High     0.04353 0.1280 Inf   0.339  0.9866
##  Medium - High  0.04461 0.1400 Inf   0.320  0.9887
## 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates
ggplot(moz_data, aes(x = moz_data$WNV, y = moz_data$Ae.vex, fill = moz_data$WNV)) + 
  geom_boxplot() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5), axis.title=element_text(size=14,face="bold"), axis.text=element_text(size=12)) + 
  labs(x = "Risk Level", y = "Ae. vexans Counts", title = NULL) +
  scale_fill_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  Transparent +
  geom_signif(annotations ="*", y_position = c(110), xmin = c("None"), xmax =c("Low"))

Culiseta melanura did not have any significant differences among WNV risk levels

pairs(pairwise1, adjust="tukey")
##  contrast      estimate     SE  df z.ratio p.value
##  None - Low     -0.1793 0.0779 Inf  -2.301  0.0977
##  None - Medium  -0.2205 0.0951 Inf  -2.318  0.0939
##  None - High     0.0391 0.1240 Inf   0.315  0.9892
##  Low - Medium   -0.0412 0.0998 Inf  -0.413  0.9763
##  Low - High      0.2184 0.1280 Inf   1.707  0.3200
##  Medium - High   0.2596 0.1390 Inf   1.866  0.2427
## 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

Individual species do not seem to have much of an effect on WNV risk level. What about total mosquito counts?

Effect of Mosquito Totals

First, totals for each trapping were calculated.

moz_data$Totals<-rowSums(moz_data[,10:34])

ggplot(moz_data, aes(moz_data$Totals, fill = moz_data$WNV)) +
    facet_grid(moz_data$WNV~., scales = "free_y") +
  scale_fill_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  geom_histogram(binwidth = 10) +
Transparent +
theme(strip.background =element_rect(fill="transparent"))+
theme(panel.background = element_rect(fill = "transparent", color="black"))+
guides(fill = "none") +
labs(x = "Total Mosquitoes Collected",
     y = "Count")

Discerning whether the distributions are normal from the histogram alone was challenging

with(moz_data, tapply(moz_data$Totals,moz_data$WNV, Interpret))
##                                 None                                  Low 
## "Mean (Std. Dev) = 1204.17 (154.72)" "Mean (Std. Dev) = 1232.99 (152.48)" 
##                               Medium                                 High 
## "Mean (Std. Dev) = 1237.51 (145.35)" "Mean (Std. Dev) = 1200.10 (140.51)"
byf.shapiro(moz_data$Totals~moz_data$WNV,data=moz_data)
## 
##  Shapiro-Wilk normality tests
## 
## data:  moz_data$Totals by moz_data$WNV 
## 
##             W p-value
## None   0.9975  0.6804
## Low    0.9946  0.2573
## Medium 0.9905  0.2399
## High   0.9861  0.4076

The distributions were normal, so an anova and Tukey HSD were used to compare them.

WesDiff<-aov(moz_data$Totals~moz_data$WNV,data=moz_data)

summary(WesDiff)
##                Df   Sum Sq Mean Sq F value  Pr(>F)   
## moz_data$WNV    3   273655   91218   3.986 0.00774 **
## Residuals    1114 25495454   22886                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(WesDiff)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = moz_data$Totals ~ moz_data$WNV, data = moz_data)
## 
## $`moz_data$WNV`
##                   diff          lwr      upr     p adj
## Low-None     28.823485   1.50959041 56.13738 0.0339592
## Medium-None  33.339693  -0.02244668 66.70183 0.0502291
## High-None    -4.066667 -47.58466592 39.45133 0.9951116
## Medium-Low    4.516208 -30.52389897 39.55632 0.9874217
## High-Low    -32.890152 -77.70748077 11.92718 0.2336590
## High-Medium -37.406360 -86.14624614 11.33353 0.1982090

The relationship between total mosquito count and WNV risk level mirrored the Oc. trivittatus / WNV risk and Ae. vexans / WNV risk relationships.

ggplot(moz_data, aes(x = moz_data$WNV, y = moz_data$Totals, fill = moz_data$WNV)) + 
  geom_boxplot() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5), axis.title=element_text(size=14,face="bold"), axis.text=element_text(size=12)) + 
  labs(x = "Risk Level", y = "Mosquito Totals", title = NULL) +
  scale_fill_viridis(option = "turbo",begin = 0.1, end = 0.9, discrete=TRUE) +
  Transparent +
  geom_signif(annotations ="*", y_position = c(1800), xmin = c("None"), xmax =c("Low"))

What about total mosquito count and forest type?

ggplot(moz_data, aes(moz_data$Totals, fill = moz_data$Forest.Type)) +
    facet_grid(moz_data$Forest.Type~., scales = "free_y") +
  scale_fill_viridis(begin = 0.2, end = 0.9, discrete=TRUE) +
  theme(legend.title = element_blank(), strip.background = element_blank(),strip.text = element_blank())+
  geom_histogram(binwidth = 10)+
  Transparent+
  theme(panel.background = element_rect(fill = "transparent", color="black"))+
  labs(x = "Total Mosquitoes Collected",
     y = "Count")

with(moz_data, tapply(moz_data$Totals,moz_data$Forest.Type, Interpret))
##                           Coniferous                 Coniferous_Mixed Oak 
## "Mean (Std. Dev) = 1230.58 (156.99)" "Mean (Std. Dev) = 1271.12 (168.87)" 
##        Coniferous_Northern Hardwoods                            Mixed Oak 
## "Mean (Std. Dev) = 1200.44 (154.51)" "Mean (Std. Dev) = 1183.09 (151.80)" 
##         Mixed Oak_Northern Hardwoods                   Northern Hardwoods 
## "Mean (Std. Dev) = 1226.39 (136.94)" "Mean (Std. Dev) = 1222.53 (150.95)"
byf.shapiro(moz_data$Totals~moz_data$Forest.Type,data=moz_data)
## 
##  Shapiro-Wilk normality tests
## 
## data:  moz_data$Totals by moz_data$Forest.Type 
## 
##                                    W p-value
## Coniferous                    0.9861  0.6882
## Coniferous_Mixed Oak          0.9830  0.8816
## Coniferous_Northern Hardwoods 0.9977  0.9781
## Mixed Oak                     0.9729  0.5818
## Mixed Oak_Northern Hardwoods  0.9935  0.9288
## Northern Hardwoods            0.9976  0.4864

Despite looking a little strange when plotted, the distributions are normal.

Treediff<-aov(moz_data$Totals~moz_data$Forest.Type,data=moz_data)
summary(Treediff)
##                        Df   Sum Sq Mean Sq F value Pr(>F)  
## moz_data$Forest.Type    5   237242   47448   2.067 0.0672 .
## Residuals            1112 25531867   22960                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There are no significant differences among forest types in regards to mosquito totals.

\[\\[0.03in]\]