Researchers: Kate Nazario & Sarah Hurley

Culex larvae: source(http://whyfiles.org/2014/mosquitoes/)

Culex larvae: source(http://whyfiles.org/2014/mosquitoes/)

Overview and Motivation

Before Graduate school, I (Kate) worked at the Connecticut Agricultural Experiment Station as a Seasonal Research Assistant doing mosquito identification and colony maintenance as part of the Mosquito Research Surveillance Program in the division of Vector Biology and Zoonotic Diseases. My experience and the articles I’ve read previously on mosquitoes was my primary motivation for working on mosquitoes for the project.

At the CT Agricultural Experiment Station, mosquito trapping and virus testing are conducted from June through October at 91 locations across Connecticut. Every year data on the relative abundance, distribution, and infection rates of mosquito species are collected. On a 10 day rotation, mosquito traps are set using two trap types. A CO2-baited CDC Light Trap and a Gravid Mosquito trap are set up. While the Light Trap is designed to trap all species, Gravid Traps trap previously blood-fed adult females. The Gravid Trap primarily traps species in the genus Culex. Mosquitoes are processed through three-way process: collection, species identification, virus isolation/identification. This whole process takes about 7-10 days.

While the Agricultural Experiment Station is interested in monitoring viruses in all species, they are on the lookout for Culex species, especially Cx. pipiens. Papers such as Andreadis et al. (2001) and (2004) illustrate the complexity of the genus. These papers were also used as our primary reference when organizing this project. Andreadis et al. (2001) and (2004) look at different species of mosquitoes in Connecticut. However, I was interested in the species of the genus Culex they presented. There are three species of Culex in Connecticut found in a relatively high abundance: Cx. pipiens, Cx. restuans, and Cx. salinarius. While they are found in cities and even the same collection sites, Cx. pipiens is a stronger vector for West Nile Virus.

I was interested in looking at the phylogenetic relationships of Culex species. For my Master’s thesis, collaborators will be constructing phylogenties for me. I wanted to to create some phylogenies myself so I though this project would be a great chance to do it. What is very interesting about this group is that Cx. pipiens and Cx. restuans are quite similar morphologically. One of the most noticeable difference between the two species is that Cx. restuans has two spots on the thorax. However, does this mean they are more closely related? This motivated me to explore this group a bit more.

I (Sarah) think that maps are a really interesting and informative way to look at data and interpret the spatial scale and distribution of organisms. I also really wanted to explore map making more in R. I have done it a bit in the past through class assignments, but I wanted to learn become more comfortable using data and shapefiles to make meaningful maps.

Initial Questions

  1. Are the patterns seen in Andreadis et al. (2001) and (2004) still seen today?
  2. Is Cx. pipiens collected in Connecticut, still a stronger vector for West Nile Virus?
  3. Are Cx. restuans, and Cx. pipiens more closely related to each other than Cx. salinarius?
  4. Is human population density, temperature,or precipitation a factor?
  5. How do changes in these factors affect virus transmission in Culex?

Data

Molecular Sequences

We used BOLD and GenBank to search for genomes for each of the three species. However, we did not find genomes for Cx. restuans or Cx. salinarius. Instead, we decided to look for sequences that were commonly found in these databases for each of the species. Cytochrome c oxidase I (COI) sequences were highly abundant. Luckily, we were able to find sequences that were the same length. 658 base pair sequences from five haplotypes for each of the species were downloaded from GenBank. Another mosquito species from a different genus, Anopheles punctipennis was selected as the outgroup. Anopheles punctipennis is also found in Connecticut and the sequence on GenBank also had a 658 base pair sequence of COI that we could use. Raw reads were not found in the SRA repository so quality control could not be performed. Only aligned sequences were provided which were easily downloaded as one FASTA file. GenBank assession numbers for each of the sequences we chose were:

KU877005.1 (pipiens)
KU877019.1 (pipiens)
KU877003.1 (pipiens)
KU876994.1 (pipiens)
KU876959.1 (pipiens)
JX259915.1 (restuans)
JX259914.1 (restuans)
JX259913.1 (restuans)
JX259912.1 (restuans)
JX259911.1 (restuans)
JX260666.1 (salinarius)
JX260665.1 (salinarius)
JX260667.1 (salinarius)
JX260664.1 (salinarius)
JX260663.1 (salinarius)
KT111921.1 (outgroup)

Before doing any analyses or visualizations, we modified the FASTA file that we downloaded, for simplicity. The unique identifier was changed to include the GenBank assession number. After the version on the assession number a P,R,S,or A was added to identify each of the species (pipiens, restuans, salinarius, Anopheles punctipennis). Below is a trimmed example of the FASTA file before and after it was modified.

Before:
>KU877003.1 Culex pipiens voucher APHA-5-2015B07 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATT

After:
>KU877003.1P 
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATT

The final FASTA file I used contained the following sequences:

>JX260666.1S 
TACTTTATACTTCATTTTTGGTGCTTGAGCAGGAATAGTGGGAACTTCCCTAAGTTTACTTATTCGTGCTGAATTAAGCCAACCTGGTGTATTTATTGGAAATGATCAAATTTATAATGTAATCGTTACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGGGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGATTTTGAATACTTCCTCCTTCATTAACTTTACTACTGTCTAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTATCCCCCTCTTTCTTCTGGAACTGCCCATGCTGGAGCTTCTGTTGATTTAGCTATTTTTTCTCTTCATTTAGCTGGAGTATCTTCAATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCTGTTCTTTTATTATTATCTTTACCTGTATTAGCCGGAGCAATTACTATATTATTAACTGACCGAAATCTTAATACATCATTCTTTGATCCTATTGGAGGAGGAGATCCTATTTTATATCAACATTTATTC
>JX260665.1S 
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAGCAGAATTAAGACAACCGGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCCCATGCTTTTATTATAATTTTTTTTATAGTAATACCTATTATAATTGGGGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGACATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCTCCGTCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGAACAGCTCATGCTGGGGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCGGGAGTTTCATCTATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAACATACGATCATCCGGAATTACTTTAGATCGAATGCCTTTATTTGTCTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCCGGAGCTATTACTATACTATTAACTGATCGAAACTTAAATACTTCTTTCTTTGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260667.1S 
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATYTTAATTCGAGCAGAATTAAGACAACCAGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCCCATGCTTTTATTATAATTTTTTTTATAGTAATACCTATTATAATTGGGGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGACATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCTCCGTCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGGACAGCTCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCAGGAGTTTCATCTATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAATATACGATCATCCGGAATTACTTTAGATCGAATGCCTTTATTTGTCTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCTGGAGCTATTACTATACTATTAACTGATCGAAACTTAAATACTTCTTTCTTTGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260664.1S 
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAGCAGAATTAAGACAACCTGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTATTATATTTTTTTTTATAGTAATGCCTATTATAATTGGAGGGTTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCCCCATCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCTGGAACAGCCCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCGGGAGTTTCATCTATTTTAGGGGCAGTTAATTTTATTACAACTGTTATTAACATACGATCATCCGGAATTACTTTAGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTCTACCTGTTTTAGCTGGAGCTATTACTATATTATTAACTGATCGAAACTTAAATACTTCTTTCTTCGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260663.1S 
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAACAGAATTAAGACAACCTGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATGCCTATTATAATTGGAGGGTTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCCCCATCATTAACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCTGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGAACAGCCCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCAGGAGTTTCATCTATTTTAGGGGCAGTTAATTTTATTACAACTGTTATCAACATACGATCATCCGGAATTACTCTAGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCTGGAGCTATTACTATATTATTAACTGATCGAAACTTAAATACTTCTTTCTTCGATCCTATTGGAGGAGGAGATCCTATTCTATACCAACACTTATTT
>KU877005.1P 
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU877019.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU877003.1P 
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU876994.1P 
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU876959.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259915.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCATTTTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259914.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259913.1R 
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259912.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATGCGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259911.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATGCGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCATTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KT111921.1A
AACATTATATTTTATTTTTGGAGCTTGAGCAGGAATAGTAGGGACTTCTTTAAGTATTCTAATTCGTGCTGAATTAGGACACCCTGGAGCCTTTATTGGAGACGATCAAATTTATAATGTTATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGGGGATTTGGAAACTGATTAGTTCCTTTAATATTGGGAGCACCAGATATGGCTTTCCCTCGAATAAATAATATAAGATTTTGAATATTACCTCCTTCTTTGACTCTTTTAATTTCTAGTAGTATAGTAGAAAATGGAGCCGGGACAGGTTGAACTGTTTACCCTCCTCTATCTTCTGGAATTGCTCATGCCGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCAGGAATTTCTTCAATTTTAGGGGCAGTAAATTTTATTACAACTGTAATTAATATACGGTCTCCCGGAATTACACTTGATCGAATACCTTTATTTGTTTGATCAGTTGTGATTACAGCAGTATTATTATTATTATCTCTTCCTGTATTAGCTGGAGCTATTACAATATTATTAACAGATCGAAATTTAAATACATCATTTTTCGACCCTGCTGGAGGAGGAGACCCAATTTTATATCAACACTTATTT

BLAST

We ran BLAST (version 2.3.0) to check if the sequences map back to COI and if they are from at least mosquitoes. All sequences had similarities to COI sequences of other mosquito species of the same genus or other mosquito genera. None of the results were out of the ordinary. On the UConn cluster, we ran the following script:

#!/bin/bash
  #Specifies that bash shell has to be use to interpret script
#$ -N blast 
  #Name of job
#$ -M katherine.nazario@uconn.edu 
  #Email for status of job
#$ -m bea 
  #Notify user when job begins,error, or aborts
#$ -S /bin/bash 
  #Source for bash profile
#$ -cwd 
  #Execute job prom current working directory
#$ -pe smp 4  
  #4 cores requested 
#$ -o blast_$JOB_ID.out 
  #output log file
#$ -e blast_$JOB_ID.err 
  #error log file 

module load blast/2.3.0
  #Load the Blast
blastn -query culex_combined.fasta -out culex.blast.txt -evalue 1e-10 -outfmt '10 qseqid sscinames sstrand' -db /common/blast/data/nr
  #Blast parameters

For each species, the BLAST results showed the sequences were all COI. Most results showed similarity to sequences from species of the same genus or its sister genera. From here, we were ready to start phylogenetic analyses. Below are the first three results for each of the sequences we used for the analyses:

JX260666.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260666.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
JX260666.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

JX260665.1,0,Culex salinarius, cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260665.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260665.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A

JX260667.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260667.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260667.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A

JX260664.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260664.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A
JX260664.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A

JX260663.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260663.1,0,Culiseta inornata,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta inornata],N/A
JX260663.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A

KU877005.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877005.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877005.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

KU877019.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877019.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877019.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

KU877003.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877003.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877003.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

KU876994.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU876994.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876994.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

KU876959.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU876959.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876959.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A

JX259915.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259915.1,0,Culex tarsalis, cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259915.1,0,Culex tarsalis, cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A

JX259914.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259914.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259914.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A

JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A

JX259912.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259912.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259912.1,0,Culex homunculus,cytochrome oxidase subunit I, partial (mitochondrion) [Culex homunculus],N/A

JX259911.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259911.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259911.1,0,Culex homunculus,cytochrome oxidase subunit I, partial (mitochondrion) [Culex homunculus],N/A

KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A
KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A
KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A

Metadata

Information about viral isolates, localities, and dates were readily available at the CT Agricultural Experiment Station’s website (http://www.ct.gov/caes). Although the data was readily available, it was not easy to download in a form we can easily use in R. We decided to analyze data from the past five years (2012-2015). Weather plays a role in the abundance of mosquitoes during collecting. For instance, a rainy day may lead to lower mosquitoes collection. To help minimize that bias, we choose to look at different years.

To map this some of the factors we wanted to investigate more, we gathered shapefiles that contained information for state/town borders and bedrock. The CT shapefiles and bedrock files came from the CT GIS data at MAGIC(http://magic.lib.uconn.edu/connecticut_data.html). We gathered information about human population density in Connecticut from Connecticut’s Department of Economic and Community Development website (http://www.ct.gov/ecd/).For data on precipitation, we looked at WorldClim (http://www.worldclim.org).

Exploratory Analysis

To visualize our data, we used R. Using packages such as ape, ggtree, ggmap, and ggplot, we were able to customize phylogenies. We also used a combination of graphs and tables while looking for patterns. R packages such as ggmap, rgdal, dismo, and ggplot were used for creating maps.

Phylogenies: First Steps

To create the phylogenies, We used SplitsTree4(version 4.14.4), PAUP*(version 4.0b10), FastTree(version 2.1.3), and RAxML(version 8.3.17). We changed the parameters for several of the applications we used. Many of them gave very odd results such as several polytomies. Here we present and duelve into detail on three of the trees that appear to be most probable.

SplitsTree
Before any deep phylogenetic analyses were constructed, we used SplitsTree4 to create a phylogenetic network to visualize the data quickly. The program was locally installed on a MAC with a Sierra operating system (version 10.12.4). An uncorrected character transformation neighbor net network was created using all 16 sequence. From this, we were able to see some interesting information. One of the Cx. salinarius haplotypes (JX260666.1S), stood out right away. Our small sample size or gene choice, may have led to this, but we decided to explore a little more.

PAUP
We decided to explore PAUP (version 4.0b10) next in order to create neighbor joining trees and used heuristic tree searching in attempt to find more patterns. Although species grouped together into distinct clades, branching patterns with in clades varied. However, JX260666.1S continued to stand out. In order to determine the best model of evolution, we ran an Akaike information criterion (AIC) test and built a phylogeny based on the results. The parameters we used to build a neighbor-joining tree, to perform a heuristic search, and the results of the AIC test are illustrated below. These results led us to look at the sequences further.

paup> log file=culex_AIC_out.txt  start replace  
  #Creates a log of the current session
paup> tonexus from=culex.fasta to=culex.nex dataType=nucleotide format=fasta    
  #Converts FASTA to Nexus
paup> exe culex.nex     
  #Executes nexus file
paup> outgroup KT111921.1A      
  #Sets KT111921.1A as the outgroup
paup> set criterion=likelihood     
  #Sets tree searching criterion
paup>nj
  #Creates a neighbor-joining tree
paup> savetree brlens=yes file=culex_nj.tre    
  #Save tree  
paup> hsearch   
  #Does a heuristic search
paup> savetree brlens=yes file=culex_hsearch.tre    
  #Save tree
paup> automodel   
  #Performs AIC and BIC test
paup> lset nst=6 rclass=(abcdef) rmatrix=(0.12358346 123.41206 126.79993 0.25074559 198.30319) basefreq=(0.27876607 0.15353608 0.16001515) pinv=0.6325611   
  #Parameters from best model from AIC
paup> bootstrap nreps=100     
  #bootstrap analysis
paup> savetree brlens=yes file=culex_AIC.tre    
  #Save to file
paup> log stop     
  #Stop logging session
Neighbor-joining tree
Heuristic search tree
List of the first few results from the AIC test and the command recommended by AIC for the best model

Final Analysis:

Phylogenies: Further analyses and visualization

We also used RAxML and FastTree to investigate the differences we saw further. Commands and results for these two programs will be included in the visualization section below.To visualize the tree in an easy to follow way, we highlighted and labeled the species groups using R. To highlight each of the clades using ggtree, it was a bit difficult to upload the tree. The method we found best was to use a newick format. The newick output from my analyses originally did not have Anopheles as the outgroup so we used FigTree. With FigTree, we were able to reroot the tree and export the tree in newick format. When labeling which clades to highlight, you need to know the node forming that clade. We used the package ape to get node values.

Libraries needed to build the tree

library("ape")
library("ggplot2")
library("ggtree")

Visualizing the tree using AIC’s best model

#Read Tree in Newick format (copied and pasted from output file):
AIC_NWK <- read.tree(text="(((JX260666.1S:0.070371,(((JX260665.1S:0.004864,JX260667.1S:0.004263):0.022939,JX260664.1S:0.004431):0.008176,JX260663.1S:0.007523):0.088486):0.026309,((KU877005.1P:0.0,KU877019.1P:0.0,KU877003.1P:0.0,KU876994.1P:0.0,KU876959.1P:0.0):0.048345,(JX259915.1R:0.001499,(JX259914.1R:0.0,JX259913.1R:0.0,(JX259912.1R:0.0,JX259911.1R:0.001499):0.001499):0.001499):0.031989):0.016595):0.130208,KT111921.1A:0.0);")

AIC <-ggtree(AIC_NWK) #Name tree 

AIC_nodes <- AIC + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later 
AIC_nodes #View tree with nodes numbered

#Customize tree to final output
AIC_tree <- ggtree(AIC_NWK,size=.5) + #this modifies the thickness of the branches
  geom_treescale(0,16) +  #Adds scale
  geom_tiplab() + #Adds tip labels
  geom_cladelabel(node=19, label="Cx.salinarius", align=TRUE, offset=.09,color="blue") + #Adds blue clade label
  geom_cladelabel(node=24, label="Cx.pipiens", align=TRUE, offset=.09,color="red") + #Adds red clade label
  geom_cladelabel(node=25, label="Cx.restuans", align=TRUE, offset=.09,color="dark green") + #Adds green clade label
  geom_hilight(node=19, fill="#0101DF", alpha=.5, extend=.08) + #Highlights salinarius clade blue
  geom_hilight(node=24, fill="red", alpha=.4, extend=.08) + #Highlights pipiens clade red
  geom_hilight(node=25, fill="darkgreen", alpha=.5, extend=.08) + #Highlights restuans clade green
  xlim(0,.5) #sets x limits so tree fits in window
AIC_tree #View tree

The tree using AIC’s best model agrees with the relationships we think would look like based on morphological differences. We wanted to explore this a bit further to see if we get the same relationships!

Visualizing the tree from FastTree

We wanted to test different types of tree building to see if we got the same results. One of the programs we used was FastTree (version 2.1.3). With FastTree, you can import alignments and get a tree quickly without using a lot of memory. It infers maximum likelihood phylogenies. We followed similar protocols when visualizing the tree. Some of the branching patterns were different so we had to adjust some parameters accordingly. We used FastTree installed on UConn’s cluster(See BLAST script for parameter explanations) using the following:

#!/bin/bash
#$ -N blast
#$ -M Katherine.nazario@uconn.edu
#$ -m bea
#$ -S /bin/bash
#$ -cwd
#$ -pe smp 4
#$ -o blast_$JOB_ID.out
#$ -e blast_$JOB_ID.err

FastTree culex.fasta > culex_FastTree.tre
#Read Tree in Newick format (copied and pasted from output file):
fasttree_NWK <- read.tree(text="(((JX260666.1S:0.05943,((JX260665.1S:0.00549,JX260667.1S:0.00391)[&label=0.963]:0.01412,(JX260664.1S:0.00568,JX260663.1S:0.01439)[&label=0.317]:0.00881)[&label=1.0]:0.05985)[&label=0.88]:0.02424,((KU877005.1P:0.0,KU877019.1P:0.0,KU877003.1P:0.0,KU876994.1P:0.0,KU876959.1P:0.0):0.03847,(JX259915.1R:0.00152,((JX259914.1R:0.0,JX259913.1R:0.0):5.4E-4,(JX259912.1R:5.3E-4,JX259911.1R:0.0013)[&label=0.103]:0.00152)[&label=0.927]:0.00131)[&label=0.999]:0.03344)[&label=0.766]:0.01643):0.048275,KT111921.1A:0.048275);")

fasttree <-ggtree(fasttree_NWK) #Name tree 

fasttree_nodes <- fasttree + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later 
fasttree_nodes #View tree with nodes numbered

#Customize tree to final output
fasttree_tree <- ggtree(fasttree_NWK,size=.5) + #this modifies the thickness of the branches
  geom_treescale(0,16) +  #Adds scale to top left corner
  geom_tiplab() + #Adds tip labels
  geom_cladelabel(node=19, label="Cx.salinarius", align=TRUE, offset=.09,color="blue") + #Adds blue clade label
  geom_cladelabel(node=24, label="Cx.pipiens", align=TRUE, offset=.09,color="red") + #Adds red clade label
  geom_cladelabel(node=25, label="Cx.restuans", align=TRUE, offset=.09,color="dark green") + #Adds green clade label
  geom_hilight(node=19, fill="#0101DF", alpha=.5, extend=.08) + #Highlights salinarius clade blue
  geom_hilight(node=24, fill="red", alpha=.4, extend=.08) + #Highlights pipiens clade red
  geom_hilight(node=25, fill="darkgreen", alpha=.5, extend=.08) + #Highlights restuans clade green
  xlim(0,.3) #sets x limits so tree fits in window
fasttree_tree #View tree

This tree illustrates similar results to our AIC best model tree. There are some differences within species but Cx. pipiens and Cx. restuans are more closely related.

Visualizing the tree from RAxML

The last tree building software we tested was RAxML (version 8.3.17). During many Grad seminars, trees are done using RAxML so we decided to give it a shot. However, with RAxML, the input file needs to be in a PHYLIP not Nexus format. We formatting this manually. With the Nexus file we had produced via PAUP, we deleted everything except the matrix. The first line we added the number of taxa and the number of characters (16 658). The first few lines of the PHYLIP file and RAxML parameters we used when analyzing the tree on the UConn cluster were:

#!/bin/bash
#$ -N blast
#$ -M Katherine.nazario@uconn.edu
#$ -m bea
#$ -S /bin/bash
#$ -cwd
#$ -pe smp 4
#$ -o blast_$JOB_ID.out
#$ -e blast_$JOB_ID.err

raxml -m GTRGAMMA -s culex.phylip -N culex_RAxML -o KT111921.1A -N 100 -p 1616 -e 0.00001 -f a -x 1616
    -m = model of evolution
    -s = input phylip file
    -N = output names 
    -x = Bootstrap Random Number Seed
    -e = precision with which model parameters will be estimated
    -o = outgroup
    -N = number of runs
    -p = pseudorandom number seed
    -f a = sets up combination of bootstrapping and ML searching 
#Read Tree in Newick format (copied and pasted from output file):
RAxML_NWK <- read.tree(text="(((((JX260663.1S:0.02397468441109106008,(JX260664.1S:0.01404653975119892796,(JX260667.1S:0.01892817657512024798,JX260665.1S:0.01168702685188493899):0.07573599826928545387):0.02919144044608793995):0.29155137604914332927,JX260666.1S:0.21313723410477683484):0.12881041226579492687,(((JX259913.1R:0.00000041761126378665,JX259914.1R:0.00000041761126378665):0.00000041761126378665,(JX259912.1R:0.00000041761126378665,JX259911.1R:0.00504292530803613845):0.00504288320049779172):0.00496452002556844671,JX259915.1R:0.00517979166678775218):0.10160254100523570531):0.05364671827476297228,(((KU876959.1P:0.00000041761126378665,KU877003.1P:0.00000041761126378665):0.00000041761126378665,(KU877005.1P:0.00000041761126378665,KU877019.1P:0.00000041761126378665):0.00000041761126378665):0.00000041761126378665,KU876994.1P:0.00000041761126378665):0.11877850631174037555):0.23320577545323131763,KT111921.1A:0.23320577545323131763);")

RAxMLtree <-ggtree(RAxML_NWK) #Name tree 

RAxMLtree_nodes <- RAxMLtree + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later 
RAxMLtree_nodes #View tree with nodes numbered

#Customize tree to final output
RAxMLtree_tree <- ggtree(RAxML_NWK,size=.5) + #this modifies the thickness of the branches
  geom_treescale(0,16) +  #Adds scale
  geom_tiplab() + #Adds tip labels
  geom_cladelabel(node=20, label="Cx.salinarius", align=TRUE, offset=.3,color="blue") + #Adds blue clade label
  geom_cladelabel(node=28, label="Cx.pipiens", align=TRUE, offset=.3,color="red") + #Adds red clade label
  geom_cladelabel(node=24, label="Cx.restuans", align=TRUE, offset=.3,color="dark green") + #Adds green clade label
  geom_hilight(node=20, fill="#0101DF", alpha=.5, extend=.25) + #Highlights salinarius clade blue
  geom_hilight(node=28, fill="red", alpha=.4, extend=.25) + #Highlights pipiens clade red
  geom_hilight(node=24, fill="darkgreen", alpha=.5, extend=.25) + #Highlights restuans clade green
  xlim(0,1.5) #sets x limits so tree fits in window
RAxMLtree_tree #View tree

When we ran RAxML, we received a warning that some of our sequences might be the same. We did not find this too surprising since some of the individuals were from the same locality. Since we are only using COI, there might not be enough information in this gene to tell these individuals apart. Surprisingly, RAxML grouped Cx. restuans and Cx. salinarius. This supports the theory that the species that are bad vectors for West Nile are more closely related. Which tree is the “true” tree? It might be that we need to consider other factors. Some mosquitoes feed on mammals primarily while others birds. That could be a factor to investigate in the future. We could not find adequate data to look at this ourselves.

Graphs and Tables

In order to visualize the data we collected from the Connecticut Agricultural Experiment Station, we decided to use graphs and tables. They were created using the ggplot2 package. First, we wanted to see if there were any patterns illustrated in Andreadis et al. (2001) and (2004) such as a higher abundance of West Nile Virus in Cx. pipiens. The tables on the Station’s website were not downloadable. It also contained information about all viruses tested. We gathered the data we needed into an excel file. To test the visual appeal, we created graphs with and without the gridlines to test Kelleher & Wagener (2011) visualization guidelines. Graphs without the background are easier to read. Therefore, after the first graph all others have the background removed.

Table: Percent of WNV isolates in CT hosted by Culex species

percent <- matrix(c(82,50,76,69,84,6,37,15,10,7,3,4,3,15,0), ncol=3) #Add values 
colnames(percent) <- c('%pipiens', "%restuans","%salinarius") #Add column names
rownames(percent) <-c(2012,2013,2014,2015,2016) #Add row names
percents.table <- as.table(percent) #Convert to table
percents.table
##      %pipiens %restuans %salinarius
## 2012       82         6           3
## 2013       50        37           4
## 2014       76        15           3
## 2015       69        10          15
## 2016       84         7           0

In this table, we can see that Cx. pipiens is the species that is found to carry West Nile virus in a higher proportion in every year in our study. West Nile isolates in Cx. salinarius is significantly lower than the other two species.

Graph: Number of West Nile isolates in Connecticut from 2012-2016 with background

WNV_isolates <- read.csv("Mosquito_WNVisolates.csv") #Read table
WNV_isolates #View table
##    Year Number.of.West.Nile.isolates Culex.species
## 1  2012                          192       pipiens
## 2  2013                           45       pipiens
## 3  2014                           52       pipiens
## 4  2015                          109       pipiens
## 5  2016                          103       pipiens
## 6  2012                           14      restuans
## 7  2013                           33      restuans
## 8  2014                           10      restuans
## 9  2015                           15      restuans
## 10 2016                            8      restuans
## 11 2012                            6   salinarius 
## 12 2013                            4   salinarius 
## 13 2014                            2   salinarius 
## 14 2015                           23   salinarius 
## 15 2016                            0   salinarius
WNV_isolates_graph <-ggplot(data=WNV_isolates,aes(x=Year,y=Number.of.West.Nile.isolates,colour=Culex.species,pch=Culex.species))+ #Define variables
  geom_point() + #Plot points
  geom_line() + #Plot lines
  labs(title="Number of West Nile Isolates in Connecticut from 2012-2016") + #Add title
  xlab("Year") + #change x axis label
  ylab("Number of WNV isolates") #change y axis label

WNV_isolates_graph #View graph

This graph shows similar results as our table. Cx. pipiens carries a significantly higher amount of West Nile!

Graph: Number of West Nile isolates in Connecticut from 2012-2016 without background

WNV_isolates <- read.csv("Mosquito_WNVisolates.csv") #Read table
WNV_isolates #View table
##    Year Number.of.West.Nile.isolates Culex.species
## 1  2012                          192       pipiens
## 2  2013                           45       pipiens
## 3  2014                           52       pipiens
## 4  2015                          109       pipiens
## 5  2016                          103       pipiens
## 6  2012                           14      restuans
## 7  2013                           33      restuans
## 8  2014                           10      restuans
## 9  2015                           15      restuans
## 10 2016                            8      restuans
## 11 2012                            6   salinarius 
## 12 2013                            4   salinarius 
## 13 2014                            2   salinarius 
## 14 2015                           23   salinarius 
## 15 2016                            0   salinarius
WNV_isolates_graph <-ggplot(data=WNV_isolates, aes(x=Year,y=Number.of.West.Nile.isolates,colour=Culex.species))+ #Define variables
  geom_point() + #Plot points
  geom_line() + #Plot lines
  labs(title="Number of West Nile Isolates in Connecticut from 2012-2016") + #Add title
  xlab("Year") + #change x axis label
  ylab("Number of WNV isolates") + #change y axis label
  theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank())  #Remove gridlines

WNV_isolates_graph #View graph

Graph: Percent of West Nile isolates in Connecticut from 2012-2016

WNV_percents <- read.csv("Mosquito_WNVPercents.csv") #Read table
WNV_percents #View table
##    Year Culex.species percent  X
## 1  2012       pipiens    0.82 NA
## 2  2013       pipiens    0.50 NA
## 3  2014       pipiens    0.76 NA
## 4  2015       pipiens    0.69 NA
## 5  2016       pipiens    0.84 NA
## 6  2012      restuans    0.06 NA
## 7  2013      restuans    0.37 NA
## 8  2014      restuans    0.15 NA
## 9  2015      restuans    0.10 NA
## 10 2016      restuans    0.07 NA
## 11 2012    salinarius    0.03 NA
## 12 2013    salinarius    0.04 NA
## 13 2014    salinarius    0.03 NA
## 14 2015    salinarius    0.15 NA
## 15 2016    salinarius    0.00 NA
WNV_isolates_graph <-ggplot(data=WNV_percents, aes(x=Year,y=percent,colour=Culex.species))+ #Define variables
  geom_point() + #Plot points
  geom_line() + #Plot lines
  labs(title="Percent of West Nile Isolates in Connecticut from 2012-2016") + #Add title
  xlab("Year") + #change x axis label
  ylab("Percent of WNV isolates")+ #change y axis label
  theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank())  #Remove gridlines

WNV_isolates_graph #View graph

This is the same data as our table but in a graph. Cx. pipiens stands out when we graphed this data.

Graph: Connecticut Human Population Desity vs WNV isolates in Culex pipiens with statistics

Density <- read.csv("Mosquito_WNVPercents_pipiens.csv")

Density_graph <-ggplot(data=Density, aes(x=Density,y=percent)) + #Define variables
  geom_bar(stat="identity")+ #Plots points
  labs(title="Human Population Density and West Nile isolates in Culex pipiens") +#Add Title 
  xlab("CT human population density (in meters squared)") + #change x axis label
  ylab("Percent of WNV isolates for Culex pipens") + #change y axis label
  theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank())   #Remove gridlines
  
Density_graph #View graph

cor.test(Density$Density,Density$percent)
## 
##  Pearson's product-moment correlation
## 
## data:  Density$Density and Density$percent
## t = -1.3023, df = 3, p-value = 0.2838
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9692982  0.5987856
## sample estimates:
##        cor 
## -0.6009622

In this graph, we decided to look at Cx. pipiens and see if there is a correlation with human population density in Connecticut. The p-value from the correlation test was greater than 0.05. This indicates that the result are not significant and that we can’t reject the null hypothesis. In this graph, the percent of West Nile virus isolates found in Cx. pipiens varies quite a bit even when human population density does not change much. Even before we did the correlation test, we didn’t think there would be a correlation by looking at the graph.

Graph: Weekly WNV isolates in Culex for 2016

Weekly <- read.csv("Mosquito_Localities.csv")
Weekly
##    Week cases    species
## 1     1     0    pipiens
## 2     2     0    pipiens
## 3     3     0    pipiens
## 4     4     0    pipiens
## 5     5     1    pipiens
## 6     6     0    pipiens
## 7     7     0    pipiens
## 8     8    12    pipiens
## 9     9     3    pipiens
## 10   10    21    pipiens
## 11   11    12    pipiens
## 12   12    12    pipiens
## 13   13    11    pipiens
## 14   14     3    pipiens
## 15   15    18    pipiens
## 16   16    10    pipiens
## 17   17     4    pipiens
## 18   18     1    pipiens
## 19   19     0    pipiens
## 20   20     0    pipiens
## 21    1     0   restuans
## 22    2     0   restuans
## 23    3     0   restuans
## 24    4     0   restuans
## 25    5     0   restuans
## 26    6     0   restuans
## 27    7     0   restuans
## 28    8     1   restuans
## 29    9     0   restuans
## 30   10     3   restuans
## 31   11     0   restuans
## 32   12     2   restuans
## 33   13     1   restuans
## 34   14     0   restuans
## 35   15     0   restuans
## 36   16     0   restuans
## 37   17     0   restuans
## 38   18     0   restuans
## 39   19     0   restuans
## 40   20     0   restuans
## 41    1     0 salinarius
## 42    2     0 salinarius
## 43    3     0 salinarius
## 44    4     0 salinarius
## 45    5     0 salinarius
## 46    6     0 salinarius
## 47    7     0 salinarius
## 48    8     0 salinarius
## 49    9     0 salinarius
## 50   10     0 salinarius
## 51   11     1 salinarius
## 52   12     0 salinarius
## 53   13     0 salinarius
## 54   14     0 salinarius
## 55   15     0 salinarius
## 56   16     0 salinarius
## 57   17     0 salinarius
## 58   18     0 salinarius
## 59   19     0 salinarius
## 60   20     0 salinarius
weekly_graph <-ggplot(data=Weekly, aes(x=Weekly$Week,y=Weekly$cases,group=species, colour=species)) + #Define variables
  geom_line() + #Plot line
  geom_point() + #Plot points
  labs(title="Weekly WNV Results 2016") + #Add title
  xlab("Collection Week") + #Change x label
  ylab("Number of WNV isolates") + #Change y label
  theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
  theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank())  #Remove gridlines
  
weekly_graph #View graph

We were only able to get weekly data for 2016 from the CT Ag. Station’s website. This is very similar to Andreadis et al. findings over a decade before!

Maps

This aspect of the project was conducted on a PC operating on Windows 10.

Libraries needed to build the maps

library(ggmap)
library(rgdal)  
library(dismo)
library(ggplot2)

Results:

From this study, we were able to determine that 15 years after Andreadis et al. (2001), Culex pipiens continues to be the main vector for West Nile Virus in Connecticut. Several patterns such as seasonal abundance continue the same trends today. Of the 3 Culex species, Culex pipiens is the most active in carrying WNV.

combined_tree<-multiplot(AIC_tree,fasttree_tree,RAxMLtree_tree,ncol=1,labels,plotlist="1=AIC 2=FastTree 3=RAxML",height=7)
#Tree 1 = AIC
#Tree 2 = FastTree
#Tree 3 = RAxML
While analyzing the phylogeny, we did notice differences with each of the different tree building software we used. While major clades (Cx. pipiens, Cx. restuans, and Cx. salinarius) were the same, branching patterns within and between varied. The RAxML tree showed Cx. restuans, and Cx. salinarius sister to each other while the AIC tree and FastTree trees grouped Cx. restuans, and Cx. pipiens together. Based on morphology, Cx. pipiens and Cx. restuans look more alike than Cx. salinarius. When working at the Agricultural Experiment Station, the primary character I (Kate) used to tell them apart was the presence or absence of 2 spots on the thorax. Miller et al. (1996) used ribosomal DNA to create a phylogeny of 14 Culex mosquito species. In their phylogeny, Cx. restuans, and Cx. salinarius were more closely related to each other than Cx. pipiens as we noted in the RAxML tree. We found this very surprising, just knowing what we do about the morphology.
Miller et al. (1996) Phylogeny

Miller et al. (1996) Phylogeny

They had a much larger variety of samples than we did. They also used a different data type. It may be that COI for this group, there are not enough informative sites to resolve the tree. Seeing that Cx. restuans, and Cx. salinarius are more closely related and are not good vectors of the virus, it appears as though there might be a genetic component to this. If we had genomes for each of the species, this would have been a great topic to explore more. We also wanted to try out JModelTest which is a statistical tool which determines best-fit models of nucleotide substitution. However, we could not get the program to work. We tried using it on the UConn cluster, locally on our computers, but it didn’t work.

As we looked through a lot of data and shapefiles, we realized that we weren’t able to answer some of our questions with the data that is publically available. First, environmental factors such as precipitation and humidity didn’t have much of a gradient over the state. It is probably just too small of an area to have much of a difference across it. So we weren’t able to use those parameters to create maps. Although, we were able to use bedrock data over CT, we did not see a relationship. We had a difficult time trying to add human population density to the CT shapefiles that we had available, so we weren’t able to answer that question with a map, but we were able to create plots that represented this data. After doing correlations tests, it looks as though human population density and West Nile Virus isolates are not correlated. This may be due to a bias in the data. Mosquitoes are only collected in localities where there are people. Also, the summer is full of travel so a yearly population consensus might not be appropriate to analyse this.

Summary of Results:

  1. Are the patterns seen in Andreadis et al. (2001) and (2004) still seen today?Yes
  2. Is Cx. pipiens collected in Connecticut, still a stronger vector for West Nile Virus?Yes
  3. Is this correlated to human population density, temperature, precipitation, or another factor?
    • human population density No
    • temperature No
    • precipitation No
    • bedrock No
  4. Are Cx. restuans, and Cx. pipiens more closely related to each other than Cx. salinarius?Maybe

    With the conflicting data we gathered when building the phylogenies we cannot say for certain who’s more related to who. Previous researchers claim Cx. restuans and Cx. salinarius are more closely related. If multiple genes are used or genomes are assembled, this data can be improved. These relationship highly important to distinguish. This can help improve mosquito repellents to target specific species. It can also be useful when determining where West Nile virus can spread.

Future Work:

In the future it would be good to be able to have genomes for each of the species. An alternative we could have done if the data was available, was to look at sequences from each of the species from different localities. If we collected mosquito data from localities, other than the ones the Ag. station looks at, would we have seen the same results? It would have been very interesting to do some work looking at which organisms each of the species feeds on.

References

  1. Anderson, J., Andreadis, T., Main, A., & Kline, D. (2004). Prevalence of West Nile virus in tree canopy-inhabiting Culex pipiens and associated mosquitoes. The American Journal of Tropical Medicine and Hygiene, 71(1), 112-9.

  2. Andreadis, T., Anderson, J., & Vossbrinck, C. (2001). Mosquito surveillance for West Nile virus in Connecticut, 2000: Isolation from Culex pipiens, Cx. restuans, Cx. salinarius, and Culiseta melanura. Emerging Infectious Diseases, 7(4), 670-674.

  3. Andreadis, T.G., Anderson, J.F., Vossbrinck, C.R., & Main, A.J. (2004). Epidemiology of West Nile virus in Connecticut: A five-year analysis of mosquito data 1999-2003. Vector Borne and Zoonotic Diseases, 4(4), 360-378.

  4. Diuk-Wasser, M. A., Brown, H. D., Fish, D. G., & Andreadis, T. (2006). Modeling the spatial distribution of mosquito vectors West Nile virus in Connecticut, USA. Vector-Borne and Zoonotic Diseases, 6(3), 283-295.

  5. Miller, B. R., Crabtree, M. B. and Savage, H. M. (1996), Phylogeny of fourteen Culex mosquito species, including the Culex pipiens complex, inferred from the internal transcribed spacers of ribosomal DNA. Insect Molecular Biology, 5: 93–107.

  6. Molaei, G. G., Andreadis, T. M., Armstrong, P., & Diuk-Wasser, M. (2008). Host-feeding patterns of potential mosquito vectors in connecticut, USA: Molecular analysis of bloodmeals from 23 species of Aedes, Anopheles, Culex, Coquillettidia, Psorophora, and Uranotaenia. Journal of Medical Entomology, 45(6), 1143-1151.