Culex larvae: source(http://whyfiles.org/2014/mosquitoes/)
Before Graduate school, I (Kate) worked at the Connecticut Agricultural Experiment Station as a Seasonal Research Assistant doing mosquito identification and colony maintenance as part of the Mosquito Research Surveillance Program in the division of Vector Biology and Zoonotic Diseases. My experience and the articles I’ve read previously on mosquitoes was my primary motivation for working on mosquitoes for the project.
At the CT Agricultural Experiment Station, mosquito trapping and virus testing are conducted from June through October at 91 locations across Connecticut. Every year data on the relative abundance, distribution, and infection rates of mosquito species are collected. On a 10 day rotation, mosquito traps are set using two trap types. A CO2-baited CDC Light Trap and a Gravid Mosquito trap are set up. While the Light Trap is designed to trap all species, Gravid Traps trap previously blood-fed adult females. The Gravid Trap primarily traps species in the genus Culex. Mosquitoes are processed through three-way process: collection, species identification, virus isolation/identification. This whole process takes about 7-10 days.
While the Agricultural Experiment Station is interested in monitoring viruses in all species, they are on the lookout for Culex species, especially Cx. pipiens. Papers such as Andreadis et al. (2001) and (2004) illustrate the complexity of the genus. These papers were also used as our primary reference when organizing this project. Andreadis et al. (2001) and (2004) look at different species of mosquitoes in Connecticut. However, I was interested in the species of the genus Culex they presented. There are three species of Culex in Connecticut found in a relatively high abundance: Cx. pipiens, Cx. restuans, and Cx. salinarius. While they are found in cities and even the same collection sites, Cx. pipiens is a stronger vector for West Nile Virus.
I was interested in looking at the phylogenetic relationships of Culex species. For my Master’s thesis, collaborators will be constructing phylogenties for me. I wanted to to create some phylogenies myself so I though this project would be a great chance to do it. What is very interesting about this group is that Cx. pipiens and Cx. restuans are quite similar morphologically. One of the most noticeable difference between the two species is that Cx. restuans has two spots on the thorax. However, does this mean they are more closely related? This motivated me to explore this group a bit more.
I (Sarah) think that maps are a really interesting and informative way to look at data and interpret the spatial scale and distribution of organisms. I also really wanted to explore map making more in R. I have done it a bit in the past through class assignments, but I wanted to learn become more comfortable using data and shapefiles to make meaningful maps.
We used BOLD and GenBank to search for genomes for each of the three species. However, we did not find genomes for Cx. restuans or Cx. salinarius. Instead, we decided to look for sequences that were commonly found in these databases for each of the species. Cytochrome c oxidase I (COI) sequences were highly abundant. Luckily, we were able to find sequences that were the same length. 658 base pair sequences from five haplotypes for each of the species were downloaded from GenBank. Another mosquito species from a different genus, Anopheles punctipennis was selected as the outgroup. Anopheles punctipennis is also found in Connecticut and the sequence on GenBank also had a 658 base pair sequence of COI that we could use. Raw reads were not found in the SRA repository so quality control could not be performed. Only aligned sequences were provided which were easily downloaded as one FASTA file. GenBank assession numbers for each of the sequences we chose were:
KU877005.1 (pipiens)
KU877019.1 (pipiens)
KU877003.1 (pipiens)
KU876994.1 (pipiens)
KU876959.1 (pipiens)
JX259915.1 (restuans)
JX259914.1 (restuans)
JX259913.1 (restuans)
JX259912.1 (restuans)
JX259911.1 (restuans)
JX260666.1 (salinarius)
JX260665.1 (salinarius)
JX260667.1 (salinarius)
JX260664.1 (salinarius)
JX260663.1 (salinarius)
KT111921.1 (outgroup)
Before doing any analyses or visualizations, we modified the FASTA file that we downloaded, for simplicity. The unique identifier was changed to include the GenBank assession number. After the version on the assession number a P,R,S,or A was added to identify each of the species (pipiens, restuans, salinarius, Anopheles punctipennis). Below is a trimmed example of the FASTA file before and after it was modified.
Before:
>KU877003.1 Culex pipiens voucher APHA-5-2015B07 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATT
After:
>KU877003.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATT
The final FASTA file I used contained the following sequences:
>JX260666.1S
TACTTTATACTTCATTTTTGGTGCTTGAGCAGGAATAGTGGGAACTTCCCTAAGTTTACTTATTCGTGCTGAATTAAGCCAACCTGGTGTATTTATTGGAAATGATCAAATTTATAATGTAATCGTTACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGGGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGATTTTGAATACTTCCTCCTTCATTAACTTTACTACTGTCTAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTATCCCCCTCTTTCTTCTGGAACTGCCCATGCTGGAGCTTCTGTTGATTTAGCTATTTTTTCTCTTCATTTAGCTGGAGTATCTTCAATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCTGTTCTTTTATTATTATCTTTACCTGTATTAGCCGGAGCAATTACTATATTATTAACTGACCGAAATCTTAATACATCATTCTTTGATCCTATTGGAGGAGGAGATCCTATTTTATATCAACATTTATTC
>JX260665.1S
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAGCAGAATTAAGACAACCGGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCCCATGCTTTTATTATAATTTTTTTTATAGTAATACCTATTATAATTGGGGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGACATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCTCCGTCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGAACAGCTCATGCTGGGGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCGGGAGTTTCATCTATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAACATACGATCATCCGGAATTACTTTAGATCGAATGCCTTTATTTGTCTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCCGGAGCTATTACTATACTATTAACTGATCGAAACTTAAATACTTCTTTCTTTGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260667.1S
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATYTTAATTCGAGCAGAATTAAGACAACCAGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCCCATGCTTTTATTATAATTTTTTTTATAGTAATACCTATTATAATTGGGGGATTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGACATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCTCCGTCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGGACAGCTCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCAGGAGTTTCATCTATTTTAGGAGCAGTTAATTTTATTACAACTGTTATTAATATACGATCATCCGGAATTACTTTAGATCGAATGCCTTTATTTGTCTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCTGGAGCTATTACTATACTATTAACTGATCGAAACTTAAATACTTCTTTCTTTGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260664.1S
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAGCAGAATTAAGACAACCTGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTATTATATTTTTTTTTATAGTAATGCCTATTATAATTGGAGGGTTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCCCCATCATTGACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCCGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCTGGAACAGCCCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCGGGAGTTTCATCTATTTTAGGGGCAGTTAATTTTATTACAACTGTTATTAACATACGATCATCCGGAATTACTTTAGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTCTACCTGTTTTAGCTGGAGCTATTACTATATTATTAACTGATCGAAACTTAAATACTTCTTTCTTCGACCCTATTGGAGGAGGAGACCCTATTCTATATCAACATTTATTT
>JX260663.1S
AACTTTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTCTTAGAATTTTAATTCGAACAGAATTAAGACAACCTGGGATATTCATTGGAAATGATCAAATTTATAATGTAATTGTTACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATGCCTATTATAATTGGAGGGTTTGGAAATTGATTAGTTCCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCCCGAATAAATAATATAAGATTTTGAATACTTCCCCCATCATTAACTCTTCTTCTTTCTAGAAGTATAGTAGAAAATGGATCTGGGACTGGTTGAACTGTTTACCCCCCTCTTTCTTCAGGAACAGCCCATGCTGGAGCTTCTGTTGATTTAACTATTTTTTCTCTCCATTTAGCAGGAGTTTCATCTATTTTAGGGGCAGTTAATTTTATTACAACTGTTATCAACATACGATCATCCGGAATTACTCTAGATCGAATACCTTTATTTGTTTGATCTGTTGTAATTACAGCAGTTCTTCTTCTTCTTTCTTTACCTGTTTTAGCTGGAGCTATTACTATATTATTAACTGATCGAAACTTAAATACTTCTTTCTTCGATCCTATTGGAGGAGGAGATCCTATTCTATACCAACACTTATTT
>KU877005.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU877019.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU877003.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU876994.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KU876959.1P
AACATTATATTTTATTTTTGGGGCTTGAGCTGGAATAGTTGGAACTTCTTTAAGTTTACTAATTCGAGCAGAATTAAGTCAACCAGGTGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTAATACCAATCATAATTGGAGGATTTGGAAATTGATTAGTTCCTTTAATGTTAGGAGCTCCAGATATGGCCTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTGACACTACTACTTTCAAGTAGTTTAGTAGAAAATGGAGCTGGGACTGGATGAACAGTGTATCCCCCTCTTTCATCTGGAACAGCTCATGCTGGAGCTTCAGTAGACTTAGCTATTTTTTCTTTACATTTAGCAGGAATTTCATCAATTTTAGGTGCAGTAAATTTTATTACAACAGTAATTAATATACGATCTTCAGGAATTACTCTTGATCGAATACCTTTATTTGTTTGATCAGTAGTAATTACTGCAGTTTTATTACTTCTTTCTTTACCTGTTTTAGCTGGTGCTATTACTATGTTATTAACAGATCGAAATTTAAATACTTCATTCTTTGATCCAATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259915.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCATTTTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259914.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259913.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATACGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259912.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATGCGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCGTTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>JX259911.1R
AACATTATACTTTATTTTCGGAGCTTGAGCTGGAATAATTGGTACTTCATTAAGTATTCTTATTCGAGCAGAATTAAGTCAACCTGGAGTATTTATTGGAAATGATCAAATTTATAATGTTATTGTAACTGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGTAATTGATTAGTTCCTCTAATATTAGGAGCTCCTGATATAGCTTTTCCTCGAATAAATAATATAAGTTTTTGAATACTACCTCCTTCATTAACATTACTACTTTCAAGTAGTATAGTAGAAAATGGAGCTGGGACTGGATGAACAGTTTACCCCCCTCTTTCATCTGGTACAGCCCATGCTGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCTGGAATTTCATCAATTTTAGGAGCAGTAAATTTTATTACTACTGTAATTAATATGCGATCTTCAGGTATTACACTTGATCGAATACCATTATTTGTTTGATCAGTAGTAATTACTGCTGTTCTTTTACTTCTTTCTTTACCTGTATTAGCTGGTGCTATTACTATACTATTAACTGATCGAAATCTAAATACTTCATTCTTTGATCCTATTGGAGGAGGAGATCCAATTTTATATCAACATTTATTT
>KT111921.1A
AACATTATATTTTATTTTTGGAGCTTGAGCAGGAATAGTAGGGACTTCTTTAAGTATTCTAATTCGTGCTGAATTAGGACACCCTGGAGCCTTTATTGGAGACGATCAAATTTATAATGTTATTGTAACAGCTCATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGGGGATTTGGAAACTGATTAGTTCCTTTAATATTGGGAGCACCAGATATGGCTTTCCCTCGAATAAATAATATAAGATTTTGAATATTACCTCCTTCTTTGACTCTTTTAATTTCTAGTAGTATAGTAGAAAATGGAGCCGGGACAGGTTGAACTGTTTACCCTCCTCTATCTTCTGGAATTGCTCATGCCGGAGCTTCAGTAGATTTAGCTATTTTTTCATTACATTTAGCAGGAATTTCTTCAATTTTAGGGGCAGTAAATTTTATTACAACTGTAATTAATATACGGTCTCCCGGAATTACACTTGATCGAATACCTTTATTTGTTTGATCAGTTGTGATTACAGCAGTATTATTATTATTATCTCTTCCTGTATTAGCTGGAGCTATTACAATATTATTAACAGATCGAAATTTAAATACATCATTTTTCGACCCTGCTGGAGGAGGAGACCCAATTTTATATCAACACTTATTT
We ran BLAST (version 2.3.0) to check if the sequences map back to COI and if they are from at least mosquitoes. All sequences had similarities to COI sequences of other mosquito species of the same genus or other mosquito genera. None of the results were out of the ordinary. On the UConn cluster, we ran the following script:
#!/bin/bash
#Specifies that bash shell has to be use to interpret script
#$ -N blast
#Name of job
#$ -M katherine.nazario@uconn.edu
#Email for status of job
#$ -m bea
#Notify user when job begins,error, or aborts
#$ -S /bin/bash
#Source for bash profile
#$ -cwd
#Execute job prom current working directory
#$ -pe smp 4
#4 cores requested
#$ -o blast_$JOB_ID.out
#output log file
#$ -e blast_$JOB_ID.err
#error log file
module load blast/2.3.0
#Load the Blast
blastn -query culex_combined.fasta -out culex.blast.txt -evalue 1e-10 -outfmt '10 qseqid sscinames sstrand' -db /common/blast/data/nr
#Blast parameters
For each species, the BLAST results showed the sequences were all COI. Most results showed similarity to sequences from species of the same genus or its sister genera. From here, we were ready to start phylogenetic analyses. Below are the first three results for each of the sequences we used for the analyses:
JX260666.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260666.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
JX260666.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
JX260665.1,0,Culex salinarius, cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260665.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260665.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A
JX260667.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260667.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260667.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A
JX260664.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta salinarius],N/A
JX260664.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A
JX260664.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260663.1,0,Culex salinarius,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex salinarius],N/A
JX260663.1,0,Culiseta inornata,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta inornata],N/A
JX260663.1,0,Culiseta melanura,cytochrome oxidase subunit 1, partial (mitochondrion) [Culiseta melanura],N/A
KU877005.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877005.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877005.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877019.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877019.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877019.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877003.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU877003.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU877003.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876994.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU876994.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876994.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876959.1,0,Culex pilosus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex pilosus],N/A
KU876959.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
KU876959.1,0,Culex modestus,cytochrome c oxidase subunit I, partial (mitochondrion) [Culex modestus],N/A
JX259915.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259915.1,0,Culex tarsalis, cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259915.1,0,Culex tarsalis, cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259914.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259914.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259914.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259913.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259912.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259912.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259912.1,0,Culex homunculus,cytochrome oxidase subunit I, partial (mitochondrion) [Culex homunculus],N/A
JX259911.1,0,Culex tarsalis,cytochrome oxidase subunit 1, partial (mitochondrion) [Culex tarsalis],N/A
JX259911.1,0,Culex univittatus,cytochrome c oxidase subunit 1, partial (mitochondrion) [Culex univittatus],N/A
JX259911.1,0,Culex homunculus,cytochrome oxidase subunit I, partial (mitochondrion) [Culex homunculus],N/A
KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A
KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A
KT111921.1,0,Anopheles albitarsis,cytochrome oxidase subunit1, partial (mitochondrion) [Anopheles albitarsis],N/A
Information about viral isolates, localities, and dates were readily available at the CT Agricultural Experiment Station’s website (http://www.ct.gov/caes). Although the data was readily available, it was not easy to download in a form we can easily use in R. We decided to analyze data from the past five years (2012-2015). Weather plays a role in the abundance of mosquitoes during collecting. For instance, a rainy day may lead to lower mosquitoes collection. To help minimize that bias, we choose to look at different years.
To map this some of the factors we wanted to investigate more, we gathered shapefiles that contained information for state/town borders and bedrock. The CT shapefiles and bedrock files came from the CT GIS data at MAGIC(http://magic.lib.uconn.edu/connecticut_data.html). We gathered information about human population density in Connecticut from Connecticut’s Department of Economic and Community Development website (http://www.ct.gov/ecd/).For data on precipitation, we looked at WorldClim (http://www.worldclim.org).
To visualize our data, we used R. Using packages such as ape, ggtree, ggmap, and ggplot, we were able to customize phylogenies. We also used a combination of graphs and tables while looking for patterns. R packages such as ggmap, rgdal, dismo, and ggplot were used for creating maps.
To create the phylogenies, We used SplitsTree4(version 4.14.4), PAUP*(version 4.0b10), FastTree(version 2.1.3), and RAxML(version 8.3.17). We changed the parameters for several of the applications we used. Many of them gave very odd results such as several polytomies. Here we present and duelve into detail on three of the trees that appear to be most probable.
SplitsTreePAUP
We decided to explore PAUP (version 4.0b10) next in order to create neighbor joining trees and used heuristic tree searching in attempt to find more patterns. Although species grouped together into distinct clades, branching patterns with in clades varied. However, JX260666.1S continued to stand out. In order to determine the best model of evolution, we ran an Akaike information criterion (AIC) test and built a phylogeny based on the results. The parameters we used to build a neighbor-joining tree, to perform a heuristic search, and the results of the AIC test are illustrated below. These results led us to look at the sequences further.
paup> log file=culex_AIC_out.txt start replace
#Creates a log of the current session
paup> tonexus from=culex.fasta to=culex.nex dataType=nucleotide format=fasta
#Converts FASTA to Nexus
paup> exe culex.nex
#Executes nexus file
paup> outgroup KT111921.1A
#Sets KT111921.1A as the outgroup
paup> set criterion=likelihood
#Sets tree searching criterion
paup>nj
#Creates a neighbor-joining tree
paup> savetree brlens=yes file=culex_nj.tre
#Save tree
paup> hsearch
#Does a heuristic search
paup> savetree brlens=yes file=culex_hsearch.tre
#Save tree
paup> automodel
#Performs AIC and BIC test
paup> lset nst=6 rclass=(abcdef) rmatrix=(0.12358346 123.41206 126.79993 0.25074559 198.30319) basefreq=(0.27876607 0.15353608 0.16001515) pinv=0.6325611
#Parameters from best model from AIC
paup> bootstrap nreps=100
#bootstrap analysis
paup> savetree brlens=yes file=culex_AIC.tre
#Save to file
paup> log stop
#Stop logging session
Neighbor-joining treeWe also used RAxML and FastTree to investigate the differences we saw further. Commands and results for these two programs will be included in the visualization section below.To visualize the tree in an easy to follow way, we highlighted and labeled the species groups using R. To highlight each of the clades using ggtree, it was a bit difficult to upload the tree. The method we found best was to use a newick format. The newick output from my analyses originally did not have Anopheles as the outgroup so we used FigTree. With FigTree, we were able to reroot the tree and export the tree in newick format. When labeling which clades to highlight, you need to know the node forming that clade. We used the package ape to get node values.
Libraries needed to build the tree
library("ape")
library("ggplot2")
library("ggtree")
Visualizing the tree using AIC’s best model
#Read Tree in Newick format (copied and pasted from output file):
AIC_NWK <- read.tree(text="(((JX260666.1S:0.070371,(((JX260665.1S:0.004864,JX260667.1S:0.004263):0.022939,JX260664.1S:0.004431):0.008176,JX260663.1S:0.007523):0.088486):0.026309,((KU877005.1P:0.0,KU877019.1P:0.0,KU877003.1P:0.0,KU876994.1P:0.0,KU876959.1P:0.0):0.048345,(JX259915.1R:0.001499,(JX259914.1R:0.0,JX259913.1R:0.0,(JX259912.1R:0.0,JX259911.1R:0.001499):0.001499):0.001499):0.031989):0.016595):0.130208,KT111921.1A:0.0);")
AIC <-ggtree(AIC_NWK) #Name tree
AIC_nodes <- AIC + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later
AIC_nodes #View tree with nodes numbered
#Customize tree to final output
AIC_tree <- ggtree(AIC_NWK,size=.5) + #this modifies the thickness of the branches
geom_treescale(0,16) + #Adds scale
geom_tiplab() + #Adds tip labels
geom_cladelabel(node=19, label="Cx.salinarius", align=TRUE, offset=.09,color="blue") + #Adds blue clade label
geom_cladelabel(node=24, label="Cx.pipiens", align=TRUE, offset=.09,color="red") + #Adds red clade label
geom_cladelabel(node=25, label="Cx.restuans", align=TRUE, offset=.09,color="dark green") + #Adds green clade label
geom_hilight(node=19, fill="#0101DF", alpha=.5, extend=.08) + #Highlights salinarius clade blue
geom_hilight(node=24, fill="red", alpha=.4, extend=.08) + #Highlights pipiens clade red
geom_hilight(node=25, fill="darkgreen", alpha=.5, extend=.08) + #Highlights restuans clade green
xlim(0,.5) #sets x limits so tree fits in window
AIC_tree #View tree
The tree using AIC’s best model agrees with the relationships we think would look like based on morphological differences. We wanted to explore this a bit further to see if we get the same relationships!
Visualizing the tree from FastTree
We wanted to test different types of tree building to see if we got the same results. One of the programs we used was FastTree (version 2.1.3). With FastTree, you can import alignments and get a tree quickly without using a lot of memory. It infers maximum likelihood phylogenies. We followed similar protocols when visualizing the tree. Some of the branching patterns were different so we had to adjust some parameters accordingly. We used FastTree installed on UConn’s cluster(See BLAST script for parameter explanations) using the following:
#!/bin/bash
#$ -N blast
#$ -M Katherine.nazario@uconn.edu
#$ -m bea
#$ -S /bin/bash
#$ -cwd
#$ -pe smp 4
#$ -o blast_$JOB_ID.out
#$ -e blast_$JOB_ID.err
FastTree culex.fasta > culex_FastTree.tre
#Read Tree in Newick format (copied and pasted from output file):
fasttree_NWK <- read.tree(text="(((JX260666.1S:0.05943,((JX260665.1S:0.00549,JX260667.1S:0.00391)[&label=0.963]:0.01412,(JX260664.1S:0.00568,JX260663.1S:0.01439)[&label=0.317]:0.00881)[&label=1.0]:0.05985)[&label=0.88]:0.02424,((KU877005.1P:0.0,KU877019.1P:0.0,KU877003.1P:0.0,KU876994.1P:0.0,KU876959.1P:0.0):0.03847,(JX259915.1R:0.00152,((JX259914.1R:0.0,JX259913.1R:0.0):5.4E-4,(JX259912.1R:5.3E-4,JX259911.1R:0.0013)[&label=0.103]:0.00152)[&label=0.927]:0.00131)[&label=0.999]:0.03344)[&label=0.766]:0.01643):0.048275,KT111921.1A:0.048275);")
fasttree <-ggtree(fasttree_NWK) #Name tree
fasttree_nodes <- fasttree + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later
fasttree_nodes #View tree with nodes numbered
#Customize tree to final output
fasttree_tree <- ggtree(fasttree_NWK,size=.5) + #this modifies the thickness of the branches
geom_treescale(0,16) + #Adds scale to top left corner
geom_tiplab() + #Adds tip labels
geom_cladelabel(node=19, label="Cx.salinarius", align=TRUE, offset=.09,color="blue") + #Adds blue clade label
geom_cladelabel(node=24, label="Cx.pipiens", align=TRUE, offset=.09,color="red") + #Adds red clade label
geom_cladelabel(node=25, label="Cx.restuans", align=TRUE, offset=.09,color="dark green") + #Adds green clade label
geom_hilight(node=19, fill="#0101DF", alpha=.5, extend=.08) + #Highlights salinarius clade blue
geom_hilight(node=24, fill="red", alpha=.4, extend=.08) + #Highlights pipiens clade red
geom_hilight(node=25, fill="darkgreen", alpha=.5, extend=.08) + #Highlights restuans clade green
xlim(0,.3) #sets x limits so tree fits in window
fasttree_tree #View tree
This tree illustrates similar results to our AIC best model tree. There are some differences within species but Cx. pipiens and Cx. restuans are more closely related.
Visualizing the tree from RAxML
The last tree building software we tested was RAxML (version 8.3.17). During many Grad seminars, trees are done using RAxML so we decided to give it a shot. However, with RAxML, the input file needs to be in a PHYLIP not Nexus format. We formatting this manually. With the Nexus file we had produced via PAUP, we deleted everything except the matrix. The first line we added the number of taxa and the number of characters (16 658). The first few lines of the PHYLIP file and RAxML parameters we used when analyzing the tree on the UConn cluster were:
#!/bin/bash
#$ -N blast
#$ -M Katherine.nazario@uconn.edu
#$ -m bea
#$ -S /bin/bash
#$ -cwd
#$ -pe smp 4
#$ -o blast_$JOB_ID.out
#$ -e blast_$JOB_ID.err
raxml -m GTRGAMMA -s culex.phylip -N culex_RAxML -o KT111921.1A -N 100 -p 1616 -e 0.00001 -f a -x 1616
-m = model of evolution
-s = input phylip file
-N = output names
-x = Bootstrap Random Number Seed
-e = precision with which model parameters will be estimated
-o = outgroup
-N = number of runs
-p = pseudorandom number seed
-f a = sets up combination of bootstrapping and ML searching
#Read Tree in Newick format (copied and pasted from output file):
RAxML_NWK <- read.tree(text="(((((JX260663.1S:0.02397468441109106008,(JX260664.1S:0.01404653975119892796,(JX260667.1S:0.01892817657512024798,JX260665.1S:0.01168702685188493899):0.07573599826928545387):0.02919144044608793995):0.29155137604914332927,JX260666.1S:0.21313723410477683484):0.12881041226579492687,(((JX259913.1R:0.00000041761126378665,JX259914.1R:0.00000041761126378665):0.00000041761126378665,(JX259912.1R:0.00000041761126378665,JX259911.1R:0.00504292530803613845):0.00504288320049779172):0.00496452002556844671,JX259915.1R:0.00517979166678775218):0.10160254100523570531):0.05364671827476297228,(((KU876959.1P:0.00000041761126378665,KU877003.1P:0.00000041761126378665):0.00000041761126378665,(KU877005.1P:0.00000041761126378665,KU877019.1P:0.00000041761126378665):0.00000041761126378665):0.00000041761126378665,KU876994.1P:0.00000041761126378665):0.11877850631174037555):0.23320577545323131763,KT111921.1A:0.23320577545323131763);")
RAxMLtree <-ggtree(RAxML_NWK) #Name tree
RAxMLtree_nodes <- RAxMLtree + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) #Get node values to figue out clades for later
RAxMLtree_nodes #View tree with nodes numbered
#Customize tree to final output
RAxMLtree_tree <- ggtree(RAxML_NWK,size=.5) + #this modifies the thickness of the branches
geom_treescale(0,16) + #Adds scale
geom_tiplab() + #Adds tip labels
geom_cladelabel(node=20, label="Cx.salinarius", align=TRUE, offset=.3,color="blue") + #Adds blue clade label
geom_cladelabel(node=28, label="Cx.pipiens", align=TRUE, offset=.3,color="red") + #Adds red clade label
geom_cladelabel(node=24, label="Cx.restuans", align=TRUE, offset=.3,color="dark green") + #Adds green clade label
geom_hilight(node=20, fill="#0101DF", alpha=.5, extend=.25) + #Highlights salinarius clade blue
geom_hilight(node=28, fill="red", alpha=.4, extend=.25) + #Highlights pipiens clade red
geom_hilight(node=24, fill="darkgreen", alpha=.5, extend=.25) + #Highlights restuans clade green
xlim(0,1.5) #sets x limits so tree fits in window
RAxMLtree_tree #View tree
When we ran RAxML, we received a warning that some of our sequences might be the same. We did not find this too surprising since some of the individuals were from the same locality. Since we are only using COI, there might not be enough information in this gene to tell these individuals apart. Surprisingly, RAxML grouped Cx. restuans and Cx. salinarius. This supports the theory that the species that are bad vectors for West Nile are more closely related. Which tree is the “true” tree? It might be that we need to consider other factors. Some mosquitoes feed on mammals primarily while others birds. That could be a factor to investigate in the future. We could not find adequate data to look at this ourselves.
In order to visualize the data we collected from the Connecticut Agricultural Experiment Station, we decided to use graphs and tables. They were created using the ggplot2 package. First, we wanted to see if there were any patterns illustrated in Andreadis et al. (2001) and (2004) such as a higher abundance of West Nile Virus in Cx. pipiens. The tables on the Station’s website were not downloadable. It also contained information about all viruses tested. We gathered the data we needed into an excel file. To test the visual appeal, we created graphs with and without the gridlines to test Kelleher & Wagener (2011) visualization guidelines. Graphs without the background are easier to read. Therefore, after the first graph all others have the background removed.
Table: Percent of WNV isolates in CT hosted by Culex species
percent <- matrix(c(82,50,76,69,84,6,37,15,10,7,3,4,3,15,0), ncol=3) #Add values
colnames(percent) <- c('%pipiens', "%restuans","%salinarius") #Add column names
rownames(percent) <-c(2012,2013,2014,2015,2016) #Add row names
percents.table <- as.table(percent) #Convert to table
percents.table
## %pipiens %restuans %salinarius
## 2012 82 6 3
## 2013 50 37 4
## 2014 76 15 3
## 2015 69 10 15
## 2016 84 7 0
In this table, we can see that Cx. pipiens is the species that is found to carry West Nile virus in a higher proportion in every year in our study. West Nile isolates in Cx. salinarius is significantly lower than the other two species.
Graph: Number of West Nile isolates in Connecticut from 2012-2016 with background
WNV_isolates <- read.csv("Mosquito_WNVisolates.csv") #Read table
WNV_isolates #View table
## Year Number.of.West.Nile.isolates Culex.species
## 1 2012 192 pipiens
## 2 2013 45 pipiens
## 3 2014 52 pipiens
## 4 2015 109 pipiens
## 5 2016 103 pipiens
## 6 2012 14 restuans
## 7 2013 33 restuans
## 8 2014 10 restuans
## 9 2015 15 restuans
## 10 2016 8 restuans
## 11 2012 6 salinarius
## 12 2013 4 salinarius
## 13 2014 2 salinarius
## 14 2015 23 salinarius
## 15 2016 0 salinarius
WNV_isolates_graph <-ggplot(data=WNV_isolates,aes(x=Year,y=Number.of.West.Nile.isolates,colour=Culex.species,pch=Culex.species))+ #Define variables
geom_point() + #Plot points
geom_line() + #Plot lines
labs(title="Number of West Nile Isolates in Connecticut from 2012-2016") + #Add title
xlab("Year") + #change x axis label
ylab("Number of WNV isolates") #change y axis label
WNV_isolates_graph #View graph
This graph shows similar results as our table. Cx. pipiens carries a significantly higher amount of West Nile!
Graph: Number of West Nile isolates in Connecticut from 2012-2016 without background
WNV_isolates <- read.csv("Mosquito_WNVisolates.csv") #Read table
WNV_isolates #View table
## Year Number.of.West.Nile.isolates Culex.species
## 1 2012 192 pipiens
## 2 2013 45 pipiens
## 3 2014 52 pipiens
## 4 2015 109 pipiens
## 5 2016 103 pipiens
## 6 2012 14 restuans
## 7 2013 33 restuans
## 8 2014 10 restuans
## 9 2015 15 restuans
## 10 2016 8 restuans
## 11 2012 6 salinarius
## 12 2013 4 salinarius
## 13 2014 2 salinarius
## 14 2015 23 salinarius
## 15 2016 0 salinarius
WNV_isolates_graph <-ggplot(data=WNV_isolates, aes(x=Year,y=Number.of.West.Nile.isolates,colour=Culex.species))+ #Define variables
geom_point() + #Plot points
geom_line() + #Plot lines
labs(title="Number of West Nile Isolates in Connecticut from 2012-2016") + #Add title
xlab("Year") + #change x axis label
ylab("Number of WNV isolates") + #change y axis label
theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank()) #Remove gridlines
WNV_isolates_graph #View graph
Graph: Percent of West Nile isolates in Connecticut from 2012-2016
WNV_percents <- read.csv("Mosquito_WNVPercents.csv") #Read table
WNV_percents #View table
## Year Culex.species percent X
## 1 2012 pipiens 0.82 NA
## 2 2013 pipiens 0.50 NA
## 3 2014 pipiens 0.76 NA
## 4 2015 pipiens 0.69 NA
## 5 2016 pipiens 0.84 NA
## 6 2012 restuans 0.06 NA
## 7 2013 restuans 0.37 NA
## 8 2014 restuans 0.15 NA
## 9 2015 restuans 0.10 NA
## 10 2016 restuans 0.07 NA
## 11 2012 salinarius 0.03 NA
## 12 2013 salinarius 0.04 NA
## 13 2014 salinarius 0.03 NA
## 14 2015 salinarius 0.15 NA
## 15 2016 salinarius 0.00 NA
WNV_isolates_graph <-ggplot(data=WNV_percents, aes(x=Year,y=percent,colour=Culex.species))+ #Define variables
geom_point() + #Plot points
geom_line() + #Plot lines
labs(title="Percent of West Nile Isolates in Connecticut from 2012-2016") + #Add title
xlab("Year") + #change x axis label
ylab("Percent of WNV isolates")+ #change y axis label
theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank()) #Remove gridlines
WNV_isolates_graph #View graph
This is the same data as our table but in a graph. Cx. pipiens stands out when we graphed this data.
Graph: Connecticut Human Population Desity vs WNV isolates in Culex pipiens with statistics
Density <- read.csv("Mosquito_WNVPercents_pipiens.csv")
Density_graph <-ggplot(data=Density, aes(x=Density,y=percent)) + #Define variables
geom_bar(stat="identity")+ #Plots points
labs(title="Human Population Density and West Nile isolates in Culex pipiens") +#Add Title
xlab("CT human population density (in meters squared)") + #change x axis label
ylab("Percent of WNV isolates for Culex pipens") + #change y axis label
theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank()) #Remove gridlines
Density_graph #View graph
cor.test(Density$Density,Density$percent)
##
## Pearson's product-moment correlation
##
## data: Density$Density and Density$percent
## t = -1.3023, df = 3, p-value = 0.2838
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9692982 0.5987856
## sample estimates:
## cor
## -0.6009622
In this graph, we decided to look at Cx. pipiens and see if there is a correlation with human population density in Connecticut. The p-value from the correlation test was greater than 0.05. This indicates that the result are not significant and that we can’t reject the null hypothesis. In this graph, the percent of West Nile virus isolates found in Cx. pipiens varies quite a bit even when human population density does not change much. Even before we did the correlation test, we didn’t think there would be a correlation by looking at the graph.
Graph: Weekly WNV isolates in Culex for 2016
Weekly <- read.csv("Mosquito_Localities.csv")
Weekly
## Week cases species
## 1 1 0 pipiens
## 2 2 0 pipiens
## 3 3 0 pipiens
## 4 4 0 pipiens
## 5 5 1 pipiens
## 6 6 0 pipiens
## 7 7 0 pipiens
## 8 8 12 pipiens
## 9 9 3 pipiens
## 10 10 21 pipiens
## 11 11 12 pipiens
## 12 12 12 pipiens
## 13 13 11 pipiens
## 14 14 3 pipiens
## 15 15 18 pipiens
## 16 16 10 pipiens
## 17 17 4 pipiens
## 18 18 1 pipiens
## 19 19 0 pipiens
## 20 20 0 pipiens
## 21 1 0 restuans
## 22 2 0 restuans
## 23 3 0 restuans
## 24 4 0 restuans
## 25 5 0 restuans
## 26 6 0 restuans
## 27 7 0 restuans
## 28 8 1 restuans
## 29 9 0 restuans
## 30 10 3 restuans
## 31 11 0 restuans
## 32 12 2 restuans
## 33 13 1 restuans
## 34 14 0 restuans
## 35 15 0 restuans
## 36 16 0 restuans
## 37 17 0 restuans
## 38 18 0 restuans
## 39 19 0 restuans
## 40 20 0 restuans
## 41 1 0 salinarius
## 42 2 0 salinarius
## 43 3 0 salinarius
## 44 4 0 salinarius
## 45 5 0 salinarius
## 46 6 0 salinarius
## 47 7 0 salinarius
## 48 8 0 salinarius
## 49 9 0 salinarius
## 50 10 0 salinarius
## 51 11 1 salinarius
## 52 12 0 salinarius
## 53 13 0 salinarius
## 54 14 0 salinarius
## 55 15 0 salinarius
## 56 16 0 salinarius
## 57 17 0 salinarius
## 58 18 0 salinarius
## 59 19 0 salinarius
## 60 20 0 salinarius
weekly_graph <-ggplot(data=Weekly, aes(x=Weekly$Week,y=Weekly$cases,group=species, colour=species)) + #Define variables
geom_line() + #Plot line
geom_point() + #Plot points
labs(title="Weekly WNV Results 2016") + #Add title
xlab("Collection Week") + #Change x label
ylab("Number of WNV isolates") + #Change y label
theme(axis.line = element_line(colour = "black")) + #Add line on X and Y axis
theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border =element_blank(),panel.background = element_blank()) #Remove gridlines
weekly_graph #View graph
We were only able to get weekly data for 2016 from the CT Ag. Station’s website. This is very similar to Andreadis et al. findings over a decade before!
This aspect of the project was conducted on a PC operating on Windows 10.
Libraries needed to build the maps
library(ggmap)
library(rgdal)
library(dismo)
library(ggplot2)
From this study, we were able to determine that 15 years after Andreadis et al. (2001), Culex pipiens continues to be the main vector for West Nile Virus in Connecticut. Several patterns such as seasonal abundance continue the same trends today. Of the 3 Culex species, Culex pipiens is the most active in carrying WNV.
combined_tree<-multiplot(AIC_tree,fasttree_tree,RAxMLtree_tree,ncol=1,labels,plotlist="1=AIC 2=FastTree 3=RAxML",height=7)
#Tree 1 = AIC
#Tree 2 = FastTree
#Tree 3 = RAxML
Miller et al. (1996) Phylogeny
They had a much larger variety of samples than we did. They also used a different data type. It may be that COI for this group, there are not enough informative sites to resolve the tree. Seeing that Cx. restuans, and Cx. salinarius are more closely related and are not good vectors of the virus, it appears as though there might be a genetic component to this. If we had genomes for each of the species, this would have been a great topic to explore more. We also wanted to try out JModelTest which is a statistical tool which determines best-fit models of nucleotide substitution. However, we could not get the program to work. We tried using it on the UConn cluster, locally on our computers, but it didn’t work.
As we looked through a lot of data and shapefiles, we realized that we weren’t able to answer some of our questions with the data that is publically available. First, environmental factors such as precipitation and humidity didn’t have much of a gradient over the state. It is probably just too small of an area to have much of a difference across it. So we weren’t able to use those parameters to create maps. Although, we were able to use bedrock data over CT, we did not see a relationship. We had a difficult time trying to add human population density to the CT shapefiles that we had available, so we weren’t able to answer that question with a map, but we were able to create plots that represented this data. After doing correlations tests, it looks as though human population density and West Nile Virus isolates are not correlated. This may be due to a bias in the data. Mosquitoes are only collected in localities where there are people. Also, the summer is full of travel so a yearly population consensus might not be appropriate to analyse this.
Are Cx. restuans, and Cx. pipiens more closely related to each other than Cx. salinarius?Maybe
With the conflicting data we gathered when building the phylogenies we cannot say for certain who’s more related to who. Previous researchers claim Cx. restuans and Cx. salinarius are more closely related. If multiple genes are used or genomes are assembled, this data can be improved. These relationship highly important to distinguish. This can help improve mosquito repellents to target specific species. It can also be useful when determining where West Nile virus can spread.
In the future it would be good to be able to have genomes for each of the species. An alternative we could have done if the data was available, was to look at sequences from each of the species from different localities. If we collected mosquito data from localities, other than the ones the Ag. station looks at, would we have seen the same results? It would have been very interesting to do some work looking at which organisms each of the species feeds on.
Anderson, J., Andreadis, T., Main, A., & Kline, D. (2004). Prevalence of West Nile virus in tree canopy-inhabiting Culex pipiens and associated mosquitoes. The American Journal of Tropical Medicine and Hygiene, 71(1), 112-9.
Andreadis, T., Anderson, J., & Vossbrinck, C. (2001). Mosquito surveillance for West Nile virus in Connecticut, 2000: Isolation from Culex pipiens, Cx. restuans, Cx. salinarius, and Culiseta melanura. Emerging Infectious Diseases, 7(4), 670-674.
Andreadis, T.G., Anderson, J.F., Vossbrinck, C.R., & Main, A.J. (2004). Epidemiology of West Nile virus in Connecticut: A five-year analysis of mosquito data 1999-2003. Vector Borne and Zoonotic Diseases, 4(4), 360-378.
Diuk-Wasser, M. A., Brown, H. D., Fish, D. G., & Andreadis, T. (2006). Modeling the spatial distribution of mosquito vectors West Nile virus in Connecticut, USA. Vector-Borne and Zoonotic Diseases, 6(3), 283-295.
Miller, B. R., Crabtree, M. B. and Savage, H. M. (1996), Phylogeny of fourteen Culex mosquito species, including the Culex pipiens complex, inferred from the internal transcribed spacers of ribosomal DNA. Insect Molecular Biology, 5: 93–107.
Molaei, G. G., Andreadis, T. M., Armstrong, P., & Diuk-Wasser, M. (2008). Host-feeding patterns of potential mosquito vectors in connecticut, USA: Molecular analysis of bloodmeals from 23 species of Aedes, Anopheles, Culex, Coquillettidia, Psorophora, and Uranotaenia. Journal of Medical Entomology, 45(6), 1143-1151.