Project 2 - West Nile Virus
https://www.cdc.gov/westnile/statsmaps/cumMapsData.html
For this analysis I used Victoria McEleney’s suggested data sets on the West Nile Virus by state1.
There are two tables here with the cases each year in a separate column:
West Nile virus disease cases reported to CDC by state of residence, 1999-20192
West Nile virus neuroinvasive disease cases reported to CDC by state of residence, 1999-20193
The totals per state & per year are already calculated, but the means are yet to be calculated.
The year columns could be pivoted longer and the 2 tables could be combined. Percent of positive cases that developed into neuroinvasive disease could be calculated (per year / per state).
Setup
Load Libraries
library(stringr)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(kableExtra)##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(readr)Load Data
westNileCasesUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-disease-cases-by-state_1999-2019-P.csv'
westNileCasesRaw <- read.csv(westNileCasesUrl, sep = " ", skip = 1)
westNileNeuroUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.csv'
westNileNeuroRaw <- read.csv(westNileNeuroUrl, sep = " ", skip = 1)Tidy
westNileCasesDf <- westNileCasesRaw %>%
as.data.frame()
westNileNeuroDf <- westNileNeuroRaw %>%
as.data.frame()Fix DC because it’s the only one that has three spaces in the name
fix_dc <- function(df){
# District of Columbia 9,78
dc <- df[9,] %>% unite(State,1:3, sep = " ")
df[9,]$State <- dc[1,1]
df[78,]$State <- dc[1,1]
df[9,2:3] <- df[10,1:2]
df[78,2:3] <- df[79,1:2]
df
}
westNileCasesDf<-fix_dc(westNileCasesDf)
westNileNeuroDf <-fix_dc(westNileNeuroDf)fix_states <- function(df){
#there is probably a clever way to find these
statesWithSpaces <- c(31,33,35,37,39,41,47,49,51,53,61,100,102,104,106,108,110,116,118,120,122,130)
for (index in statesWithSpaces) {
dc <- df[index,] %>% unite(State,1:2, sep = " ")
df[index,]$State <- dc[1,1]
df[index,2] <- df[index+1,1]
}
df
}
westNileCasesDf <- fix_states(westNileCasesDf)
westNileNeuroDf <- fix_states(westNileNeuroDf)Filter all the columns with bad/meaningless data. Had to do gsub because there were some rows where the number had a comma, which meant it was parsed as NA and didn’t get filtered across(where(is.character), str_trim))
casesDf <- westNileCasesDf %>%
filter(is.na(as.numeric(gsub(",", "",State)))) %>%
mutate(across(where(is.character), str_trim))## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
nueroDf <- westNileNeuroDf %>%
filter(is.na(as.numeric(gsub(",", "",State)))) %>%
mutate(across(where(is.character), str_trim))## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
Now we have two data sets combined into one. I will need to split the dataframe and recombined it by state.
recombine <- function(df) {
df_one <- df[1:52,]
df_two <- df[57:109,]
names(df_two) <- df_two[1,]
df_two <- df_two[-c(1),]
bind_cols(df_one,df_two) %>%
subset(select = -c(`State...13`))
}finalCasesDf <- recombine(casesDf) %>%
pivot_longer(-State...1) %>%
mutate(value = as.numeric(gsub(",","",value)))## New names:
## * State -> State...1
## * State -> State...13
finalNeuroDf <- recombine(nueroDf) %>%
pivot_longer(-State...1) %>%
mutate(value = as.numeric(gsub(",","",value)))## New names:
## * State -> State...1
## * State -> State...13
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
Analysis
sumCases <- finalCasesDf %>%
group_by(State...1) %>%
summarize(m = mean(value)) %>%
arrange(desc(m))
sumNeuro <- finalNeuroDf %>%
group_by(State...1) %>%
summarize(m = mean(value))%>%
arrange(desc(m))
final <- inner_join(sumCases, sumNeuro, by = "State...1")
kable(final,caption = "Mean West Nile By State", format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")| State…1 | m.x | m.y |
|---|---|---|
| California | 638.7272727 | 372.0000000 |
| Colorado | 513.4545455 | 125.6363636 |
| Texas | 508.1818182 | 308.1818182 |
| Nebraska | 363.6363636 | 72.6363636 |
| Illinois | 242.0000000 | 154.6363636 |
| South Dakota | 237.5454545 | NA |
| Arizona | 175.2727273 | 110.7272727 |
| North Dakota | 174.2727273 | NA |
| Louisiana | 167.3636364 | 101.2727273 |
| Mississippi | 131.0000000 | 71.7272727 |
| Idaho | 125.2727273 | 21.0909091 |
| Michigan | 119.8181818 | 101.1818182 |
| Ohio | 95.0909091 | 67.3636364 |
| New York | 88.4545455 | NA |
| Minnesota | 75.0000000 | 30.8181818 |
| Oklahoma | 74.4545455 | 45.3636364 |
| Wyoming | 69.1818182 | 16.7272727 |
| Indiana | 65.8181818 | 38.7272727 |
| Pennsylvania | 63.2727273 | NA |
| Kansas | 61.5454545 | 31.5454545 |
| New Mexico | 59.2727273 | NA |
| Montana | 56.4545455 | 17.1818182 |
| Iowa | 56.2727273 | 28.3636364 |
| Missouri | 56.2727273 | 42.4545455 |
| Georgia | 44.2727273 | 26.9090909 |
| Utah | 41.2727273 | 18.1818182 |
| Nevada | 38.8181818 | NA |
| Florida | 36.7272727 | 27.9090909 |
| Maryland | 35.6363636 | 24.0909091 |
| Alabama | 33.2727273 | 22.0909091 |
| Arkansas | 31.9090909 | 23.9090909 |
| Wisconsin | 30.7272727 | 20.5454545 |
| Tennessee | 30.0000000 | 22.0909091 |
| New Jersey | 29.1818182 | NA |
| Virginia | 20.9090909 | 14.9090909 |
| Massachusetts | 18.7272727 | 14.3636364 |
| Kentucky | 17.6363636 | 12.0909091 |
| Oregon | 17.1818182 | 4.1818182 |
| Connecticut | 14.3636364 | 9.7272727 |
| Washington | 10.6363636 | NA |
| South Carolina | 9.7272727 | NA |
| North Carolina | 7.4545455 | NA |
| Delaware | 4.7272727 | NA |
| West Virginia | 2.1818182 | NA |
| Rhode Island | 2.0000000 | NA |
| Vermont | 1.4545455 | 0.7272727 |
| New Hampshire | 0.6363636 | NA |
| Maine | 0.3636364 | 0.2727273 |
| Alaska | 0.1818182 | 0.0909091 |
| Hawaii | 0.0909091 | 0.0000000 |
| Puerto Rico | 0.0909091 | NA |
Conclusions
It appears based on my analysis that the states with the highest population had the highest instances of both West Nile and Neuroinvasive West Nile.
References
https://bbhosted.cuny.edu/webapps/discussionboard/do/message?action=list_messages&course_id=_2010109_1&nav=discussion_board_entry&conf_id=_2342994_1&forum_id=_2992508_1&message_id=_54088437_1↩︎
West Nile virus disease cases reported to CDC by State of … (n.d.). Retrieved October 4, 2021, from https://www.cdc.gov/westnile/resources/pdfs/data/West-Nile-virus-disease-cases-by-state_1999-2019-P.pdf.↩︎
West Nile virus neuroinvasive disease cases reported to … (n.d.). Retrieved October 4, 2021, from https://www.cdc.gov/westnile/resources/pdfs/data/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.pdf.↩︎