Project 2 - West Nile Virus

https://www.cdc.gov/westnile/statsmaps/cumMapsData.html

For this analysis I used Victoria McEleney’s suggested data sets on the West Nile Virus by state¹.

There are two tables here with the cases each year in a separate column:

West Nile virus disease cases reported to CDC by state of residence, 1999-2019²

West Nile virus neuroinvasive disease cases reported to CDC by state of residence, 1999-2019³

The totals per state & per year are already calculated, but the means are yet to be calculated.

The year columns could be pivoted longer and the 2 tables could be combined. Percent of positive cases that developed into neuroinvasive disease could be calculated (per year / per state).

Setup

Load Libraries

library(stringr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(readr)

Load Data

westNileCasesUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-disease-cases-by-state_1999-2019-P.csv'
westNileCasesRaw <- read.csv(westNileCasesUrl, sep = " ", skip = 1)

westNileNeuroUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.csv'
westNileNeuroRaw <- read.csv(westNileNeuroUrl, sep = " ", skip = 1)

Tidy

westNileCasesDf <- westNileCasesRaw %>% 
  as.data.frame()
westNileNeuroDf <- westNileNeuroRaw %>%
  as.data.frame()

Fix DC because it’s the only one that has three spaces in the name

fix_dc <- function(df){
  # District of Columbia 9,78
dc <- df[9,] %>% unite(State,1:3, sep = " ")
df[9,]$State <- dc[1,1]
df[78,]$State <- dc[1,1]

df[9,2:3] <- df[10,1:2]
df[78,2:3] <- df[79,1:2]
df
}
westNileCasesDf<-fix_dc(westNileCasesDf)
westNileNeuroDf <-fix_dc(westNileNeuroDf)

fix_states <- function(df){
  #there is probably a clever way to find these
  statesWithSpaces <- c(31,33,35,37,39,41,47,49,51,53,61,100,102,104,106,108,110,116,118,120,122,130)
  for (index in statesWithSpaces) {
    dc <- df[index,] %>% unite(State,1:2, sep = " ")
    df[index,]$State <- dc[1,1]
    df[index,2] <- df[index+1,1]
  
  }
  df
}
westNileCasesDf <- fix_states(westNileCasesDf)
westNileNeuroDf <- fix_states(westNileNeuroDf)

Filter all the columns with bad/meaningless data. Had to do gsub because there were some rows where the number had a comma, which meant it was parsed as NA and didn’t get filtered across(where(is.character), str_trim))

casesDf <- westNileCasesDf %>% 
  filter(is.na(as.numeric(gsub(",", "",State)))) %>% 
  mutate(across(where(is.character), str_trim))

## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion

nueroDf <- westNileNeuroDf %>% 
  filter(is.na(as.numeric(gsub(",", "",State)))) %>% 
  mutate(across(where(is.character), str_trim))

## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion

Now we have two data sets combined into one. I will need to split the dataframe and recombined it by state.

recombine <- function(df) {
  df_one <- df[1:52,]
  df_two <- df[57:109,]
  names(df_two) <- df_two[1,]
  df_two <- df_two[-c(1),]
  bind_cols(df_one,df_two) %>% 
    subset(select = -c(`State...13`))
}

finalCasesDf <- recombine(casesDf) %>%
  pivot_longer(-State...1) %>%
  mutate(value = as.numeric(gsub(",","",value)))

## New names:
## * State -> State...1
## * State -> State...13

finalNeuroDf <- recombine(nueroDf) %>% 
  pivot_longer(-State...1) %>%
  mutate(value = as.numeric(gsub(",","",value)))

## New names:
## * State -> State...1
## * State -> State...13

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Analysis

sumCases <- finalCasesDf %>%
  group_by(State...1) %>% 
  summarize(m = mean(value)) %>%
  arrange(desc(m))

sumNeuro <- finalNeuroDf %>%
  group_by(State...1) %>% 
  summarize(m = mean(value))%>%
  arrange(desc(m))

final <- inner_join(sumCases, sumNeuro, by = "State...1")

kable(final,caption = "Mean West Nile By State", format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")

Mean West Nile By State
State…1	m.x	m.y
California	638.7272727	372.0000000
Colorado	513.4545455	125.6363636
Texas	508.1818182	308.1818182
Nebraska	363.6363636	72.6363636
Illinois	242.0000000	154.6363636
South Dakota	237.5454545	NA
Arizona	175.2727273	110.7272727
North Dakota	174.2727273	NA
Louisiana	167.3636364	101.2727273
Mississippi	131.0000000	71.7272727
Idaho	125.2727273	21.0909091
Michigan	119.8181818	101.1818182
Ohio	95.0909091	67.3636364
New York	88.4545455	NA
Minnesota	75.0000000	30.8181818
Oklahoma	74.4545455	45.3636364
Wyoming	69.1818182	16.7272727
Indiana	65.8181818	38.7272727
Pennsylvania	63.2727273	NA
Kansas	61.5454545	31.5454545
New Mexico	59.2727273	NA
Montana	56.4545455	17.1818182
Iowa	56.2727273	28.3636364
Missouri	56.2727273	42.4545455
Georgia	44.2727273	26.9090909
Utah	41.2727273	18.1818182
Nevada	38.8181818	NA
Florida	36.7272727	27.9090909
Maryland	35.6363636	24.0909091
Alabama	33.2727273	22.0909091
Arkansas	31.9090909	23.9090909
Wisconsin	30.7272727	20.5454545
Tennessee	30.0000000	22.0909091
New Jersey	29.1818182	NA
Virginia	20.9090909	14.9090909
Massachusetts	18.7272727	14.3636364
Kentucky	17.6363636	12.0909091
Oregon	17.1818182	4.1818182
Connecticut	14.3636364	9.7272727
Washington	10.6363636	NA
South Carolina	9.7272727	NA
North Carolina	7.4545455	NA
Delaware	4.7272727	NA
West Virginia	2.1818182	NA
Rhode Island	2.0000000	NA
Vermont	1.4545455	0.7272727
New Hampshire	0.6363636	NA
Maine	0.3636364	0.2727273
Alaska	0.1818182	0.0909091
Hawaii	0.0909091	0.0000000
Puerto Rico	0.0909091	NA

Conclusions

It appears based on my analysis that the states with the highest population had the highest instances of both West Nile and Neuroinvasive West Nile.

References

https://bbhosted.cuny.edu/webapps/discussionboard/do/message?action=list_messages&course_id=_2010109_1&nav=discussion_board_entry&conf_id=_2342994_1&forum_id=_2992508_1&message_id=_54088437_1 ↩︎
West Nile virus disease cases reported to CDC by State of … (n.d.). Retrieved October 4, 2021, from https://www.cdc.gov/westnile/resources/pdfs/data/West-Nile-virus-disease-cases-by-state_1999-2019-P.pdf.↩︎
West Nile virus neuroinvasive disease cases reported to … (n.d.). Retrieved October 4, 2021, from https://www.cdc.gov/westnile/resources/pdfs/data/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.pdf.↩︎

Project 2 West Nile Virus

Nick Oliver