Project 2 West Nile Virus

Nick Oliver

Project 2 - West Nile Virus

https://www.cdc.gov/westnile/statsmaps/cumMapsData.html

For this analysis I used Victoria McEleney’s suggested data sets on the West Nile Virus by state1.

There are two tables here with the cases each year in a separate column:

West Nile virus disease cases reported to CDC by state of residence, 1999-20192

West Nile virus neuroinvasive disease cases reported to CDC by state of residence, 1999-20193

The totals per state & per year are already calculated, but the means are yet to be calculated.

The year columns could be pivoted longer and the 2 tables could be combined. Percent of positive cases that developed into neuroinvasive disease could be calculated (per year / per state).

Setup

Load Libraries

library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(readr)

Load Data

westNileCasesUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-disease-cases-by-state_1999-2019-P.csv'
westNileCasesRaw <- read.csv(westNileCasesUrl, sep = " ", skip = 1)

westNileNeuroUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/westnile/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.csv'
westNileNeuroRaw <- read.csv(westNileNeuroUrl, sep = " ", skip = 1)

Tidy

westNileCasesDf <- westNileCasesRaw %>% 
  as.data.frame()
westNileNeuroDf <- westNileNeuroRaw %>%
  as.data.frame()

Fix DC because it’s the only one that has three spaces in the name

fix_dc <- function(df){
  # District of Columbia 9,78
dc <- df[9,] %>% unite(State,1:3, sep = " ")
df[9,]$State <- dc[1,1]
df[78,]$State <- dc[1,1]

df[9,2:3] <- df[10,1:2]
df[78,2:3] <- df[79,1:2]
df
}
westNileCasesDf<-fix_dc(westNileCasesDf)
westNileNeuroDf <-fix_dc(westNileNeuroDf)
fix_states <- function(df){
  #there is probably a clever way to find these
  statesWithSpaces <- c(31,33,35,37,39,41,47,49,51,53,61,100,102,104,106,108,110,116,118,120,122,130)
  for (index in statesWithSpaces) {
    dc <- df[index,] %>% unite(State,1:2, sep = " ")
    df[index,]$State <- dc[1,1]
    df[index,2] <- df[index+1,1]
  
  }
  df
}
westNileCasesDf <- fix_states(westNileCasesDf)
westNileNeuroDf <- fix_states(westNileNeuroDf)

Filter all the columns with bad/meaningless data. Had to do gsub because there were some rows where the number had a comma, which meant it was parsed as NA and didn’t get filtered across(where(is.character), str_trim))

casesDf <- westNileCasesDf %>% 
  filter(is.na(as.numeric(gsub(",", "",State)))) %>% 
  mutate(across(where(is.character), str_trim))
## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
nueroDf <- westNileNeuroDf %>% 
  filter(is.na(as.numeric(gsub(",", "",State)))) %>% 
  mutate(across(where(is.character), str_trim))
## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion

Now we have two data sets combined into one. I will need to split the dataframe and recombined it by state.

recombine <- function(df) {
  df_one <- df[1:52,]
  df_two <- df[57:109,]
  names(df_two) <- df_two[1,]
  df_two <- df_two[-c(1),]
  bind_cols(df_one,df_two) %>% 
    subset(select = -c(`State...13`))
}
finalCasesDf <- recombine(casesDf) %>%
  pivot_longer(-State...1) %>%
  mutate(value = as.numeric(gsub(",","",value)))
## New names:
## * State -> State...1
## * State -> State...13
finalNeuroDf <- recombine(nueroDf) %>% 
  pivot_longer(-State...1) %>%
  mutate(value = as.numeric(gsub(",","",value)))
## New names:
## * State -> State...1
## * State -> State...13
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Analysis

sumCases <- finalCasesDf %>%
  group_by(State...1) %>% 
  summarize(m = mean(value)) %>%
  arrange(desc(m))

sumNeuro <- finalNeuroDf %>%
  group_by(State...1) %>% 
  summarize(m = mean(value))%>%
  arrange(desc(m))

final <- inner_join(sumCases, sumNeuro, by = "State...1")

kable(final,caption = "Mean West Nile By State", format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")
Mean West Nile By State
State…1 m.x m.y
California 638.7272727 372.0000000
Colorado 513.4545455 125.6363636
Texas 508.1818182 308.1818182
Nebraska 363.6363636 72.6363636
Illinois 242.0000000 154.6363636
South Dakota 237.5454545 NA
Arizona 175.2727273 110.7272727
North Dakota 174.2727273 NA
Louisiana 167.3636364 101.2727273
Mississippi 131.0000000 71.7272727
Idaho 125.2727273 21.0909091
Michigan 119.8181818 101.1818182
Ohio 95.0909091 67.3636364
New York 88.4545455 NA
Minnesota 75.0000000 30.8181818
Oklahoma 74.4545455 45.3636364
Wyoming 69.1818182 16.7272727
Indiana 65.8181818 38.7272727
Pennsylvania 63.2727273 NA
Kansas 61.5454545 31.5454545
New Mexico 59.2727273 NA
Montana 56.4545455 17.1818182
Iowa 56.2727273 28.3636364
Missouri 56.2727273 42.4545455
Georgia 44.2727273 26.9090909
Utah 41.2727273 18.1818182
Nevada 38.8181818 NA
Florida 36.7272727 27.9090909
Maryland 35.6363636 24.0909091
Alabama 33.2727273 22.0909091
Arkansas 31.9090909 23.9090909
Wisconsin 30.7272727 20.5454545
Tennessee 30.0000000 22.0909091
New Jersey 29.1818182 NA
Virginia 20.9090909 14.9090909
Massachusetts 18.7272727 14.3636364
Kentucky 17.6363636 12.0909091
Oregon 17.1818182 4.1818182
Connecticut 14.3636364 9.7272727
Washington 10.6363636 NA
South Carolina 9.7272727 NA
North Carolina 7.4545455 NA
Delaware 4.7272727 NA
West Virginia 2.1818182 NA
Rhode Island 2.0000000 NA
Vermont 1.4545455 0.7272727
New Hampshire 0.6363636 NA
Maine 0.3636364 0.2727273
Alaska 0.1818182 0.0909091
Hawaii 0.0909091 0.0000000
Puerto Rico 0.0909091 NA

Conclusions

It appears based on my analysis that the states with the highest population had the highest instances of both West Nile and Neuroinvasive West Nile.

References