Questions: The questions for this data set was provided by Victoria McEleney
West Nile virus disease cases reported to CDC by state of residence, 1999-2019
West Nile virus neuroinvasive disease cases reported to CDC by state of residence, 1999-2019
The totals per state & per year are already calculated, but the means are yet to be calculated.
The year columns could be pivoted longer and the 2 tables could be combined. Percent of positive cases that developed into neuroinvasive disease could be calculated (per year / per state).
Load the libraries
library(tidyverse)
Load the data
url_westnile_disease <- "https://raw.githubusercontent.com/chinedu2301/DATA607-Data-Acquisition-and-Management/main/West-Nile-virus-disease-cases-by-state_1999-2019-P.csv"
url_westnile_neuro <- "https://raw.githubusercontent.com/chinedu2301/DATA607-Data-Acquisition-and-Management/main/West-Nile-virus-neuroinvasive-disease-cases-by-state_1999-2019-P.csv"
westnile_disease <- read_csv(url_westnile_disease)
westnile_neuro <- read_csv(url_westnile_neuro, skip = 1)
Check the each of the datasets
#View the west nile disease dataset
westnile_disease
#View the west nile disease cases that turned neuro invasive
westnile_neuro
These two data sets are wide and contain a lot of unnecessary rows. Hence, we subset the data sets to contain only the rows for each state and then transform the data sets from wide to long.
Subset the data sets
#slice the west nile disease dataset
westnile_disease <- westnile_disease %>% slice(1:52)
westnile_disease
#slice the west nile neuro invasive dataset
westnile_neuro <- westnile_neuro %>% slice(1:52)
westnile_neuro
Transform the tables from wide to long
westnile_d <- westnile_disease %>% select(-Total) %>%
gather(key = "Year", value = "cases", -State) %>% arrange(State)
westnile_n <- westnile_neuro %>% select(-Total) %>%
gather(key = "Year", value = "neuroinvasive_cases", -State) %>% arrange(State)
View the new long data sets
West nile disease cases
westnile_d
West nile neuro invasive cases
westnile_n
Combine the two data sets
westnile <- cbind(westnile_d, neuroinvasive_cases = westnile_n$neuroinvasive_cases)
westnile
Find the average (mean) cases for each state: Group by State
westnile_mean <- westnile %>% group_by(State) %>%
summarise(Avg_cases = round(mean(cases),0), Avg_neuroinvasive_cases = round(mean(neuroinvasive_cases),0))
westnile_mean
Find percentage of cases that turn neuroinvasive
westnile_percent_neuro <- westnile_mean %>%
mutate(percent_neuroinvasive = round((Avg_neuroinvasive_cases/Avg_cases)*100, 2))
westnile_percent_neuro
Replace the NaN values with NA
westnile_percent_neuro$percent_neuroinvasive[is.nan(westnile_percent_neuro$percent_neuroinvasive)] <- NA
westnile_percent_neuro
Construct a bar graph
westnile_percent_neuro %>% top_n(30) %>% ggplot(aes(reorder(State, percent_neuroinvasive), percent_neuroinvasive)) +
geom_col(fill = "brown") + coord_flip() + xlab("") + labs(title = "Percentage of West Nile virus cases that turned neuroinvasive")
## Selecting by percent_neuroinvasive
Check summary for the percent_neuroinvasive cases by state
summary(westnile_percent_neuro$percent_neuroinvasive)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 50.00 60.87 57.13 71.66 100.00 5
Find the average (mean) cases for each state: Grouping by Year
westnile_month <- westnile %>% group_by(Year) %>%
summarise(Avg_cases = round(mean(cases),0), Avg_neuroinvasive_cases = round(mean(neuroinvasive_cases),0))
westnile_percent_month <- westnile_month %>%
mutate(percent_neuroinvasive = round((Avg_neuroinvasive_cases/Avg_cases)*100, 2))
#Replace NaN values with NA
westnile_percent_month$percent_neuroinvasive[is.nan(westnile_percent_month$percent_neuroinvasive)] <- NA
westnile_percent_month
Plot a graph of percent neuroinvasive by year
ggplot(westnile_percent_month, aes(Year, percent_neuroinvasive)) + geom_col(fill = "brown") +
theme_bw() + labs(title = "Percentage of Neuroinvasive westnile virus cases by year")
## Warning: Removed 1 rows containing missing values (position_stack).
Summary statistics for the percentage of west nile virus cases that turned neuroinvasive per year
summary(westnile_percent_month$percent_neuroinvasive)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 28.95 48.73 60.49 58.33 64.89 100.00 1
Conclusion: We can see that using the average westnile virus cases from 1999 to 2019 for each state, the median percentage of the westnile virus cases that turn neuroinvasive is about 60% for each state. Also, for each year since 1990, about 60% of westnile virus cases turned neuroinvasive.