Github Link
Web Link
Deployment app for Question1
Data about mortality from all 50 states and the District of Columbia.Please access it at https://github.com/charleyferrari/CUNY_DATA608/tree/master/module3/data.
You are invited to gather more data from our provider, the CDC WONDER system, at https://wonder.cdc.gov
This assignment must be done in R. It must be done using the ‘shiny’ package. It is recommended you use an R package that supports interactive graphing such as plotly, or vegalite, but this is not required. Your apps must be deployed, I won’t be accepting raw files. Luckily, you can pretty easily deploy apps with a free account at shinyapps.io
There is one dataset recorded by Centers for Disease Control and Prevention (CDC) about mortality from 1999-2010 for U.S. States. The dataset is provided by Instructor:Charley Ferrari. This data comes in csv files and we will use R-programming language to acquire the dataset pre-stored in Github repository.
mortality_df <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module3/data/cleaned-cdc-mortality-1999-2010-2.csv", header = TRUE, stringsAsFactors=FALSE)
#write.csv(mortality_df,"~/R/Data608_Module3\\mortality.csv", row.names = FALSE)
head(mortality_df)
## ICD.Chapter State Year Deaths Population
## 1 Certain infectious and parasitic diseases AL 1999 1092 4430141
## 2 Certain infectious and parasitic diseases AL 2000 1188 4447100
## 3 Certain infectious and parasitic diseases AL 2001 1211 4467634
## 4 Certain infectious and parasitic diseases AL 2002 1215 4480089
## 5 Certain infectious and parasitic diseases AL 2003 1350 4503491
## 6 Certain infectious and parasitic diseases AL 2004 1251 4530729
## Crude.Rate
## 1 24.6
## 2 26.7
## 3 27.1
## 4 27.1
## 5 30.0
## 6 27.6
The dataset include 9961 observations and 06 variables. All values are numerical of type integer excepted the variable “State” that has a character datatype. Luckly, there is no missiing data. Therefore, we don’t have to deal with missing data.
str(mortality_df)
## 'data.frame': 9961 obs. of 6 variables:
## $ ICD.Chapter: chr "Certain infectious and parasitic diseases" "Certain infectious and parasitic diseases" "Certain infectious and parasitic diseases" "Certain infectious and parasitic diseases" ...
## $ State : chr "AL" "AL" "AL" "AL" ...
## $ Year : int 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 ...
## $ Deaths : int 1092 1188 1211 1215 1350 1251 1303 1312 1241 1385 ...
## $ Population : int 4430141 4447100 4467634 4480089 4503491 4530729 4569805 4628981 4672840 4718206 ...
## $ Crude.Rate : num 24.6 26.7 27.1 27.1 30 27.6 28.5 28.3 26.6 29.4 ...
#view(mortality_df)
sum(is.na(mortality_df))
## [1] 0
summary(mortality_df)
## ICD.Chapter State Year Deaths
## Length:9961 Length:9961 Min. :1999 Min. : 10
## Class :character Class :character 1st Qu.:2002 1st Qu.: 177
## Mode :character Mode :character Median :2005 Median : 667
## Mean :2005 Mean : 2929
## 3rd Qu.:2008 3rd Qu.: 2474
## Max. :2010 Max. :96511
## Population Crude.Rate
## Min. : 491780 Min. : 0.00
## 1st Qu.: 1728292 1st Qu.: 4.60
## Median : 4219239 Median : 24.00
## Mean : 5937896 Mean : 52.15
## 3rd Qu.: 6562231 3rd Qu.: 50.50
## Max. :37253956 Max. :478.40
##
## Attaching package: 'shiny'
## The following object is masked from 'package:rsconnect':
##
## serverInfo
## Warning: package 'plotly' was built under R version 4.0.5
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Warning: package 'RCurl' was built under R version 4.0.5
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
Let’s explore the CDC report for the state of Oregon from 1999-2010. We could also transform the data into time series to do some forecasting. We could also focus on other variables and this can be done by filter() or select().
#mortality_df %>%
#group_by(State) %>%
#mutate() %>%
#arrange(desc()) %>%
#top_n(15)%>%
#filter() %>%# adjusting the legend
#autoplot(Deaths) + labs(title= "GDP per capital", y = "Currency in US Dollars")
df1 <- mortality_df %>%
filter(State == "OR")
ggplot(df1, aes(x = Year, y = Deaths,
group = interaction(State, ICD.Chapter),
colour = ICD.Chapter)) +
geom_line() + labs(title= "Centers for Disease Control and Prevention (CDC) Report on Diseases in Oregon 19990 2010", y = "Number of Deaths")
df2s <- mortality_df %>%
filter(State == "OR" | State == "WA" | State == "CA") %>%
filter(ICD.Chapter == "Neoplasms")
head(df2s)
## ICD.Chapter State Year Deaths Population Crude.Rate
## 1 Neoplasms CA 1999 54197 33499204 161.8
## 2 Neoplasms CA 2000 54338 33871648 160.4
## 3 Neoplasms CA 2001 55095 34479458 159.8
## 4 Neoplasms CA 2002 55400 34871843 158.9
## 5 Neoplasms CA 2003 55607 35253159 157.7
## 6 Neoplasms CA 2004 54911 35574576 154.4
ggplot(df2s, aes(x = Year, y = Deaths,
group = interaction(State, ICD.Chapter),
colour = State)) +
geom_line() + labs(title= "Comparing Centers for Disease Control and Prevention (CDC) Report on Neoplasms Disease in California, Oregon and Washington 1990-2010", y = "Number of Deaths")
# df1as <- mortality_df %>%
# dplyr::select(Year, Deaths, Population, Crude.Rate) %>% ## can remove some variables
# gather(key = "variable", value = "value", -Year)
# ggplot(df1as, aes(x = Year, y = value)) +
# geom_line(aes(color = variable, linetype = variable)) +
# scale_color_manual(values = c("darkred", "steelblue"))
#my1 <- ts (name of the data frame, [,2], start = year,
# month, date, frequency = in my case it was 31)
df3 <- mortality_df %>%
filter(ICD.Chapter == "Neoplasms" & Year == 2010 )%>%
arrange(desc(Crude.Rate))
plot1 <- ggplot(df3, aes(x=reorder(State, -Crude.Rate), y = Crude.Rate))+
geom_col(fill=rainbow(51)) +
coord_flip() +
geom_text(aes(label=Crude.Rate), size = 4, hjust = -0, color = 'blue')+
labs(x= "State", y = "Crude Rate", title = "Centers for Disease Control and Prevention (CDC) Report on Neoplasms Disease in U.S. State 2010")+
theme(axis.text.x = element_text(angle = 0, vjust = 0.2))
# Example of UI with fluidPage
ui <- fluidPage(
# Application title
titlePanel("Centers for Disease Control and Prevention (CDC) Report on Neoplasms Disease in U.S. State 2010"),
# Sidebar with a slider input
sidebarLayout(
sidebarPanel(
# add the selected input
selectInput('Infections','Cause of Death', unique(mortality_df$ICD.Chapter))),
mainPanel(
htmlOutput(outputId = 'Select'),
#plot to be display
plotOutput('trend')
)
)
)
#server logic
server <- shinyServer(function(input, output, session){
df <- reactive({mortality_df %>%
filter(ICD.Chapter == input$Infections & Year == 2010)%>%
arrange(desc(Crude.Rate))
})
output$Select <- renderText({
paste("Death Caused by Neoplasms Disease", input$Infections)
})
output$trend <- renderPlot({
ggplot(df(), aes(x=reorder(State, -Crude.Rate), y = Crude.Rate))+
geom_col(fill="#FF8000FF") +
coord_flip() +
geom_text(aes(label=Crude.Rate), size = 4, hjust = -0)+
labs(x= "State", y = "Crude Rate", title = "Centers for Disease Control and Prevention (CDC) Report on Neoplasms Disease in U.S. State 2010")+
theme(axis.text.x = element_text(angle = 0, vjust = 0.2))
}
)
})
#shinyApp(ui = ui, server = server, options = list(height = 500, width = 960))
#runApp()
#deployApp()
Often you are asked whether particular States are improving their mortality rates (per cause) faster than, or slower than, the national average. Create a visualization that lets your clients see this for themselves for one cause of death at the time. Keep in mind that the national average should be weighted by the national population.
We observed that Crude.Rate = (death/population)*100000 . This is a rate per state, per year, per infection. To find the national average, we need to group by infection and year, then sum all deaths divided by sum population times 100000 and assigned to a new variable, but we think this new variable will be redundant since the selection will be by state.
#view(mortality_df)
mortality_df0 <- mortality_df %>%
mutate(Crude.Rate.State = Crude.Rate)
mortality_df1 <- mortality_df0 %>%
group_by(Year, ICD.Chapter)%>%
mutate(Crude.Rate.USA = round(((sum(Deaths)*100000)/sum(Population)), 1))%>%
dplyr::select(ICD.Chapter, State, Year, Crude.Rate.State, Crude.Rate.USA ) %>%
gather(key = "variable", value = "value", -ICD.Chapter, -State, -Year)
#view(mortality_df1)
df4 <- mortality_df1 %>%
filter(ICD.Chapter == "Neoplasms" & State == "OR" )#%>%
#arrange(desc(Crude.Rate))
# ggplot(df4, aes(x=Year)) +
# geom_line(aes(y = Crude.Rate, color = "Crude.Rate")) + #, color = "EEA236"
# geom_line(aes(y = Crude.RateUSA, color = "Crude.RateUSA")) + #, color ="darkred"
# #scale_color_manual(values = c("darkred", "steelblue"))+
# labs(title= "Comparing Centers for Disease Control and Prevention (CDC) Report \n on Neoplasms Disease in Oregon against Nationwide 1990-2010", y = "Crude Rate")+
# theme(legend.position = "right", legend.text = element_text(size = 8), legend.title = element_text(face = "bold"))
#view(df4)
# df4a <- df4 %>%
# dplyr::select(Year,Crude.Rate,Crude.RateUSA) %>%
# gather(key = "variable", value = "value", -Year)
#
# Visualization
ggplot(df4, aes(x = Year, y = value)) +
geom_line(aes(color = variable, linetype = variable)) +
scale_color_manual(values = c("darkred", "steelblue")) +
labs(title= "Comparing Centers for Disease Control and Prevention (CDC) Report \n on Infections Diseases in Oregon against Nationwide 1990-2010", y = "Crude Rate")+
theme(legend.position = "right", legend.text = element_text(size = 8), legend.title = element_text(face = "bold"))
# Example of UI with fluidPage
ui <- fluidPage(
# Application title
titlePanel("Centers for Disease Control and Prevention (CDC) Report \n on Infections Diseases in each State against Nationwide 1990-2010"),
# Sidebar with a slider input
sidebarLayout(
sidebarPanel(
# add the selected input
selectInput('Infections','Cause of Death', unique(mortality_df1$ICD.Chapter)),
selectInput('States','State infected', unique(mortality_df1$State))),
mainPanel(
htmlOutput(outputId = 'Selects'),
#plot to be display
plotOutput('trends')
)
)
)
#server logic
server <- shinyServer(function(input, output, session){
df5 <- reactive({mortality_df1 %>%
filter(ICD.Chapter == input$Infections & State == input$States)
})
output$Selects <- renderText({
a1 <- paste("Comparing Death Rate caused by ", input$Infections)
a2 <- paste("in the state of ", input$States)
a3 <- paste("against Nationwide Rate 1990-2010")
paste(" ")
})
output$trends <- renderPlot({
a1 <- paste("Comparing Death Rate caused by ", input$Infections)
a2 <- paste("in the state of ", input$States)
a3 <- paste("against Nationwide Rate 1990-2010")
ggplot(df5(), aes(x = Year, y = value)) +
geom_line(aes(color = variable, linetype = variable), size = 1) +
geom_point(aes(color=variable))+
ggtitle(paste0(a1," ", a2,"\n",a3))+
scale_color_manual(values = c("darkred", "darkblue")) +
labs( y = "Crude Rate")+
theme(legend.position = "right", legend.text = element_text(size = 8), legend.title = element_text(face = "bold"))
}
)
})
#shinyApp(ui = ui, server = server, options = list(height = 500, width = 960))
#runApp()
#deployApp()