Instruction
I have provided you with data about mortality from all 50 states and the District of Columbia. Please access it at https://github.com/charleyferrari/CUNY_DATA608/tree/master/lecture3/data
Question 1: As a researcher, you frequently compare mortality rates from particular causes across different States. You need a visualization that will let you see (for 2010 only) the crude mortality rate, across all States, from one cause (for example, Neoplasms, which are effectively cancers). Create a visualization that allows you to rank States by crude mortality for each cause of death.
Load data from Git
df <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module3/data/cleaned-cdc-mortality-1999-2010-2.csv", header = TRUE)
head(df)## ICD.Chapter State Year Deaths Population
## 1 Certain infectious and parasitic diseases AL 1999 1092 4430141
## 2 Certain infectious and parasitic diseases AL 2000 1188 4447100
## 3 Certain infectious and parasitic diseases AL 2001 1211 4467634
## 4 Certain infectious and parasitic diseases AL 2002 1215 4480089
## 5 Certain infectious and parasitic diseases AL 2003 1350 4503491
## 6 Certain infectious and parasitic diseases AL 2004 1251 4530729
## Crude.Rate
## 1 24.6
## 2 26.7
## 3 27.1
## 4 27.1
## 5 30.0
## 6 27.6
The below plot shows state by crude rate filtered with year = 2010, and cause for death = Neoplasms.
df2 <- df %>% filter(Year == '2010' & ICD.Chapter == 'Neoplasms') %>% select(State, Crude.Rate)
df2 %>% group_by(State) %>% summarise(Crude.Rate = sum(Crude.Rate)) %>%
ggplot() + aes(x = reorder(State, Crude.Rate), y = Crude.Rate) +
ggtitle('States wise Mortality Rate') +
xlab('State') +
geom_bar(fill="#276678", stat = "identity") +
coord_flip() + geom_text(aes(label = Crude.Rate), size = 5, hjust=-0.20) +
theme(panel.background = element_rect(fill = "white", color = NA),
plot.title = element_text(hjust = 0.5, size = 25, colour = "#03506f"),
axis.title.y = element_text(size = 20, colour = "#03506f"),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x=element_blank()
)Question 2: Often you are asked whether particular States are improving their mortality rates (per cause) faster than, or slower than, the national average. Create a visualization that lets your clients see this for themselves for one cause of death at the time. Keep in mind that the national average should be weighted by the national population.
df3 <- df %>% filter(ICD.Chapter == 'External causes of morbidity and mortality') %>% group_by(Year) %>%
mutate(National.Crude.Rate = sum(Deaths) / sum(Population) * 100000)
df4 <- df %>% filter(ICD.Chapter == 'External causes of morbidity and mortality') %>% group_by(Year)
df5 <- cbind(df4, df3$National.Crude.Rate)
df5 <- df5 %>% filter(State == 'AK')
colnames(df5)[7] <- "National.Crude.Rate"
df5$National.Crude.Rate = round(df5$National.Crude.Rate,1)
DT::datatable(df5)df5 <- df5 %>% select(Year, Crude.Rate, National.Crude.Rate) %>% gather("Crude", "Value", -Year)
head(df5)## # A tibble: 6 x 3
## # Groups: Year [6]
## Year Crude Value
## <int> <chr> <dbl>
## 1 1999 Crude.Rate 75.9
## 2 2000 Crude.Rate 86.1
## 3 2001 Crude.Rate 78.9
## 4 2002 Crude.Rate 85
## 5 2003 Crude.Rate 78.8
## 6 2004 Crude.Rate 82.5
The below plot shows Yearly Crude Rate vs National Crude Rate filtered with State = AK, and cause for death = Certain infectious and parasitic diseases.
df5 %>%
ggplot() + aes(x = Year, y = Value, fill = Crude) +
geom_col(position = "dodge") +
ggtitle('Crude Rate vs National Crude Rate') +
xlab('Year') +
ylab('Crude Rate') +
scale_fill_manual(values = c("#276678", "#999999")) +
geom_text(aes(label = Value), size = 4, position = position_dodge(width = 1), vjust = -.50) +
theme(panel.background = element_rect(fill = "white", color = NA),
plot.title = element_text(hjust = 0.5, size = 15, colour = "#03506f"),
axis.title.y = element_text(size = 10, colour = "#03506f")
)