Cancer Death Rate Amoung Different Age and Race in 2013

For our assignment, we will explore cancer 2013 data that originates from the Centers for Disease Control and Prevention (CDC) and the Centers for Medicare & Medicaid Services (CMS). This Data Sets provide information about death rates due to different types of cancers on the U.S county and state level. We are mainly interested in total death from cancer amoung young and old people and different Race. For our Purpose we define \(YOUNG\) as age between 18 to 44 and \(OLD\) age between 45 to 64 .

Import our \(CSV\) data and look at our imported data

library(readr)
cancer_df<- read_csv("C:/Users/Afzal Hossain/Documents/A.Spring 2019/SOC 712/HW/hw2/R12021551_SL040.csv",col_names = TRUE)
head(cancer_df)

Let’s Keep only the Variables that we are interested.

library(dplyr)
Cancer_18_to_64_Death<-select(cancer_df,"Name of Area" ,"Total Cancer Deaths (18 to 44 Years, Rate per 100,000 Population, 2007-2013)","Total Cancer Deaths (45 to 64 Years, Rate per 100,000 Population, 2007-2013)","Total Cancer Deaths (18 to 44 Years, 2007-2013)" : "Total Cancer Deaths (18 to 44 Years American Indian or Alaska Native, 2007-2013)","Total Cancer Deaths (45 to 64 Years, 2007-2013)":"Total Cancer Deaths (45 to 64 Years American Indian or Alaska Native, 2007-2013)")
names(Cancer_18_to_64_Death)
 [1] "Name of Area"                                                                          
 [2] "Total Cancer Deaths (18 to 44 Years, Rate per 100,000 Population, 2007-2013)"          
 [3] "Total Cancer Deaths (45 to 64 Years, Rate per 100,000 Population, 2007-2013)"          
 [4] "Total Cancer Deaths (18 to 44 Years, 2007-2013)"                                       
 [5] "Total Cancer Deaths (18 to 44 Years White, 2007-2013)"                                 
 [6] "Total Cancer Deaths (18 to 44 Years White Non-Hispanic, 2007-2013)"                    
 [7] "Total Cancer Deaths (18 to 44 Years Black or African American, 2007-2013)"             
 [8] "Total Cancer Deaths (18 to 44 Years Black or African American Non-Hispanic, 2007-2013)"
 [9] "Total Cancer Deaths (18 to 44 Years Asian or Pacific Islander, 2007-2013)"             
[10] "Total Cancer Deaths (18 to 44 Years American Indian or Alaska Native, 2007-2013)"      
[11] "Total Cancer Deaths (45 to 64 Years, 2007-2013)"                                       
[12] "Total Cancer Deaths (45 to 64 Years White, 2007-2013)"                                 
[13] "Total Cancer Deaths (45 to 64 Years White Non-Hispanic, 2007-2013)"                    
[14] "Total Cancer Deaths (45 to 64 Years Black or African American, 2007-2013)"             
[15] "Total Cancer Deaths (45 to 64 Years Black or African American Non-Hispanic, 2007-2013)"
[16] "Total Cancer Deaths (45 to 64 Years Asian or Pacific Islander, 2007-2013)"             
[17] "Total Cancer Deaths (45 to 64 Years American Indian or Alaska Native, 2007-2013)"      

Let’s Plote Now Usiing \(ggplot2\)

The Total Young People Dath Rate per one hundred thousand people for Each State and where size of the dote represent Total number of young people Die in Cancer.

library(ggplot2)
ggplot(data = Cancer_18_to_64_Death, aes(`Name of Area`,`Total Cancer Deaths (18 to 44 Years, Rate per 100,000 Population, 2007-2013)` )) + 
    geom_point(aes(size=`Total Cancer Deaths (18 to 44 Years, 2007-2013)`))+
       scale_fill_gradient2(low = ("red"), mid = "white",high = ("blue"), midpoint = 0, space = "Lab",na.value =                    "grey50", guide = "colourbar") + coord_flip() +
               ggtitle("Total Young People Dath and Reate per Hundred Thousand People") +
                  xlab("State") + ylab(" Cancer Death Reate per Hundred Thousand Peopl") +theme_minimal()

Ploting More Variable Side By Side

We will plot \(2013\) total death of Young and Old people side by side for each State

library(gridExtra)
#plot the 18 to 44 year
x<-ggplot(data=Cancer_18_to_64_Death, aes(x=`Name of Area`, y=`Total Cancer Deaths (18 to 44 Years, 2007-2013)`)) +
  geom_bar(stat="identity",fill="deeppink1")+ coord_flip()+
  geom_text(aes(label=`Total Cancer Deaths (18 to 44 Years, 2007-2013)`), vjust=.4, color="Black", size=2.6)+
  ggtitle("Aged Between 18 to 44 Years") +
  xlab("State") + ylab("Total Death") +theme_minimal()
# plot 45 to 64 year
y<-ggplot(data=Cancer_18_to_64_Death, aes(x=`Name of Area`, y=`Total Cancer Deaths (45 to 64 Years, 2007-2013)`)) +
  geom_bar(stat="identity",fill="darkorange1")+ coord_flip()+
  geom_text(aes(label=`Total Cancer Deaths (45 to 64 Years, 2007-2013)`), vjust=.5, color="Black", size=3.0)+
  ggtitle("Aged Between 45 to 64 Years") +
  xlab("State") + ylab("Total Death") +theme_minimal()
grid.arrange(x, y, ncol = 2)

From this plot we can see that California, New York, Florida and Illinois has the hight mortality rate. Its because those state has the higher population.

Using magrittr and Group By and Summarize

Inorder to compare Young and Old people total death because of cancer per state, we have to use Group by.

Cancer_young_Death_Avg<- Cancer_18_to_64_Death  %>% 
         filter(!is.na(`Total Cancer Deaths (18 to 44 Years, 2007-2013)`),!is.na(`Total Cancer Deaths (45 to 64 Years, 2007-2013)`))%>%
                  group_by(`Name of Area`) %>% 
                      summarize(Total_young_Death = sum(`Total Cancer Deaths (18 to 44 Years, 2007-2013)`),
                                Total_Old_Death =sum(`Total Cancer Deaths (45 to 64 Years, 2007-2013)`))
Cancer_young_Death_Avg

Create Variable using \(mutate\)

Now lets Create a New Variable Youngn Minority Deaths where we sum all the death of young minority group.

# Using Mutate
Cancer_young_Death_minority <- Cancer_18_to_64_Death %>% 
   mutate( "Young Minority Deaths" = `Total Cancer Deaths (18 to 44 Years Black or African American, 2007-2013)`+
                +`Total Cancer Deaths (18 to 44 Years Black or African American Non-Hispanic, 2007-2013)`      
                +`Total Cancer Deaths (18 to 44 Years Asian or Pacific Islander, 2007-2013)` 
                +`Total Cancer Deaths (18 to 44 Years American Indian or Alaska Native, 2007-2013)`)
names(Cancer_young_Death_minority)
 [1] "Name of Area"                                                                          
 [2] "Total Cancer Deaths (18 to 44 Years, Rate per 100,000 Population, 2007-2013)"          
 [3] "Total Cancer Deaths (45 to 64 Years, Rate per 100,000 Population, 2007-2013)"          
 [4] "Total Cancer Deaths (18 to 44 Years, 2007-2013)"                                       
 [5] "Total Cancer Deaths (18 to 44 Years White, 2007-2013)"                                 
 [6] "Total Cancer Deaths (18 to 44 Years White Non-Hispanic, 2007-2013)"                    
 [7] "Total Cancer Deaths (18 to 44 Years Black or African American, 2007-2013)"             
 [8] "Total Cancer Deaths (18 to 44 Years Black or African American Non-Hispanic, 2007-2013)"
 [9] "Total Cancer Deaths (18 to 44 Years Asian or Pacific Islander, 2007-2013)"             
[10] "Total Cancer Deaths (18 to 44 Years American Indian or Alaska Native, 2007-2013)"      
[11] "Total Cancer Deaths (45 to 64 Years, 2007-2013)"                                       
[12] "Total Cancer Deaths (45 to 64 Years White, 2007-2013)"                                 
[13] "Total Cancer Deaths (45 to 64 Years White Non-Hispanic, 2007-2013)"                    
[14] "Total Cancer Deaths (45 to 64 Years Black or African American, 2007-2013)"             
[15] "Total Cancer Deaths (45 to 64 Years Black or African American Non-Hispanic, 2007-2013)"
[16] "Total Cancer Deaths (45 to 64 Years Asian or Pacific Islander, 2007-2013)"             
[17] "Total Cancer Deaths (45 to 64 Years American Indian or Alaska Native, 2007-2013)"      
[18] "Young Minority Deaths"                                                                 

We can see that our newly created variable at the bottom.

Compare Our New Variable with Other Variable using group by and Summarize

Cancer_young_minority_White<- Cancer_young_Death_minority  %>% 
  filter(!is.na(`Total Cancer Deaths (18 to 44 Years White, 2007-2013)`),!is.na(`Young Minority Deaths`))%>%
         group_by(`Name of Area`) %>% 
             summarize(under_44_Minority = sum(`Young Minority Deaths`),Under_44_White =sum(`Total Cancer Deaths (18 to 44 Years White, 2007-2013)`))
head(Cancer_young_minority_White)

Using \(melt\)

Ploting the summarize Variable by making it long, so we can compare side by side time will compare minority with white demographic cancer death

library(tidyr)
Cancer_young_minority_White.long<-melt(Cancer_young_minority_White)
Using Name of Area as id variables
ggplot(Cancer_young_minority_White.long,aes(`Name of Area`,value,fill=variable))+
  geom_bar(stat="identity",position="dodge")+coord_flip()+
       ggtitle(" Young Minority VS Young White People Death By Cancer ") +
             xlab("State") + ylab("Total Death") +theme_minimal()

