For this homework assignment I built a comprehensive heatmap that breaks down the correlation between each variable and each state in the “USArrests” data set.The “USArrests” data set is built into R. This dataset contains information about arrests per 100,000 residents in each state of America. Data was collected on Murder, Rape, Assault and the percentage of population living in urban areas (UrbanPop).
In order to create the heatmap I first had to convert our data frame into a data matrix. This lets us convert all our variables into numerics, and organize them in a two dimensional rectangle. Organizing our data this way makes it easier to visualize. I used a color function to select red, white and green for my color palette
From this visualization we can see that Assaults are the most common reason for arrests recorded within the data set. UrbanPop is a variable that shows what percentage of the crimes were committed in an urban population. Most states have an urban population greater than 50%. Murders and rapes are the least common arrest. Rape is more common with an average of around 20 arrests per 100,000 residents, while murder is closer to 10 arrests per 100,000. Mid-Western states seem to have the lowest crime rates out of all the regions. Urban population and crime rates don’t seem to be strongly correlated.
data("USArrests")
USArrests2 <- data.matrix(USArrests)
# Visualize relationships with heatmap
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
colfunc<-colorRampPalette(c("red","white","green"))
heatmap.2(USArrests2,
cellnote = USArrests2, # same data set for cell labels
main = "Correlation",
notecol="black", # make cell labels black
trace="column", # turns off trace lines inside the heat map
margins =c(10,6), # set margins around plot
col=colfunc, # use the color palette I had defined earlier
dendrogram="none", # I don't want a dendrogram in this plot
Colv="NA") # turn off column clustering