If I am working on data with a binary response, I like to use the GGally
package but mainly for the ggpairs
function. It provides a way to look at a lot of different data at the same time but the setup can be a little daunting. My hope is that this example, which uses data from homework that we all recognize, will make you more comfortable with what’s going on.
Below, I demonstrate it’s use on a data set that everyone in class is familiar with: HW3’s crime-training-data.csv
:
library(tidyverse)
library(GGally)
crimes <- read.csv("crime-training-data.csv", header = T) %>% # read in data
mutate(tag=ifelse(target==1, 'a', 'b'), # tag for colors
chas=ifelse(chas==1, 'bo','no')) %>% select(-black)
cr_exp <- crimes %>%
select(-tag) %>% mutate(chas=ifelse(chas=='brdrs', 1, 0)) # crimes, without tag
stat_info = psych::describe(cr_exp) %>%
as.data.frame()
pm <- ggpairs(crimes, columns = c('rm','medv','lstat', 'dis',
'age','nox', 'indus', 'tax',
'ptratio', 'rad', 'zn', 'chas'),
mapping = ggplot2::aes(color = tag),
lower = list(continuous = wrap('points', size = 1, alpha = .4),
combo = 'facetdensity'),
upper = list(continuous = wrap("cor", size = 3, alpha = 1),
combo = 'box_no_facet'),
diag = list(continuous = wrap('densityDiag', alpha = .6))) +
theme(panel.background = element_rect(fill = 'grey92', color = NA),
panel.spacing = unit(1, "pt"),
panel.grid = element_line(color = 'white'),
strip.background = element_rect(fill = "grey85", colour = NA),
plot.margin = margin(.1, .1, .1, .1, "cm"))
pm
My main gripe: It can take a while to run. The above took almost a minute to process which is a long time when you’re making tweaks and you want to review the changes quickly.