Visualization is the most important aspect of any data analysis. Without proper visualization, our data might fail to tell the entire story. And if you are a newbie in the data science or analytics fields you will surely run into below doubts once or more often:

In most data analysis we broadly deal with a mix of categorical and numeric variables from source data.Dplyr and ggplot2 are the R packages we will use for data manipulation and graphics respectively and we will use the dataset “mpg” which comes with ggplot2 package.

Read in the data and load the required packages.

“Mpg” contains fuel economy data from 1999 and 2008 for 38 popular model of car. First we will load the required packages and dataset in our R workspace(R-studio).

library(ggplot2)
library(dplyr)
data(mpg)

To get a quick overview of our (or for that matter any) dataset use

str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...
glimpse(mpg)
## Observations: 234
## Variables: 11
## $ manufacturer (chr) "audi", "audi", "audi", "audi", "audi", "audi", "...
## $ model        (chr) "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 qua...
## $ displ        (dbl) 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0,...
## $ year         (int) 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1...
## $ cyl          (int) 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6...
## $ trans        (chr) "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)...
## $ drv          (chr) "f", "f", "f", "f", "f", "f", "f", "4", "4", "4",...
## $ cty          (int) 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 1...
## $ hwy          (int) 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 2...
## $ fl           (chr) "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class        (chr) "compact", "compact", "compact", "compact", "comp...

Remember that the following examples are intended to explain the general principles. You might need to adapt them to fit any additional requirements.

To visualize one numeric variable:

Use dotplot or histogram.

##Dot plot
ggplot(mpg, aes(cty)) +
geom_dotplot()

OR

##Histogram
ggplot(mpg, aes(cty)) +
geom_histogram(binwidth = 2)

To visualize two numeric variables:

Use scatterplot.

ggplot(mpg, aes(cty,hwy)) +
geom_point()

To visualize one categorical variable:

Use bar graph.

ggplot(mpg, aes(drv)) +
geom_bar()

To visualize one categorical and one numeric variable:

Use the summary of numeric variable and plot as bar graph across category.

mpg %>% group_by(manufacturer) %>% summarise(avg_cty_mileage = mean(cty)) %>%
  ggplot(aes(x = manufacturer, y = avg_cty_mileage)) +
  geom_bar(stat = "identity")

##To change the co-ordinates
mpg %>% group_by(manufacturer) %>% summarise(avg_cty_mileage = mean(cty)) %>%
     ggplot(aes(x = manufacturer, y = avg_cty_mileage)) +
     geom_bar(stat = "identity") +
     coord_flip()

To visualize two categorical variables:

Use bar graph.

ggplot(mpg, aes(class, fill = drv)) +
geom_bar()

OR

ggplot(mpg, aes(class, fill = drv)) +
geom_bar(position = "stack")

## To normalize the height
ggplot(mpg, aes(class, fill = drv)) +
geom_bar(position = "fill")

## Side by side
ggplot(mpg, aes(class, fill = drv)) +
geom_bar(position = "dodge")

To visualize three categorical variables:

Use faceting with bar graphs.

ggplot(mpg, aes(drv, fill = class)) + geom_bar() +
facet_grid(~fl , labeller = label_parsed)

OR

ggplot(mpg, aes(drv, fill = class)) + geom_bar() +
facet_wrap(~fl , ncol = 2)

To visualize three numeric variables:

Use faceting with dot-plot.

ggplot(mpg, aes(cty, hwy)) + geom_point() +
facet_grid(year~.)

Bonus points:

To change/add a Title and labels across X-Y axis for your plot.

ggplot(mpg, aes(cty,hwy)) +
geom_point() +
ggtitle("Mileage comparision") +
xlab("country") +
ylab("highway")

To visualize points which are overlapping on each other in a dot-plot:

Use alpha and Jitter.

## see through points
ggplot(mpg, aes(cty,hwy)) +
geom_point(alpha = .2)

## jittering.
ggplot(mpg, aes(cty,hwy)) +
geom_point(alpha = .4, position = position_jitter(width= .1,height = .1))

To change the background of plot.

Use theme.

## removes the grey background.
ggplot(mpg, aes(cty,hwy)) +
geom_point(alpha = .2) +
theme_bw()

## have minimum theme.
ggplot(mpg, aes(cty,hwy)) +
geom_point(alpha = .2) +
theme_minimal()