Introduction to Data Visualization with ggplot2

Ming

2020-03-31

Introduction

Data visualization is a necessary skill for data scientist which helps the target stakeholders to understand the ideas / concepts easily. ggplot2 is a powerful but easy to use data visualization library and it is part of the tidyvise library.

This blog will make a quick introduction about ggplot2 for anyone who is new in R / ggplot2.

Installation

Before start using ggplot2, library tidyvise is needed to be installed. In the console of R studio, input the following command to install tidyvise.

install.packages("tidyvise")

Import Library

After installation, library tidyvise can be imported with following command :

library(tidyverse)

Basic structure of ggplot2

Before start to use ggplot2, it is very important to know the sentence structure or grammar of ggplot2, which consists of following four key elements

Data

Data means which dataset is going to be visualize in the graph.

Graph Type

Graph Type means what kind of graph is going to be used. ggplot2 supports tons of different kind of graph, including some common graph types, like scatterplot, bar chart, line chart, histogram, boxplot, etc.

Mapping

Mapping states the relationship between the dataset and the graph, which means what variables in the dataset are using to be use and how they are going to be visualize.

Aesthetics

Aesthetics means the decorative elements of the graph, like the title, axes labels, font size, font colour, line colour, background colour, annotations, etc.

The tutorial below will go through all these four key elements steps by steps

Learn ggplot2 steps by steps

Declare Data

To plot a graph, we have to declare which data set we are going to use in the graph. In this sample, we are going to use default data set diamonds in tidyvise. Please try to run the following code

A grey box will be shown, because we have declared which dataset will be used only, the details like graph type and mapping information are missing. R does not know which graph type and what variable to display in the chart.

Select graph type and adding a mapping

Now we are going to add graph type and mapping information. Scatterplot will be used in this tutorial. To use scatterplot, just use geom_point and input the mapping information. We are going to use carat on x-axis and price of the diamonds on y-axis.

This chart simply shows the relationships between carat and price of the diamonds, when carat is larger, price is higher.

Adding more information in the mapping

And we would like to indicate the color grading of each diamonds in the chart, color can be input into the mapping.

After adding the color mapping information, the color grading of every diamonds are shown in the chart. It’s easy to find that when a diamond has smaller carat, usually it has a better color grading.

Enhancing the layout of the graph

After showing the important information in the graph, it’s time to do some decorative works on it, which can help to present the information more clearly. In ggplot2, tons of layout modification can be made, from changing chart title, background color to adding annotations. The following fews steps will introduce some commonly used functions.

Changing the axes label

To change the label of x-axis and y-axis, just add xlab and ylab to the ggplot2 sentence like the sample below. xlab will update the label of x-axis while ylab will change the label of y-axis.

Change the value in axis

Sometimes, when plotting a graph by ggplot2, the y-axis doesn’t show the scale clearly. Like the one the chart above, it is hard to know the price range of the diamonds as the largest value in y-axis is 15000 while some prices are larger than 15000. It is better to change the scale of y-axis by using ylim.

Now, the scale of y-axis changes from 0 to 20000.

Give the graph a title

Finally, it is very important to give the chart a title, which give the reader a rough idea what information the chart is illustrating. By adding ggtitle, the graph will have a title.

The Introduction will be ended here. I hope that you will have some basic ideas on how to visualize the data in R after following the tutorial above. Thank you for reading the whole introduction.