Data visualization is a necessary skill for data scientist which helps the target stakeholders to understand the ideas / concepts easily. ggplot2 is a powerful but easy to use data visualization library and it is part of the tidyvise library.
This blog will make a quick introduction about ggplot2 for anyone who is new in R / ggplot2.
Before start using ggplot2, library tidyvise is needed to be installed. In the console of R studio, input the following command to install tidyvise.
After installation, library tidyvise can be imported with following command :
Before start to use ggplot2, it is very important to know the sentence structure or grammar of ggplot2, which consists of following four key elements
Data means which dataset is going to be visualize in the graph.
Graph Type means what kind of graph is going to be used. ggplot2 supports tons of different kind of graph, including some common graph types, like scatterplot, bar chart, line chart, histogram, boxplot, etc.
Mapping states the relationship between the dataset and the graph, which means what variables in the dataset are using to be use and how they are going to be visualize.
Aesthetics means the decorative elements of the graph, like the title, axes labels, font size, font colour, line colour, background colour, annotations, etc.
The tutorial below will go through all these four key elements steps by steps
To plot a graph, we have to declare which data set we are going to use in the graph. In this sample, we are going to use default data set diamonds in tidyvise. Please try to run the following code
A grey box will be shown, because we have declared which dataset will be used only, the details like graph type and mapping information are missing. R does not know which graph type and what variable to display in the chart.
Now we are going to add graph type and mapping information. Scatterplot will be used in this tutorial. To use scatterplot, just use geom_point and input the mapping information. We are going to use carat on x-axis and price of the diamonds on y-axis.
This chart simply shows the relationships between carat and price of the diamonds, when carat is larger, price is higher.
And we would like to indicate the color grading of each diamonds in the chart, color can be input into the mapping.
After adding the color mapping information, the color grading of every diamonds are shown in the chart. It’s easy to find that when a diamond has smaller carat, usually it has a better color grading.
After showing the important information in the graph, it’s time to do some decorative works on it, which can help to present the information more clearly. In ggplot2, tons of layout modification can be made, from changing chart title, background color to adding annotations. The following fews steps will introduce some commonly used functions.
To change the label of x-axis and y-axis, just add xlab and ylab to the ggplot2 sentence like the sample below. xlab will update the label of x-axis while ylab will change the label of y-axis.
ggplot(diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
xlab("Carat") +
ylab("Price(USD)")Sometimes, when plotting a graph by ggplot2, the y-axis doesn’t show the scale clearly. Like the one the chart above, it is hard to know the price range of the diamonds as the largest value in y-axis is 15000 while some prices are larger than 15000. It is better to change the scale of y-axis by using ylim.
ggplot(diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
xlab("Carat") +
ylab("Price(USD)") +
ylim(0, 20000)Now, the scale of y-axis changes from 0 to 20000.
Finally, it is very important to give the chart a title, which give the reader a rough idea what information the chart is illustrating. By adding ggtitle, the graph will have a title.
ggplot(diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
xlab("Carat") +
ylab("Price(USD)") +
ylim(0, 20000) +
ggtitle("Relationship between Carat and Price of Diamonds")The Introduction will be ended here. I hope that you will have some basic ideas on how to visualize the data in R after following the tutorial above. Thank you for reading the whole introduction.