Introduction to ggplot2


“ggplot2” is a plotting package that creates complex plots from data in a data frame. gg stands for grammar of grammar of graphics. according to the ggplot any graph can br devided into 3 parts:

1.datawhich is the data frame.

2.aesthetics which refers to the x and y variables, colors, size of points, shapes,etc..

3.Geormetry is the type of graph which can be bargraph, linegraph, etc..

Learn More: The principles of “ggplot2” we initially conceived and implemented by Hadley Wickham ggplot2:Elegant Graphics for Data Analytics which you can find here.


Installation and Loading of ggplot2



How does ggplot2 work


First, we need to tell ggplot what dataset to use. This is done using the ggplot(df) function, where df is a dataframe that contains all features needed to make the plot.

Next, we can add whatever aesthetics we want to apply to our ggplot and for this we use the aes() argument - we can identify the X and Y axis, color, size, shape, etc…

Then,we can add ‘geoms’ which are graphical representations of the data in the plot (points, lines, bars). ggplot2 offers many different geoms such as:

1.geom_point() for scatter plots, dot plots, etc.

2.geom_boxplot() for, well, boxplots

3.geom_line() for trend lines, time series, etc.


Example


As an example we will use the diamonds dataset which is found in ggplot2.


How to make a simple Histogram


A histogram is mainly used when we want to look at just one dimension of our data and observe its distribution.

To plot a histogram with ggplot2 we need to replace the geom_point() with the geom_histogram()argument. For example, we do the follwoing:

Notice that the binwidth will allow us to customize the size of our histogram as to make it wider or smaller. for example here we will do a wider histogram and set binwidth to equal 4000.

As in scatterplots, in histograms we can add asthetics so for instance we want to make a stacked histogram based on the cut. To ccomplish this we should add the fill() as follows:

Note That: adding the fill() will allow us to see different cuts where each cut is represented in a different color and so we can have better analysis.


Discussion


So we can notice how ggplot2 can assist in ploting different graphs and in addition to whta has proceeded there are the following functions that can be used to plot other kinds of graphs. For instace there is the the geom_violin() for violin plot, the geom_dotplot() for dot plot, the geom_jitter() for stripchart, the geom_line() for line plot and the geom_bar() for bar plot.


Other Resources


For further resources and further information about ggplot2, you can follow the below links:

The R Graph Gallery, ggplot2

Quick R by DataCamp, Graphics with ggplot2

STHDA

ggplot2-Part of the Tidyverse


References


Tutorial Gateway, retreived from: https://www.tutorialgateway.org/r-ggplot2-scatter-plot/) on June24, 2020.

R-statistics.co retreived from: http://r-statistics.co/ggplot2-Tutorial-With-R.html on June 24, 2020.