Introduction to ggplot2

“ggplot2” is a package that is used to visualize data. It can significantly improve the quality and aesthetics of your graphics. It also allows you to build any type of chart since it breaks plots into components to help create informative and neat graphs with rather simple and readable code. The “gg” in ggplot2 stands for grammar of graphics and was created by Hadley Wickham.


ggplot2 Plot Basics

There are alot of functions and arguments that can be used in ggplot2, therefore using the ggplot cheat sheet is very useful and you can view it here


Installing and Loading

We will start by installing the required packages and load it with library(). library() is the command used to load a package, and it refers to the place where the package is contained.

Note: “ggplo2”is a core Tidyverse package and you can view it by installing and loading “Tidyverse”package.


Practice dataset for ggplot2

We will start by understanding more about our dataset. This dataset contains the prices and other attributes of almost 53,000 diamonds

## starting httpd help server ... done

Load dataset by function data()

View Variable Names

##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"


Basic Plotting Functions

Essential foundations and layers of a plot

  1. ggplot()
  • All ggplot2 plots begin with a call to ggplot(), and tells R that we want to create a ggplot object
  1. aes()
  • Aesthetic mappings specifies the aesthetics of the graph - what goes on the X and Y axes, but also any other data we want represented in our plot (Also called the Aesthetics Layer)

aes(), can illustrate basic variables from our data such as

  • x=
  • y=
  • color=
  • fill=
  1. geom_()
  • A layer called geom_(geometric object) tells the plot how you want to display your data in R. A geom is just the type of plot or the geometric object which represents data.

Note: Add an addition operator to connect the gglplot2 functions


Diamonds dataset Plot

Basic Scatter Plot

We will begin with basic plotting to understand the relationship between carat and price of diamonds.

The geom_point() function creates a scatterplot that is useful for displaying the relationship between carat and price.

As displayed in the graph, the heavier the carat, the higher the price.


Amendments on geom_ function

Let’s Make our scatter plot look better!

  1. Added Alpha= attribute to geom_point function to control transparency of points The higher the alpha the more transparent the points are.

  2. We also added color= attribute to to geom_point function to costumize color of points

Note : We typically understand aesthetics as how something looks, color, size etc. But in ggplot’s world how things look is just an attribute.

Bar Plot

Using the geom_bar() function , we have changed the display of our graph We have specified a categorical Variable on the X-axis which was cut and ggplot() automatically calculated the Y-axis as count of cuts.

Improved Bar Plot

As we said above the aes()adds any other data we want represented in our plot. What we did here was add an aesthetic that would fill the barplot to the clarity variable.

Adding Layers

Adding Theme_() Function

Lets go back to our original Plot and try to add layers to our plot. The theme () function controls all non-data display. Use theme_()if you just need to tweak the display of an existing theme.

Get to know more themes here

Note: We have added color= argument in our aes()function ,this way we can explore the relationship of each of these variables (In this case variable cut) and how it affects the carat/price relationship.

Example

Okay Lets try another theme()


Adding Labs() Function

Important aspects of the ‘labs()’ function

  • The Lab Function Modifies Axis, Legend, And Plot Labels.
  • It Also Ensures the axis and legend labels display the full variable name
  • Its Important to use the lab()function to create Good labels because they are critical for making your plots accessible to a wider audience.

In the code below: 1. We have Used the plot title and subtitle to explain the main findings. 2. Created New X & Y labels.


Adding caption Argument

It’s common to use the caption argument to provide information about the data source. The text for the caption will be displayed in the bottom-right of the plot by default.


Adding Scale_()Function

  1. Important aspects of scale_() Function
  • The Scales control the details of how data values are translated to visual properties.

  • Used to tweak details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.

  • They take your data and turn it into something that you can see, like size, colour, position or shap

  • Scales can be divided into 4 families:Positions scale, color scale, Manual scales and Identity scale

  • Scales do have a big effect on the visual appearance of the plot thats why they are important

  1. The 2 autocomplete options in the scale_()function
  • The first is after the scale_() , where you can choose axes x or Y ,colour , fill, alpha or size.(We will focus on the axes)

  • Let’s say we chose scale_x() or scale_y(), This means we are picking the scale of axis we need to alter or change.

  • The second is after specifiying axes,where we can choose “continuous”, “discrete”, or many others

    • Discrete : specifies categorical variable by scale_x_discrete

    • Continuous:specifies continuous variable by scale_x_continuous

  1. Important Arguments

    After specifying Our function , we need to pick certain arguments to make amendments. Below are 2 of the common arguments that can be used .

  • Breaks= They are used to format the breaks in each axis.

    • custom breaks can be concentrated in c=

    • We assigned to our graph below Breaks=pretty

    • pretty Uses Default R Break Algorithm which allows easy, incremental break formatting

  • Labels= They are used to format the data labels.

    • For Instance if we use labels= dollarthis would include $ sign next to our numbers on the axis.


Interactive Plot

Let ’s make our graph interactive!

After creating our graph with ggplot() , We used plotly() to make it as cool as the one below!

  • Create an object and place your code in it
  • place object in plotly :: ggplotly()

Read more about plotly here

Conclusion

In the above dataset (diamonds), we were able to view the prices and other attributes of almost 53,000 diamonds. I hope you enjoyed reading this and had the chance to understand the package ggplot() in a better way. This package is useful for better data display since it helps with building plots in layers to tell a story in a complete way! What is cool about it is it’s quick ,easy and creates amazing graphics!


Refrences & Further Resources

Elegant graphics for data analysis

Check this book by Hadley Wickam to understand the story behind the grammar of graphics here

tidyverse.org

github/ggplot_intro