“ggplot2” is a package that is used to visualize data. It can significantly improve the quality and aesthetics of your graphics. It also allows you to build any type of chart since it breaks plots into components to help create informative and neat graphs with rather simple and readable code. The “gg” in ggplot2 stands for grammar of graphics and was created by Hadley Wickham.
There are alot of functions and arguments that can be used in ggplot2, therefore using the ggplot cheat sheet is very useful and you can view it here
We will start by installing the required packages and load it with library(). library() is the command used to load a package, and it refers to the place where the package is contained.
Note: “ggplo2”is a core Tidyverse package and you can view it by installing and loading “Tidyverse”package.
We will start by understanding more about our dataset. This dataset contains the prices and other attributes of almost 53,000 diamonds
## starting httpd help server ... done
Load dataset by function data()
View Variable Names
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
Essential foundations and layers of a plot
ggplot(), and tells R that we want to create a ggplot objectaes(), can illustrate basic variables from our data such as
x=y=color=fill=geom_(geometric object) tells the plot how you want to display your data in R. A geom is just the type of plot or the geometric object which represents data.Note: Add an addition operator to connect the gglplot2 functions
We will begin with basic plotting to understand the relationship between carat and price of diamonds.
The geom_point() function creates a scatterplot that is useful for displaying the relationship between carat and price.
As displayed in the graph, the heavier the carat, the higher the price.
ggplot(data = diamonds)+ #use the ggplot() function and assign it to specific data frame
aes(x=carat,y=price)+ # specify variables from your data
geom_point() # Tells the plot how you want to display your data in R
geom_ functionLet’s Make our scatter plot look better!
Added Alpha= attribute to geom_point function to control transparency of points The higher the alpha the more transparent the points are.
We also added color= attribute to to geom_point function to costumize color of points
Note : We typically understand aesthetics as how something looks, color, size etc. But in ggplot’s world how things look is just an attribute.
Using the geom_bar() function , we have changed the display of our graph We have specified a categorical Variable on the X-axis which was cut and ggplot() automatically calculated the Y-axis as count of cuts.
As we said above the aes()adds any other data we want represented in our plot. What we did here was add an aesthetic that would fill the barplot to the clarity variable.
Theme_() FunctionLets go back to our original Plot and try to add layers to our plot. The theme () function controls all non-data display. Use theme_()if you just need to tweak the display of an existing theme.
Get to know more themes here
Note: We have added color= argument in our aes()function ,this way we can explore the relationship of each of these variables (In this case variable cut) and how it affects the carat/price relationship.
ggplot(data = diamonds) +
aes(x=carat,y=price,color = cut)+ # Argument "color =" shows variable "cut"
geom_point()+
theme_minimal()Example
Okay Lets try another theme()
ggplot(data = diamonds) +
aes(x=carat,y=price,color = cut)+
geom_point()+
theme_dark() #Creates a dark background
Labs() FunctionImportant aspects of the ‘labs()’ function
lab()function to create Good labels because they are critical for making your plots accessible to a wider audience.In the code below: 1. We have Used the plot title and subtitle to explain the main findings. 2. Created New X & Y labels.
ggplot(data = diamonds) +
aes(x=carat,y=price,color = cut)+
geom_point(alpha=0.30)+
theme_minimal() +
labs(title="Diamond Cut by Price & Weight", #"New plot title"
subtitle="53,000 observations of diamonds cuts ", # New Plot subtitle
x="Carat(Weight)", #new X label
y="Price in US dollars") #new Y label
It’s common to use the caption argument to provide information about the data source. The text for the caption will be displayed in the bottom-right of the plot by default.
ggplot(data = diamonds) +
aes(x=carat,y=price,color = cut)+geom_point(alpha=0.30)+
theme_minimal() +
labs(title="Diamond Cut by Price & Weight",
subtitle="53,000 observations of diamonds cuts ",
x="Carat(Weight) ",
y="Price in US dollars",
caption ="Source: Diamonds data in ggplot2")
Scale_()FunctionThe Scales control the details of how data values are translated to visual properties.
Used to tweak details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
They take your data and turn it into something that you can see, like size, colour, position or shap
Scales can be divided into 4 families:Positions scale, color scale, Manual scales and Identity scale
Scales do have a big effect on the visual appearance of the plot thats why they are important
scale_()functionThe first is after the scale_() , where you can choose axes x or Y ,colour , fill, alpha or size.(We will focus on the axes)
Let’s say we chose scale_x() or scale_y(), This means we are picking the scale of axis we need to alter or change.
The second is after specifiying axes,where we can choose “continuous”, “discrete”, or many others
Discrete : specifies categorical variable by scale_x_discrete
Continuous:specifies continuous variable by scale_x_continuous
Important Arguments
After specifying Our function , we need to pick certain arguments to make amendments. Below are 2 of the common arguments that can be used .
Breaks= They are used to format the breaks in each axis.
custom breaks can be concentrated in c=
We assigned to our graph below Breaks=pretty
pretty Uses Default R Break Algorithm which allows easy, incremental break formatting
Labels= They are used to format the data labels.
labels= dollarthis would include $ sign next to our numbers on the axis. ggplot(data = diamonds) +
aes(x=carat,y=price,color = cut)+
geom_point(alpha=0.80)+
theme_minimal() +
labs(title="Diamond Cut by Price & Weight",
subtitle="53,000 observations of diamonds cuts ",
x="Carat(Weight)",
y="Price",
caption ="Source: Diamonds data in ggplot2",
color= "Cut")+
(scale_x_continuous( breaks = pretty)) + #Included breaks to manage to create incremental breaks
(scale_y_continuous(labels = dollar)) # Include dollar sign to display that the prices are in $Let ’s make our graph interactive!
After creating our graph with ggplot() , We used plotly() to make it as cool as the one below!
plotly :: ggplotly()Read more about plotly here
p<- ggplot(data = diamonds) + # Created an object p
aes(x=carat,y=price,color = cut)+
geom_point(alpha=0.80)+
theme_minimal() +
labs(title="Diamond Cut by Price & Weight",
subtitle="53,000 observations of diamonds cuts ",
x="Carat(Weight)",
y="Price in US dollars",
caption ="Source: Diamonds data in ggplot2",
color= "Cut")+
(scale_x_continuous(labels= comma, breaks = pretty)) +
(scale_y_continuous(labels = dollar))
plotly::ggplotly(p) # Placed p in plotly to make graph interactiveIn the above dataset (diamonds), we were able to view the prices and other attributes of almost 53,000 diamonds. I hope you enjoyed reading this and had the chance to understand the package ggplot() in a better way. This package is useful for better data display since it helps with building plots in layers to tell a story in a complete way! What is cool about it is it’s quick ,easy and creates amazing graphics!
Elegant graphics for data analysis
Check this book by Hadley Wickam to understand the story behind the grammar of graphics here