“ggplot2” is a plotting package that creates complex plots from data in a data frame. gg stands for grammar of grammar of graphics. according to the ggplot any graph can br devided into 3 parts:
1.datawhich is the data frame.
2.aesthetics which refers to the x and y variables, colors, size of points, shapes,etc..
3.Geormetry is the type of graph which can be bargraph, linegraph, etc..
Learn More: The principles of “ggplot2” we initially conceived and implemented by Hadley Wickham ggplot2:Elegant Graphics for Data Analytics which you can find here.
First, we need to tell ggplot what dataset to use. This is done using the ggplot(df) function, where df is a dataframe that contains all features needed to make the plot.
Next, we can add whatever aesthetics we want to apply to our ggplot and for this we use the aes() argument - we can identify the X and Y axis, color, size, shape, etc…
Then,we can add ‘geoms’ which are graphical representations of the data in the plot (points, lines, bars). ggplot2 offers many different geoms such as:
1.geom_point() for scatter plots, dot plots, etc.
2.geom_boxplot() for, well, boxplots
3.geom_line() for trend lines, time series, etc.
A boxplot usually has two attributes: an x, which is usually a classification into categories, and the y which is the actual variable that we are comparing.
For example if we want to compare the prices of different diamonds cuts we can do the below:
To have a boxplot that is looking nicer we can add some more aesthetics. for example we will add the below argurments:
# add ggtitle() to title our boxplot and the xlab() and the ylab() to name the x-axis and thy-axis
ggplot(diamonds, aes(x=cut, y=price)) +
geom_boxplot() +
ggtitle("Diamond Price according to the Cut Type") +
xlab("Type of Cut") +
ylab("Diamond Price")# We can fill some colors as well using the fill()
# We can set our y-axis limits using the coord_cartesian(ylim=c())
ggplot(diamonds, aes(x=cut, y=price, fill=cut)) +
geom_boxplot() +
ggtitle("Diamond Price according to the Cut Type") +
xlab("Type of Cut") +
ylab("Diamond Price") +
coord_cartesian(ylim=c(0,7500))A scatterplot is used to determine the relation between different attributes using points. To acheive this the geompoint() is used.
# Here is the default way to draw a scatterplot using the geom-point()
ggplot() +
geom_point(data = diamonds, aes(x = carat, y = price))# To Change colors of the Scatter Plot follow the below
ggplot() +
geom_point(data = diamonds, aes(x = carat, y = price,color=cut)) # To change the axis of the scatterplot specifiy the limits of the x-axis and y-axis through scale x-continuous and scale-y-continuous. Here we will specifiy the limits of y-axis as below.
ggplot() +
geom_point(data = diamonds, aes(x = carat, y = price,color=cut))+
scale_y_continuous(limits = c(0, 20000)) # We can add some labels to the plot using the geom text()
ggplot() +
geom_point(data = diamonds, aes(x = carat, y = price,color=cut)) +
geom_text(diamonds,
mapping = aes(x = carat, y = price),
label = rownames(diamonds))# We can add a regression line using the geom_smooth() function
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(color = "midnightblue") +
geom_smooth()A histogram is mainly used when we want to look at just one dimension of our data and observe its distribution.
To plot a histogram with ggplot2 we need to replace the geom_point() with the geom_histogram()argument. For example, we do the follwoing:
Notice that the binwidth will allow us to customize the size of our histogram as to make it wider or smaller. for example here we will do a wider histogram and set binwidth to equal 4000.
As in scatterplots, in histograms we can add asthetics so for instance we want to make a stacked histogram based on the cut. To ccomplish this we should add the fill() as follows:
Note That: adding the fill() will allow us to see different cuts where each cut is represented in a different color and so we can have better analysis.
So we can notice how ggplot2 can assist in ploting different graphs and in addition to whta has proceeded there are the following functions that can be used to plot other kinds of graphs. For instace there is the the geom_violin() for violin plot, the geom_dotplot() for dot plot, the geom_jitter() for stripchart, the geom_line() for line plot and the geom_bar() for bar plot.
For further resources and further information about ggplot2, you can follow the below links:
Quick R by DataCamp, Graphics with ggplot2
Tutorial Gateway, retreived from: https://www.tutorialgateway.org/r-ggplot2-scatter-plot/) on June24, 2020.
R-statistics.co retreived from: http://r-statistics.co/ggplot2-Tutorial-With-R.html on June 24, 2020.