Plotting step by step

1. Data

The first layer of ggplot() is the data.

We want to use the mtcars data because it has variety range of numeric and categorical variables that is benefit to show variety of plots.

First we call mtcars:

data(mtcars)
mtcars

To draw a plot with ggplot2 package we first install and load the package with library()function.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.2

Then we can use ggplot()to build the first layer of plot with our dataset. As you see below a blank plane was shown after the code run. It means that the plot is now ready to draw with respect to our dataset, mtcars.

p=ggplot(mtcars)
p

2. Aesthetics

With the aes() command it can be shown that which variable is used to set at the x axis and which one for the y axis. Moreover, we can use some of the factor variables to show the color, group, shape, size, fill or the lable of geometric layer we will learn about in this lesson.

Now we interested in to plot qsec vs wt with respect to set am for shape, wt for size, and cyl for color.

p=ggplot(mtcars,aes(x=qsec,y=wt,shape=factor(am),size=hp,color=factor(cyl)))

3. Geometric

The geom layer specifies the type of the plot. There are very different and useful types of plots you can choose depending on your purpose. All types of the geom layer is shown in link below:

Geoms

geom_point()

I choose the geom_point() to draw the plot of those five variables called before. What does the plot say about each point? Yes! there are the information of five variables for each point you see in the plot. For example let’s look at the red point in the right side of the plot. It shows that the wt is about 3, qsec is about 23 with am equals 0, hp is nearby 100 with cyl equals 4.

(p=p+geom_point())

We can draw a same plot using aes() in the geom_point() function directly. See the codes below:

ggplot(mtcars,aes(x=qsec,y=wt))+
geom_point(aes(shape=factor(am),size=hp,color=factor(cyl)))

4. Facet

Faceting ,in general, means that we can draw multiple plots in different subset of the data. It is a powerful tool for exploratory data analysis to compare patterns in different parts of the data rapidly.

There are three types of faceting:

  • facet_null(): single plot, the default.
  • facet_grid(): fundamentally 2d, with two independent components.
  • facet_wrap(): 1d, but wrapped into 2d to save space.

facet_grid()

facet_grid() shows the plots in a 2d grid, as defined by a formula:

  • .~a : Spreads the values of a across the columns.
  • b~. : Spreads the values of b down the rows.
  • a~b : Spreads a across columns and b down rows.
ggplot(mtcars,aes(x=qsec,y=wt))+
  geom_point(aes(color=factor(gear)))+
  facet_grid(.~factor(gear))

facet_wrap()

  • ncol, nrow : Control how many columns and rows you want.You only need to set one.
  • as.table : Controls whether the facets are laid out like a table (TRUE) or a plot (FALSE).
  • dir :Controls the direction of wrap: horizontal or vertical.
ggplot(mtcars,aes(x=qsec,y=wt))+
  geom_point(aes(color=factor(gear)))+
  facet_wrap(~factor(gear),nrow=2)

5. Statistics

A statistical transformation, specified with a stat_ function, transforms the data by summarizing it in some way. For example, a useful stat is the smoother (stat_smooth()), which calculates the smoothed mean of y, conditional on x. You can use many of ggplot2’s stats explained in link below:

Stats

ggplot(mtcars,aes(x=qsec,y=wt))+
  geom_point(aes(color=factor(gear)))+
  facet_wrap(.~factor(gear),nrow=1)+
  stat_smooth(method="lm",se=F,colour="black",size=0.5)
## `geom_smooth()` using formula 'y ~ x'

6. Scale

Scales control the details of how data values are translated to visual properties. You can change the default scales to the one you like, such as the axis labels or legend keys, or you can use a completely different translation from data to aesthetic. All of the scale functions to help you change your plot to a better one are shown in the link below:

Scales

ggplot(mtcars,aes(x=qsec,y=wt))+
  geom_point(aes(color=factor(gear)))+
  facet_grid(.~factor(gear))+
  stat_smooth(method="lm",se=F,col="black",size=0.5)+
  scale_x_continuous(breaks = c(15,17,19,21,23))
## `geom_smooth()` using formula 'y ~ x'

7. Theme

The theme system of ggplot2 allows you to fine control over the non-data elements of your plot. The theme system does not affect how the data is given by geom layer, or how it is transformed by scales. See the link below to get familiar with different themes items.

Themes

ggplot(mtcars,aes(x=qsec,y=wt))+
  geom_point(aes(shape=factor(am),color=factor(gear)))+
  stat_smooth(method="lm",se=F,col="red")+
  scale_x_continuous(breaks = c(15,17,19,21,23))+
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'

As you see before, the three layers data, aes, and geom must be identified to draw a plot. So adding or removing the other layers is arbitrary with respect to your purpose.

Now let’s see some more complicated examples of drawing a plot with ggplot2:

Example 1

In this example we want to draw a bar plot to show the DALY attributable to risk of Smoking, High fasting plasma glucose, and High body-mass index in 22 regions in the world. The data are extracted from GBD website. We have used different layers to create an attractive plot. There are Some geoms, themes, scales etc. So, to describe the layers of this plot we start with the first layer: A is the name of our data and it’s the first element of ggplot() function.

  • The aes(Regions, percent, fill = sex) shows that regions set as x axis, percent set as y axis, and the bars fill with variable sex.

  • In geom_bar(), we are asking R to use the y-value we provide for the dependent variable with command stat = "identity". we can also choose to specify how far the bars dodge each other with position_dodge() and finally with color="black" the border of bars set to black.

  • By coord_flip() the horizontal coordinate becomes vertical, and the vertical, horizontal.

  • In geom_text() we want to print the (percent+%) nearby each bar using label = paste(percent,"%") inside the aes with size=2 and horizontal adjustment hjust=-0.5.

  • theme_classic() set the simple white background for the plot.

  • In geom_errorbar() we want to draw error bars for each bar in aes we use ymin ans ymax, width=0.2 for line width.

  • In the second theme function, we adjust some characteristics of legend and axis. By legend.position and legend.justification the legend place at the top left of the plot and by axis.title.x.bottom (margin = margin(10, 0, 0, 0)) we set the distance of x-axis title in 10 from top and 0 from other sides. Its the same description for axis.title.y.left = element_text(margin = margin(0, 5, 0, 0))). The theme() function has variety of elements you can choose the appropriate one (See this link: theme elements).

  • By ylim() we can set the limits of y axis from 0 to 50.

  • scale_x_discrete() function can set the x axis elements by the right order of locations we introduce in the first.

  • By scale_fill_brewer we can use the palettes we want to fill the bars. (See the palette names here:palette names)

  • Finally, we can set the title name for axis by labs().

library(readxl)
## Warning: package 'readxl' was built under R version 4.0.2
A <- read_excel("/Users/zahrazamani/Documents/Works/Dr. Musavi/Figure 1/Section A & B, Final.xlsx", sheet = "A")
locations <- c("Global","High-income Asia Pacific", "High-income North America",
              "Western Europe","Australasia","Andean Latin America","Tropical Latin America",
              "Central Latin America","Southern Latin America","Caribbean","Central Europe",
              "Eastern Europe","Central Asia","North Africa and Middle East",
              "South Asia","Southeast Asia","East Asia","Oceania","Western Sub-Saharan Africa",
              "Eastern Sub-Saharan Africa","Central Sub-Saharan Africa","Southern Sub-Saharan Africa")
Regions=A$location
A=data.frame(Regions,A)
head(A,10)
p=ggplot(A, aes(Regions, percent, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge",color="black") +
  coord_flip() +
  facet_wrap( ~ Facet ,nrow=1)+
  geom_text(aes(label = paste(percent,"%")), size=2,
            hjust=-0.5)+
  theme_classic()+
  geom_errorbar(aes(ymin=percent-SE, ymax=percent+SE), width=.2) +
  theme(legend.position="top",
        legend.justification="left",
        axis.title.x.bottom  = element_text(margin = margin(10, 0, 0, 0)),
        axis.title.y.left  = element_text(margin = margin(0, 5, 0, 0)))+
  ylim(0,50)+
  scale_x_discrete(limits = rev(locations))+
  scale_fill_brewer(palette = "Pastel2", limits = c("Male", "Female"))+
  labs(x = "Regions", y="DALY attributable to risk factors (%)")

Example 2

In this example we want to draw a scatter plot with some special characteristics. We try to set a specific color to a specific subset of regions we introduce in first codes with different shapes, for example we want to set the purple color to Southeast Asia, East Asia, and Oceania , each with different shapes you can see in the plot) etc. The data are extracted from GBD website for 22 regions of the world. We first grouping the regions to subsets a=c("Global") to g=c("Western Sub-Saharan Africa","Eastern Sub-Saharan Africa","Central Sub-Saharan Africa","Southern Sub-Saharan Africa") and set the levels of location value of our data in the right order of these subsets (daly$location). Then we should import the colors and shapes manually in thr right order of locations.So, to describe the layers of this plot we start with the first layer: daly is the name of our data and it’s the first element of ggplot() function.

  • The aes set the SDI and val for x and y axis respectively and the location variable for all group, color, and shape.

  • By geom_point(size=3) we plot a scatter plot with size 3.

  • In geom_line() we draw the black smooth line by method loess that we mentioned by fit object before ggplot() function.

  • By scale_shape_manual()and scale_color_manual() we change the shapes and colors in the right order of shape and color with imported manually. Other layers is in the same description of example 1.

library(ggplot2)
daly <- read_excel("/Users/zahrazamani/Documents/Works/Dr. Musavi/Figure 2/final-All Data combined, Regions, Edited.xlsx")

a=c("Global")
b=c("High-income Asia Pacific", "High-income North America",
    "Western Europe","Australasia")
c=c("Andean Latin America","Tropical Latin America",
    "Central Latin America","Southern Latin America","Caribbean")
d=c("Central Europe",
    "Eastern Europe","Central Asia")
e=c("North Africa and Middle East",
    "South Asia")
f=c("Southeast Asia","East Asia","Oceania")
g=c("Western Sub-Saharan Africa",
    "Eastern Sub-Saharan Africa","Central Sub-Saharan Africa",
    "Southern Sub-Saharan Africa")
daly$location <- factor(daly$location, 
levels = c("Global","High-income Asia Pacific", "High-income North America","Western Europe","Australasia","Andean Latin America","Tropical Latin America","Central Latin America","Southern Latin America","Caribbean","Central Europe","Eastern Europe","Central Asia","North Africa and Middle East","South Asia","Southeast Asia","East Asia","Oceania","Western Sub-Saharan Africa", "Eastern Sub-Saharan Africa","Central Sub-Saharan Africa","Southern Sub-Saharan Africa"))

options(stringsAsFactors = FALSE)

color=c("#000000","#d11141","#d11141","#d11141","#d11141","#f37735",
  "#f37735","#f37735","#f37735","#f37735",
  "#00b159","#00b159","#00b159","#00aedb",
  "#00aedb","#c77cff","#c77cff","#c77cff",
  "#FF61C3","#FF61C3","#FF61C3","#FF61C3")
#shape=c(16,15,1,17,3,68,0,1,2,76,72,89,90,7,12,8,15,17,0,1,2,3)
shape=c(19,2,0,3,1,17,15,8,19,76,2,0,3,89,90,17,15,8,2,0,3,1)
fit <- loess(val ~ SDI, degree=1,span = 0.2, data=daly)
smooth = fit$fitted
p=ggplot(daly, aes(SDI, val, group=location,color=location,shape=location))+
geom_point(size=3) +
  geom_line(aes(SDI, smooth), color="black")+
  scale_shape_manual("location",values = shape)+
  scale_color_manual("location",values = color)+
  labs(x = "SDI", y="Age-standardised DALY rate (per 100 000 person-years) ")+
  scale_x_continuous(breaks = c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9))+
  theme_classic()+
  theme(legend.position = c(0.2, 0.8))

ggplot2 Extensions

ggstatsplot Package

Extension of ggplot2, ggstatsplot creates plots with details of statistical tests and provides an easier way to extract the information of the statistical analysis. For continuous data we can draw violin plots, scatter plots, histograms, dot plots, dot-and-whisker plots and for categorical data we can use the pie and bar charts. In addition, it supports the most common types of statistical approaches and tests: parametric, non-parametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses.

ggbetweenstats

This function creates either a violin plot, a box plot, or a mix of two for between-group comparisons with results from statistical tests in the subtitle. The simplest function call looks like this:

ggstatsplot::ggbetweenstats(
  data = iris, 
  x = Species, 
  y = Sepal.Length,
  type="p",
  messages = FALSE
) 
## Warning: Ignoring unknown parameters: segment.linetype

The type (of test) argument also accepts the following abbreviations: "p" (for parametric) or "np" (for nonparametric) or "r" (for robust) or "bf" (for Bayes Factor). Additionally, the type of plot to be displayed can also be modified ("box", "violin", or "boxviolin").

ggcorrmat

ggcorrmat returns a matrix of correlation coefficients with the Pearson correlation coefficients.

ggstatsplot::ggcorrmat(data=mtcars,
                       corr.method = "robust",
                       sig.level = 0.001,
                       p.adjust.method = "holm",
                       cor.vars = c(mpg,disp,hp,drat,wt),
                       matrix.type = "upper",
                       title = "Correlation matrix")

We can save the correlation matrix as a data frame with confidence intervals and p-values:

ggstatsplot::ggcorrmat(data=mtcars,
                       corr.method = "robust",
                       sig.level = 0.001,
                       p.adjust.method = "holm",
                       cor.vars = c(mpg,disp,hp,drat,wt),
                       matrix.type = "upper",
                       output="dataframe",
                       title = "Correlation matrix")

ggcoefstats

ggcoefstats creates a plot with the regression coefficients’ point estimates as dots with confidence intervals. It supports most of the regression models:

insight::supported_models()
##   [1] "aareg"             "afex_aov"          "AKP"              
##   [4] "Anova.mlm"         "aov"               "aovlist"          
##   [7] "Arima"             "averaging"         "bamlss"           
##  [10] "bamlss.frame"      "bayesQR"           "bayesx"           
##  [13] "BBmm"              "BBreg"             "bcplm"            
##  [16] "betamfx"           "betaor"            "betareg"          
##  [19] "BFBayesFactor"     "BGGM"              "bife"             
##  [22] "bifeAPEs"          "bigglm"            "biglm"            
##  [25] "blavaan"           "blrm"              "bracl"            
##  [28] "brglm"             "brmsfit"           "brmultinom"       
##  [31] "btergm"            "censReg"           "cgam"             
##  [34] "cgamm"             "cglm"              "clm"              
##  [37] "clm2"              "clmm"              "clmm2"            
##  [40] "clogit"            "coeftest"          "complmrob"        
##  [43] "confusionMatrix"   "coxme"             "coxph"            
##  [46] "coxph.penal"       "coxr"              "cpglm"            
##  [49] "cpglmm"            "crch"              "crq"              
##  [52] "crqs"              "crr"               "dep.effect"       
##  [55] "DirichletRegModel" "drc"               "eglm"             
##  [58] "elm"               "epi.2by2"          "ergm"             
##  [61] "feglm"             "feis"              "felm"             
##  [64] "fitdistr"          "fixest"            "flexsurvreg"      
##  [67] "gam"               "Gam"               "gamlss"           
##  [70] "gamm"              "gamm4"             "garch"            
##  [73] "gbm"               "gee"               "geeglm"           
##  [76] "glht"              "glimML"            "glm"              
##  [79] "Glm"               "glmm"              "glmmadmb"         
##  [82] "glmmPQL"           "glmmTMB"           "glmrob"           
##  [85] "glmRob"            "glmx"              "gls"              
##  [88] "gmnl"              "HLfit"             "htest"            
##  [91] "hurdle"            "iv_robust"         "ivFixed"          
##  [94] "ivprobit"          "ivreg"             "lavaan"           
##  [97] "lm"                "lm_robust"         "lme"              
## [100] "lmerMod"           "lmerModLmerTest"   "lmodel2"          
## [103] "lmrob"             "lmRob"             "logistf"          
## [106] "logitmfx"          "logitor"           "LORgee"           
## [109] "lqm"               "lqmm"              "lrm"              
## [112] "manova"            "MANOVA"            "margins"          
## [115] "maxLik"            "mclogit"           "mcmc"             
## [118] "mcmc.list"         "MCMCglmm"          "mcp1"             
## [121] "mcp12"             "mcp2"              "med1way"          
## [124] "mediate"           "merMod"            "merModList"       
## [127] "meta_bma"          "meta_fixed"        "meta_random"      
## [130] "metaplus"          "mhurdle"           "mipo"             
## [133] "mira"              "mixed"             "MixMod"           
## [136] "mixor"             "mjoint"            "mle"              
## [139] "mle2"              "mlm"               "mlogit"           
## [142] "mmlogit"           "model_fit"         "multinom"         
## [145] "mvord"             "negbinirr"         "negbinmfx"        
## [148] "ols"               "onesampb"          "orm"              
## [151] "pgmm"              "plm"               "PMCMR"            
## [154] "poissonirr"        "poissonmfx"        "polr"             
## [157] "probitmfx"         "psm"               "Rchoice"          
## [160] "ridgelm"           "riskRegression"    "rjags"            
## [163] "rlm"               "rlmerMod"          "RM"               
## [166] "rma"               "rma.uni"           "robmixglm"        
## [169] "robtab"            "rq"                "rqs"              
## [172] "rqss"              "Sarlm"             "scam"             
## [175] "selection"         "sem"               "SemiParBIV"       
## [178] "semLm"             "semLme"            "slm"              
## [181] "speedglm"          "speedlm"           "stanfit"          
## [184] "stanmvreg"         "stanreg"           "summary.lm"       
## [187] "survfit"           "survreg"           "svy_vglm"         
## [190] "svyglm"            "svyolr"            "t1way"            
## [193] "tobit"             "trimcibt"          "truncreg"         
## [196] "vgam"              "vglm"              "wbgee"            
## [199] "wblm"              "wbm"               "wmcpAKP"          
## [202] "yuen"              "yuend"             "zcpglm"           
## [205] "zeroinfl"          "zerotrunc"

Example of linear regression model:

ggstatsplot::ggcoefstats(x = stats::lm(formula = mpg ~ am * cyl,data = mtcars)) 

Example of Coxph model:

library(survival)
m=coxph(Surv(time,status)~age+sex+ph.ecog,data=lung)
ggstatsplot::ggcoefstats(x = m) 

For more information and examples of ggstatsplot, see the ggstatsplot.

ggpmisc Package

The most important point of the Package ggpmisc is that the estimates from model fit objects can be displayed in ggplots as text, tables or equations. To use this package you need to load the broom package too. For more information see the ggpmisc.

In this example we want to fit a linear model hp~mpg+I(mpg^2) to mtcars data and show the regression coefficients table in the plot.

library(ggpmisc)
## Warning: package 'ggpmisc' was built under R version 4.0.2
## Loading required package: ggpp
## Warning: package 'ggpp' was built under R version 4.0.2
## 
## Attaching package: 'ggpp'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(broom)
## Warning: package 'broom' was built under R version 4.0.2
fit <- lm(hp ~ mpg,data=mtcars)
f = fit$fitted
p=ggplot(mtcars, aes(mpg, hp,col=factor(am))) +
  geom_point(size=6) +
  theme_classic()+
  geom_line(aes(mpg, f), color="black")+
  stat_fit_tb(method.args = list(formula = y ~ poly(x, 2)),
    tb.params = c("intercept" = 1, "mpg" = 2, "mpg^2" = 3),
    tb.vars = c("Term" = 1, "Estimate" = 2, "S.E." = 3,
    "italic(F)-value" = 4, "italic(P)-value" = 5),parse = T,
              table.theme = ttheme_gtlight,
              position = "identity" )

ggpubr Package

The ggpubr is an easy to use and attractive extension of ggplot2. I just show you one example and for more examples please see the ggpubr.

library(ggpubr)
ggplot(mtcars,aes(x=factor(cyl),y=mpg,fill=factor(cyl)))+
  geom_boxplot()+
  stat_compare_means(comparisons=list(c("4","6"),c("6","8"),c("4","8")))+
  stat_compare_means(label.y = 50)+
  theme_classic()

ggrepel Package

ggrepel helps you to draw a scatter plot by text labels of points without any overlapping.Text labels repel away from each other, away from data points, and away from edges of the plotting area.

See the example below:

library(ggplot2)
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.0.2
name=rownames(mtcars)
length(name)
## [1] 32
fit <- lm(hp~mpg, data=mtcars)
f = fit$fitted
ggplot(mtcars,aes(x=mpg,y=hp))+ 
  geom_point(color=1:32)+
  geom_text_repel(color=1:32,label=name,direction="y",force=0.1,size = 2,point.padding=0.3,box.padding=0.2)+
  geom_line(aes(mpg, f), color="black")+
  theme_classic()+
  scale_x_continuous(breaks = c(10,15,20,25,30,35))+
  xlab("mpg")+
  ylab("hp")