This document instructs on generating line graphs for the adiposity indices for a male group participating in a clinical trial. The plotting is done here with the base plotting system in R, but other plotting systems such as ggplot2 and lattice, may also be used.
We start by requiring the package ‘readxl’. Be sure to install this package if it cannot be found in your environment. Packages can only be required once installed. Make sure the version of R is up to date on your machine, as this is a relatively new package.
The ‘readxl’ package contains the function ‘read_excel’ which is used to import files of format (.xlsx) that contain multiple sheets into R, which is useful here.
library(readxl)
Using the ‘matrix’ function, configure the graph layout of 4 rows and 2 columns. We call the matrix ‘m’. Argument ‘byrow=TRUE’ indicates subsequent graphs will be plotted in the same row first before being plotted to the next column. The number arguments 1 through 6 indicate space for the 6 plots. The two 7’s leave space for a common legend (for the three age groups) to be plotted. We specify the height for first 2 rows of matrix ‘m’ where the graphs will be as 0.4, but a lower height for the last row as 0.2,since the legend will take up less space than the graphs. We call it ‘lay’. ‘par(oma=c(0,0.5,3,1))’ specifies the outer margin for bottom, left, top, right sides of the plot, respectively.
‘par(mar=c(1,4,1,1)+0.1)’ specifies the inner margin for the same respective sides.
m <- matrix(c(1,2,3,4,5,6,7,7),nrow=4,ncol=2,byrow = TRUE)
lay <- layout(mat=m,heights = c(0.4,0.4,0.4,0.2))
par(oma=c(0,0.5,3,1)) # outer margin specified as 3 for top, leaving space for main title
par(mar=c(1,4,1,1)+0.1) # inner margin
layout.show(lay) # layout shown
We will now import the .xlsx file (in folder ‘R code’ > ‘FIG6’, titled ‘MFBaseFig3.xlsx’). This .xlsx file contains six sheets in total, each displaying the measurements of a specific adiposity index for the sample male population.
Note: the data values are purely made-up, and therefore these values are not to be used as reference for biomedical research purpose
In each sheet, the three BMI categories are labeled in the first column. And the adipose index values for the three age groups are found in the next three columns. The last three columns are the two-sided 95% confidence interval for the corresponding age groups. And the last column contains the three p-values. Note that the working directory must be set to the same folder as the .xlsx file, or the entire file path would have to be specified in the argument for ‘read_excel’ function. Setting working directory can be done manually using the ‘setwd’ function, or through the upper toolbar in R.
The first to be plotted is the VSAT data for the sample male population, as found in the sheet titled ‘MaleVSAT’. In the ‘read_excel’ function, we specify this as the argument ‘sheet = MaleVSAT’.
fileVSAT<-read_excel('MFBaseFig3.xlsx',sheet='MaleVSAT')
The ‘BMI’ column in the table imported must be converted from class ‘character’ to class ‘factor’, for ‘BMI’ to be recognized as a categorical variable and for it to be plotted properly on the x-axis. The ‘class’ function in the code below checks that BMI is now of class ‘factor’.
fileVSAT$BMI <- as.factor(fileVSAT$BMI)
class(fileVSAT$BMI)
## [1] "factor"
Now we begin plotting the line graph for the first age group: age 30-49. Note the 3 points plotted correspond to the first column in the data table imported that has been named ‘fileVSAT’. Since error interval (95% confidence interval) for each point must not overlap with the other points to be plotted in the same BMI category (but different age group), we stagger the first three points by shifting them 0.15 unit to the left.
The line will be dark orange, as specified in the ‘col’ argument. The ‘xaxt’ argument is ‘n’, which means x-axis label will not be labeled (for now). ‘xlim’ and ‘ylim’ specify the value limits of the x-axis and y-axis, respectively. ‘xlab’ and ‘ylab’ indicate the respective labels for those axes. The arguments ‘type’ and ‘pch’ specify the formatting of the points plotted. ‘cex.axis’ and ‘mgp’ format the size of axis labels and distance between label to tick marks, tick marks to axis etc.
plot.default(as.numeric(fileVSAT$BMI)-0.15,fileVSAT$VSAT_age1,
type='o',pch=16,
col='darkorange',xaxt='n',xlim=c(0.9,3.2),xlab='',
ylim=c(70,230),ylab='adiposity index(unit)',
cex.axis=0.8,mgp=c(1.9,0.5,0))
Now, we plot the error interval for the 3 points plotted. The length of the error bar is essentially the 95% confidence interval spanned by the measurement, and is determined by the column ‘se_age1’ in the imported table. ‘se_age1’ is the value obtained by multiplying 1.96 by the standard error of the corresponding measurement.
arrows(as.numeric(fileVSAT$BMI)-0.15 ,fileVSAT$VSAT_age1-fileVSAT$se_age1,
as.numeric(fileVSAT$BMI)-0.15 , fileVSAT$VSAT_age1+ fileVSAT$se_age1,
length=0.05,angle=90,code=3)
Recall that x-axis labels have not been placed. We do so next. Although BMI is plotted as a categorical variable, it is labeled as class ‘character’. We only need to label x-axis once per graph.
axis(1,labels=as.character(fileVSAT$BMI),
at=as.numeric(fileVSAT$BMI),
cex.axis=0.9,mgp=c(1.9,0.5,0))
Following similar principles, we plot the line graph for the age group: 50-65, then for group 65+.
The plotting order are as follow: line and three points, error bars. The error interval for the age groups 50-65, and for 65+ can be found in the columns ‘se_age2’ and ‘se_age3’ in the imported ‘fileVSAT’, respectively.
The graph for age group 50-65 is plotted as a dark green line, along with the error bars.
lines.default(fileVSAT$BMI,type='o',
fileVSAT$VSAT_age2,col='darkgreen',
pch=16)
arrows(as.numeric(fileVSAT$BMI) ,fileVSAT$VSAT_age2-fileVSAT$se_age2,
as.numeric(fileVSAT$BMI), fileVSAT$VSAT_age2+ fileVSAT$se_age2,
length=0.05,angle=90,code=3)
The graph for age group 65+ is plotted as a blue line. Points are staggered 0.15 unit to the right, so that error bars do not overlap with those of the points on the green line.
lines.default(as.numeric(fileVSAT$BMI)+0.15,fileVSAT$VSAT_age3,type='o',col='blue',
pch=16)
arrows(as.numeric(fileVSAT$BMI)+0.15 ,fileVSAT$VSAT_age3-fileVSAT$se_age3,
as.numeric(fileVSAT$BMI)+0.15 , fileVSAT$VSAT_age3 + fileVSAT$se_age3,
length=0.05,angle=90,code=3)
We repeat this code for the five other graphs, making according changes for the sheet imported and axis-labeling. The plot will look similar to below.
After all six graphs have been generated, we plot the common legend. Recall that the initialized layout consists of 4 rows. The six graphs have taken up three rows, so the legend goes into the fourth row.
The ‘plot’ function initializes a blank plot, so that the legend can be manually added in the next lines of code.
‘plot_colors’ is the color labels for the legend, corresponding to the color of each of the three line graphs (i.e. age groups). The ‘legend’ function plots the legend. The argument ‘x=’ specifies the location of the legend. ‘inset’ specifies the adjustment of the legend position, none needed here. ‘legend’ specifies the age group labels. ‘bty’ specifies whether a box outline around the legend is needed (‘n’ means none). ‘col’ specifies the color labels corresponding to the ‘legend’ argument. ‘lwd’ specfifies width of the color label. ‘seg.len’ specifies length of the color label. ‘horiz’ specifies whether the legend will be plotted horizontally (‘true’ means horizontal). ‘y.intersp’ specifies the vertical distance apart between each line of the legend. ‘x.intersp’ specifies the same, but for the horizontal distance.
plot(1,type='n',axes=FALSE, xlab='',ylab = '') # blank plot generated on 4th row
plot_colors<- c('darkorange','darkgreen','blue') # specify colors for each age group line graph
legend(x='top',inset=0, legend=c('Age 30-49','Age 50-65','Age >65'),
bty='n', cex=1.2, col=plot_colors,lwd=3,horiz=TRUE,y.intersp = 0.2,
x.intersp = 0.4,seg.len=0.7) # plot legend
Lastly, we place the main title. Recall in the first lines of code, that we left ample space for the top side of the outer margin where the main title will go.
argument ‘side’ specifies on which side of the plot the title will be placed (3 indicates top). ‘line’ specifies the margin line where the title will be located. ‘outer’ specifies if the title will be in the outer margin (TRUE in this case). ‘cex’ specifies text size. ‘las’ specifies orientation of the text (0=parallel to the axis).
The code is shown below.
mtext(expression(bold('Adiposity indices by age and BMI in the sample population'))
,side = 3, line = 1,outer=TRUE,cex=0.8,las=0)