02 mai, 2023

Summary

  • Export/Import one sheet in a Workbook

  • Export/Import multiple sheets in a Workbook

  • Create workbook with multiple type data and figures

  • Create workbook with multiple formatting styles

  • Add comments adaptively

Introduction to the openxlsx package

  • Other packages : XLConnect, readxl, xlsx, etc.

  • It’s not the fastest in reading Excel Data, but has high number of features.

  • 82 R functions read, write, style and edit Excel spreadsheets.

  • Does not depend on Java.

Introduction to the openxlsx package

Read/Write Speed benchmarking

#-- Create 1e6 x 10 table. 
M <- data.frame(matrix(rnorm(1e7), ncol = 10))

#-- Write speed toward Excel 
a <- Sys.time() ; openxlsx::write.xlsx(M, 'M.xlsx') ; b <- Sys.time()
b-a  # Time difference of 1.0 mins


a<- Sys.time() ; writexl::write_xlsx(M , "M_writexl.xlsx") ; b<- Sys.time()
b-a # Time difference of 30.6 secs


#-- Reading speed from Excel 
a <- Sys.time() ; openxlsx::read.xlsx('M.xlsx') ; b <- Sys.time()
b-a # Time difference of 48.2 secs

a <- Sys.time()  ; readxl::read_xlsx('M.xlsx') ; b <- Sys.time()
b-a # Time difference of 20.3 secs

Set the basics for this workshop

 #-- Fix your own working directory
 # Example :
 setwd("C:\\Users\\oallatif\\CIRI\\workshop_R_Excel\\")


 #-- Attach needed packages
 library(openxlsx)  # R <=> Excel connection. 
 library(tidyverse) # Data processing tasks.
 library(biomaRt)   # Interface to BioMart databases.




If you want, you can download the complete R code used in this Workshop as a .R file.

Download R <=> Excel Workshop R code

R <=> Excel









Straightforward use

Export/Import one Excel Worksheet

2 main functions : write.xlsx & read.xlsx

Write to Excel from R

write.xlsx(x = trees, file = "arbres.xlsx")

Read in R from Excel

arbre <- read.xlsx(xlsxFile = "arbres.xlsx") 
arbre %>% head()
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Export multiple R objects









to Excel

Helpful knowledge for R <=> Excel connection

The R   list   type

The list is a flexible and versatile structure of the R language. It is a special object whose elements can be of any kind : strings, numbers, vectors … even other lists.

Example :

#-- list contents
x <- c("January", "February", "March", "April") ; y <- c(8,4,2,7,9,6,1,10,5,3) ; z <- mpg

ma.list <- list(x,y,z)
R structuresR structuresR structures

R structures

Export multiple sheets to a Workbook

Write to Excel from R

#-- List of 3 objects - Same type.
species <- iris  %>%  split(iris$Species) 
species %>%  write.xlsx("Species.xlsx" , overwrite = TRUE)
#-- List of 6 objects - Same type.
wool.tension <- split(warpbreaks, list(warpbreaks$wool , warpbreaks$tension))
write.xlsx(wool.tension , "wool.tension.xlsx" , overwrite = TRUE)
#-- List of 4 objects - different types.
myObjs <- list(billboard  = billboard                       , 
               rand.norm  = rnorm(500)                      , 
               rand.pois  = rpois(50,2)                     , 
               rand.binom = rbinom(50,10,0.3)               , 
               people     = c("Pierre" , "Marie" , "Kacou") , 
               emptySheet = " "
               )

write.xlsx(myObjs , "myObjs.xlsx" ,  overwrite = T) 
#The argument "colWidths = 'auto'" Works on Windows!!

Import multiple Excel









Worksheets into R

Import multiple sheets from a Workbook

Read in R from Excel

2 main functions : loadWorkbook & readWorkbook

#-- Import the whole workbook into R.
MyWB <- loadWorkbook("myObjs.xlsx")

#-- Get sheet names from the imported workbook.
sheets(MyWB)
## [1] "billboard"  "rand.norm"  "rand.pois"  "rand.binom" "people"    
## [6] "emptySheet"
#-- Get the contents of a specific worksheet into an R data.frame.
norm.distr <- readWorkbook(MyWB , sheet = "rand.norm", colNames = F)
head(norm.distr)
##           X1
## 1 -0.7677441
## 2 -0.8682696
## 3  1.0245288
## 4  2.7748300
## 5  1.0699806
## 6  0.2597911

Import multiple sheets from a Workbook

Handle imported data as R objects

#-- Create, modify and delete data.
mutate(norm.distr , Random = X1 , X1 = NULL , Density = dnorm(Random,0,1)) -> norm.prob  
 
ggplot(norm.prob , aes(Random, Density))+ #Probability density.
  geom_point(alpha = 0.06, size = 4 , col = "magenta" , shape = 19)

Import multiple sheets from a Workbook

To import multiple sheets at once, we need a for loop. Also We will use the assign function.

Here is an Example of a for loop :

#-- Put all sheet names in a vector.
sheetNamesInMyWB <- sheets(MyWB)

#-- 'for' loop that displays all sheet names.
for(s in sheetNamesInMyWB){print(s)}
## [1] "billboard"
## [1] "rand.norm"
## [1] "rand.pois"
## [1] "rand.binom"
## [1] "people"
## [1] "emptySheet"

Import multiple sheets from a Workbook

How does the ‘assign’ function work ? Here are 2 examples :

#-- assign to the variable named 'lili' the value 25.
assign("lili" , 25) 
lili
## [1] 25
#-- Assign to the variable named "salut" the character string "Bonjour Martin !"
(assign("salut" , "Bonjour Martin !"))
## [1] "Bonjour Martin !"

And now, we are ready !

#-- Loop over the Workbook sheets to assign to each sheet name its contents.
for(s in sheetNamesInMyWB){assign(s , readWorkbook(MyWB , sheet = s))}

Delete & Rename Excel sheets from within R

#-- Remove a sheet from a Workbook.
#Example : remove, in 2 steps, sheets whose name contains the character string 'rand' :

#-- First, put the sheets whose name contains the character string 'rand' in a vector.
toDelete <- sheets(MyWB)[grep("rand", sheets(MyWB))]

#-- Second, use a 'for' loop to remove the sheets in the 'toDelete' vector. 
for(r in toDelete){removeWorksheet(MyWB , r)}

#-- Check the names of the remaining sheets
sheets(MyWB)
## [1] "billboard"  "people"     "emptySheet"
#-- Renames Workbook sheets.
names(MyWB) <- c("table" , "family" , "noThing")

#-- Send 'MyWB' now to Excel as a physical '.xlsx' file.
saveWorkbook(MyWB , "myObjs_NoRand.xlsx" , overwrite = T)

#-- Open an Excel Workbook from within R. 
#openXL("myObjs_NoRand.xlsx")

Creating custom Excel









Workbooks from within R

Different object types in a sheet

4 main functions : createWorkbook, addWorksheet, writeData & saveWorkbook


Multiple objects

Multiple objects

Different object types in a sheet

#-- Generate data : table, model & graph.
Data   <- iris[, c("Sepal.Length" , "Species")]

result <- summary(lm(Sepal.Length ~ Species , data = Data))$coeff

gg <- ggplot(Data , aes(y = Sepal.Length, x = Species))+
        geom_boxplot(aes(fill = Species), show.legend = FALSE)+
        geom_jitter(width = 0.1, shape = 20 , size = 6 , col = "blue", alpha = 0.5 )
print(gg)

Different object types in a sheet

#-- Create a blank workbook, named wb3.
wb3 <- createWorkbook()

#-- Insert in wb3 a blank sheet, named "Sepal.Length".
addWorksheet(wb3, "Sepal.Length")

#-- Start filling the sheet "Sepal.Length" with the desired R objects to export to Excel.
writeData   (wb3, "Sepal.Length" , "Data"         , startRow = 1 , startCol = 1)
writeData   (wb3, "Sepal.Length" , Data           , startRow = 2 , startCol = 1)
writeData   (wb3, "Sepal.Length" , "Model Coeff." , startRow = 1 , startCol = 5)
writeData   (wb3, "Sepal.Length" , result         , startRow = 2 , startCol = 5)

print(gg)
insertPlot  (wb3, "Sepal.Length"                  , startRow = 10, startCol = 5)

#-- Stop filling, send the Workbook 'wb3' to Excel now.
saveWorkbook(wb3, "wb3.xlsx", overwrite = TRUE)

#openXL("wb3.xlsx")

Different object types in a sheet

Iterate over multiple sheets

wb4 <- createWorkbook()
for(i in 1:4){
  Data   <- iris[, c(i , 5)]
  quanti <- names(Data)[1] ; quali <- names(Data)[2]
  result <- summary(lm(Data[,quanti] ~ Data[,quali]))$coeff
  g1 <- ggplot(Data , aes(x = get(quali) , y = get(quanti) ))+
    geom_boxplot(aes(fill = get(quali)), show.legend = FALSE)+
    geom_jitter(width = 0.1, shape = 20 , size = 6 , col = "blue", alpha = 0.5)+
    labs(x = quali, y = quanti)
  #ggsave(paste0("lm.",quanti,".png"))
  
  addWorksheet(wb4, quanti                                                              )
  writeData   (wb4, quanti , "Data"         , startRow = 1 , startCol = 1)
  writeData   (wb4, quanti , Data           , startRow = 2 , startCol = 1)
  writeData   (wb4, quanti , "Model Coeff." , startRow = 1 , startCol = 5)
  writeData   (wb4, quanti , result         , startRow = 2 , startCol = 5)
  print(g1)
  insertPlot  (wb4, quanti                  , startRow = 10, startCol = 5)
  insertImage (wb4, quanti , "iris2.png"    , startRow = 30, startCol = 5)
}
#link to iris flower : https://tinyurl.com/fafxecrz

saveWorkbook(wb4, "wb4.xlsx", overwrite = TRUE)
##openXL("wb4.xlsx")

Styling Excel Workbooks









from within R

Sheet formatting style

Create style and assign it to specific sheet range

2 main functions : createStyle & addStyle

Add style to sheet

Add style to sheet

Sheet formatting style

Create style and assign it to specific sheet range

#-- Import the Workbook created in the previous step.
wb5 <- loadWorkbook("wb4.xlsx")

#-- Create a style named 'CoeffStyle'.
CoeffStyle <- createStyle(fgFill = "green" , fontColour = "red" , borderColour = "blue",
                          border = c("top", "bottom", "left", "right"))

#-- Apply the created style 'CoeffStyle' to specific worksheet range.
addStyle(wb5, "Sepal.Width" , style = CoeffStyle , 
         rows = 2:5 , cols = 5:8 , gridExpand = TRUE)

saveWorkbook(wb5, "wb5.xlsx", overwrite = TRUE)
#openXL("wb5.xlsx")

Sheet formatting style

Create styles and use them adaptively

Create styles

Create styles

Sheet formatting style

Create styles and use them adaptively

#-- Import an RNA-Seq differential expression results table (Copy & Paste 
# the code below in your R script).
# rna.seq <- read.table("https://t.ly/V5Lb" , header = T , sep = '\t' , dec = ',')
# write.xlsx(rna.seq , "rna.seq.xlsx" , colWidths = "auto"  ,  overwrite = TRUE)

rna.seq <- read.xlsx("rna.seq.xlsx")

#-- Create 2 styles
logfc.pos <- createStyle(fgFill = "#FFD715")
logfc.neg <- createStyle(fgFill = "#88CEEB")

wb6 <- createWorkbook()
addWorksheet(wb6, "rna-seq")
writeData(wb6 , "rna-seq" , rna.seq)

#-- Apply the style according to the LogFC value.
for (i in 1:nrow(rna.seq)) {
 if (rna.seq[i, "logFC"] >= 0) {
  addStyle(wb6 , "rna-seq" , style = logfc.pos , rows = i + 1 , cols = 1:6)
 } else{
  addStyle(wb6 , "rna-seq" , style = logfc.neg , rows = i + 1 , cols = 1:6)
 }
}

saveWorkbook(wb6 , "wb6.xlsx" , overwrite = T)

#openXL("wb6.xlsx")

Sheet formatting style

Create styles and use them adaptively

Add more constraints

rna.seq <- read.xlsx("rna.seq.xlsx")
#-- 2 styles
logfc.pos <- createStyle(fgFill = "#FFD715")
logfc.neg <- createStyle(fgFill = "#88CEEB")

wb7 <- createWorkbook()
addWorksheet(wb7, "rna-seq")
writeData(wb7 , "rna-seq" , rna.seq)

for (i in 1:nrow(rna.seq)) {
 if (grepl("interferon", rna.seq[i, "description"])) {
  if (rna.seq[i, "logFC"] >= 0) {
   addStyle( wb7 , "rna-seq" , style = logfc.pos , rows = i + 1 , cols = 1:6)
  } else{
   addStyle(wb7 , "rna-seq", style = logfc.neg, rows = i + 1 , cols = 1:6)
  }
 }
}

saveWorkbook(wb7 , "wb7.xlsx" , overwrite = T)
#openXL("wb7.xlsx")

Sheet formatting style

Create styles on the fly in a ‘for’ loop

wb8 <- createWorkbook()
addWorksheet(wb8, "rna-seq")
writeData(wb8 , "rna-seq" , rna.seq)

#-- 2 styles
for(i in 1:nrow(rna.seq)) {
 if (rna.seq[i, "logFC"] > 0) {
  logfc.pos <-
   createStyle(fgFill = "#FFD715" ,
               fontSize = 3 * pmax(1, abs(rna.seq[i, "logFC"])))
  
  addStyle(wb8 , "rna-seq" , style = logfc.pos , rows = i + 1 , cols = 1:6)
 } else{
  logfc.neg <- createStyle(fgFill = "#88CEEB",
               fontSize = 3 * pmax(1, abs(rna.seq[i, "logFC"])))
  
  addStyle(wb8 , "rna-seq", style = logfc.neg, rows = i + 1 ,   cols = 1:6)
 }
}

saveWorkbook(wb8 , "wb8.xlsx" , overwrite = T)
#openXL("wb8.xlsx")

Writting comments in Excel









Workbooks from within R

Write comments

2 main functions createComment & writeComment

Create comments

Create comments

Simple comment

#-- Straightforward use
wb9 <- createWorkbook()
addWorksheet(wb9 , "rna-seq"          )
writeData   (wb9 , "rna-seq" , rna.seq)

#-- Create your comment. 
myFirstComment <- createComment(comment = "Comment to be inserted in the RNA-Seq file.",
 visible = FALSE
)

#-- Insert your comment in the desired Excel cell.
writeComment(wb9 , "rna-seq" , row = 1 , col = 1 , comment = myFirstComment)

saveWorkbook(wb9 , "wb9.xlsx"  , overwrite = T)
#openXL("wb9.xlsx")

Stylized comment

#-- Create styles.
Sty1 <- createStyle(fontSize = 12, fontColour = "red", textDecoration = c("BOLD"))
Sty2 <- createStyle(fontSize = 9, fontColour  = "blue")

wb10 <- createWorkbook()
addWorksheet(wb10 , "rna-seq"          )
writeData   (wb10 , "rna-seq" , rna.seq)

#-- Create comment and apply styles on it.
coloredComment <- 
  createComment(
    comment = c(paste("Cette table contient", nrow(rna.seq), "genes significatifs dont" , 
                      sum(rna.seq$logFC>0),"Up regulated et " ),
                paste(sum(rna.seq$logFC<0) , "Down Regulated")
                ),
    style  = c(Sty1 , Sty2) , 
    author = "BIBS Workshop", visible = FALSE , width = 4 , height = 3
  )

writeComment(wb10 , "rna-seq" , row = 1 , col = 1 , comment = coloredComment)

saveWorkbook(wb10 , "wb10.xlsx"  , overwrite = T)
#openXL("wb10.xlsx")

Dynamic comments

Example with a biomaRt query

#-- Connect to a BioMart database and dataset. Default host = "https://www.ensembl.org"
mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL" , dataset = "hsapiens_gene_ensembl")
listMarts() # display available databases in a host. 

db <- useMart("ENSEMBL_MART_ENSEMBL")

listDatasets(db) # display available datasets in a given database.

mart <- useMart("ENSEMBL_MART_ENSEMBL" , dataset = "hsapiens_gene_ensembl")

listAttributes(mart) # display available attributes for a given dataset.

#-- Extract specified attributes from a database according to a filter features.
getBM(mart       = mart             , 
      attributes = c("ensembl_gene_id", 
                     "protein_id" , 
                     "entrezgene_id" , 
                     "chromosome_name", 
                     "strand"),
      
      filters    = "ensembl_gene_id", 
      values     = c( "ENSG00000088827","ENSG00000184347","ENSG00000128335")
)

Dynamic comments

Write as a comment the proteins Id coded by specific genes

wb11 <- createWorkbook()
addWorksheet(wb11 , "rna-seq"          )
writeData   (wb11 , "rna-seq" , rna.seq)

for(g in 1:nrow(rna.seq)){
  if(grepl("binding protein" , rna.seq$description[g])){
    
    results <- getBM(attributes = "protein_id"                ,
                     filters    = "ensembl_gene_id"           , 
                     values     =  rna.seq$ensembl_gene_id[g] , 
                     mart       =  mart
                     )
    
    myDynamicComment <- createComment(
     comment = paste(results$protein_id , collapse = " " , sep = "\n") , visible = F
    )
    
    writeComment(wb11 , "rna-seq" , row = g+1 , col = 1 , comment = myDynamicComment)
  }
}
saveWorkbook(wb11 , "wb11.xlsx" , overwrite = T)
#openXL("wb11.xlsx")

*** The end ***