Skeleton tutorial

Raffaele A Calogero

Dissecting the skeleton.R

The skeleton function allows to control a bash script, skeleton.sh, located in docker.io/repbioinfo/ubuntu image in /bin.

The skeleton function has three parameters:

skeleton(group="docker", scratch.folder, data.folder)

The first step in the skeleton function is storing the working folder and grabbing the process time for subsequent performance evaluation.

  #storing the position of the home folder  
  home <- getwd()
  #running time 1
  ptm <- proc.time()

Then, it is tested if docker demon is running,

  #testing if docker is running
  test <- dockerTest()
  if(!test){
    cat("\nERROR: Docker seems not to be installed in your system\n")
    return()
  }

checking if data folder exists and setting it as working folder,

  #setting the data.folder as working folder
  if (!file.exists(data.folder)){
    cat(paste("\nIt seems that the ",data.folder, " folder does not exist\n"))
    return(2)
  }
  setwd(data.folder)

checking if scratch folder exists and creating a temporary folder.

  #check  if scratch folder exist
  if (!file.exists(scratch.folder)){
    cat(paste("\nIt seems that the ",scratch.folder, " folder does not exist\n"))
    return(3)
  }
  tmp.folder <- gsub(":","-",gsub(" ","-",date()))
  scrat_tmp.folder=file.path(scratch.folder, tmp.folder)
  writeLines(scrat_tmp.folder,paste(data.folder,"/tempFolderID", sep=""))
  cat("\ncreating a folder in scratch folder\n")
  dir.create(file.path(scrat_tmp.folder))

Executing the docker command:

  #executing the docker job
  if(group=="sudo"){
    params <- paste("--cidfile ",data.folder,"/dockerID -v ",scrat_tmp.folder,":/scratch -v ", data.folder, ":/data -d docker.io/repbioinfo/ubuntu sh /bin/skeleton.sh", sep="")
    resultRun <- runDocker(group="sudo",container="docker.io/repbioinfo/ubuntu", params=params)
  }else{
    params <- paste("--cidfile ",data.folder,"/dockerID -v ",scrat_tmp.folder,":/scratch -v ", data.folder, ":/data -d docker.io/repbioinfo/ubuntu sh /bin/skeleton.sh", sep="")
    resultRun <- runDocker(group="docker",container="docker.io/repbioinfo/ubuntu", params=params)
  }

The skeleton.sh scripts in docker.io/repbioinfo/ubuntu is the following:

#!/bin/bash
echo "skeleton 0.0.1"
#setting the scratch folder as workinng directory
SCRATCH_FOLDER=/scratch
DATA_FOLDER=/data
#moving to scratch folder
cd $SCRATCH_FOLDER
#adding information to run.info file or creating a run.info file
file="run.info"
if [ -f "$file" ]
then
        echo "skeleton 0.0.1" >> $SCRATCH_FOLDER/run.info
else
        echo "skeleton 0.0.1" > $SCRATCH_FOLDER/run.info
fi
#writing the result file helloworld in data scratch
echo "hello world" > $SCRATCH_FOLDER/helloworld.txt
# creating the out.info file indicating that run is finished 
echo "analysis is finished" > $SCRATCH_FOLDER/out.info
#changing the properties of files and folders in /data/scratch 
chmod 777 -R $SCRATCH_FOLDER/*

It writes hello world in the helloworld.txt and moves helloworld.txt to the scratch folder together with the run.info file, used to store information about the run, and the out.info, used to tell to the R script when the doker job is finished. The skeleton.sh scripts is a prototype for the handling of docker application(s).

Lets go back to the skeleton.R dissection:

The resultRun is used to check when the docker job is finished. The results are copied from the scratch to the data folder

 #when container ends copy results from scratch in data.folder
 if(resultRun=="false"){
    system(paste("cp ", scrat_tmp.folder, "/* ", data.folder, sep=""))
  }

Then, the computing time is estimated and saved in the run.info file

  #running time 2
  ptm <- proc.time() - ptm
  dir <- dir(data.folder)
  dir <- dir[grep("run.info",dir)]
  if(length(dir)>0){
    con <- file("run.info", "r")
    tmp.run <- readLines(con)
    close(con)
    tmp.run[length(tmp.run)+1] <- paste("user run time mins ",ptm[1]/60, sep="")
    tmp.run[length(tmp.run)+1] <- paste("system run time mins ",ptm[2]/60, sep="")
    tmp.run[length(tmp.run)+1] <- paste("elapsed run time mins ",ptm[3]/60, sep="")
    writeLines(tmp.run,"run.info")
  }else{
    tmp.run <- NULL
    tmp.run[1] <- paste("run time mins ",ptm[1]/60, sep="")
    tmp.run[length(tmp.run)+1] <- paste("system run time mins ",ptm[2]/60, sep="")
    tmp.run[length(tmp.run)+1] <- paste("elapsed run time mins ",ptm[3]/60, sep="")
    writeLines(tmp.run,"run.info")
  }

The log of the docker job (describing the events observed during the execution of the container) is saved with a name made of the first 12 letters of the docker job ID. After saving the docker instance log, the scratch folder and the docker container are deleted. The dockerID and tempFolderID files are also deleted. The list of the dockers installed on the machine is saved in the data.folder.

   #saving log and removing docker container
  container.id <- readLines(paste(data.folder,"/dockerID", sep=""), warn = FALSE)
  system(paste("docker logs ", substr(container.id,1,12), " &> ",data.folder,"/", substr(container.id,1,12),".log", sep=""))
  system(paste("docker rm ", container.id, sep=""))
  #removing temporary folder
  cat("\n\nRemoving the temporary file ....\n")
  system(paste("rm -R ",scrat_tmp.folder))
  system("rm -fR out.info")
  system("rm -fR dockerID")
  system("rm  -fR tempFolderID")
  system(paste("cp ",paste(path.package(package="docker4seq"),"containers/containers.txt",sep="/")," ",data.folder, sep=""))
  setwd(home)