Usually in Bioinformatics we run programs from the command line. This is not only cool, it’s also practical. It takes much less time and effort to run repeatative jobs from the command line and keep track of what you have done. Command line arguments are options that we pass to the program to do exactly what we want them to do. For example, assume the well-known command ls
, which gives the contents of a directory. If I want to obtain a detailed list of the directory, I will run it as ls -l
. If I need to get the size of each file in human-readable format, I’ll use ls -h -l
or ls -hl
etc.
Previously, I wrote a small script to parse GEO files (from Gene Expression Omnibus) (check the tutorial here). The script needs two inputs:
## the input file
inputFile="GDS5430.soft"
## i'm interested to split (and color) the dataset based on this factor
myType = 'agent'
Thus, if somebody wants to run everything for another file and/or another factor, he should open the source code and change the values of the variables. **This is not advisable however, because:
Therefore, the best option is to provide the option to insert arguments from the command line and to modify, thus, the values of some program variables. Let’s see how this work in Python and R.
We will import the argparse
package.
import argparse
## construct a command line parser object. This initiates the command line parser. It does nothing yet...
parser = argparse.ArgumentParser(description='Parse GEO Files')
## This is a useful line because it adds an option that we are interested in. The input file
parser.add_argument('-i', '--input', help='input file', required=True)
## The factor
parser.add_argument('-f', '--factor', help='a factor that splits the data (individuals)', required=True)
## Note that I provide both a short flag and a long flag. This is not necessary, but it's nicer I think.
## Parse the command line argument
args = parser.parse_args()
## I can extract the arguments by using the 'args' (result of parsing) and the long name of the flag
myType=args.factor
inputFile=args.input
As an example, assume that the program is called ./geo.py
. If I will just run it as:
./geo.py
I will get an error message:
usage: geo.py [-h] -i INPUT -f FACTOR
geo.py: error: the following arguments are required: -i/--input, -f/--factor
Now, I can run the code as:
./geo.py -i GDS5430.soft -f agent
In R things are even simpler. You can use the command:
args <- commandArgs(trailingOnly = TRUE)
The variable args
is a vector. Thus, args[1]
will be the first argument, args[2]
the second etc…