Passing arguments from the command line in R and Python

What are command line arguments

Usually in Bioinformatics we run programs from the command line. This is not only cool, it’s also practical. It takes much less time and effort to run repeatative jobs from the command line and keep track of what you have done. Command line arguments are options that we pass to the program to do exactly what we want them to do. For example, assume the well-known command ls, which gives the contents of a directory. If I want to obtain a detailed list of the directory, I will run it as ls -l. If I need to get the size of each file in human-readable format, I’ll use ls -h -l or ls -hl etc.

An example: parsing a GEO file

Previously, I wrote a small script to parse GEO files (from Gene Expression Omnibus) (check the tutorial here). The script needs two inputs:

An input file (here it’s the GDS5430.soft) and
A factor to separate the data (here it’s ‘agent’).

## the input file
inputFile="GDS5430.soft"

## i'm interested to split (and color) the dataset based on this factor
myType = 'agent'

Thus, if somebody wants to run everything for another file and/or another factor, he should open the source code and change the values of the variables. **This is not advisable however, because:

It takes time
You may, by mistake, modify the code and introduce bugs.

Therefore, the best option is to provide the option to insert arguments from the command line and to modify, thus, the values of some program variables. Let’s see how this work in Python and R.

Python

We will import the argparse package.

import argparse
## construct a command line parser object. This initiates the command line parser. It does nothing yet...
parser = argparse.ArgumentParser(description='Parse GEO Files')
## This is a useful line because it adds an option that we are interested in. The input file
parser.add_argument('-i', '--input', help='input file', required=True)
## The factor
parser.add_argument('-f', '--factor', help='a factor that splits the data (individuals)', required=True)

## Note that I provide both a short flag and a long flag. This is not necessary, but it's nicer I think.

## Parse the command line argument
args = parser.parse_args()

## I can extract the arguments by using the 'args' (result of parsing) and the long name of the flag
myType=args.factor
inputFile=args.input

As an example, assume that the program is called ./geo.py. If I will just run it as:

./geo.py

I will get an error message:

usage: geo.py [-h] -i INPUT -f FACTOR
geo.py: error: the following arguments are required: -i/--input, -f/--factor

Now, I can run the code as:

./geo.py -i GDS5430.soft -f agent

R

In R things are even simpler. You can use the command:

args <- commandArgs(trailingOnly = TRUE)

The variable args is a vector. Thus, args[1] will be the first argument, args[2] the second etc…

Passing arguments from the command line in R and Python

Pavlos Pavlidis

March 16, 2019

What are command line arguments

An example: parsing a GEO file

Python

R