Here is the Gram-Schmidt stuff. It has two code chunks

The chunk gs.c defines Gram-Schmidt in C, in such a way that it can be called from R. It does not do pivoting and it assumes the input matrix is of full column rank.

The chunk gs.R defines the R functions gs() and gsrc(). The first one does Gram-Schmidt in R, the second calls the Gram-Schmidt in C. First the code in C.

#include <math.h>

void
gsc(double *x, double *q, double *r, int *n, int *m)
{
    int             i, j, l, jn, ln, jm, imax = *n, jmax = *m;
    double          s = 0.0;
    for (i = 0; i < imax; i++)
        s += *(x + i) * *(x + i);
    s = sqrt(s);
    *r = s;
    for (i = 0; i < imax; i++)
        *(q + i) = *(x + i) / s;
    for (j = 1; j < jmax; j++) {
        jn = j * imax;
        jm = j * jmax;
        for (l = 0; l < j; l++) {
            ln = l * imax;
            s = 0.0;
            for (i = 0; i < imax; i++)
                s += *(q + ln + i) * *(x + jn + i);
            *(r + jm + l) = s;
            for (i = 0; i < imax; i++)
                *(q + jn + i) += s * *(q + ln + i);
        }
        for (i = 0; i < imax; i++)
            *(q + jn + i) = *(x + jn + i) - *(q + jn + i);
        s = 0.0;
        for (i = 0; i < imax; i++)
            s += *(q + jn + i) * *(q + jn + i);
        s = sqrt(s);
        *(r + jm + j) = s;
        for (i = 0; i < imax; i++)
            *(q + jn + i) /= s;
    }
}

And now the code in R.

gs <- function(x) {
  n <- dim(x)[1]
  m <- dim(x)[2]
  q <- matrix(0,n,m)
  r <- matrix(0,m,m)
  qi <- x[,1]
  si <- sqrt(sum(qi ^ 2))
  q[,1] <- qi / si
  r[1,1] <- si
  for (i in 2:m) {
    xi <- x[,i]
    qj <- q[,1:(i - 1)]
    rj <- t(qj) %*% xi
    qi <- xi - qj %*% rj
    r[1:(i - 1),i] <- rj
    si <- sqrt(sum(qi ^ 2))
    q[,i] <- qi / si
    r[i,i] <- si
  }
  return(list(q = q,r = r))
  }

gsrc <- function(x) {
  n <- dim(x)[1]
  m <- dim(x)[2]
  q <- matrix(0,n,m)
  r <- matrix(0,m,m)
  qr <-
  .C(
  "gsc",as.double(x),as.double(q),as.double(r),as.integer(dim(x)[1]),
  as.integer(dim(x)[2])
  )
  return(list(q = matrix(qr[[2]],n,m),r = matrix(qr[[3]],m,m)))
  }

How does one use this ? In RStudio compilation and building of the shared library is taken care of automatically. Otherwise, we first compile gs.c to a shared library using

R CMD SHLIB gs.c

in the shell. Then in R we say

dyn.load("gs.so")
source("gs.R")

and we run a small example

set.seed (12345)
x<-matrix (rnorm (1000000L), 10000L, 100L)
library (microbenchmark)
mb<-microbenchmark(R = gs(x), C = gsrc(x), Q = qr(x), times = 100L)
mb
## Unit: milliseconds
##  expr       min         lq       mean     median         uq       max
##     R 987.62120 1040.60052 1104.02168 1098.28220 1134.02603 1405.8157
##     C  83.81926   94.27173  110.80002  100.78089  133.08216  167.5365
##     Q  53.12226   57.91658   65.94093   60.34848   64.20902  107.1689
##  neval cld
##    100   c
##    100  b 
##    100 a

Thus for this example the C code is about 8-10 times as fast as the R code. The QR decomposition that comes with R, based on Householder transformations, is again twice as fast.

Note: In a personal communication Bill Venables pointed out (01/18/16) that the above timing comparisons are somewhat unfavorable to our routines, because the standard qr routines in R still have to dig \(Q\) and \(R\) out of the qr structure. So an alternative, and perhaps more suitable comparison, is

mb<-microbenchmark(R = gs(x), C = gsrc(x), Q = {qrx <- qr(x); list(q = qr.Q(qrx), r = qr.R(qrx))}, times = 100L)
mb
## Unit: milliseconds
##  expr       min        lq      mean     median        uq       max neval
##     R 961.36460 1062.7542 1124.3137 1098.15344 1171.5015 1458.0584   100
##     C  87.04156   93.5807  106.1242   98.99528  109.9939  162.4064   100
##     Q 136.50614  153.7398  173.2713  163.18265  192.7677  246.1762   100
##  cld
##    c
##  a  
##   b

Now gsrc is faster than qr, which now includes the cost of the copies and assignments. So a completely fair comparison will be somewhere in between the two benchmark results.