Following along with the paper where \(b = (X^TX)^{-1}X^TY\) (beta values) and \(H = X(X^TX)^{-1}X^T\) (projection matrix), so \(\hat{y} = Hy\)

Lets take a small dataset where it’s easy to see what could happen. In this scenario we will look at how “hp” and “disp” are related to “mpg”. Think of “hp” and “disp” as something like movement and spiking where we want to remove those artifacts from our data. These variables are correlated as I’d imagine many artifacts are that you are trying to remove.

data("mtcars")
cor(mtcars$hp,mtcars$disp)
[1] 0.7909486
y<-mtcars$mpg

let’s get rid of the effect of “hp” on “mpg” first

x1<-cbind(rep(1,nrow(mtcars)),mtcars$hp)
b<-solve(t(x1)%*%x1)%*%t(x1)%*%y
h<-x1%*%solve(t(x1)%*%x1)%*%t(x1)
y.hat<-h%*%y
resids<-mtcars$mpg - y.hat

The residuals (in the context of fmri, our signal of interest with motion removed) should be uncorrelated with our nuisance variable “hp” and that is indeed the case.

round(cor(resids, mtcars$hp))
     [,1]
[1,]    0

OK, let’s move on to the next step of preprocessing. Let’s take our signal we care about (resids) and remove the effect of “disp”, this could be something like spiking.

x2<-cbind(rep(1,nrow(mtcars)),mtcars$disp)
b.2<-solve(t(x2)%*%x2)%*%t(x2)%*%resids
h.2<-x2%*%solve(t(x2)%*%x2)%*%t(x2)
y.hat.2<-h.2%*%resids
resids.2<-resids - y.hat.2

In theory, “resids.2” would be our data with the effects of motion (hp) and spiking (disp) removed, so let’s check.

cor(resids.2,mtcars$disp)
             [,1]
[1,] 3.300913e-16

Disp and resids.2 are uncorrelated which is good. What about resids.2 and hp?. Nope, because hp and disp are correlated (and as a result their projection matrices) we have actually reintroduced some of the covariance by projecting this data back into a space that shares some overlap with the hp (i.e, the motion regressor)

cor(resids.2,mtcars$hp)
          [,1]
[1,] 0.3155504

However, if you get rid of modularization and do everything in one step, you’re good to go.

x3<-cbind(rep(1,nrow(mtcars)),mtcars$hp,mtcars$disp)
b.3<-solve(t(x3)%*%x3)%*%t(x3)%*%y
h.3<-x3%*%solve(t(x3)%*%x3)%*%t(x3)
y.hat.3<-h.3%*%y
resids.3<-y - y.hat.3

Correlation between resids and nuisance regressors is 0.

round(cor(data.frame(resids.3,mtcars$hp,mtcars$disp)),2)
            resids.3 mtcars.hp mtcars.disp
resids.3           1      0.00        0.00
mtcars.hp          0      1.00        0.79
mtcars.disp        0      0.79        1.00
LS0tDQp0aXRsZTogIk1vZHVsYXJpdHkiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KPHN0eWxlIHR5cGU9InRleHQvY3NzIj4NCg0KaDEudGl0bGUgew0KICBmb250LXNpemU6IDM4cHg7DQogIGNvbG9yOiBCbGFjazsNCiAgZm9udC1mYWNlOiBCb2xkOw0KfQ0KDQpib2R5LCB0ZCB7DQogICBmb250LXNpemU6IDE4cHg7DQp9DQpjb2RlLnJ7DQogIGZvbnQtc2l6ZTogMTZweDsNCn0NCnByZSB7DQogIGZvbnQtc2l6ZTogMTRweA0KfQ0KPC9zdHlsZT4NCjxicj4NCg0KRm9sbG93aW5nIGFsb25nIHdpdGggdGhlIHBhcGVyIHdoZXJlICRiID0gKFheVFgpXnstMX1YXlRZJCAoYmV0YSB2YWx1ZXMpIGFuZCAkSCA9IFgoWF5UWCleey0xfVheVCQgKHByb2plY3Rpb24gbWF0cml4KSwgc28gJFxoYXR7eX0gPSBIeSQNCg0KTGV0cyB0YWtlIGEgc21hbGwgZGF0YXNldCB3aGVyZSBpdCdzIGVhc3kgdG8gc2VlIHdoYXQgY291bGQgaGFwcGVuLiAgSW4gdGhpcyBzY2VuYXJpbyB3ZSB3aWxsIGxvb2sgYXQgaG93ICJocCIgYW5kICJkaXNwIiBhcmUgcmVsYXRlZCB0byAibXBnIi4gIFRoaW5rIG9mICJocCIgYW5kICJkaXNwIiBhcyBzb21ldGhpbmcgbGlrZSBtb3ZlbWVudCBhbmQgc3Bpa2luZyB3aGVyZSB3ZSB3YW50IHRvIHJlbW92ZSB0aG9zZSBhcnRpZmFjdHMgZnJvbSBvdXIgZGF0YS4gIFRoZXNlIHZhcmlhYmxlcyBhcmUgY29ycmVsYXRlZCBhcyBJJ2QgaW1hZ2luZSBtYW55IGFydGlmYWN0cyBhcmUgdGhhdCB5b3UgYXJlIHRyeWluZyB0byByZW1vdmUuDQoNCmBgYHtyfQ0KZGF0YSgibXRjYXJzIikNCg0KY29yKG10Y2FycyRocCxtdGNhcnMkZGlzcCkNCg0KeTwtbXRjYXJzJG1wZw0KYGBgDQoNCmxldCdzIGdldCByaWQgb2YgdGhlIGVmZmVjdCBvZiAiaHAiIG9uICJtcGciIGZpcnN0DQoNCmBgYHtyfQ0KDQp4MTwtY2JpbmQocmVwKDEsbnJvdyhtdGNhcnMpKSxtdGNhcnMkaHApDQoNCmI8LXNvbHZlKHQoeDEpJSoleDEpJSoldCh4MSklKiV5DQoNCmg8LXgxJSolc29sdmUodCh4MSklKiV4MSklKiV0KHgxKQ0KDQp5LmhhdDwtaCUqJXkNCg0KcmVzaWRzPC1tdGNhcnMkbXBnIC0geS5oYXQNCg0KYGBgDQoNClRoZSByZXNpZHVhbHMgKGluIHRoZSBjb250ZXh0IG9mIGZtcmksIG91ciBzaWduYWwgb2YgaW50ZXJlc3Qgd2l0aCBtb3Rpb24gcmVtb3ZlZCkgc2hvdWxkIGJlIHVuY29ycmVsYXRlZCB3aXRoIG91ciBudWlzYW5jZSB2YXJpYWJsZSAiaHAiIGFuZCB0aGF0IGlzIGluZGVlZCB0aGUgY2FzZS4NCg0KYGBge3J9DQpyb3VuZChjb3IocmVzaWRzLCBtdGNhcnMkaHApKQ0KDQoNCmBgYA0KDQpPSywgbGV0J3MgbW92ZSBvbiB0byB0aGUgbmV4dCBzdGVwIG9mIHByZXByb2Nlc3NpbmcuICBMZXQncyB0YWtlIG91ciBzaWduYWwgd2UgY2FyZSBhYm91dCAocmVzaWRzKSBhbmQgcmVtb3ZlIHRoZSBlZmZlY3Qgb2YgImRpc3AiLCB0aGlzIGNvdWxkIGJlIHNvbWV0aGluZyBsaWtlIHNwaWtpbmcuDQoNCmBgYHtyfQ0KDQp4MjwtY2JpbmQocmVwKDEsbnJvdyhtdGNhcnMpKSxtdGNhcnMkZGlzcCkNCg0KYi4yPC1zb2x2ZSh0KHgyKSUqJXgyKSUqJXQoeDIpJSolcmVzaWRzDQoNCmguMjwteDIlKiVzb2x2ZSh0KHgyKSUqJXgyKSUqJXQoeDIpDQoNCnkuaGF0LjI8LWguMiUqJXJlc2lkcw0KDQpyZXNpZHMuMjwtcmVzaWRzIC0geS5oYXQuMg0KDQpgYGANCg0KDQpJbiB0aGVvcnksICJyZXNpZHMuMiIgd291bGQgYmUgb3VyIGRhdGEgd2l0aCB0aGUgZWZmZWN0cyBvZiBtb3Rpb24gKGhwKSBhbmQgc3Bpa2luZyAoZGlzcCkgcmVtb3ZlZCwgc28gbGV0J3MgY2hlY2suIA0KDQpgYGB7cn0NCg0KY29yKHJlc2lkcy4yLG10Y2FycyRkaXNwKQ0KDQpgYGANCg0KRGlzcCBhbmQgcmVzaWRzLjIgYXJlIHVuY29ycmVsYXRlZCB3aGljaCBpcyBnb29kLiBXaGF0IGFib3V0IHJlc2lkcy4yIGFuZCBocD8uICBOb3BlLCBiZWNhdXNlIGhwIGFuZCBkaXNwIGFyZSBjb3JyZWxhdGVkIChhbmQgYXMgYSByZXN1bHQgdGhlaXIgcHJvamVjdGlvbiBtYXRyaWNlcykgd2UgaGF2ZSBhY3R1YWxseSByZWludHJvZHVjZWQgc29tZSBvZiB0aGUgY292YXJpYW5jZSBieSBwcm9qZWN0aW5nIHRoaXMgZGF0YSBiYWNrIGludG8gYSBzcGFjZSB0aGF0IHNoYXJlcyBzb21lIG92ZXJsYXAgd2l0aCB0aGUgaHAgKGkuZSwgdGhlIG1vdGlvbiByZWdyZXNzb3IpDQoNCmBgYHtyfQ0KDQpjb3IocmVzaWRzLjIsbXRjYXJzJGhwKQ0KDQoNCmBgYA0KDQpIb3dldmVyLCBpZiB5b3UgZ2V0IHJpZCBvZiBtb2R1bGFyaXphdGlvbiBhbmQgZG8gZXZlcnl0aGluZyBpbiBvbmUgc3RlcCwgeW91J3JlIGdvb2QgdG8gZ28uDQoNCmBgYHtyfQ0KDQp4MzwtY2JpbmQocmVwKDEsbnJvdyhtdGNhcnMpKSxtdGNhcnMkaHAsbXRjYXJzJGRpc3ApDQoNCmIuMzwtc29sdmUodCh4MyklKiV4MyklKiV0KHgzKSUqJXkNCg0KaC4zPC14MyUqJXNvbHZlKHQoeDMpJSoleDMpJSoldCh4MykNCg0KeS5oYXQuMzwtaC4zJSoleQ0KDQpyZXNpZHMuMzwteSAtIHkuaGF0LjMNCg0KDQpgYGANCg0KQ29ycmVsYXRpb24gYmV0d2VlbiByZXNpZHMgYW5kIG51aXNhbmNlIHJlZ3Jlc3NvcnMgaXMgMC4NCg0KYGBge3J9DQpyb3VuZChjb3IoZGF0YS5mcmFtZShyZXNpZHMuMyxtdGNhcnMkaHAsbXRjYXJzJGRpc3ApKSwyKQ0KYGBgDQoNCg==