In this project, we’re interested in determining whether specific serotonergic regulator genes interact with early life stress to predict hoarding symptoms (difficulty discarding, excessive acquiring, and clutter).

Summary of genotyping

##     
##       SS  SL  LL
##   GG   0   0   1
##   AG   1  24  30
##   AA  67 147  86

0 people are SgSg, 0 people are SgLg, 1 person is LgLg 1 person is SaSg, 24 people are SaLg, 30 people are LaLg 67 people are SaSa, 147 people are SaLa, 86 people are LaLa

Create effect-coded variables for 3 genotype groups with combined 5-HTTLPR and rs25531: S/S vs. La/La vs. all other genotypes

Check for normality of distribution of predictor variables:

hist(genes$BDI_tot)

hist(genes$NLEQ_tot)

hist(genes$SC_tot)

skew(genes$BDI_tot)
##  skew (g1)         se          z          p 
##  2.1999871  0.1298227 16.9460900  0.0000000
skew(genes$NLEQ_tot)
## skew (g1)        se         z         p 
## 1.1583839 0.1298227 8.9228149 0.0000000
skew(genes$SC_tot)
##   skew (g1)          se           z           p 
## -0.02452497  0.21997067 -0.11149199  0.91122621
kurtosis(genes$BDI_tot)
## Excess Kur (g2)              se               z               p 
##       7.3880089       0.2596454      28.4542267       0.0000000
kurtosis(genes$NLEQ_tot)
## Excess Kur (g2)              se               z               p 
##    1.435857e+00    2.596454e-01    5.530070e+00    3.201037e-08
kurtosis(genes$SC_tot)
## Excess Kur (g2)              se               z               p 
##      -0.2600210       0.4399413      -0.5910356       0.5544965

BDI and NLEQ are both positively skewed. Use square root transformation to normalize both variables.

Center predictor variables

This helps to reduce collinearity when testing for interaction terms.

genes$bdi_c <- genes$BDI_tot - mean(genes$BDI_tot, na.rm=T)
genes$nleq_c <- genes$NLEQ_tot - mean(genes$NLEQ_tot, na.rm=T)
genes$nleq_sq_c <- genes$nleq_sq - mean(genes$nleq_sq, na.rm=T)
genes$bdi_sq_c <- genes$bdi_sq - mean(genes$bdi_sq, na.rm=T)
genes$sc_tot_c <- genes$SC_tot - mean(genes$SC_tot, na.rm=T)
hist(genes$sc_tot_c)

GxE interaction regression models

Model 1: Gene by NLEQ interaction effect on SIR:

hoard <- lm(SIR_tot ~ SS*nleq_c + LaLa*nleq_c + sc_tot_c + bdi_c, data=genes)
summary(hoard)
## 
## Call:
## lm(formula = SIR_tot ~ SS * nleq_c + LaLa * nleq_c + sc_tot_c + 
##     bdi_c, data = genes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.039  -4.965  -0.559   4.849  52.196 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.23832    1.00432  19.156  < 2e-16 ***
## SS           0.16084    1.58230   0.102  0.91921    
## nleq_c       0.09327    0.05053   1.846  0.06752 .  
## LaLa         0.23121    1.42248   0.163  0.87117    
## sc_tot_c    -0.44037    0.10153  -4.337 3.13e-05 ***
## bdi_c        0.47292    0.17740   2.666  0.00879 ** 
## SS:nleq_c    0.04899    0.07356   0.666  0.50676    
## nleq_c:LaLa -0.11682    0.06846  -1.706  0.09066 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.304 on 114 degrees of freedom
##   (237 observations deleted due to missingness)
## Multiple R-squared:  0.4439, Adjusted R-squared:  0.4097 
## F-statistic:    13 on 7 and 114 DF,  p-value: 3.299e-12
discard <- lm(SIR_discarding ~ SS*nleq_c + LaLa*nleq_c + sc_tot_c + bdi_c, data=genes)
summary(discard)
## 
## Call:
## lm(formula = SIR_discarding ~ SS * nleq_c + LaLa * nleq_c + sc_tot_c + 
##     bdi_c, data = genes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3578 -2.3676 -0.1154  1.9947 16.9382 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.462325   0.446540  14.472  < 2e-16 ***
## SS          -0.506879   0.703523  -0.720  0.47270    
## nleq_c       0.005195   0.022468   0.231  0.81757    
## LaLa         0.774295   0.632464   1.224  0.22338    
## sc_tot_c    -0.154704   0.045144  -3.427  0.00085 ***
## bdi_c        0.160986   0.078874   2.041  0.04356 *  
## SS:nleq_c    0.018683   0.032707   0.571  0.56897    
## nleq_c:LaLa -0.065380   0.030439  -2.148  0.03384 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.137 on 114 degrees of freedom
##   (237 observations deleted due to missingness)
## Multiple R-squared:  0.3183, Adjusted R-squared:  0.2765 
## F-statistic: 7.605 on 7 and 114 DF,  p-value: 1.63e-07
clutter <- lm(SIR_clutter ~ SS*nleq_c + LaLa*nleq_c + sc_tot_c + bdi_c, data=genes)
summary(clutter)
## 
## Call:
## lm(formula = SIR_clutter ~ SS * nleq_c + LaLa * nleq_c + sc_tot_c + 
##     bdi_c, data = genes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1518 -2.5395 -0.3637  1.3725 14.8876 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.746531   0.408578  11.617   <2e-16 ***
## SS           0.571863   0.643715   0.888   0.3762    
## nleq_c       0.040916   0.020558   1.990   0.0490 *  
## LaLa        -0.215545   0.578696  -0.372   0.7102    
## sc_tot_c    -0.120361   0.041306  -2.914   0.0043 ** 
## bdi_c        0.110481   0.072169   1.531   0.1286    
## SS:nleq_c   -0.004991   0.029926  -0.167   0.8678    
## nleq_c:LaLa -0.012877   0.027851  -0.462   0.6447    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.785 on 114 degrees of freedom
##   (237 observations deleted due to missingness)
## Multiple R-squared:  0.2948, Adjusted R-squared:  0.2515 
## F-statistic: 6.809 on 7 and 114 DF,  p-value: 9.387e-07
acquire <- lm(SIR_acquisitioning ~ SS*nleq_c + LaLa*nleq_c + sc_tot_c + bdi_c, data=genes)
summary(acquire)
## 
## Call:
## lm(formula = SIR_acquisitioning ~ SS * nleq_c + LaLa * nleq_c + 
##     sc_tot_c + bdi_c, data = genes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.1145 -1.9898 -0.0336  1.6548 18.9346 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.56362    0.38340  19.728  < 2e-16 ***
## SS           0.16251    0.60405   0.269 0.788395    
## nleq_c       0.04044    0.01929   2.096 0.038258 *  
## LaLa        -0.34887    0.54304  -0.642 0.521881    
## sc_tot_c    -0.14609    0.03876  -3.769 0.000261 ***
## bdi_c        0.18889    0.06772   2.789 0.006194 ** 
## SS:nleq_c    0.04086    0.02808   1.455 0.148373    
## nleq_c:LaLa -0.04584    0.02614  -1.754 0.082135 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.552 on 114 degrees of freedom
##   (237 observations deleted due to missingness)
## Multiple R-squared:  0.3978, Adjusted R-squared:  0.3608 
## F-statistic: 10.76 on 7 and 114 DF,  p-value: 2.369e-10

Plot Results of Gene*Environment Interactions from Regresison Models

There appears to be an interaction between genes and early life stress (NLEQ) to predict difficulty discarding, whereby the non-risk group (LaLa) is less susceptible to the effects of stress in terms of later symptom development.