Segmentation Analysis

I used segmentation analysis to try and quantify the changes in slope at different points. If there is a particular point in time where the trajectory of the network measurement changes, and then a quantifiable slope change, then we have a quantitative description of the plots that Cathy already has.

I followed much of the same data wrangling process as before. Except I kept everyone who had at least one CDR diagnosis and one scan (rather than only keeping individuals who had two). I wound up with 419 individuals with scans. Of these 419, 70 of them converted from cdr = 0 to cdr > 0 at some point.

For the converters, I calculated the number of years before or after their first cdr > 0 diagnosis.

Then I followed Cathy’s process of assigning an abritrary clinical assessment date for the non converters.

I dropped the first visit date for each individual, then randomly selected one of their subsequent visits.

library(plyr)
nonconverters.clin.select<-nonconverters.clin[duplicated(nonconverters.clin$ID),] #dropping first visit date for each individual
nonconverters.clin.select<-ddply(nonconverters.clin.select,.(ID),function(x) x[with(x, sample(seq_along(date1),1)),])

Then I had 419 visits with associated cdr scores. 70 of these people converted at some point in their relationship with the ADRC, 349 did not. For each of these individuals, I found the nearest scan date to the cdr-diagnosis date that I retained previously.

 library(data.table)
 setDT(nonconverters.clin.select)
 setDT(df.sub)
 
 setkey(df.sub, ID, date2)[, date1 :=date2]
 nonconverters.result<-df.sub[nonconverters.clin.select, roll = 'nearest']

 df.datecalc.non<-nonconverters.result
 df.datecalc.non$DaysBetween<-difftime(df.datecalc.non$date2, df.datecalc.non$date1, units = "weeks")
 df.datecalc.non$DaysBetween<-df.datecalc.non$DaysBetween/52
 df.datecalc.non$WillConvert<-0   
 
 df.plots<-rbind(df.datecalc, df.datecalc.non)
 df.plots$WillConvert<-as.factor(df.plots$WillConvert) #1 means yes, 0 means no

df.plots<-df.plots[,-(2)] 
df.plots<-df.plots[,-(93:94)]

I generated a pdf of plots a lot like what Cathy provided, except I included the points on my graphs, and I used linear regression rather than nonlinear. When I look at the plots, I still see some differences, but including the points for visualization makes me think the trends are not nearly as strong as they appeared in the dot-less plots.

 plot_data_column <- function (DaysBetween, Network, WillConverts){
   ggplot(df.plots, aes_string(x = DaysBetween, y = Network, shape = WillConverts, color = WillConverts)) + geom_point()+
   geom_smooth(method=lm, aes(fill = WillConvert)) + labs(x = "EYO") + theme_classic()+ geom_vline(xintercept=0)+theme(legend.position = "none")}

 plist<-list()
for(i in 1:(length(df.plots)-2)){
  plist[[i]]<-plot_data_column(df.plots$DaysBetween, names(df.plots)[i], df.plots$WillConvert)
}
#write all of my scatter plots for each region to a pdf
#commenting out for RMarkdown
#pdf("C:/Users/wischj/Documents/PracticeData/Cathy/Images.pdf")
#invisible(lapply(plist, print))
#dev.off()

Then I thought I would compare my plots to Cathy’s to make sure they at least looked similar….a lot of them look totally different. Granted, we did our random selection independently….but…if that’s all it took to make the plots turn out totally different, there may be nothing well defined in the data?

plot_data_column_nonlinear <- function (DaysBetween, Network, WillConverts){
  ggplot(df.plots, aes_string(x = DaysBetween, y = Network, shape = WillConverts, color = WillConverts)) + geom_point()+
    stat_smooth(method=lm, formula = y ~ x + I(x^2), aes(fill = WillConvert)) + labs(x = "EYO") + theme_classic()+ geom_vline(xintercept=0)+theme(legend.position = "none")}

plist<-list()
for(i in 1:(length(df.plots)-2)){
  plist[[i]]<-plot_data_column_nonlinear(df.plots$DaysBetween, names(df.plots)[i], df.plots$WillConvert)
}

#pdf("C:/Users/wischj/Documents/PracticeData/Cathy/Images_nonlinear.pdf")
#invisible(lapply(plist, print))
#dev.off()

In order to look at the data, particularly to consider if there is a point in time where the trajectory of functional connectivity changes, I used segmented regression (Muggeo, 2008). After experimenting some, I decided to just use a single breakpoint, since that was kind of the idea presented with the original pictures. I used the Davies test (Davies, 1987) to determine if there was a significant turning point.

6 regions showed significant breakpoints at p < 0.01. 1 was for the nonconverters (VIS x VIS), and the other 5 were for the converters (SMlat x SMlat, CO x CO, CO x SubCort, CO x DAN, DAN x CEREB). I plotted each of these regions, both comparing the nonconverter and converter linear regressions, and then plotting the results of the segmentation analysis. The breakpoints are all way after EYO, so they’re not actually indicative of anything interesting happening.

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  6.575  0.131 
## 
## Meaningful coefficients of the linear terms:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)    -0.039447   0.053267  -0.741    0.459
## DaysBetween    -0.003468   0.022570  -0.154    0.878
## U1.DaysBetween -8.859186   9.028030  -0.981       NA
## (Dispersion parameter for gaussian family taken to be 0.9304132)
## 
## Null     deviance: 323.62  on 348  degrees of freedom
## Residual deviance: 320.99  on 345  degrees of freedom
## AIC: 971.22
## 
## Convergence attained in 2 iterations with relative change 0

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  4.598  5.193 
## 
## Meaningful coefficients of the linear terms:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    -0.43799    0.14106  -3.105  0.00280 **
## DaysBetween    -0.14033    0.04899  -2.865  0.00559 **
## U1.DaysBetween  0.26311    0.39611   0.664       NA   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for gaussian family taken to be 1.040555)
## 
## Null     deviance: 77.931  on 69  degrees of freedom
## Residual deviance: 68.677  on 66  degrees of freedom
## AIC: 207.32
## 
## Convergence attained in 3 iterations with relative change 0

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
## -2.629  4.330 
## 
## Meaningful coefficients of the linear terms:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)     0.20674    0.72151   0.287    0.775
## DaysBetween     0.05751    0.15361   0.374    0.709
## U1.DaysBetween -0.10302    0.16623  -0.620       NA
## (Dispersion parameter for gaussian family taken to be 1.262888)
## 
## Null     deviance: 84.190  on 69  degrees of freedom
## Residual deviance: 83.351  on 66  degrees of freedom
## AIC: 220.87
## 
## Convergence attained in 3 iterations with relative change 0

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  7.839  1.415 
## 
## Meaningful coefficients of the linear terms:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -0.30112    0.12080  -2.493   0.0152 *
## DaysBetween    -0.05907    0.04109  -1.438   0.1553  
## U1.DaysBetween  0.73127    0.92794   0.788       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for gaussian family taken to be 0.8140086)
## 
## Null     deviance: 55.835  on 69  degrees of freedom
## Residual deviance: 53.725  on 66  degrees of freedom
## AIC: 190.13
## 
## Convergence attained in 2 iterations with relative change 1.322542e-16

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
## -4.045  3.211 
## 
## Meaningful coefficients of the linear terms:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -1.2314     2.2655  -0.544    0.589
## DaysBetween     -0.1877     0.3625  -0.518    0.606
## U1.DaysBetween   0.2702     0.3653   0.740       NA
## (Dispersion parameter for gaussian family taken to be 0.8705906)
## 
## Null     deviance: 60.607  on 69  degrees of freedom
## Residual deviance: 57.459  on 66  degrees of freedom
## AIC: 194.83
## 
## Convergence attained in 2 iterations with relative change 0

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.glm(obj = fit.glm, seg.Z = ~DaysBetween, psi = 0)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
## -0.014  5.252 
## 
## Meaningful coefficients of the linear terms:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -0.46620    0.24819  -1.878   0.0647 .
## DaysBetween    -0.03213    0.07585  -0.424   0.6732  
## U1.DaysBetween -0.06789    0.11035  -0.615       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for gaussian family taken to be 0.927121)
## 
## Null     deviance: 64.22  on 69  degrees of freedom
## Residual deviance: 61.19  on 66  degrees of freedom
## AIC: 199.24
## 
## Convergence attained in 2 iterations with relative change 0

Based on all of these results, although it is way less exciting, it seems like standard linear regression might be the most appropriate form of analysis.

COxCO is kind of interesting - the breakpoint is at -1.673 years.

Segmentation Analysis

Julie Wisch

September 12, 2018