Introduction

This is my first attempt to put everything together and get some desired results. Significant effort has been devoted to salvaging trials with missing data.

The plan:

  1. Find each trial based on the movement of the target. We will look at 200 samples before the target moves and 600 samples after. Each sample is approximately 3 ms.

  2. For each trial, measure the eye and head movements. Identify gaze shifts and associated head movements.

  3. Summarize the data into plots for statistics

Analysis

library(dplyr)
library(ggplot2)
library(plotly)
library(knitr)
source('knighthelperfunctions.R')
samplerate<- 304.7508/1000
path<- "F:/dropbox/kdata/"
# path<-"C:/Users/setup/Desktop/NRTP Vergence/kdata/"
h<- loadnewheadfree(NULL,path)

# h<- read.csv('HEADFREE.txt',sep="\t",header=FALSE)
h<- select(h,-V9)
names(h)<- c('G','GV','H','HV','E','EV','time','Targ','block','subject','blocknum')

Once loaded, the raw data we will work with contains the horizontal and vertical gaze (G and GV), head and eye positions, the time in seconds, the horizontal target position (Targ), the blockname, which consists of subject-blocknum.

For the purposes of our initial analysis, we will just look at the horizontal positions of the head and gaze.

head(h)
##          G        GV       H       HV      E      EV     time Targ block
## 1: -38.388  -3.65370 -42.268   0.9960 3.8805 -4.6497 0.000010    0  A-01
## 2: -38.401  -2.59900 -42.262   2.8295 3.8608 -5.4285 0.003285    0  A-01
## 3: -38.426  -5.65400 -42.253   2.4827 3.8272 -8.1367 0.006566    0  A-01
## 4: -38.405  -0.40436 -42.245   2.8302 3.8396 -3.2345 0.009848    0  A-01
## 5: -37.361 647.11000 -41.175 649.2000 3.8135 -2.0913 0.013129    0  A-01
## 6: -35.256 641.43000 -39.045 649.2000 3.7887 -7.7772 0.016410    0  A-01
##    subject blocknum
## 1:       A        1
## 2:       A        1
## 3:       A        1
## 4:       A        1
## 5:       A        1
## 6:       A        1
h<- select(h,G,H,Targ,block)

head(h)
##          G       H Targ block
## 1: -38.388 -42.268    0  A-01
## 2: -38.401 -42.262    0  A-01
## 3: -38.426 -42.253    0  A-01
## 4: -38.405 -42.245    0  A-01
## 5: -37.361 -41.175    0  A-01
## 6: -35.256 -39.045    0  A-01

Next we will identify and mark the trials based on the target movement. We will give each trial a number (trialnum), and add a variable called counter that marks time since the trial began.

raw.h<- h
h%>%
  group_by(block) %>%
  mutate(time=row_number(),
         Graw=G,
         G=replace(smooth(G,"3R"),G==0,NA), #mark missing data as NA rather than 0
         Gnospline=G,
         G= applyspline(G,6),
         target.velocity=parabolicdiff(Targ,7)*samplerate,
         Gv=parabolicdiff(G,7)*samplerate, #calculate velocity
         Hv=parabolicdiff(H,7)*samplerate,
         gazeshifts=markMovementsDouble(Gv,threshold1=100,threshold2=10),
         headmovement=markMovementsDouble(Hv,threshold1=10,threshold2=4)) %>%
  do(markTagetMovements(t=.,buffer=200,threshold=200,trial.length=500)) ->
  h

head(h)
## # A tibble: 6 x 13
## # Groups:   block [1]
##       G     H  Targ block  time  Graw Gnospline    Gv    Hv gazeshifts
##   <dbl> <dbl> <dbl> <chr> <int> <dbl>     <dbl> <dbl> <dbl>      <int>
## 1 -38.4 -42.3     0 A-01      1 -38.4     -38.4   468   471          1
## 2 -38.4 -42.3     0 A-01      2 -38.4     -38.4   468   471          1
## 3 -38.4 -42.3     0 A-01      3 -38.4     -38.4   468   471          1
## 4 -38.4 -42.2     0 A-01      4 -38.4     -38.4   468   471          1
## 5 -37.4 -41.2     0 A-01      5 -37.4     -37.4   468   471          1
## 6 -35.3 -39.0     0 A-01      6 -35.3     -35.3   468   471          1
## # ... with 3 more variables: headmovement <int>, trialnum <dbl>,
## #   counter <int>

Notice that time now is in samples rather than seconds and the trialnum and counter are NA. This is because there is no trial at the beginning of the data set. Let’s get rid of all of those periods:

h<- filter(h,!is.na(trialnum))

head(h)
## # A tibble: 6 x 13
## # Groups:   block [1]
##       G     H  Targ block  time  Graw Gnospline     Gv      Hv gazeshifts
##   <dbl> <dbl> <dbl> <chr> <int> <dbl>     <dbl>  <dbl>   <dbl>      <int>
## 1 0.505 -1.71     0 A-01  13096 0.505     0.505 0.0120 -0.0256         NA
## 2 0.505 -1.71     0 A-01  13097 0.502     0.505 0.0440 -0.112          NA
## 3 0.508 -1.71     0 A-01  13098 0.509     0.508 0.0745 -0.198          NA
## 4 0.508 -1.71     0 A-01  13099 0.508     0.508 0.122  -0.261          NA
## 5 0.503 -1.71     0 A-01  13100 0.498     0.503 0.144  -0.309          NA
## 6 0.503 -1.71     0 A-01  13101 0.503     0.503 0.112  -0.336          NA
## # ... with 3 more variables: headmovement <int>, trialnum <dbl>,
## #   counter <int>

Now we can see that trial 1 begins at time 13096. Let’s look at the raw data for this trial:

h %>%
  filter(block=='A-01',trialnum==1) %>%
  ggplot()+
  geom_point(aes(counter,Graw),color='darkgreen')+
  geom_line(aes(counter,Targ),color='gray50',size=2)+
  geom_line(aes(counter,H),color='darkblue',size=2)

Clearly there is missing data here, which is recorded in the data file as 0. We will discuss what to do about missing data in a later section.

Measuring the trial

We will calculate:

Saccades with peak velocities below 200 deg/s are considered corrective gaze shifts and are not included in the analysis.

Saccades with intersaccadic intervals > 150 ms are not included

If there are two head movements in response to a single target movement, we ignore the second – FOR NOW.

Head movement is only considered part of the gaze shift if it starts within 100 ms of the initial gaze shift

Head movement that starts after the end of the primary gaze shift is ignored

Trials with non-zero gaze velocity within 100 ms of target movement are rejected

Trials where the primary gaze shift is “BAD” are rejected

As an example, let’s see how closely the primary gaze shift amplitude matches the target amplitude.

ggplotly(
  hm %>%
  ggplot()+
  geom_point(aes(target.amp,gaze.amp))+
  facet_wrap(~block)+
  xlab('Target Amplitude')+
  ylab('Primary Gaze Shift Amplitude')+
  geom_abline()+
    theme_minimal()
)

It seems to match very well for most trials, but there are clearly some outliers. I suspect that these are tiny gaze shifts that are not made in response to the target. Let’s look at the latency to see if this can be used to reject the outliers.

ggplotly(
  hm %>%
    ggplot()+
    geom_histogram(aes((gaze.onset-200)/samplerate))+
  facet_wrap(~block)+
    theme_minimal()+
    xlab('Gaze Shift Latency (ms)')

)

It seems that we have outliers that are both made too soon and too late. Let’s just plot trials where the gaze shift starts at least 150 ms after the target moves but not more than 500 ms after.

ggplotly(
  hm %>%
    mutate(gs.latency=(gaze.onset-200)/samplerate) %>%
    filter(gs.latency>150,gs.latency<500) %>%
  ggplot()+
  geom_point(aes(target.amp,gaze.amp))+
  facet_wrap(~block)+
  xlab('Target Amplitude')+
  ylab('Primary Gaze Shift Amplitude')+
  geom_abline()+
    theme_minimal()
)

Since we are only looking at the primary gaze shift, it’s likely that some of these outlier trials are actually multi-step. Let’s instead plot the total gaze amplitude.

ggplotly(
  hm %>%
    mutate(gs.latency=(gaze.onset-200)/samplerate) %>%
    filter(gs.latency>150) %>%
  ggplot()+
  geom_point(aes(target.amp,total.gaze.amp))+
  facet_wrap(~block)+
  xlab('Target Amplitude')+
  ylab('Total Gaze Shift Amplitude')+
  geom_abline()+
    theme_minimal()
)

We can go through other measurements like this and see if there are other refinements that we can make, or if there are some things being measured incorrectly. Missing data is still an issue though.

Missing data

Our choice:

  1. Drop all trials with missing data
  2. Ignore the missing data
  3. Interpolate the missing data

Let’s assess the severity of the missing data problem. I will define three levels of severity:

  1. Any data are missing for the duration of the trial
  2. Data are missing during the first 200 samples after the target has moved
  3. More than 10 data points are missing
hm%>%
  group_by(block) %>%
  summarize(n=n(),
            any.missing=sum(total.missing>0),
            missing.critical=sum(missing.critical>0), #within 100 samples of target move
            missing.gs=sum(missing.gs>0))->
  hss


head(hss)
## # A tibble: 2 x 5
##   block     n any.missing missing.critical missing.gs
##   <chr> <int>       <int>            <int>      <int>
## 1 A-01    118          70               19         18
## 2 A-02    125          93               22         42

From this, we can see that most of the trials are missing a few data points. More importantly though, less than 20% of the trials are missing data within 100 samples of the target’s movement. Nevertheless, we might be missing important data at other parts of the trial. In particular, this might be an issue during large gaze shifts.

hm %>%
  ggplot()+
  geom_point(aes(target.amp,total.missing))+
  facet_wrap(~block)

It seems that the most missing data comes from leftward trials from the second block. Let’s see if it’s related to head movement.

ggplotly(hm %>%
  ggplot()+
  geom_point(aes(head.amp,total.missing))+
  facet_wrap(~block) )

It appears that it is related to head movement. In order to avoid throwing out these trials, I implemented an interpolation. I will demonstrate how this works.

Interpolating Missing data

First lets look again at the first trial from the first block. The missing data points are represented by zeros in the data file. But there are also points surrounding the missing data that seem to be artifacts. I have drawn a pink box around one such point in the plot below.

h %>%
  filter(block=='A-01',trialnum==1) %>%
  ggplot()+
  geom_point(aes(counter,Graw),color='darkgreen')+
  geom_line(aes(counter,Targ),color='gray50',size=2)+
  geom_line(aes(counter,H),color='darkblue',size=2)+
  annotate('rect',xmin = 220,xmax=260,ymin=36,ymax=41,color='hotpink',alpha=0,size=2)

I want to interpolate the missing points, but these non-missing but wrong points will interfer with the interpolation. To address this, I replace all the zeros with NAs, and also remove a number of points on each side of the missing data. This is a balancing act because we don’t want to remove good data, but we want to remove enough of the bad data I used six points to make the plot below:

ggplotly(
  h %>%
  filter(block=='A-01',trialnum==1) %>%
  ggplot()+
  geom_point(aes(counter,Gnospline),color='darkgreen')+
  geom_line(aes(counter,Targ),color='gray50',size=2)+
  geom_line(aes(counter,H),color='darkblue',size=2)
)

now to interpolate, I use a cubic spline using the splinefun function in R. I just tried different methods until one seemed to work pretty well. The goal is to try to salvage some of the trials with missing data in the middle of a gaze shift. The example is one of the worst trials, so if it works on this it will be very useful.

ggplotly(
  h %>%
  filter(block=='A-01',trialnum==1) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)

It’s not perfect, but it does work very well considering how many data points are missing. Below, I will plot some of the trials with the most missind data points during the gaze shifts:

missingtable<- hm %>% select(block,trialnum,missing.gs) %>% arrange(desc(missing.gs))
kable(head(missingtable,20))
block trialnum missing.gs
A-02 100 185
A-02 71 146
A-02 24 116
A-02 19 101
A-02 74 98
A-02 78 93
A-02 59 72
A-01 21 71
A-02 85 62
A-02 101 62
A-02 26 48
A-02 20 46
A-02 121 41
A-01 99 39
A-02 123 37
A-02 108 33
A-02 125 32
A-01 3 31
A-02 10 30
A-02 45 30

Let’s look at some trials where there are around 30 missing data points:

That doesn’t seem to be a reliable threshold, so let’s see how many trials we will reject if we go with a lower threshold:

missingtable %>%
  filter(missing.gs>0) %>%
  ggplot()+
  geom_histogram(aes(missing.gs))+
  xlab('number of missing points during the gaze shift\n
       Trials with no missing data are omitted')

missingtable %>%
  group_by(block) %>%
  summarize(rejected30=sum(missing.gs>30),
            rejected20=sum(missing.gs>20)) %>%
  head() %>%
  kable()
block rejected30 rejected20
A-01 3 8
A-02 15 24
missingtable %>%
  filter(missing.gs<30) %>%
  head(10)%>%
  kable()
block trialnum missing.gs
A-01 88 29
A-01 62 28
A-02 18 28
A-02 119 26
A-02 89 25
A-02 92 25
A-01 38 23
A-02 64 23
A-01 89 22
A-01 49 21

Here are some trials that would be included using the <30 missing points threshold:

ggplotly(
  h %>%
  filter(block=='A-02',trialnum==18) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)
ggplotly(
  h %>%
  filter(block=='A-02',trialnum==119) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)
ggplotly(
  h %>%
  filter(block=='A-02',trialnum==89) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)

That’s still not consistently good, so let’s look at a threshold of 20:

missingtable %>%
  filter(missing.gs<20) %>%
  head(10)%>%
  kable()
block trialnum missing.gs
A-01 1 19
A-01 112 19
A-01 81 15
A-02 110 14
A-01 7 11
A-01 8 11
A-02 13 11
A-01 102 10
A-02 39 9
A-02 47 9
ggplotly(
  h %>%
  filter(block=='A-01',trialnum==1) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)
ggplotly(
  h %>%
  filter(block=='A-01',trialnum==112) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)
ggplotly(
  h %>%
  filter(block=='A-02',trialnum==110) %>%
  ggplot()+
    geom_point(aes(counter,G),color='orange')+
    geom_point(aes(counter,Gnospline),color='darkgreen')+
    geom_line(aes(counter,Targ),color='gray50',size=2)+
    geom_line(aes(counter,H),color='darkblue',size=2)+
    ggtitle('Missing data interpolated')
)

A threshold of 20 missing points seems to be more successful. This results in rejecting 8 trials from block 1 and 24 trials from block 2. If the number of trials is small enough, perhaps we could simply flag the trials with interpolated points and have a human make the decision about whether to reject them.