This script analyzes data for sol-gating experiment. The goal of this analysisis to find the precise gate when participants can reliably identify the sign for each sign used in the SOL stimulus set.

Load libraries.

Read in data.

df <- read.csv("sol_gating_processed-df.csv")

Descriptives

df %>%
    group_by(id) %>%
    mutate(num_trials = max(trial)) %>%
    select(id, gender, age, asl_fluency, age_learned_asl, num_trials) %>%
    distinct()
## Source: local data frame [9 x 6]
## Groups: id
## 
##   id gender age asl_fluency age_learned_asl num_trials
## 1 20 Female  47           2              13        228
## 2 21 Female  37           2       18 months        228
## 3 22 Female  20           2              18        228
## 4 40 Female  25           3           birth        228
## 5 41 Female  37           2              19        228
## 6 46 Female  27           3               5        228
## 7 49   Male  50           2              13        228
## 8 12   Male  29           2               8        228
## 9 NA     NA  NA          NA              NA         NA

Histogram of main outcome variable –> Correct on 2-AFC measure

qplot(x=correct, data=df)

Histogram rt just to make sure nothing weird is going on

qplot(x=rt, data=df)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Flag and filter bad RTs

df <- df %>%
    filter(id != "NA") %>% 
    mutate(include_good_rt = ifelse(log(rt) > mean(log(rt)) + 2 * sd(log(rt)) |
                                        log(rt) < mean(log(rt)) - 2 * sd(log(rt)),
                                    "exclude", "include"))

df %>% group_by(include_good_rt) %>% summarise(n())
## Source: local data frame [2 x 2]
## 
##   include_good_rt  n()
## 1         exclude   44
## 2         include 1780
df <- filter(df, include_good_rt == "include")

Main analysis

Plot accuracy for each gate within each sign.

ms <- df %>%
    na.omit() %>% 
    group_by(gate_name, gate_num, gate_signer, gate) %>%
    summarise(mean_correct = mean(correct),
              ci_h = ci.high(correct),
              ci_l = ci.low(correct))

Now plot

qplot(x=gate_num, y=mean_correct, color = gate_signer, data=ms) +
    facet_wrap( ~ gate_name,ncol=10) +
    geom_line() +
    geom_pointrange(aes(ymin=mean_correct - ci_l, 
                        ymax=mean_correct + ci_h), 
                    width = .05, size=0.6) +
    theme_bw()

Compute empirical F0

Decisions about gate were made by VM/KM. Decision criteria was as follows: - Earliest gate that achieved accuracy above chance

Note: there were some signs that did not have a clear gate based on the data. In these cases we stick with the experimenter defined F0.

#grab gate names
df_empricial_f0 <- ms %>% 
    ungroup() %>% 
    select(gate_name) %>% 
    unique()

# create vector with gate decisions 
gate_decision <- c(4, 5, 3, 4, 1, 4, 5, 5, 4, 1, 4, 5, 2, 6, 6, 
                    4, 4, 5, 3, 6, 5, 2, 3, 6, 5, 3, 5, 4, 6, 6, 6, 
                    5, 4, 3, 3, 3, 6, 4)
# check length: should be 38
length(gate_decision)
## [1] 38
# bind to gate names 
df_empricial_f0 %<>% cbind(gate_decision)

Take the gate decisions and add this information to the larger summarized data frame: ms. Then only keep the gates where there is a match between gate_decision and gate_num variables.

ms_gate_decisions <- ms %>% 
    left_join(y = df_empricial_f0, by = "gate_name") %>%
    ungroup() %>% 
    select(gate_name, gate_num, gate, gate_decision) %>% 
    filter(gate_decision == gate_num)

Extract the frame information from the gate variable. And convert to ms.

regexp <- "[[:digit:]]"

str_locate(ms_gate_decisions$gate, regexp)[1]
## [1] 13
ms_gate_decisions %<>% 
    select(gate, gate_decision) %>% 
    mutate(
        f0_sec = as.numeric(str_extract(gate, regexp)),
        f0_frame = as.numeric(str_sub(gate, 
                           start = str_length(gate) - 3, 
                           end = str_length(gate) - 2)),
        f0_tot_frames = (f0_sec * 30) + f0_frame,
        f0_ms_1 = f0_tot_frames * 33,
        f0_ms_2 = (f0_sec * 1000) + (f0_frame * 33)
    )

Save data frame, so we can add the experimenter chosen F0

# write.csv(x = ms_gate_decisions, "sol-empirical-gate-decisions.csv", row.names = F)

Read in data frame with experimenter F0 added and compute the difference for each sign and the average difference overall.

df_final <- read.csv("sol-empirical-gate-decisions.csv")

Summarise how far off we were. First we compare the difference between empirical and experiment, if we use the following computation to get F0

\(frames * 33\)

min_diff_ms max_diff_ms min_diff_frames max_diff_frames avg_diff_ms avg_diff_frames
7 294 0 9 127.29 3.89

Now we compare the difference if we use the following computation to get F0:

\((seconds * 1000) + (frames * 33)\)

min_diff_ms max_diff_ms min_diff_frames max_diff_frames avg_diff_ms avg_diff_frames
0 274 0 8 107.18 3.18
ggplot2::qplot(f0_diff_empirical_experiment_2_frames, data = df_final,
               binwidth = 0.5) + 
    ylim(0, 8) +
    scale_x_continuous(limits = c(0, 8), breaks=0:8) +
    theme_bw()