Psychometric function (PF)

The psychometric function (PF) is about relation between human performance (e.g., classification) on a psychophysical task and sensory inputs (e.g., stimulus intensity). Recall that a general psychometric function:

\[\Psi(x,\alpha, \beta, \gamma, \lambda) = \gamma + (1-\gamma -\lambda) F(x, \alpha, \beta)\] where \(\alpha\) is the threshold, \(\beta\) sensitivity, \(\gamma\) chance level, and \(\lambda\) lapse rate. Several candidate functions can be used for psychometric functions \(F(x,\alpha, \beta)\):

The Logit function is relative simple:

\[ \log \frac{P}{1-P} = \beta(x-\alpha), \] where \(P\) is the prop. of the positive responses, and \(1-P\) is the prop. of the negative response. \(\log \frac{P}{1-P}\) is log-likelihood of two alternative responses (sometimes it is called decision variable), which is assumed to be a linear function of the stimulus intensity \(X\). The above equation can be transformed to

\[ P = \frac{1}{1+ e^{-\beta(x-\alpha)}}\]

From the equation, we can easily obtained the point of subjective equality (PSE, \(\alpha\)) and just noticeable difference (JND). Notice, the JND is defined by the difference between thresholds \(P=0.5\) and \(P=0.75\), so

\[JND = \log3/\beta\] That is, when the slope \(\beta\) becomes steeper, the JND becomes smaller.

Case Study: Influence of music tempo on duration judgments.

In the study, same classic music pieces were manipulated and played in three different tempo: slow, medium, and faster. The length of the music piece was randomly select from 2 to 8 seconds. Participants had to judge if the music piece was a ‘short’ or a ‘long’ one. Participants had to estimate the ‘short’ or ‘long’ based on the past trials they received. The research question is if the music tempo alters the time judgment.

A sample experimental data that contains three participants’ responses can be found here.

Please download the data file first to your local folder. Here I put in a subfolder called ‘data’ in my local computer.

1. Import data and tidy the data

Here we do the ‘tidyverse’ way. First, we need to load necessary packages, such as tidyverse, broom, and we also need quickpsy for a fast estimation of psychometric function.

Note, the raw csv file uses ; as a separation, so we need to explicitly specify the separation when we use read.csv() function. The responses for two alternative choices were coded as ‘1’ (short) and ‘2’ (long). You can count how many ‘2’ among one condition to calculate the proportion. A better faster way is to recode the responses to 0 and 1, and use sum() and mean() functions to count the number of ‘long’ responses and the mean. In addition, for better understanding, we also need to recode / factorize the factor Tempo.

# 1. Import necessary libraries 
library(tidyverse)
library(broom)
# If you haven't install quickpsy, uncomment the following line
#install.packages(quickpsy)
library(quickpsy)
# 2. import raw data, and make it readible. 
# read raw data, this csv file uses ';' as separation
# --- please change the location of the data file
raw_bisection <- read.csv('data/music_bisection.csv',sep=';')
# change response from 1,2 to 0,1 (short vs. long)
raw_bisection$resp <- raw_bisection$Decision -1
# convert numbers (tempo) to factor (slow, medium, fast)
raw_bisection$Tempo <- factor(raw_bisection$Tempo,labels = c('Slow','Medium','Fast'))
head(raw_bisection)
  Participant Duration Tempo Stimuli_Nu Decision Music_Duration resp
1      Sub101        2  Slow         10        1       2.001604    0
2      Sub101        6  Fast         10        2       6.000169    1
3      Sub101        5  Slow          4        2       5.000097    1
4      Sub101        7  Slow          8        2       7.000105    1
5      Sub101        2  Fast          1        1       2.000080    0
6      Sub101        7  Slow          6        2       7.000084    1

2. Average data and visualize average responses.

After loading the raw dta, we now to calculate the average responses for each condition, individual participants. We also count the number of ‘long’ and ‘short’ responses, those are needed for later GLM logistic regression.

# average resp of 'long'
msub <- raw_bisection %>% 
  group_by(Participant, Tempo, Duration) %>%  
  summarise(prop = mean(resp), long = sum(resp), 
            n = n(), short = n-long) 
head(msub)

We now visualize the empirical psychometric curves. Recall we can add facet_wrap() in ggplot to plot separate participants, and use different color for different Tempos.

msub %>% ggplot(aes(x = Duration, 
                    y = prop, 
                    color = Tempo)) + 
  geom_point() + geom_line() +
  facet_wrap(~Participant) + 
  ylab("Prop. of 'long' responses")

By visual inspection, we can roughly know that the fast tempo (blue lines) shifted the psychometric function leftwards, and the slow tempo shifted the curve rightward. In the fast tempo condition, the bisection point (the middle point between 2 and 8 seconds) shifted lower. That is, the fast tempo music piece, compared to the slow tempo, requires less amount of duration to reach the subjective middle point. In other words, the fast tempo may dilate the subjective time.

3. Psychometric function estimated from GLM approach

To formally estimate the psychometric function and its parameters, here we use the general linear model (GLM) approach. Recall the formula format in R, and also check the help information of the function glm(). The logistic regression requires you to specify the family parameters with binomial(logit) in the glm() function.

# glm for logistic regression
glm_results = msub %>% 
    group_by(Participant, Tempo) %>%
    do(tidy (glm(cbind(long,short) ~ Duration, 
                 family = binomial(logit), data=.)))
head(glm_results)

In the above tidy-type analysis, we use do(tidy()) function, which extracts the key parameters from the GLM model, and concatenate them into a table. Note that there are two parameters Intercept and Duration in the table. They are two parameters \(a\) and \(b\) from the logistic function

\[\log\frac{P}{1-P}= a + b \cdot Duration\]

With the above format, the PSE is \(-a/b\), and the b remains the same \(\log(3)/b\).

Now we calculate the PSEs and JNDs

psy_results = glm_results %>%
    select(., one_of(c('Participant','Tempo', 'term','estimate'))) %>%
    spread(.,term, estimate) %>% rename(., b=Duration, a = `(Intercept)`) %>%
    mutate(., pse = -a/b, jnd = log(3)/b)
head(psy_results)

Visualize the estimated psychometric functions together with empirical data. Note, ggplot also provides the direct estimation and draw the logistic curves using geom_smooth(). In this layer, you need to specify the method with glm, and the arguments for the method with methods.args = list(family = binomial(logit)).

# Psychometric curves. 
msub %>%ggplot( aes(x=Duration, y=prop, color=Tempo)) +  
  geom_point(aes(shape = Tempo), size=3) + 
  geom_smooth(method=glm, method.args= list(family = binomial(logit)), 
              se = FALSE, aes(linetype=Tempo)) +
  xlab('Durations (sec)') + ylab('Prop. of Long Response') +
  facet_wrap(~Participant)

In the end, we visualize the mean PSEs and JNDs.

psy_results %>% group_by(Tempo) %>%
  summarise(mPSE = mean(pse)) %>% 
  ggplot(aes(Tempo, mPSE)) + geom_bar(stat = 'identity', width = 0.5) +
  theme_classic()

NA

Your task: please add errorbars and also plot the figure for mean JNDs.

One more Note: the GLM approach does not assume any lapse rate, and guess rate here. This is because logistic regression assume the range from 0 to 1.

4. Quickpsy approach

Quickpsy is an R package developed by Daniel Linares and Joan Loópez-Moliner to quickly fit and plot psychometric fucntions for multiple conditions. You can find the introduction and quick tutorial from their official website.

First, please open quickpsy help file and check it’s grammar. It has several important parameters: x for stimulus intensity, k for response variable. The response variable could be the number of trials in which a yes-type response was given or a vector of 0s and 1s indicating the response on each trial. In addition, the group variable grouping, where you can specify multiple group variables with the operator .().

In addition, you can estimate the lapse rate, by state lapses = TRUE.

Let’s first estimate the psychometric function using the raw data.

# Estimate psychometric function and PSEs
quickpsy(raw_bisection, x = Duration, k = resp, 
         grouping = .(Tempo,Participant), lapses = TRUE, 
         bootstrap = 'none') -> qp
# show the thresholds
head(qp$thresholds)

There is one special function called plotcurves() for visualized the estimated psychometric functions.

plotcurves(qp)

Please compare the estimations from GLM and quickpsy, and discuss with your classmates the potential impacts of adding the lapse rate in the estimation.

Your task: please extract the thresholds (PSEs), JNDs, and lapse rates from the quickpsy estimations, and visualise them.

---
title: 'Course C Tutorial 3: Psychometric function'
author: "Zhuanghua Shi (http://msense.de)"
output:
  html_notebook: default
  html_document:
    df_print: paged
---

# Psychometric function (PF)

The psychometric function (PF) is about relation between human performance (e.g., classification) on a psychophysical task and sensory inputs (e.g., stimulus intensity). Recall that a general psychometric function:

$$\Psi(x,\alpha, \beta, \gamma, \lambda) = \gamma + (1-\gamma -\lambda) F(x, \alpha, \beta)$$
where $\alpha$ is the threshold, $\beta$ sensitivity, $\gamma$ chance level, and $\lambda$ lapse rate. 
Several candidate functions can be used for psychometric functions $F(x,\alpha, \beta)$:

* logit function
* cumulative Gaussian function
* Weibull

The Logit function is relative simple:

$$ \log \frac{P}{1-P} = \beta(x-\alpha), $$
where $P$ is the prop. of the positive responses, and $1-P$ is the prop. of the negative response. $\log \frac{P}{1-P}$ is log-likelihood of two alternative responses (sometimes it is called decision variable), which is assumed to be a linear function of the stimulus intensity $X$. The above equation can be transformed to 

$$ P = \frac{1}{1+ e^{-\beta(x-\alpha)}}$$

From the equation, we can easily obtained the point of subjective equality (PSE, $\alpha$) and just noticeable difference (JND). Notice, the JND is defined by the difference between thresholds $P=0.5$ and $P=0.75$, so 

$$JND = \log3/\beta$$
That is, when the slope $\beta$ becomes steeper, the JND becomes smaller. 

# Case Study: Influence of music tempo on duration judgments. 

In the study, same classic music pieces were manipulated and played in three different tempo: slow, medium, and faster. The length of the music piece was randomly select from 2 to 8 seconds. Participants had to judge if the music piece was a 'short' or a 'long' one. Participants had to estimate the 'short' or 'long' based on the past trials they received. The research question is if the music tempo alters the time judgment.  

A sample experimental data that contains three participants' responses can be found [here](https://www.dropbox.com/s/7regjuv7l7y4l5d/music_bisection.csv?dl=0).

Please download the data file first to your local folder. Here I put in a subfolder called 'data' in my local computer. 

## 1. Import data and tidy the data

Here we do the 'tidyverse' way. First, we need to load necessary packages, such as `tidyverse`, `broom`, and we also need `quickpsy` for a fast estimation of psychometric function. 

Note, the raw csv file uses `;` as a separation, so we need to explicitly specify the separation when we use `read.csv()` function. The responses for two alternative choices were coded as '1' (short) and '2' (long). You can count how many '2' among one condition to calculate the proportion. A better faster way is to recode the responses to 0 and 1, and use `sum()` and `mean()` functions to count the number of 'long' responses and the mean. In addition, for better understanding, we also need to recode / factorize the factor `Tempo`. 

```{r init}
# 1. Import necessary libraries 
library(tidyverse)
library(broom)
# If you haven't install quickpsy, uncomment the following line
#install.packages(quickpsy)
library(quickpsy)

# 2. import raw data, and make it readible. 
# read raw data, this csv file uses ';' as separation
# --- please change the location of the data file
raw_bisection <- read.csv('data/music_bisection.csv',sep=';')

# change response from 1,2 to 0,1 (short vs. long)
raw_bisection$resp <- raw_bisection$Decision -1
# convert numbers (tempo) to factor (slow, medium, fast)
raw_bisection$Tempo <- factor(raw_bisection$Tempo,labels = c('Slow','Medium','Fast'))

head(raw_bisection)

```


## 2. Average data and visualize average responses. 

After loading the raw dta, we now to calculate the average responses for each condition, individual participants. We also count the number of 'long' and 'short' responses, those are needed for later GLM logistic regression. 

```{r}
# average resp of 'long'
msub <- raw_bisection %>% 
  group_by(Participant, Tempo, Duration) %>%  
  summarise(prop = mean(resp), long = sum(resp), 
            n = n(), short = n-long) 
head(msub)
```

We now visualize the empirical psychometric curves. Recall we can add `facet_wrap()` in ggplot to plot separate participants, and use different color for different Tempos. 

```{r}
msub %>% ggplot(aes(x = Duration, 
                    y = prop, 
                    color = Tempo)) + 
  geom_point() + geom_line() +
  facet_wrap(~Participant) + 
  ylab("Prop. of 'long' responses")

```

By visual inspection, we can roughly know that the fast tempo (blue lines) shifted the psychometric function leftwards, and the slow tempo shifted the curve rightward. In the fast tempo condition, the bisection point (the middle point between 2 and 8 seconds) shifted lower. That is, the fast tempo music piece, compared to the slow tempo, requires less amount of duration to reach the subjective middle point. In other words, the fast tempo may _dilate_ the subjective time. 

## 3. Psychometric function estimated from GLM approach

To formally estimate the psychometric function and its parameters, here we use the general linear model (GLM) approach. Recall the formula format in R, and also check the help information of the function `glm()`. The logistic regression requires you to specify the `family` parameters with `binomial(logit)` in the `glm()` function. 

```{r}
# glm for logistic regression
glm_results = msub %>% 
    group_by(Participant, Tempo) %>%
    do(tidy (glm(cbind(long,short) ~ Duration, 
                 family = binomial(logit), data=.)))

head(glm_results)
```

In the above `tidy`-type analysis, we use `do(tidy())` function, which extracts the key parameters from the GLM model, and concatenate them into a table. Note that there are two parameters `Intercept` and `Duration` in the table. They are two parameters $a$ and $b$ from the logistic function 

$$\log\frac{P}{1-P}= a + b \cdot Duration$$

With the above format, the PSE is $-a/b$, and the b remains the same $\log(3)/b$. 

Now we calculate the PSEs and JNDs

```{r}
psy_results = glm_results %>%
    select(., one_of(c('Participant','Tempo', 'term','estimate'))) %>%
    spread(.,term, estimate) %>% rename(., b=Duration, a = `(Intercept)`) %>%
    mutate(., pse = -a/b, jnd = log(3)/b)

head(psy_results)
```

Visualize the estimated psychometric functions together with empirical data. Note, `ggplot` also provides the direct estimation and draw the logistic curves using `geom_smooth()`. In this layer, you need to specify the method with `glm`, and the arguments for the method with `methods.args = list(family = binomial(logit))`. 

```{r}
# Psychometric curves. 
msub %>%ggplot( aes(x=Duration, y=prop, color=Tempo)) +  
  geom_point(aes(shape = Tempo), size=3) + 
  geom_smooth(method=glm, method.args= list(family = binomial(logit)), 
              se = FALSE, aes(linetype=Tempo)) +
  xlab('Durations (sec)') + ylab('Prop. of Long Response') +
  facet_wrap(~Participant)
```

In the end, we visualize the mean PSEs and JNDs. 
```{r}
psy_results %>% group_by(Tempo) %>%
  summarise(mPSE = mean(pse)) %>% 
  ggplot(aes(Tempo, mPSE)) + geom_bar(stat = 'identity', width = 0.5) +
  theme_classic()
  
```

Your task: please add errorbars and also plot the figure for mean JNDs. 

One more Note: the GLM approach does not assume any lapse rate, and guess rate here. This is because logistic regression assume the range from 0 to 1.

## 4. `Quickpsy` approach

Quickpsy is an R package developed by Daniel Linares and Joan Loópez-Moliner to quickly fit and plot psychometric fucntions for multiple conditions. You can find the introduction and quick tutorial from [their official website](http://dlinares.org/quickpsy.html). 

First, please open `quickpsy` help file and check it's grammar. It has several important parameters: `x` for stimulus intensity, `k` for response variable. The response variable could be the number of trials in which a yes-type response was given or a vector of 0s and 1s indicating the response on each trial. In addition, the group variable `grouping`, where you can specify multiple group variables with the operator `.()`. 

In addition, you can estimate the lapse rate, by state `lapses = TRUE`. 

Let's first estimate the psychometric function using the raw data.

```{r}
# Estimate psychometric function and PSEs
quickpsy(raw_bisection, x = Duration, k = resp, 
         grouping = .(Tempo,Participant), lapses = TRUE, 
         bootstrap = 'none') -> qp
# show the thresholds
head(qp$thresholds)
```

There is one special function called `plotcurves()` for visualized the estimated psychometric functions. 

```{r}
plotcurves(qp)
```

Please compare the estimations from GLM and quickpsy, and discuss with your classmates the potential impacts of adding the lapse rate in the estimation. 

Your task: please extract the thresholds (PSEs), JNDs, and lapse rates from the quickpsy estimations, and visualise them. 
