1. Basics: How long did subjects spend on each condition? Was this impacted by other attributes of the experiment?

\[\\[0.05in]\]

TDLR:

  • Subjects spent the most time on condition 3. They spent similar amounts of time on conditions 1 and 2. The convexity or concavity of the model did not impact the amount of time spent with the model.
  • Individual subjects varied in how much time they spent for the conditions. This may be partially explained by researcher behavior (see researcher difference in QAQC doc).

\[\\[0.15in]\]

The subjects spent varying amounts of time with the model for each condition in the experiment:

## $`1`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.841   6.251   8.157   7.810   9.135  12.686 
## 
## $`2`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.204   6.941   8.898   9.193  10.829  19.347 
## 
## $`3`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.15   15.41   18.80   18.45   21.96   25.75

\[\\[0.15in]\]

The amount of time spent within each condition did not seem to depend on whether the model was convex or concave.

\[\\[0.15in]\]

Different subjects spent different amounts of time with the model for each condition.

\[\\[0.25in]\]

2. Does time spent for each condition relate to words spoken or amount of touch?

\[\\[0.05in]\]

TLDR:

  • Conditions that took more time included more talking by the subject and by the researcher. The correlations between condition time and subject word count, subject talking time, researcher word count, and researcher talking time are each (individually) very significant (p < 0.001) with positive slopes.
  • Conditions that took more time did not include touching more area of the model, though condition time is correlated with spending more time touching the model (p = 0.023). Spending more time with the model is not correlated with the percent of time spent touching the model.
  • It could be that subjects who are taking more time with the model are re-touching areas they’ve already touched or touching one spot for more time (e.g. leaving their finger on the model as they talk). This might indicate a weakness in the touch tracking method (that we could better quantify with this data?).

\[\\[0.15in]\]

Condition time and speaking

Word counts and time speaking for the researcher by condition time:

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = words_I ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -513.80  -93.50    2.87  101.41  539.82 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          133.25      59.27   2.248   0.0302 *  
## condition.time_min    38.15       4.47   8.535 1.51e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 174.3 on 40 degrees of freedom
## Multiple R-squared:  0.6455, Adjusted R-squared:  0.6367 
## F-statistic: 72.84 on 1 and 40 DF,  p-value: 1.512e-10
## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = speechtime_I ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -268.13  -85.87  -41.46   56.67  328.16 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          125.53      44.56   2.817 0.007488 ** 
## condition.time_min    12.31       3.36   3.664 0.000721 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131 on 40 degrees of freedom
## Multiple R-squared:  0.2513, Adjusted R-squared:  0.2326 
## F-statistic: 13.42 on 1 and 40 DF,  p-value: 0.0007206

\[\\[0.15in]\]

Word counts and time speaking for the subject:

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = words_P ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -514.64 -115.38   -8.53  111.52  564.54 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         156.122     70.472   2.215   0.0325 *  
## condition.time_min   54.548      5.315  10.264 9.07e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 207.2 on 40 degrees of freedom
## Multiple R-squared:  0.7248, Adjusted R-squared:  0.7179 
## F-statistic: 105.3 on 1 and 40 DF,  p-value: 9.072e-13
## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = speechtime_P ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -334.08  -47.34   -6.52   27.43  268.02 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          80.846     38.731   2.087   0.0433 *  
## condition.time_min   25.820      2.921   8.840 5.97e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 113.9 on 40 degrees of freedom
## Multiple R-squared:  0.6614, Adjusted R-squared:  0.653 
## F-statistic: 78.14 on 1 and 40 DF,  p-value: 5.971e-11

All these linear models have positive slopes that are (very) significant (p<0.001). More time spent with the model corresponded to more time talking and more words spoken for both the subject and the interviewer.

\[\\[0.15in]\]

Condition time and touching:

One might guess that spending more time with the model would lead to more area touched by the subject or more time touching by the subject.

Condition time and area of the model touched:

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = touchratio ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14409 -0.09358 -0.00642  0.05781  0.34305 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        0.128083   0.037185   3.444  0.00136 **
## condition.time_min 0.001169   0.002804   0.417  0.67896   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1093 on 40 degrees of freedom
## Multiple R-squared:  0.004327,   Adjusted R-squared:  -0.02057 
## F-statistic: 0.1738 on 1 and 40 DF,  p-value: 0.679

\[\\[0.15in]\]

Condition time and time touching by the subject:

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = touch.time_mS ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -251260 -129507  -70135   52950  743942 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           68386      68338   1.001   0.3230  
## condition.time_min    12200       5154   2.367   0.0229 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 200900 on 40 degrees of freedom
## Multiple R-squared:  0.1229, Adjusted R-squared:  0.1009 
## F-statistic: 5.603 on 1 and 40 DF,  p-value: 0.02285
## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = touch.time.ratio_per ~ condition.time_min, data = timesResearcher)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -32.15 -18.20  -4.92  12.57  53.95 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         33.7635     8.4379   4.001 0.000265 ***
## condition.time_min  -0.4213     0.6364  -0.662 0.511779    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.81 on 40 degrees of freedom
## Multiple R-squared:  0.01084,    Adjusted R-squared:  -0.01389 
## F-statistic: 0.4382 on 1 and 40 DF,  p-value: 0.5118

The slopes of both of the models are pretty flat. The area touched slope is not significant. The time touching slope is significant (p = 0.023), but not as much as the speaking-related slopes are. That relation and significance goes away when the percentage of time speaking is considered instead.

\[\\[0.25in]\]

3. Overall, how do people spend the experiment time? What are they doing?

\[\\[0.15in]\]

TLDR:

  • Three behaviors are more common:
    • Researcher talking (S no touch + S no talk + R talk)
    • Subject talking (S no touch + S talk + R no talk)
    • Subject talking and touching (S touch + S talk + R no talk)
  • Three behaviors are less common:
    • Nothing (S no touch + S no talk + R no talk)
    • Subject touching (S touch + S talk + R no talk)
    • Researcher talking and subject touching (S touch + S no talk + R talk)
  • There are considerable differences in touching and talking behaviors between subjects (i.e. for each behavior there is a large range in percentage of time spent).
  • The condition (task) does not seem to impact how subjects spend their time overall.
  • Condition 2 might be slightly different from 1 and 3 in the subject touching behavior (S touch + S talk + R no talk), though this may be just a few outlying points.

\[\\[0.15in]\]

There are six different states participants can be in based on three binaries:
Subject Touch : Y / N
Subject Talk: Y / N
Researcher Talk: Y / N

There are only 6 (not 8) because subjects and researchers cannot be talking at the same time (for both touching and non-touching states)

How is this time distributed across the subjects?

\[\\[0.15in]\]

How much time do subjects spend on different behaviors by condition?

\[\\[0.25in]\]

4. Are there patterns in how individuals behaved? Are there ‘clusters’ of individuals who touched and talked similarly?

\[\\[0.15in]\]

TLDR:

  • Subjects vary considerably on how they behave (there also could be researcher effects here, TBD)
  • Some subjects are similar in their behaviors from one condition to the next (e.g. S01, S02, S09)
  • Some subjects have some variation in their behaviors from one condition to the next (e.g. S03, S04, S12)
  • A few subjects have very different behaviors from one condition to the next (e.g. S07, S10, S15)
  • More work is needed to identify possible “clusters” of subjects who behave similarly and to better characterize similarities or differences within a subject across conditions

\[\\[0.15in]\]

There are six different states participants can be in based on three binaries:
Subject Touch : Y / N
Subject Talk: Y / N
Researcher Talk: Y / N

Subjects are rows, columns are conditions. The 321 condition (right hand column) is the overall behavior for that participant across conditions.

## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

Also consider a limited version of the dataset that focuses on “action” from that subject. This is by percent within the subset of the data. That is, it only accounts for the time in which the subject is talking and/or touching. The time where nothing is happening and the time when R is talking but no touching is happening is not accounted for here.

## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

5. Revisiting more formally: Are there ‘clusters’ of individuals who touched and/or talked similarly? Do the conditions ‘cluster’ into similar touching and/or talking behaviors?

\[\\[0.15in]\]

TLDR:

  • Biggest picture: from the 3D scatter plots alone (subject touch and talking behaviors only) there does not seem to be clear evidence of clustering behavior. Also, these points actually since in a 2D plane since they have to add to 100. This makes the following points re: cluster analysis not very compelling.
  • Using cluster analysis, for most versions of the datasets, there does not appear to be clear clustering behavior; that is the optimal number of clusters analysis are not conclusive and/or suggest a number of clusters that is quite high relative to the number of cases (e.g. 10 clusters for 14 cases)
  • In some cases, there is evidence of possible clustering behavior. This seems to occur more often with the touch/talking datasets (video, transcripts) than the touch ratios datasets (images), though there are also more “versions” of the touch/talking datasets to try.
  • The most compelling clustering outcome when the conditions are held together (n = 14) is the touching and talking dataset, by percent of time in all the touching and talking behaviors, combined across conditions (i.e. a weighted sum of the percents across conditions for each participant).
  • It is hard to say what the most compelling clustering is when the conditions are not held together by participant (n = 14 * 3). They’re all kind of a maybe.
  • It’s not immediately obvious if there are patterns in the clustering in the n = 14 * 3 case—for example, it doesn’t immediately seem like all the condition 1’s cluster together or similar.
  • More work is needed to dig into other commonalities that might connect the clusters (e.g. concave or convex, researcher, quality of responses).

Cluster analysis process based on: https://www.statology.org/k-means-clustering-in-r/. Should dig into the statistical details at some point to confirm that the approach is appropriate for our case.

3D plots of subject behavior

Start by making simple plots of the subjects’ talking and touching behavior data. Note that even though these are 3D, actually all the data sits in a plane since the three values must add to 100, as they are percents of a whole.

First, plot the cumulative touch/talking percentage for each participant, combined across all the conditions.

Second, make a 3D plot of the touch/talking percentage for each participant by condition.

## NULL

Cluster analysis

A number of different datasets were considered in this analysis, including:

  1. the touch ratios data: how much different areas of the model were/weren’t touched as measured from the images
    (1a) touch ratios just including the overall touching of the model and the country specific touching of the model
    (1b) touch ratios including the two values in 1a and also the touch ratios for particular ring regions in each circle
  2. the talking and touching time data: how much time participants and researchers spent touching and talking and various permutations of these actions, as measured from the transcripts and video data
    (2a) the talking and touching time data in amounts (seconds)
    (2b) the talking and touching time data as ratios relative to the overall time for that participant and condition

These were also considered both by holding all conditions together for one participant (i.e. looking for similarities between participants, n = 14) and by treating the conditions as independent of participant (i.e. looking for similarities between conditions or participants, n = 14 * 3)

All the previously described touch and touching-talking datasets were screened for possible clustering behaviors. Maybe of them did not settling in on a “correct” number of clusters–that is there is no “elbow” in the within sum of squares graphs and/or no maximum (at the reasonable number of clusters) in the gap statistic graph. In some cases, there may be signs of an optimal number of clusters, but that value is quite high relative to the number of data points (i.e. suggesting 10 clusters for 14 data points). The datasets that did or did not move towards a possibly reasonable and definitive number of clusters are described in each section.

First: Keep all the data for one participant together.

Keep all the data for one participant together (i.e. n = 14). This shows us if there are participants that have similar behaviors to each other, perhaps suggesting that there are different “kinds” of people touch-habits-wise.

Much of the data does not seem to converge to an optimal number of clusters. Specifically:

  1. the touch ratios data: how much different areas of the model were/weren’t touched as measured from the images
    (1a) touch ratios just including the overall touching of the model and the country specific touching of the model–NO OPTIMAL NUMBER OF CLUSTERS
    (1b) touch ratios including the two values in 1a and also the touch ratios for particular ring regions in each circle–NO OPTIMAL NUMBER OF CLUSTERS

  2. the talking and touching time data: how much time participants and researchers spent touching and talking and various permutations of these actions, as measured from the transcripts and video data
    (2a) the talking and touching time data in amounts (seconds) combined across conditions (i.e. the final overall square graph in each row in the section 4 figure)–9 CLUSTERS
    (2b) the talking and touching time data in amounts (seconds) for each condition–NO OPTIMAL NUMBER OF CLUSTERS
    (2c) the talking and touching time data as ratios relative to the overall time combined across conditions (i.e. the final overall square graph in each row in the section 4 figure)–6 CLUSTERS
    | (2d) the talking and touching time data as ratios relative to the overall time for that participant and condition–4 CLUSTERS

MOST COMPELLING: Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This data is combined across conditions (i.e. a weighted sum of the percents across conditions for each participant).

## S02 S10 S06 S01 S09 S13 S14 S04 S05 S08 S12 S03 S07 S15 
##   1   2   3   4   4   4   4   5   5   5   5   6   6   6 
## [1] "90.2%"

(NEW) ALSO COMPELLING: Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This data is combined across conditions (i.e. a weighted sum of the percents across conditions for each participant, the 321 graphs). It also considers only the time in which the subject is doing something (touching or talking) and does not consider the researcher’s behavior.

## S03.321 S07.321 S15.321 S01.321 S04.321 S06.321 S08.321 S09.321 S13.321 S14.321 
##       1       1       1       2       2       2       2       2       3       3 
## S02.321 S05.321 S12.321 S10.321 
##       4       4       4       5 
## [1] "95.4%"

Consider the touching and talking dataset by percent and by condition. That is, each participant has three sets of all the percent of time in all the touching and talking behaviors described previously–one set for each condition, as one row and many columns.

## S02 S03 S04 S05 S08 S12 S07 S15 S01 S06 S09 S13 S14 S10 
##   1   1   1   1   1   1   2   2   3   3   3   3   3   4 
## [1] "66.4%"

Consider the touching and talking dataset, including the amount of time in all the touching and talking behaviors described in the previous square multiples graphs. The total time per condition is also included here. This data is combined across conditions (i.e. touch time for S05 = touch time S05 condition 1 + touch time S05 condition 2 + touch time S05 condition 3).

## S08 S12 S03 S01 S04 S09 S13 S10 S06 S07 S14 S02 S05 S15 
##   1   1   2   3   4   4   4   5   6   7   7   8   9   9 
## [1] "93.2%"

Second: Treat the conditions as independent entities.

Treat the conditions as independent entities. One might expect that either (1) the clusters would form by participant (e.g. all of S05’s three conditions would group together) showing that individuals are particular in their touching/talking behaviors, or that (2) the clusters would form by condition (e.g all condition 3’s would group together) showing that the questions asked impacts peoples touching/talking behaviors.

Much of the data does not seem to converge to an optimal number of clusters. Specifically:

  1. the touch ratios data: how much different areas of the model were/weren’t touched as measured from the images
    (1a) touch ratios just including the overall touching of the model and the country specific touching of the model—NO OPTIMAL NUMBER Of CLUSTERS
    (1b) touch ratios including the two values in 1a and also the touch ratios for particular ring regions in each circle—8 CLUSTERS

  2. the talking and touching time data: how much time participants and researchers spent touching and talking and various permutations of these actions, as measured from the transcripts and video data
    (2b) the talking and touching time data in amounts (seconds) for each condition—7 CLUSTERS
    (2d) the talking and touching time data as ratios relative to the overall time for that participant and condition—6 CLUSTERS

(NEW) Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This version of the data only considers when the subject is doing something, in other words, the second blue-only set of square multiples graphs above.

## S02.1 S04.1 S08.1 S12.1 S02.2 S03.2 S02.3 S12.3 S03.1 S07.1 S15.1 S03.3 S05.1 
##     1     1     1     1     1     1     1     1     2     2     2     2     3 
## S09.1 S10.1 S13.1 S01.2 S08.2 S12.2 S01.3 S05.3 S06.3 S08.3 S09.3 S01.1 S06.1 
##     3     3     3     3     3     3     3     3     3     3     3     4     4 
## S14.1 S06.2 S09.2 S13.2 S14.2 S04.3 S07.3 S13.3 S14.3 S15.3 S04.2 S05.2 S07.2 
##     4     4     4     4     4     4     4     4     4     4     5     5     5 
## S10.2 S15.2 S10.3 
##     5     5     5 
## [1] "89.7%"

Consider the touch ratios dataset, including the following touch ratios: overall, country, outer, middle, inner, center.

##  S2.3  S5.1  S7.1  S7.3  S8.1 S12.1  S4.2  S4.3  S9.2 S12.2 S15.3  S6.1  S8.2 
##     1     1     1     1     1     1     2     2     2     2     2     3     3 
## S13.1 S13.2 S13.3 S14.1 S14.2 S14.3  S1.3  S2.2  S3.1  S3.2  S3.3  S7.2 S15.1 
##     3     3     3     3     3     3     4     4     4     4     4     4     4 
##  S9.3 S10.2 S10.3  S2.1 S10.1  S1.1  S4.1  S6.2  S8.3 S12.3 S15.2  S1.2  S5.2 
##     5     5     5     6     6     7     7     7     7     7     7     8     8 
##  S5.3  S6.3  S9.1 
##     8     8     8 
## [1] "92.8%"

Consider the touching and talking dataset, including the amount of time in all the touching and talking behaviors described in the previous square multiples graphs. The total time per condition is also included here.

## S01.1 S02.1 S02.2 S04.2 S06.2 S08.2 S13.2 S15.2 S05.3 S07.3 S13.3 S14.3 S15.3 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## S06.3 S10.3 S04.1 S05.1 S06.1 S07.1 S08.1 S09.1 S10.1 S13.1 S14.1 S15.1 S07.2 
##     2     2     3     3     3     3     3     3     3     3     3     3     3 
## S09.2 S14.2 S10.2 S02.3 S03.3 S01.2 S01.3 S04.3 S08.3 S09.3 S03.1 S12.1 S03.2 
##     3     3     4     5     5     6     6     6     6     6     7     7     7 
## S05.2 S12.2 S12.3 
##     7     7     7 
## [1] "83.1%"

Consider the touching and talking dataset by percent. The same touching and talking behaviors are included here but as a percent of the time in that condition.

## S07.2 S10.2 S15.2 S01.1 S05.1 S06.1 S13.1 S14.1 S01.2 S09.2 S13.2 S14.2 S01.3 
##     1     1     1     2     2     2     2     2     2     2     2     2     2 
## S04.3 S05.3 S07.3 S08.3 S09.3 S13.3 S14.3 S15.3 S07.1 S15.1 S03.3 S02.1 S09.1 
##     2     2     2     2     2     2     2     2     3     3     3     4     4 
## S10.1 S02.2 S04.2 S02.3 S12.3 S06.2 S06.3 S10.3 S03.1 S04.1 S08.1 S12.1 S03.2 
##     4     4     4     4     4     5     5     5     6     6     6     6     6 
## S05.2 S08.2 S12.2 
##     6     6     6 
## [1] "78.7%"