\[\\[0.05in]\]
\[\\[0.15in]\]
The subjects spent varying amounts of time with the model for each condition in the experiment:
## $`1`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.841 6.251 8.157 7.810 9.135 12.686
##
## $`2`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.204 6.941 8.898 9.193 10.829 19.347
##
## $`3`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.15 15.41 18.80 18.45 21.96 25.75
\[\\[0.15in]\]
The amount of time spent within each condition did not seem to depend on whether the model was convex or concave.
\[\\[0.15in]\]
Different subjects spent different amounts of time with the model for each condition.
\[\\[0.25in]\]
\[\\[0.05in]\]
\[\\[0.15in]\]
Word counts and time speaking for the researcher by condition time:
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = words_I ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -513.80 -93.50 2.87 101.41 539.82
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 133.25 59.27 2.248 0.0302 *
## condition.time_min 38.15 4.47 8.535 1.51e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 174.3 on 40 degrees of freedom
## Multiple R-squared: 0.6455, Adjusted R-squared: 0.6367
## F-statistic: 72.84 on 1 and 40 DF, p-value: 1.512e-10
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = speechtime_I ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -268.13 -85.87 -41.46 56.67 328.16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 125.53 44.56 2.817 0.007488 **
## condition.time_min 12.31 3.36 3.664 0.000721 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 131 on 40 degrees of freedom
## Multiple R-squared: 0.2513, Adjusted R-squared: 0.2326
## F-statistic: 13.42 on 1 and 40 DF, p-value: 0.0007206
\[\\[0.15in]\]
Word counts and time speaking for the subject:
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = words_P ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -514.64 -115.38 -8.53 111.52 564.54
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 156.122 70.472 2.215 0.0325 *
## condition.time_min 54.548 5.315 10.264 9.07e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 207.2 on 40 degrees of freedom
## Multiple R-squared: 0.7248, Adjusted R-squared: 0.7179
## F-statistic: 105.3 on 1 and 40 DF, p-value: 9.072e-13
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = speechtime_P ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -334.08 -47.34 -6.52 27.43 268.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.846 38.731 2.087 0.0433 *
## condition.time_min 25.820 2.921 8.840 5.97e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113.9 on 40 degrees of freedom
## Multiple R-squared: 0.6614, Adjusted R-squared: 0.653
## F-statistic: 78.14 on 1 and 40 DF, p-value: 5.971e-11
All these linear models have positive slopes that are (very) significant (p<0.001). More time spent with the model corresponded to more time talking and more words spoken for both the subject and the interviewer.
\[\\[0.15in]\]
One might guess that spending more time with the model would lead to more area touched by the subject or more time touching by the subject.
Condition time and area of the model touched:
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = touchratio ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.14409 -0.09358 -0.00642 0.05781 0.34305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.128083 0.037185 3.444 0.00136 **
## condition.time_min 0.001169 0.002804 0.417 0.67896
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1093 on 40 degrees of freedom
## Multiple R-squared: 0.004327, Adjusted R-squared: -0.02057
## F-statistic: 0.1738 on 1 and 40 DF, p-value: 0.679
\[\\[0.15in]\]
Condition time and time touching by the subject:
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = touch.time_mS ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -251260 -129507 -70135 52950 743942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68386 68338 1.001 0.3230
## condition.time_min 12200 5154 2.367 0.0229 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 200900 on 40 degrees of freedom
## Multiple R-squared: 0.1229, Adjusted R-squared: 0.1009
## F-statistic: 5.603 on 1 and 40 DF, p-value: 0.02285
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = touch.time.ratio_per ~ condition.time_min, data = timesResearcher)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.15 -18.20 -4.92 12.57 53.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.7635 8.4379 4.001 0.000265 ***
## condition.time_min -0.4213 0.6364 -0.662 0.511779
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.81 on 40 degrees of freedom
## Multiple R-squared: 0.01084, Adjusted R-squared: -0.01389
## F-statistic: 0.4382 on 1 and 40 DF, p-value: 0.5118
The slopes of both of the models are pretty flat. The area touched slope is not significant. The time touching slope is significant (p = 0.023), but not as much as the speaking-related slopes are. That relation and significance goes away when the percentage of time speaking is considered instead.
\[\\[0.25in]\]
\[\\[0.15in]\]
\[\\[0.15in]\]
There are six different states participants can be in based on three
binaries:
Subject Touch : Y / N
Subject Talk: Y / N
Researcher Talk: Y / N
There are only 6 (not 8) because subjects and researchers cannot be talking at the same time (for both touching and non-touching states)
How is this time distributed across the subjects?
\[\\[0.15in]\]
How much time do subjects spend on different behaviors by condition?
\[\\[0.25in]\]
\[\\[0.15in]\]
\[\\[0.15in]\]
There are six different states participants can be in based on three
binaries:
Subject Touch : Y / N
Subject Talk: Y / N
Researcher Talk: Y / N
Subjects are rows, columns are conditions. The 321 condition (right hand column) is the overall behavior for that participant across conditions.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
Also consider a limited version of the dataset that focuses on “action” from that subject. This is by percent within the subset of the data. That is, it only accounts for the time in which the subject is talking and/or touching. The time where nothing is happening and the time when R is talking but no touching is happening is not accounted for here.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
\[\\[0.15in]\]
Cluster analysis process based on: https://www.statology.org/k-means-clustering-in-r/. Should dig into the statistical details at some point to confirm that the approach is appropriate for our case.
Start by making simple plots of the subjects’ talking and touching behavior data. Note that even though these are 3D, actually all the data sits in a plane since the three values must add to 100, as they are percents of a whole.
First, plot the cumulative touch/talking percentage for each participant, combined across all the conditions.
Second, make a 3D plot of the touch/talking percentage for each participant by condition.
## NULL
A number of different datasets were considered in this analysis, including:
These were also considered both by holding all conditions together for one participant (i.e. looking for similarities between participants, n = 14) and by treating the conditions as independent of participant (i.e. looking for similarities between conditions or participants, n = 14 * 3)
All the previously described touch and touching-talking datasets were screened for possible clustering behaviors. Maybe of them did not settling in on a “correct” number of clusters–that is there is no “elbow” in the within sum of squares graphs and/or no maximum (at the reasonable number of clusters) in the gap statistic graph. In some cases, there may be signs of an optimal number of clusters, but that value is quite high relative to the number of data points (i.e. suggesting 10 clusters for 14 data points). The datasets that did or did not move towards a possibly reasonable and definitive number of clusters are described in each section.
Keep all the data for one participant together (i.e. n = 14). This shows us if there are participants that have similar behaviors to each other, perhaps suggesting that there are different “kinds” of people touch-habits-wise.
Much of the data does not seem to converge to an optimal number of clusters. Specifically:
the touch ratios data: how much different areas of the model
were/weren’t touched as measured from the images
(1a) touch ratios just including the overall touching of the model and
the country specific touching of the model–NO OPTIMAL NUMBER OF
CLUSTERS
(1b) touch ratios including the two values in 1a and also the touch
ratios for particular ring regions in each circle–NO OPTIMAL NUMBER OF
CLUSTERS
the talking and touching time data: how much time participants
and researchers spent touching and talking and various permutations of
these actions, as measured from the transcripts and video data
(2a) the talking and touching time data in amounts (seconds) combined
across conditions (i.e. the final overall square graph in each row in
the section 4 figure)–9 CLUSTERS
(2b) the talking and touching time data in amounts (seconds) for each
condition–NO OPTIMAL NUMBER OF CLUSTERS
(2c) the talking and touching time data as ratios relative to the
overall time combined across conditions (i.e. the final overall square
graph in each row in the section 4 figure)–6 CLUSTERS
| (2d) the talking and touching time data as ratios relative to the
overall time for that participant and condition–4 CLUSTERS
MOST COMPELLING: Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This data is combined across conditions (i.e. a weighted sum of the percents across conditions for each participant).
## S02 S10 S06 S01 S09 S13 S14 S04 S05 S08 S12 S03 S07 S15
## 1 2 3 4 4 4 4 5 5 5 5 6 6 6
## [1] "90.2%"
(NEW) ALSO COMPELLING: Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This data is combined across conditions (i.e. a weighted sum of the percents across conditions for each participant, the 321 graphs). It also considers only the time in which the subject is doing something (touching or talking) and does not consider the researcher’s behavior.
## S03.321 S07.321 S15.321 S01.321 S04.321 S06.321 S08.321 S09.321 S13.321 S14.321
## 1 1 1 2 2 2 2 2 3 3
## S02.321 S05.321 S12.321 S10.321
## 4 4 4 5
## [1] "95.4%"
Consider the touching and talking dataset by percent and by condition. That is, each participant has three sets of all the percent of time in all the touching and talking behaviors described previously–one set for each condition, as one row and many columns.
## S02 S03 S04 S05 S08 S12 S07 S15 S01 S06 S09 S13 S14 S10
## 1 1 1 1 1 1 2 2 3 3 3 3 3 4
## [1] "66.4%"
Consider the touching and talking dataset, including the amount of time in all the touching and talking behaviors described in the previous square multiples graphs. The total time per condition is also included here. This data is combined across conditions (i.e. touch time for S05 = touch time S05 condition 1 + touch time S05 condition 2 + touch time S05 condition 3).
## S08 S12 S03 S01 S04 S09 S13 S10 S06 S07 S14 S02 S05 S15
## 1 1 2 3 4 4 4 5 6 7 7 8 9 9
## [1] "93.2%"
Treat the conditions as independent entities. One might expect that either (1) the clusters would form by participant (e.g. all of S05’s three conditions would group together) showing that individuals are particular in their touching/talking behaviors, or that (2) the clusters would form by condition (e.g all condition 3’s would group together) showing that the questions asked impacts peoples touching/talking behaviors.
Much of the data does not seem to converge to an optimal number of clusters. Specifically:
the touch ratios data: how much different areas of the model
were/weren’t touched as measured from the images
(1a) touch ratios just including the overall touching of the model and
the country specific touching of the model—NO OPTIMAL NUMBER Of
CLUSTERS
(1b) touch ratios including the two values in 1a and also the touch
ratios for particular ring regions in each circle—8 CLUSTERS
the talking and touching time data: how much time participants
and researchers spent touching and talking and various permutations of
these actions, as measured from the transcripts and video data
(2b) the talking and touching time data in amounts (seconds) for each
condition—7 CLUSTERS
(2d) the talking and touching time data as ratios relative to the
overall time for that participant and condition—6 CLUSTERS
(NEW) Consider the touching and talking dataset, including the percent of time in all the touching and talking behaviors described in the previous square multiples graphs. This version of the data only considers when the subject is doing something, in other words, the second blue-only set of square multiples graphs above.
## S02.1 S04.1 S08.1 S12.1 S02.2 S03.2 S02.3 S12.3 S03.1 S07.1 S15.1 S03.3 S05.1
## 1 1 1 1 1 1 1 1 2 2 2 2 3
## S09.1 S10.1 S13.1 S01.2 S08.2 S12.2 S01.3 S05.3 S06.3 S08.3 S09.3 S01.1 S06.1
## 3 3 3 3 3 3 3 3 3 3 3 4 4
## S14.1 S06.2 S09.2 S13.2 S14.2 S04.3 S07.3 S13.3 S14.3 S15.3 S04.2 S05.2 S07.2
## 4 4 4 4 4 4 4 4 4 4 5 5 5
## S10.2 S15.2 S10.3
## 5 5 5
## [1] "89.7%"
Consider the touch ratios dataset, including the following touch ratios: overall, country, outer, middle, inner, center.
## S2.3 S5.1 S7.1 S7.3 S8.1 S12.1 S4.2 S4.3 S9.2 S12.2 S15.3 S6.1 S8.2
## 1 1 1 1 1 1 2 2 2 2 2 3 3
## S13.1 S13.2 S13.3 S14.1 S14.2 S14.3 S1.3 S2.2 S3.1 S3.2 S3.3 S7.2 S15.1
## 3 3 3 3 3 3 4 4 4 4 4 4 4
## S9.3 S10.2 S10.3 S2.1 S10.1 S1.1 S4.1 S6.2 S8.3 S12.3 S15.2 S1.2 S5.2
## 5 5 5 6 6 7 7 7 7 7 7 8 8
## S5.3 S6.3 S9.1
## 8 8 8
## [1] "92.8%"
Consider the touching and talking dataset, including the amount of time in all the touching and talking behaviors described in the previous square multiples graphs. The total time per condition is also included here.
## S01.1 S02.1 S02.2 S04.2 S06.2 S08.2 S13.2 S15.2 S05.3 S07.3 S13.3 S14.3 S15.3
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## S06.3 S10.3 S04.1 S05.1 S06.1 S07.1 S08.1 S09.1 S10.1 S13.1 S14.1 S15.1 S07.2
## 2 2 3 3 3 3 3 3 3 3 3 3 3
## S09.2 S14.2 S10.2 S02.3 S03.3 S01.2 S01.3 S04.3 S08.3 S09.3 S03.1 S12.1 S03.2
## 3 3 4 5 5 6 6 6 6 6 7 7 7
## S05.2 S12.2 S12.3
## 7 7 7
## [1] "83.1%"
Consider the touching and talking dataset by percent. The same touching and talking behaviors are included here but as a percent of the time in that condition.
## S07.2 S10.2 S15.2 S01.1 S05.1 S06.1 S13.1 S14.1 S01.2 S09.2 S13.2 S14.2 S01.3
## 1 1 1 2 2 2 2 2 2 2 2 2 2
## S04.3 S05.3 S07.3 S08.3 S09.3 S13.3 S14.3 S15.3 S07.1 S15.1 S03.3 S02.1 S09.1
## 2 2 2 2 2 2 2 2 3 3 3 4 4
## S10.1 S02.2 S04.2 S02.3 S12.3 S06.2 S06.3 S10.3 S03.1 S04.1 S08.1 S12.1 S03.2
## 4 4 4 4 4 5 5 5 6 6 6 6 6
## S05.2 S08.2 S12.2
## 6 6 6
## [1] "78.7%"