Replication of Canonical Visual Size for Real-World Objects by Talia Konkle and Aude Oliva (2011, Journal of Experimental Psychology: Human Perception and Performance)

Introduction

For this project, I will be replicating Experiment 1 of Canonical Visual Size for Real-World Objects by Talia Konkle and Aude Oliva (2011, Journal of Experimental Psychology: Human Perception and Performance). This study explores how internal object representations may reflect real-world object size information and finds that real-world objects have consistent visual sizes that participants reflect when drawing objects from memory. They further find a relationship between real visual size of objects and assumed/imagined visual size when comparing data from a computer-based size categorization task to an in-person object drawing task. Since my research interests are in perception and the effects of multidimensional object properties on perception, this examination of the internal representation of object size interests me greatly.

There are two main parts to this experiment. The first part is the computer-based size categorization task. In this task, participants see a set of 100 images of real-world objects selected from a commercial database (Hemera Photo-Objects, Vol. I and II) and are asked to sort the objects into two groups by their real-world size with large objects on one side of the screen and small objects on the other. After sorting into the two large and small object groups, participants are then asked to sort those groups again into large and small groups. This process continues until you are left with 8 total groups with all objects ranked by size. This provides a measure of assumed object size for each object. For my replication, I will not be completing this task and instead will be using the original author’s results so that I can better replicate their main effect found in the object drawing task.

The second part of the experiment is an in-person drawing task that measures size properties of existing object representations. In this task, participants are given 18 sheets of paper and a list of objects to draw that match some of the objects whose assumed sizes were measured in the first task. Participants are told to draw one object per page and that their artistic talent was not being measured. They had a minute to draw each object and were unaware that object size was the real measure. The size of the drawn objects are then compared to each other to get a ranked order of the size that the objects are internally represented at. In the original paper, the authors used paper size as an additional variable to see if the amount of space you have to draw an object changes the size participants draw it at, but doing so required three times as many participants so that they could have 22 per paper size group. Since time is a constraint, I plan to omit the paper size question and test participants at only one paper size.

The main challenge of collecting data for this replication will be recruiting and collecting data from psych 1 students for the in-person object drawing portion of the study. This requires me to learn how to use the psych 1 participant pool and recruit in-person participants. This is also very achievable since there are plenty of resources to help me do this. * Update: Since the psych 1 participant pool is no longer an option, I will be collecting data by recruiting any Stanford students to complete the study. I will likely go to a crowded area on campus and entice people using snacks (like the original study did). If this fails, I will need to recruit friends and friends-of-friends from a variety of backgrounds (not just in the psych department) who have no knowledge about what the study may be.

Methods

Power Analysis

22 participants will be recruited to match the original sample size and power.

Planned Sample

22 in-person participants: age range 18-35, no exclusion criteria mentioned in original study

Materials

Consent form
Object drawing task:
- 7.3 x 11 inch sheets of paper (18 per participant)
- printed list of objects (1 per participant) Link to python code that generates the lists!

Procedure

Object drawing task:

Study procedure “Participants sat at a table and were given 18 sheets of paper (all of the same size) and a list of items to draw. They were instructed to draw one object per page and were explicitly told that we were not interested in artistic skills. We told participants to draw each object relatively quickly (within 1 min). When delivering the instructions, the word “size” was never used. The list of items contained 16 different objects that spanned the range of real-world sizes, with two objects at each size rank. The objects were: paperclip, key, pet goldfish, apple, hairdryer, running shoe, backpack, computer monitor, German shepherd, chair, floor lamp, soda machine, car, dump truck, 1-story house, light house. The order of objects was randomized for each observer. After all 16 objects had been drawn, observers next drew two scenes, a beach and a park, in random order.”

-no planned changes

Object drawing measuring procedure “To measure the drawn size of the objects, all drawings were scanned at a fixed resolution (150 dots per inch). Custom software was written in MATLAB to automatically find the bounding box around the object in the image, and these dimensions were con- verted from pixels into centimeters using the known resolution. Drawn size was calculated as the length of the diagonal of the bounding box around the object. Using the diagonal, rather than as the height or width alone, better takes into account variation in aspect ratio and has been shown to account for more explained variance in relative size measures than height, width, principle axis, and area (Kosslyn, 1978). The software proceeded one draw- ing at a time, and each object’s identity and the corresponding bounding box was verified by eye.”

-planned changes: because I will be running a lot less participants, I plan to draw and measure bounding boxes by hand without any custom software

Analysis Plan

Object drawing task:
- drawing filtering for accurate size measurements: “The first author and one additional observer used a strict criterion to filter any drawings with extraneous objects (e.g. trash bins behind the dump truck, a worm sticking out of the apple, cords connecting the floor lamps, headlight beams on cars, air coming out of the hairdryer)”
- planned analysis: A two-way ANOVA was conducted on drawn size with paper size as a between-subject factor and object size rank as a within-subject factor.”
- planned changes: only one paper size will be tested so this will be left out of the analysis and only an one-way ANOVA will be run comparing drawn size to object ranked real-world size.

Differences from Original Study

Object drawing task:
- only one paper size (the medium size from the original study) will be tested with 22 participants as the original medium paper size group had
- paper size effect will be left out of the analysis

Methods Addendum

Actual Sample

I did not go through with my plan to recruit completely random Stanford students for the study after discovering how long people found the study to be. Instead, I recruited 22 participants who were friends and friends-of-friends. None of the participants had any knowledge about what the study was about before participating. Participants were compensated with a Krispy Kreme Donut.

Differences from pre-data collection methods plan

Instead of drawing bounding boxes around each drawing for measuring, I used two L-shaped cardboard pieces to create a bounding box and then measured the created diagonal. I did this in order to not alter the original drawings that my participants had made.

Pilot A

pilot data link

For Pilot A, I had the pilot participant draw the 16 objects described in the original paper and compared the drawn sizes to the already known object size rank. Drawn object size was determined by drawing a box around the dimensions of the drawing and measuring the diagonal of that box in centimeters. A one-way ANOVA between size ranking group and the measured drawn object size.

####Load Relevant Libraries and Functions
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

####Import data
pilot_data <- read_csv("./pilot_data.csv")

## Rows: 16 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): object
## dbl (2): drawn_size, size_rank
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

view(pilot_data)

###run anova
pilot_anova <- aov(size_rank ~ drawn_size, data = pilot_data)
summary(pilot_anova)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## drawn_size   1  69.68   69.68   68.11 9.51e-07 ***
## Residuals   14  14.32    1.02                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

###make a plot
#ggplot(pilot_data, aes(size_rank, drawn_size))

ggplot(data = pilot_data, aes(x = size_rank, y = drawn_size)) +
  geom_point() + ylim(0, 25)

Data Analysis

Data

drawings by subject link

drawings by object link

size data link

Data preparation

####Import data
tidy_data <- read_csv("./raw_data_tidy.csv") #data is already in tidy format

## Rows: 352 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): subject, object
## dbl (2): size_rank, size
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

view(tidy_data)

Results

#plot with each subject
tidy_data %>% group_by(subject) %>% 
  ggplot(aes(x = size_rank, y = size, color = subject)) +
  labs(title = "Replication of Canonical Visual Size for Real-World Objects", x = "size rank", y = "size (cm)") +
  geom_point() +
  geom_smooth(method='lm', formula= y~x, se = FALSE)

#checking out the mean size for each object out of curiosity
object_mean_df <- tidy_data %>% group_by(object) %>% summarise(mean(size))
object_mean_df

## # A tibble: 16 × 2
##    object           `mean(size)`
##    <chr>                   <dbl>
##  1 1-story_house           13.6 
##  2 apple                    9.72
##  3 backpack                 9.49
##  4 car                     13.2 
##  5 chair                   10.1 
##  6 computer_monitor        11.9 
##  7 dump_truck              13.4 
##  8 floor_lamp              12.7 
##  9 German_shepherd         13.7 
## 10 hairdryer               10.3 
## 11 key                      8.27
## 12 lighthouse              12.3 
## 13 paperclip                6.34
## 14 pet_goldfish             5.15
## 15 running_shoe             9.64
## 16 soda_machine            13.5

#plot each subject but with one linear model line
final_results_plot <- tidy_data %>% group_by(object) %>% 
  ggplot(aes(x = size_rank, y = size)) +
  labs(title = "Replication of Canonical Visual Size for Real-World Objects", x = "size rank", y = "size (cm)") +
  geom_point() +
  geom_smooth(method='lm', formula= y~x, se = FALSE)

final_results_plot

Confirmatory analysis

I tried multplie ways of doing the same model to see if different R packages gave me different results.

#Running the statistics!

#single linear regression
library(emmeans)
fit <- lm(size ~ size_rank, data=tidy_data)
fit %>% summary()

## 
## Call:
## lm(formula = size ~ size_rank, data = tidy_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5430  -3.8361  -0.5303   3.2075  18.9804 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.5856     0.6127  10.749  < 2e-16 ***
## size_rank     0.9447     0.1213   7.786 7.89e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.216 on 350 degrees of freedom
## Multiple R-squared:  0.1476, Adjusted R-squared:  0.1452 
## F-statistic: 60.62 on 1 and 350 DF,  p-value: 7.893e-14

#1 by 8(size_ranks) anova within subjects (bc everyone had all 8 ranks)
fit %>% aov()

## Call:
##    aov(formula = .)
## 
## Terms:
##                 size_rank Residuals
## Sum of Squares   1649.158  9521.020
## Deg. of Freedom         1       350
## 
## Residual standard error: 5.215641
## Estimated effects may be unbalanced

#joint tests is a function that does the same as the anova
fit %>% joint_tests()

##  model term df1 df2 F.ratio p.value
##  size_rank    1 350  60.624  <.0001

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

fit %>% Anova()

## Anova Table (Type II tests)
## 
## Response: size
##           Sum Sq  Df F value    Pr(>F)    
## size_rank 1649.2   1  60.624 7.893e-14 ***
## Residuals 9521.0 350                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exploratory analyses

Comments are notes for myself.

#linear mixed effects model, the same thing but accounts for more possible variance
library(lmerTest)

## Loading required package: lme4

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

fit2 <- lmer(formula = size ~ size_rank + (1 + size_rank | subject), data = tidy_data) #maximal linear mixed effect model, random slopes of size_rank for each level of subject as well as random intercepts; accounts for within-subject differences in mean and slope

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00378475 (tol = 0.002, component 1)

#exploring: accounting for random slopes and intercepts for rank_size for each level of participant; we didnt expect it to tho bc our effect in the first anova was so good; R squared goes to 65%, which is better 

#^this one failed to converge so we did this one instead with only random effect of subject:

#linear mixed effects model but only accounting for random intercepts, not slopes
fit2 <- lmer(formula = size ~ size_rank + (1 | subject), data = tidy_data) 

fit2 %>% summary()

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: size ~ size_rank + (1 | subject)
##    Data: tidy_data
## 
## REML criterion at convergence: 1967.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0986 -0.6407 -0.0338  0.5958  4.2344 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 14.76    3.841   
##  Residual             13.04    3.611   
## Number of obs: 352, groups:  subject, 22
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   6.58563    0.92228  30.25540   7.141 5.78e-08 ***
## size_rank     0.94467    0.08399 329.00000  11.247  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr)
## size_rank -0.410

fit2 %>% joint_tests() #run the anova with the linear fixed effects model

##  model term df1 df2 F.ratio p.value
##  size_rank    1 329 126.489  <.0001

#find R-squared that takes into account within-sub variance
library(performance) #comparing R squares between models
fit %>% r2 #normal linear model (anova)

## # R2 for Linear Regression
##        R2: 0.148
##   adj. R2: 0.145

fit2 %>% r2() #corrected one

## # R2 for Mixed Models
## 
##   Conditional R2: 0.599
##      Marginal R2: 0.145

#R squared: explained variance

Discussion

Summary of Replication Attempt

Original Results: Original plot from Replication of Canonical Visual Size for Real-World Objects, Experiemnt 1.

My Results:

#plotting this again to be below the original plot
final_results_plot

Despite a lot of variation in individual participant drawing strategies and use of exemplars, the study replicated! I found a strong linear relationship between drawn object size and the size ranking of the objects: F(1, 350) = 60.624, p < .0001

Commentary

Challenges and observations from data collection: While running the study, I observed that there was immense variation in the strategies participants used. Some participants were very concerned about the time they had to draw each item. This caused them to use a Pictionary-like strategy in which they drew what they could as fast as possible to best describe the item. For example, instead of drawing a running shoe, 2 participants drew a shoe that was running in a race. Other participants were most concerned with how creative they could be with their drawings and were focused on thinking outside the box. For example, for the object “pet goldfish”, one participant drew a goldfish plus a scene of King Midas touching the fish and turning it into gold. Another major issue in the object drawing task was the variation of exemplars participants chose to draw for objects. For the “soda machine”, around half of the participants drew a vending machine while the other half drew a tabletop soda dispenser (such as the ones you would find next to the condiments counter at fast food restaurants). This caused some distinct differences in the size of the soda machine based on which exemplar a participant chose to draw. Another example of this was when participants drew “dump truck”. In recent times, the term dump truck has taken on a different meaning in popular culture which caused participants’ drawings of dump truck to vary.

Insight from exploratory models: When running the experiment, I noticed that participant’s seemed to used the first drawing as a reference for how large to draw the other objects. I was curious if these differences in the initial drawn size impacted the results. For an exploratory analysis, I ran some mixed linear models to see if my results would be different when accounting for this possible within-subject variation. I first tried accounting for slope and intercept. This model failed to converge, so I tried next to correct for just intercept. This model worked and showed that the results were very similar to the initial one-way ANOVA.

Assessment of the meaning of the replication: Since the experiment replicated, I believe that it may be true that people draw objects in accurate sizes relative to other drawn objects. After observing the immense variation in how subjects completed the study and still replicating the result, I am unsure if this was a good method to measure internal representation of size. I would be curious to see if the other experiments from the study also replicate. This would provide insight into whether or not the theory as a whole is accurate.