How many ducks were in FCUL’s 2023 duck farm

Tiago A. Marques

Introduction

As is usually the case, on the 5th lecture of Ecologia Numérica at FCUL (9th October 2023) I conducted an experiment that aims to illustrate a number of relevant sampling concepts in a relaxed environment.

This document describes a challenge set to the students in the “Ecologia Numérica” class on the . As with any good survey, there were some pilot surveys, that happened last couple of years, on the 4th October 2022 and the 27th September 2021. Those were different settings, and different animals.

The experiment

  • Some key terms that might be important for students in Ecologia Numérica are higlighted as bold.

  • In fact the challenge was announced on twitter 90 minutes before class, but the students had not been explicitly told about it. The original post is here

  • The challenge was that the students had to guess how many ducks were in a duck farm. Well, kind of, the ducks were small rubber and plastic ducks in a transparent glass jar. The image in the next slide shows the jar from different perspectives.

The duck farm

The details

The students were told that they would have to guess what was the number of ducks in the jar, with a detail, depending on their student number:

  • students with an even number would get a cue
  • students with an odd number would close their eyes and get no cue (and hence have no clue !)

The cue was given. The cue was that I showed the students (with even numbers only) 5 small ducks and 5 large ducks, and then added them to the jar.

The added information

  • The duck height level barely changed, if at all.

  • I hope that in the subconscious of these students an idea arose… There are many, many, many more than 10 animals there, so I need to estimate a large number. In a way, these students were allowed to calibrate their estimates.

  • The other students have no way to calibrate themselves. So I hope that in the end even numbered students will have less bias than odd numbered students. So, to recap, there will be several estimators, based on a sample, the students, belonging to an infinite population of potential students, say.

The estimators and the truth

A key aspect of the exercise is that there were in reality 346 animals after the 10 from the cue were introduced back in the jar. It is this true number we want to estimate.

We will consider (at least) 3 estimators:

  • the average of all students;
  • the average of the odd students (this estimator was uncalibrated)
  • the average of the even students (this estimator was somehow/somewhat calibrated)

The survey

Then the students in class guessed the number of animals and the results were reported on a google form here

The data

This form created a google spreadsheet (as google forms do!) that was downloaded by TAM and is further processed here. The file was downloaded as a .csv file (“Patos na Quinta da FCUL.csv”).

Read the data in

# A tibble: 6 × 5
  Timestamp How many ducks are l…¹ If you are a student…² Would you mind shari…³
  <chr>                      <dbl>                  <dbl> <chr>                 
1 2023/10/…                    250                     NA Female                
2 2023/10/…                    156                  52553 Male                  
3 2023/10/…                    123                     NA Male                  
4 2023/10/…                    200                     NA Female                
5 2023/10/…                     63                  40169 Male                  
6 2023/10/…                    137                     NA Female                
# ℹ abbreviated names:
#   ¹​`How many ducks are living in the farm (do not write anything but a number in this field - comments can be added below)`,
#   ²​`If you are a student at FCUL, what is your student number (if not a student, please ignore this question)?`,
#   ³​`Would you mind sharing your gender?`
# ℹ 1 more variable:
#   `Any other comment you might have about the  FCUL duck farm and this exercise?` <chr>

Data summary

Lets take a look at a summary of the data

  Timestamp        
 Length:76         
 Class :character  
 Mode  :character  
                   
                   
                   
                   
 How many ducks are living in the farm (do not write anything but a number in this field - comments can be added below)
 Min.   :  54.00                                                                                                       
 1st Qu.:  99.25                                                                                                       
 Median : 150.00                                                                                                       
 Mean   : 201.62                                                                                                       
 3rd Qu.: 220.50                                                                                                       
 Max.   :1230.00                                                                                                       
                                                                                                                       
 If you are a student at FCUL, what is your student number (if not a student, please ignore this question)?
 Min.   :40169                                                                                             
 1st Qu.:56611                                                                                             
 Median :57912                                                                                             
 Mean   :56803                                                                                             
 3rd Qu.:57987                                                                                             
 Max.   :60439                                                                                             
 NA's   :11                                                                                                
 Would you mind sharing your gender?
 Length:76                          
 Class :character                   
 Mode  :character                   
                                    
                                    
                                    
                                    
 Any other comment you might have about the  FCUL duck farm and this exercise?
 Length:76                                                                    
 Class :character                                                             
 Mode  :character                                                             
                                                                              
                                                                              
                                                                              
                                                                              

Some comments

The time stamp is not really relevant information, since all the students did the experience in class. Further, the comments, are also not that relevant, but I do print them here just for fun though.

 [1] "Sorry for the mean answer 😉"                                                                       
 [2] NA                                                                                                   
 [3] "In practice there are no ducklings in the farm, since the jar is not a farm."                       
 [4] "Queremos patinhos"                                                                                  
 [5] "Penso que uma escala em relação à altura da jarra ajudava ;)"                                       
 [6] "Mm a bacanz"                                                                                        
 [7] "Fila 4"                                                                                             
 [8] "Very cute ducks!"                                                                                   
 [9] "Interessante hahaha"                                                                                
[10] "they're so cute :')"                                                                                
[11] "Nice"                                                                                               
[12] "Funny"                                                                                              
[13] "Can't really see the jar but let's go randomly"                                                     
[14] "Mm à bacanz"                                                                                        
[15] "🤙"                                                                                                 
[16] "It's very cute"                                                                                     
[17] "É uma quinta muito diversa! Acho que o exercício é excelente para captar a atenção dos alunos"      
[18] "No"                                                                                                 
[19] "Que eu esteja na cauda da distribuição gaussiana, para conseguir acertar contra as probabilidades !"
[20] "O meu background a trabalhar no McDonald's,  ajuda a fazer estimativas para inventario"             
[21] "It would be great if a question asking in what row were we seated should be asked."                 
[22] "Quá quá"                                                                                            
[23] "Next, how about a cetacean pond?"                                                                   

Data wrangling

We recode the gender and the number of the student as factors, and create a new data.frame, for easier use,

and now we take a look at the data again:

   estimativa        studentid                 gender  
 Min.   :  54.00   Min.   :40169   Female         :50  
 1st Qu.:  99.25   1st Qu.:56611   Male           :24  
 Median : 150.00   Median :57912   Rather not tell: 1  
 Mean   : 201.62   Mean   :56803   NA's           : 1  
 3rd Qu.: 220.50   3rd Qu.:57987                       
 Max.   :1230.00   Max.   :60439                       
                   NA's   :11                          

and we create an indicator of whether student number is odd or even

The estimates

We can plot the estimates:

The estimates and the truth I

Let’s add a few things to the plot:

  • true number of ducks as a green vertical line,
  • and then the mean of the guesses, as a vertical dashed red line
  • a naive the 95% confidence interval, the blue dashed lines

The estimates and the truth II

Conclusion

Clearly - and even though we have not covered a t-test in class yet, you have seen them before - one would reject the null hypothesis that the mean would be equal to 346. The estimates are too low compared to truth.


    One Sample t-test

data:  estimativas
t = -6.5073, df = 75, p-value = 7.667e-09
alternative hypothesis: true mean is not equal to 346
95 percent confidence interval:
 157.4187 245.8181
sample estimates:
mean of x 
 201.6184 

Gender matters?

Lets separate the estimates per gender

Not really much of a difference across males and females, and too small sample size to make inferences about those who “Rather not tell”.

Does a cue gives you a clue I

What about those that got the cue versus those that got no cue. We can check the mean value obtained by the different students

Does a cue gives you a clue I

As expected, the mean value for even numbers was closer to the truth (253) and for odd numbers was further from the truth (153).

What about non EN students

Interestingly, the few folks not EN students, and hence not getting a cue either, where less biased than those not getting a cue in class (mean 235), but there were only 11 of those and so that might just be a fluke and not interpreted as yet another proof that EN students are not very good at this game ;)

Take home messages

  • Estimators can be biased (as human guesses often are)

  • Humans are not in general good at counting ducks in jars (and other things too!)

  • Information provided as cues (information about estimator bias) can help in the estimation process

  • The larger the sample size, the more accurate an (unbiased) estimator becomes (not illustrated here, only shown in class)