Load libraries

library(mosaic)
library(dplyr)
library(googlesheets4)

Read in data

darwin<-read_sheet("https://docs.google.com/spreadsheets/d/1NCtjcO_w0SH_DQuIQ54cGc3h4VDU1DEc1vE0MOzhXtU/edit#gid=435288844" )

Rename some of the variables to make them easier to work with

names(darwin)[2]<-"mean.char"  
names(darwin)[3]<-"Sampling.method"  

Read in the passage from Darwin so we can calculate the true mean number of characters

oas<-read.csv("data/Darwin.csv")
(mean.nchar<-mean(~nchar, data=oas))
## [1] 4.936652

Distribution of sample means from students’ samples

We see that, on average, the mean number of words in the sample taken by students is higher than the population mean (4.93 words)

gf_histogram(~mean.char, data=darwin, xlab="Mean Number of Words") %>% gf_vline(xintercept=~4.93)

mean(~mean.char, data=darwin)
## [1] 5.922778
# Methods used by students
darwin %>% select(Sampling.method) %>% as.data.frame() 
##                                                                                                                                                                                                                                                                                                                                                                                          Sampling.method
## 1                                                                                                                                                                                                                                                                                                                                                                                                 random
## 2                                                                                                                                                          I chose words that were kind of in the middle letter wise. There are many big words in the passage but also many little words. I figured that they could kind of cancel each other out and at the end there will be more average sized words.
## 3                                                                                                                                                                                                                                                                                              I chose words that relate the diversity of wildlife, and words that talked about that within the passage.
## 4  I noticed that there lots of longer words in this text, so I chose words such as "animals," "different," and "organic" because when you take into account all the small words such as "a," "and," and "the," it seemed to average out quite nicely.  The words I chose were "animals," "different," "organic," "variation," "species," "variety," "organism," "exposed," "individuals," and "nature."
## 5                                                                                                                                                         I chose my 10 words by selecting a sentence near the middle of the passage from Darwin's "Origin of Species." I chose to select them this way, because I thought the 10 words would be good representatives of any sentence found in the text.
## 6                                                                                                                                        I highlighted 10 words in the text randomly. In order to avoid bias I tried to choose words without reading them. I highlighted words with a highlighter, and counted the letter in the 10 words.  After that, I calculated the mean/average number of letters.
## 7                                                                                                                                                                    I highlighted 10 word randomly and counted letter in each word. In order to avoid a bias I did not read the chosen words. I tried to choose these words randomly as I could. After that I calculated the average number of letters.
## 8                                                                                                                                                                                                                                                                                   I looked away from the screen and every time I looked back at the passage I picked the first word my eyes landed on.
## 9                                                                                                                                                                                                                                                                 I chose the words depending on what the main idea of the excerpt was and which words really stood out to me when reading this passage.
## 10                                                                                                                                                                                                                                                                                                                                             I chose words that were about the subject of the passage.
## 11                                                                                                                                                                                                                                                                                                                                           randomly chose a start point and did the following 10 words
## 12                                                                                                                                                                                                                                                                                                                                           The last word on each line except the last (first 10 lines)
## 13                                                                                                                                                                                                                                                                                                                                                                                 I chose them randomly
## 14                                                                                                                                                                                                                                                                                                                                                       Chose the first word of the middle lines (2-11)
## 15                                                                                                                                                                                                                                                                                                                                 The first word I saw in each line I recorded to calculate my average.
## 16                                                                                                                                                                                                                                                                                                                                                                                  I chose the first 10
## 17                                                                                                                                                                                                                                                                                                                                                                     I picked words of varying lengths
## 18                                                                                                                                                                                                         I chose ten words by looking at words that he uses often (such as "and" due to the lengthy sentences) and then picking a few longer words, as he uses many of those throughout the paragraph.
## 19                                                                                                                                                                                                                                                                         I tried to take a sample of words that appeared often or a diverse sample of some longer words and mid-length or short words.
## 20                                                                                                                                                                                                                                                                                                                                                             randomly chose words of different lengths
## 21                                                                                                                                                                                                                                                                                                                                                              I took the first 10 words of the passage
## 22                                                                                                                                                                                                                                                           I randomly move my eyes around the page and then stop in a general area. Then, I select a word that has a lot of letters, but not too many.
## 23                                                                                                                                                                                                                                                                                              I picked the first 10 words of the passage because it seemed to have a good mix of short and long words.
## 24                                                                                                                                                                                                                                                                                             Closed my eyes and moved the mouse around and stopped randomly to randomly choose a word from the passage
## 25                                                                                                                                                                                                                                                                                                                   I looked for ten consecutive words that had a mix of small, medium and large words.
## 26                                                                                                                                                                                                                                                                                                                                               I chose 3 short words, 4 medium words, and 3 long words
## 27                                                                                                                                                                                                                                                                                                                     I randomly chose words that I thought were taken from everywhere on the paragraph
## 28                                                                                                                                         Referring to my experience with writing papers in the past, i have noted that maybe 75% of typing is filled with small, short filler words that lead the reader to some longer word. With this knowledge, i chose mostly shorter words with some bigger ones.
## 29                                                                                                                                                                                                                                                                                             I chose my words by skimming and choosing words that seemed related, such as population, generation, etc.
## 30                                                                                                                                                                                                                                                                                     Randomly picked 10 numbers between 1 and 100 and chose the words corresponding to those numbers in the paragraph.
## 31                                                                                                                                                        I scanned the paragraph and got an idea of the variation in word size, then I randomly selected words that appeared to be within the range of shortest word to longest word in order to include the whole spectrum of word sizes in my sample.
## 32                                                                                                                                                                                                            I choose my 10 words based on how they related to the passage and the diversity of life or species. I then included some of the specifics at the end that the passage brought up directly.
## 33                                                                                                                           I chose them by skimming through and trying to average out how many long and short words there were. I guessed since there is more short words that are used to connect words I used more 2-4 letter word in my average and include an outlier of a 12 letter word as well.
## 34                                                                                                                                                                                                                     I chose words that showed up multiple times in the passage, and I chose ones representative of the range of length found in the passage.  Those two groups overlapped, of course.
## 35                                                                                                                                                                                                                                                                      I looked away from the screen and then looked back quickly. I wrote down the first word I saw. I repeated this process 10 times.
## 36                                                                                                                                                                                                                                I closed my eyes and moved my cursor at random for three seconds and then stopped. Whatever word was under my cursor was the word I sampled. I repeated this 10 times.
## 37                                                              Utilizing my knowledge of Darwin's theory of evolutoon and it's relationship to its research, I chose words that would be used a multitude of times over the the course of the passage. i.e. Diversity, Organism, Variety. This is most likely a case of many of the same words are used that they are indicative of the average length.
## 38                                                                                                                                                                                                                                                                                                                                              There is also some probability in the view propounded by
## 39                                                                                                                                                                                Used a random number generator to select which column of the text I would start in and then used a random number generator again to choose which word to start counting on. Counted 10 words consecutively from there.
## 40                                                                                                                                                                                                                               I chose the longest word and one of the shortest words in each sentence except the shortest sentence in the paragraph, so two words per sentence out of five sentences.
## 41                                                                                                                                                                                                                                                                                                                 I selected 10 words that appeared one after the other in the middle of the paragraph.
## 42                                                                                                                                                                                                                                                                                        I picked 10 words in a sentence at random.\n\n"that this great variability is due to our domestic productions"
## 43                                                                                                                                                                                                                                                                                                                             I grabbed a phrase that seemed to sum up the main subject of the passage.
## 44                                                                                                                                                                                                                                                                                                                            Haphazardly (very non-randomly). Letting my eye travel and land on things.
## 45                                                                                                                                                                                                                                                                             I chose 10 words I thought would be used fairly regularly, but don't seem too long compared to a scan of the other words.
## 46                                                                                                                                                                                                                                                                                                                                                       221 words in the text, I chose every 22nd word.
## 47                                                                                                                                                                                                             1. Assigned an individual ID number to each word.\n2. Used excel to select 10 random values between 1 and the number of words.\n3. Calculated the average of the randomly selected words.
## 48                                                                                 I chose one word from each line (excluding the first and last line). I divided the lines into 10 sections containing, 2 words on average, from left to write and picked the left or right word, alternating as a went down the lines beginning with the second line the left side of the first section: result "the".
## 49                                                                                                                                                                      I broke the section up into 50 word stretches, then randomly generated which section to use. Then I used a no replacing random number generator from 1-50 to select my ten words which I then counted and divided the sum by 10.
## 50                                                                                                                                                                                                                                                                                                          I read the section and estimated based off of word size and regularity of those size classes
## 51                                                                                                                                                                                                                                                                                                                      I chose the first word that I saw each time I looked away and back at the screen
## 52                                                                                                                                                                                                                                                                                                                                  I used command F to find 10 words that I thought showed up the most.
## 53                                                                                                                                                                                                                        I did a word count to figure that there were 221 words in the writing, then randomly generated ten numbers out of 221 and selected the corresponding words within the writing.
## 54                                                                                                                                                                                                                                                                                                                     I selected a sentence in the middle of the selection and used the first 10 words.

Random sampling

With random sampling, however, we should get estimates that are, on average, equal to the population mean. Lets explore this by taking 10,000 random samples and computing the mean number of words in each sample!

randomsamps<-do(10000)*{
  samp.char<-sample(oas, 10)
  mean(~nchar, data=samp.char)
  # Alternative way to accomplish the same thing in 1 line of code
  #mean(~nchar, data=sample(oas, 10))
}    
gf_dhistogram(~result, data=randomsamps, xlab="Mean Number of Words",
          main="Random Sampling") %>% gf_vline(xintercept=~4.93)

mean(~result, data=randomsamps)
## [1] 4.94444

Write out the data for a future lab

write.csv(darwin, file="data/Dguesses.csv", row.names = FALSE)