This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt", 
                      sep = "\t", header = T, stringsAsFactors = F)

Question 1: Create the following histograms of the number of tattoos pirates have separately for each favorite pirate. Add appropriate labels for each plot. Hint: Use unique(pirates$favorite.pirate) as your index values. Additionally, before creating the loop, set up a 2 x 3 plotting region using par(mfrow = c(2, 3))

par(mfrow = c(2, 3))
for (favorite.pirate.i in unique(pirates$favorite.pirate)) {
  data.temp <- subset(pirates,favorite.pirate == favorite.pirate.i)
  hist(data.temp$tattoos, main = favorite.pirate.i,
xlab = "tattoos")
} 

Question 2: The law of large numbers says that the larger your sample size, the closer your sample statistic will be to the true population value. Let’s test this by conducting a simulation. For sample sizes of 1 to 100, calculate the average difference between the sample mean and the population mean from a Normal distribution with mean 100 and standard deviation 10.

Step 1: Create the design matrix Step 2: Set up the loop over the rows of the design matrix Step 3: For each row of the design matrix, extract the sample size (N). Step 4: Draw N samples from a Normal distribution with mean 100 and standard deviation 10 Step 5: Calculate the absolute difference between the sample mean and the population mean.

design.matrix <- expand.grid ("sample.size" = 1:100,
                              "simulation" = 1:100, "result"= NA)
head(design.matrix)
##   sample.size simulation result
## 1           1          1     NA
## 2           2          1     NA
## 3           3          1     NA
## 4           4          1     NA
## 5           5          1     NA
## 6           6          1     NA
for (row.i in 1:nrow(design.matrix)){
  sample.size.i <- design.matrix$sample.size[row.i]
  simulation.i <- design.matrix$simulation[row.i]
  data <- rnorm (n= sample.size.i, mean = 100, sd= 10)
  sample.mean <- mean(data)
  diff <- sample.mean -100
  design.matrix$result[row.i] <- diff
}

head(design.matrix)
##   sample.size simulation     result
## 1           1          1  0.4431543
## 2           2          1  6.7239913
## 3           3          1  6.4480984
## 4           4          1  3.6598429
## 5           5          1 -1.0350483
## 6           6          1  2.9107852

Question 3 Plot your aggregate results from question 2

plot(x = design.matrix$sample.size,
y = design.matrix$result,
main = "Difference between the sample mean and the population mean",
xlab = "Sample Size",
ylab = "Difference",
xlim = c(0, 100), ylim = c(0, 30), cex = 1)

Question 4: How many people do you need in a room for the probability to be greater than 0.50 that at least two people in the room have the same birthday? Answer this question using a simulation. For example, if there are 2 people in the room, what is the probability that they have the same birthday. Now what about 3, 4, … 365 people?

Step 1: Create the design matrix

Step 2: Set up the loop over the rows of the design matrix

Step 3: For each row of the design matrix, extract the number of people in the room (N).

Step 4: Simulate those N people in a room and figure out if at least two have the same birthday. Here’s a Hint:

Step 5: Save the result (TRUE or FALSE) in the design matrix

design.matrix <- expand.grid(
  "people.in.room" = 1:365, 
  "simulation" = 1:100, 
  "result" = NA)

for (row.i in 1: nrow(design.matrix)) {

  
  people.i <- design.matrix$people.in.room [row.i]

  bdays <- sample (x= 1:365, size = people.i, replace = T)
  
result <- length(bdays) != length(unique(bdays))
design.matrix$result[row.i] <- result }


aggregate(result~people.in.room, data = design.matrix, FUN = mean)
##     people.in.room result
## 1                1   0.00
## 2                2   0.00
## 3                3   0.00
## 4                4   0.01
## 5                5   0.01
## 6                6   0.03
## 7                7   0.06
## 8                8   0.06
## 9                9   0.07
## 10              10   0.08
## 11              11   0.15
## 12              12   0.18
## 13              13   0.20
## 14              14   0.15
## 15              15   0.25
## 16              16   0.38
## 17              17   0.24
## 18              18   0.45
## 19              19   0.32
## 20              20   0.47
## 21              21   0.45
## 22              22   0.59
## 23              23   0.51
## 24              24   0.58
## 25              25   0.57
## 26              26   0.57
## 27              27   0.65
## 28              28   0.65
## 29              29   0.66
## 30              30   0.73
## 31              31   0.75
## 32              32   0.79
## 33              33   0.79
## 34              34   0.82
## 35              35   0.78
## 36              36   0.78
## 37              37   0.84
## 38              38   0.84
## 39              39   0.89
## 40              40   0.92
## 41              41   0.87
## 42              42   0.87
## 43              43   0.91
## 44              44   0.95
## 45              45   0.95
## 46              46   0.94
## 47              47   0.92
## 48              48   0.97
## 49              49   0.97
## 50              50   0.99
## 51              51   1.00
## 52              52   0.97
## 53              53   0.99
## 54              54   0.99
## 55              55   1.00
## 56              56   0.99
## 57              57   1.00
## 58              58   0.99
## 59              59   0.99
## 60              60   1.00
## 61              61   1.00
## 62              62   0.98
## 63              63   1.00
## 64              64   0.99
## 65              65   1.00
## 66              66   1.00
## 67              67   1.00
## 68              68   1.00
## 69              69   1.00
## 70              70   1.00
## 71              71   1.00
## 72              72   1.00
## 73              73   1.00
## 74              74   1.00
## 75              75   1.00
## 76              76   1.00
## 77              77   1.00
## 78              78   1.00
## 79              79   1.00
## 80              80   1.00
## 81              81   1.00
## 82              82   1.00
## 83              83   1.00
## 84              84   1.00
## 85              85   1.00
## 86              86   1.00
## 87              87   1.00
## 88              88   1.00
## 89              89   1.00
## 90              90   1.00
## 91              91   1.00
## 92              92   1.00
## 93              93   1.00
## 94              94   1.00
## 95              95   1.00
## 96              96   1.00
## 97              97   1.00
## 98              98   1.00
## 99              99   1.00
## 100            100   1.00
## 101            101   1.00
## 102            102   1.00
## 103            103   1.00
## 104            104   1.00
## 105            105   1.00
## 106            106   1.00
## 107            107   1.00
## 108            108   1.00
## 109            109   1.00
## 110            110   1.00
## 111            111   1.00
## 112            112   1.00
## 113            113   1.00
## 114            114   1.00
## 115            115   1.00
## 116            116   1.00
## 117            117   1.00
## 118            118   1.00
## 119            119   1.00
## 120            120   1.00
## 121            121   1.00
## 122            122   1.00
## 123            123   1.00
## 124            124   1.00
## 125            125   1.00
## 126            126   1.00
## 127            127   1.00
## 128            128   1.00
## 129            129   1.00
## 130            130   1.00
## 131            131   1.00
## 132            132   1.00
## 133            133   1.00
## 134            134   1.00
## 135            135   1.00
## 136            136   1.00
## 137            137   1.00
## 138            138   1.00
## 139            139   1.00
## 140            140   1.00
## 141            141   1.00
## 142            142   1.00
## 143            143   1.00
## 144            144   1.00
## 145            145   1.00
## 146            146   1.00
## 147            147   1.00
## 148            148   1.00
## 149            149   1.00
## 150            150   1.00
## 151            151   1.00
## 152            152   1.00
## 153            153   1.00
## 154            154   1.00
## 155            155   1.00
## 156            156   1.00
## 157            157   1.00
## 158            158   1.00
## 159            159   1.00
## 160            160   1.00
## 161            161   1.00
## 162            162   1.00
## 163            163   1.00
## 164            164   1.00
## 165            165   1.00
## 166            166   1.00
## 167            167   1.00
## 168            168   1.00
## 169            169   1.00
## 170            170   1.00
## 171            171   1.00
## 172            172   1.00
## 173            173   1.00
## 174            174   1.00
## 175            175   1.00
## 176            176   1.00
## 177            177   1.00
## 178            178   1.00
## 179            179   1.00
## 180            180   1.00
## 181            181   1.00
## 182            182   1.00
## 183            183   1.00
## 184            184   1.00
## 185            185   1.00
## 186            186   1.00
## 187            187   1.00
## 188            188   1.00
## 189            189   1.00
## 190            190   1.00
## 191            191   1.00
## 192            192   1.00
## 193            193   1.00
## 194            194   1.00
## 195            195   1.00
## 196            196   1.00
## 197            197   1.00
## 198            198   1.00
## 199            199   1.00
## 200            200   1.00
## 201            201   1.00
## 202            202   1.00
## 203            203   1.00
## 204            204   1.00
## 205            205   1.00
## 206            206   1.00
## 207            207   1.00
## 208            208   1.00
## 209            209   1.00
## 210            210   1.00
## 211            211   1.00
## 212            212   1.00
## 213            213   1.00
## 214            214   1.00
## 215            215   1.00
## 216            216   1.00
## 217            217   1.00
## 218            218   1.00
## 219            219   1.00
## 220            220   1.00
## 221            221   1.00
## 222            222   1.00
## 223            223   1.00
## 224            224   1.00
## 225            225   1.00
## 226            226   1.00
## 227            227   1.00
## 228            228   1.00
## 229            229   1.00
## 230            230   1.00
## 231            231   1.00
## 232            232   1.00
## 233            233   1.00
## 234            234   1.00
## 235            235   1.00
## 236            236   1.00
## 237            237   1.00
## 238            238   1.00
## 239            239   1.00
## 240            240   1.00
## 241            241   1.00
## 242            242   1.00
## 243            243   1.00
## 244            244   1.00
## 245            245   1.00
## 246            246   1.00
## 247            247   1.00
## 248            248   1.00
## 249            249   1.00
## 250            250   1.00
## 251            251   1.00
## 252            252   1.00
## 253            253   1.00
## 254            254   1.00
## 255            255   1.00
## 256            256   1.00
## 257            257   1.00
## 258            258   1.00
## 259            259   1.00
## 260            260   1.00
## 261            261   1.00
## 262            262   1.00
## 263            263   1.00
## 264            264   1.00
## 265            265   1.00
## 266            266   1.00
## 267            267   1.00
## 268            268   1.00
## 269            269   1.00
## 270            270   1.00
## 271            271   1.00
## 272            272   1.00
## 273            273   1.00
## 274            274   1.00
## 275            275   1.00
## 276            276   1.00
## 277            277   1.00
## 278            278   1.00
## 279            279   1.00
## 280            280   1.00
## 281            281   1.00
## 282            282   1.00
## 283            283   1.00
## 284            284   1.00
## 285            285   1.00
## 286            286   1.00
## 287            287   1.00
## 288            288   1.00
## 289            289   1.00
## 290            290   1.00
## 291            291   1.00
## 292            292   1.00
## 293            293   1.00
## 294            294   1.00
## 295            295   1.00
## 296            296   1.00
## 297            297   1.00
## 298            298   1.00
## 299            299   1.00
## 300            300   1.00
## 301            301   1.00
## 302            302   1.00
## 303            303   1.00
## 304            304   1.00
## 305            305   1.00
## 306            306   1.00
## 307            307   1.00
## 308            308   1.00
## 309            309   1.00
## 310            310   1.00
## 311            311   1.00
## 312            312   1.00
## 313            313   1.00
## 314            314   1.00
## 315            315   1.00
## 316            316   1.00
## 317            317   1.00
## 318            318   1.00
## 319            319   1.00
## 320            320   1.00
## 321            321   1.00
## 322            322   1.00
## 323            323   1.00
## 324            324   1.00
## 325            325   1.00
## 326            326   1.00
## 327            327   1.00
## 328            328   1.00
## 329            329   1.00
## 330            330   1.00
## 331            331   1.00
## 332            332   1.00
## 333            333   1.00
## 334            334   1.00
## 335            335   1.00
## 336            336   1.00
## 337            337   1.00
## 338            338   1.00
## 339            339   1.00
## 340            340   1.00
## 341            341   1.00
## 342            342   1.00
## 343            343   1.00
## 344            344   1.00
## 345            345   1.00
## 346            346   1.00
## 347            347   1.00
## 348            348   1.00
## 349            349   1.00
## 350            350   1.00
## 351            351   1.00
## 352            352   1.00
## 353            353   1.00
## 354            354   1.00
## 355            355   1.00
## 356            356   1.00
## 357            357   1.00
## 358            358   1.00
## 359            359   1.00
## 360            360   1.00
## 361            361   1.00
## 362            362   1.00
## 363            363   1.00
## 364            364   1.00
## 365            365   1.00
# If there are more than 23 people in the room the chance is 50 percent that two people have the same birthday.