This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt",
sep = "\t", header = T, stringsAsFactors = F)
Question 1: Create the following histograms of the number of tattoos pirates have separately for each favorite pirate. Add appropriate labels for each plot. Hint: Use unique(pirates$favorite.pirate) as your index values. Additionally, before creating the loop, set up a 2 x 3 plotting region using par(mfrow = c(2, 3))
par(mfrow = c(2, 3))
for (favorite.pirate.i in unique(pirates$favorite.pirate)) {
data.temp <- subset(pirates,favorite.pirate == favorite.pirate.i)
hist(data.temp$tattoos, main = favorite.pirate.i,
xlab = "tattoos")
}
Question 2: The law of large numbers says that the larger your sample size, the closer your sample statistic will be to the true population value. Let’s test this by conducting a simulation. For sample sizes of 1 to 100, calculate the average difference between the sample mean and the population mean from a Normal distribution with mean 100 and standard deviation 10.
Step 1: Create the design matrix Step 2: Set up the loop over the rows of the design matrix Step 3: For each row of the design matrix, extract the sample size (N). Step 4: Draw N samples from a Normal distribution with mean 100 and standard deviation 10 Step 5: Calculate the absolute difference between the sample mean and the population mean.
design.matrix <- expand.grid ("sample.size" = 1:100,
"simulation" = 1:100, "result"= NA)
head(design.matrix)
## sample.size simulation result
## 1 1 1 NA
## 2 2 1 NA
## 3 3 1 NA
## 4 4 1 NA
## 5 5 1 NA
## 6 6 1 NA
for (row.i in 1:nrow(design.matrix)){
sample.size.i <- design.matrix$sample.size[row.i]
simulation.i <- design.matrix$simulation[row.i]
data <- rnorm (n= sample.size.i, mean = 100, sd= 10)
sample.mean <- mean(data)
diff <- sample.mean -100
design.matrix$result[row.i] <- diff
}
head(design.matrix)
## sample.size simulation result
## 1 1 1 0.4431543
## 2 2 1 6.7239913
## 3 3 1 6.4480984
## 4 4 1 3.6598429
## 5 5 1 -1.0350483
## 6 6 1 2.9107852
Question 3 Plot your aggregate results from question 2
plot(x = design.matrix$sample.size,
y = design.matrix$result,
main = "Difference between the sample mean and the population mean",
xlab = "Sample Size",
ylab = "Difference",
xlim = c(0, 100), ylim = c(0, 30), cex = 1)
Question 4: How many people do you need in a room for the probability to be greater than 0.50 that at least two people in the room have the same birthday? Answer this question using a simulation. For example, if there are 2 people in the room, what is the probability that they have the same birthday. Now what about 3, 4, … 365 people?
Step 1: Create the design matrix
Step 2: Set up the loop over the rows of the design matrix
Step 3: For each row of the design matrix, extract the number of people in the room (N).
Step 4: Simulate those N people in a room and figure out if at least two have the same birthday. Here’s a Hint:
Step 5: Save the result (TRUE or FALSE) in the design matrix
design.matrix <- expand.grid(
"people.in.room" = 1:365,
"simulation" = 1:100,
"result" = NA)
for (row.i in 1: nrow(design.matrix)) {
people.i <- design.matrix$people.in.room [row.i]
bdays <- sample (x= 1:365, size = people.i, replace = T)
result <- length(bdays) != length(unique(bdays))
design.matrix$result[row.i] <- result }
aggregate(result~people.in.room, data = design.matrix, FUN = mean)
## people.in.room result
## 1 1 0.00
## 2 2 0.00
## 3 3 0.00
## 4 4 0.01
## 5 5 0.01
## 6 6 0.03
## 7 7 0.06
## 8 8 0.06
## 9 9 0.07
## 10 10 0.08
## 11 11 0.15
## 12 12 0.18
## 13 13 0.20
## 14 14 0.15
## 15 15 0.25
## 16 16 0.38
## 17 17 0.24
## 18 18 0.45
## 19 19 0.32
## 20 20 0.47
## 21 21 0.45
## 22 22 0.59
## 23 23 0.51
## 24 24 0.58
## 25 25 0.57
## 26 26 0.57
## 27 27 0.65
## 28 28 0.65
## 29 29 0.66
## 30 30 0.73
## 31 31 0.75
## 32 32 0.79
## 33 33 0.79
## 34 34 0.82
## 35 35 0.78
## 36 36 0.78
## 37 37 0.84
## 38 38 0.84
## 39 39 0.89
## 40 40 0.92
## 41 41 0.87
## 42 42 0.87
## 43 43 0.91
## 44 44 0.95
## 45 45 0.95
## 46 46 0.94
## 47 47 0.92
## 48 48 0.97
## 49 49 0.97
## 50 50 0.99
## 51 51 1.00
## 52 52 0.97
## 53 53 0.99
## 54 54 0.99
## 55 55 1.00
## 56 56 0.99
## 57 57 1.00
## 58 58 0.99
## 59 59 0.99
## 60 60 1.00
## 61 61 1.00
## 62 62 0.98
## 63 63 1.00
## 64 64 0.99
## 65 65 1.00
## 66 66 1.00
## 67 67 1.00
## 68 68 1.00
## 69 69 1.00
## 70 70 1.00
## 71 71 1.00
## 72 72 1.00
## 73 73 1.00
## 74 74 1.00
## 75 75 1.00
## 76 76 1.00
## 77 77 1.00
## 78 78 1.00
## 79 79 1.00
## 80 80 1.00
## 81 81 1.00
## 82 82 1.00
## 83 83 1.00
## 84 84 1.00
## 85 85 1.00
## 86 86 1.00
## 87 87 1.00
## 88 88 1.00
## 89 89 1.00
## 90 90 1.00
## 91 91 1.00
## 92 92 1.00
## 93 93 1.00
## 94 94 1.00
## 95 95 1.00
## 96 96 1.00
## 97 97 1.00
## 98 98 1.00
## 99 99 1.00
## 100 100 1.00
## 101 101 1.00
## 102 102 1.00
## 103 103 1.00
## 104 104 1.00
## 105 105 1.00
## 106 106 1.00
## 107 107 1.00
## 108 108 1.00
## 109 109 1.00
## 110 110 1.00
## 111 111 1.00
## 112 112 1.00
## 113 113 1.00
## 114 114 1.00
## 115 115 1.00
## 116 116 1.00
## 117 117 1.00
## 118 118 1.00
## 119 119 1.00
## 120 120 1.00
## 121 121 1.00
## 122 122 1.00
## 123 123 1.00
## 124 124 1.00
## 125 125 1.00
## 126 126 1.00
## 127 127 1.00
## 128 128 1.00
## 129 129 1.00
## 130 130 1.00
## 131 131 1.00
## 132 132 1.00
## 133 133 1.00
## 134 134 1.00
## 135 135 1.00
## 136 136 1.00
## 137 137 1.00
## 138 138 1.00
## 139 139 1.00
## 140 140 1.00
## 141 141 1.00
## 142 142 1.00
## 143 143 1.00
## 144 144 1.00
## 145 145 1.00
## 146 146 1.00
## 147 147 1.00
## 148 148 1.00
## 149 149 1.00
## 150 150 1.00
## 151 151 1.00
## 152 152 1.00
## 153 153 1.00
## 154 154 1.00
## 155 155 1.00
## 156 156 1.00
## 157 157 1.00
## 158 158 1.00
## 159 159 1.00
## 160 160 1.00
## 161 161 1.00
## 162 162 1.00
## 163 163 1.00
## 164 164 1.00
## 165 165 1.00
## 166 166 1.00
## 167 167 1.00
## 168 168 1.00
## 169 169 1.00
## 170 170 1.00
## 171 171 1.00
## 172 172 1.00
## 173 173 1.00
## 174 174 1.00
## 175 175 1.00
## 176 176 1.00
## 177 177 1.00
## 178 178 1.00
## 179 179 1.00
## 180 180 1.00
## 181 181 1.00
## 182 182 1.00
## 183 183 1.00
## 184 184 1.00
## 185 185 1.00
## 186 186 1.00
## 187 187 1.00
## 188 188 1.00
## 189 189 1.00
## 190 190 1.00
## 191 191 1.00
## 192 192 1.00
## 193 193 1.00
## 194 194 1.00
## 195 195 1.00
## 196 196 1.00
## 197 197 1.00
## 198 198 1.00
## 199 199 1.00
## 200 200 1.00
## 201 201 1.00
## 202 202 1.00
## 203 203 1.00
## 204 204 1.00
## 205 205 1.00
## 206 206 1.00
## 207 207 1.00
## 208 208 1.00
## 209 209 1.00
## 210 210 1.00
## 211 211 1.00
## 212 212 1.00
## 213 213 1.00
## 214 214 1.00
## 215 215 1.00
## 216 216 1.00
## 217 217 1.00
## 218 218 1.00
## 219 219 1.00
## 220 220 1.00
## 221 221 1.00
## 222 222 1.00
## 223 223 1.00
## 224 224 1.00
## 225 225 1.00
## 226 226 1.00
## 227 227 1.00
## 228 228 1.00
## 229 229 1.00
## 230 230 1.00
## 231 231 1.00
## 232 232 1.00
## 233 233 1.00
## 234 234 1.00
## 235 235 1.00
## 236 236 1.00
## 237 237 1.00
## 238 238 1.00
## 239 239 1.00
## 240 240 1.00
## 241 241 1.00
## 242 242 1.00
## 243 243 1.00
## 244 244 1.00
## 245 245 1.00
## 246 246 1.00
## 247 247 1.00
## 248 248 1.00
## 249 249 1.00
## 250 250 1.00
## 251 251 1.00
## 252 252 1.00
## 253 253 1.00
## 254 254 1.00
## 255 255 1.00
## 256 256 1.00
## 257 257 1.00
## 258 258 1.00
## 259 259 1.00
## 260 260 1.00
## 261 261 1.00
## 262 262 1.00
## 263 263 1.00
## 264 264 1.00
## 265 265 1.00
## 266 266 1.00
## 267 267 1.00
## 268 268 1.00
## 269 269 1.00
## 270 270 1.00
## 271 271 1.00
## 272 272 1.00
## 273 273 1.00
## 274 274 1.00
## 275 275 1.00
## 276 276 1.00
## 277 277 1.00
## 278 278 1.00
## 279 279 1.00
## 280 280 1.00
## 281 281 1.00
## 282 282 1.00
## 283 283 1.00
## 284 284 1.00
## 285 285 1.00
## 286 286 1.00
## 287 287 1.00
## 288 288 1.00
## 289 289 1.00
## 290 290 1.00
## 291 291 1.00
## 292 292 1.00
## 293 293 1.00
## 294 294 1.00
## 295 295 1.00
## 296 296 1.00
## 297 297 1.00
## 298 298 1.00
## 299 299 1.00
## 300 300 1.00
## 301 301 1.00
## 302 302 1.00
## 303 303 1.00
## 304 304 1.00
## 305 305 1.00
## 306 306 1.00
## 307 307 1.00
## 308 308 1.00
## 309 309 1.00
## 310 310 1.00
## 311 311 1.00
## 312 312 1.00
## 313 313 1.00
## 314 314 1.00
## 315 315 1.00
## 316 316 1.00
## 317 317 1.00
## 318 318 1.00
## 319 319 1.00
## 320 320 1.00
## 321 321 1.00
## 322 322 1.00
## 323 323 1.00
## 324 324 1.00
## 325 325 1.00
## 326 326 1.00
## 327 327 1.00
## 328 328 1.00
## 329 329 1.00
## 330 330 1.00
## 331 331 1.00
## 332 332 1.00
## 333 333 1.00
## 334 334 1.00
## 335 335 1.00
## 336 336 1.00
## 337 337 1.00
## 338 338 1.00
## 339 339 1.00
## 340 340 1.00
## 341 341 1.00
## 342 342 1.00
## 343 343 1.00
## 344 344 1.00
## 345 345 1.00
## 346 346 1.00
## 347 347 1.00
## 348 348 1.00
## 349 349 1.00
## 350 350 1.00
## 351 351 1.00
## 352 352 1.00
## 353 353 1.00
## 354 354 1.00
## 355 355 1.00
## 356 356 1.00
## 357 357 1.00
## 358 358 1.00
## 359 359 1.00
## 360 360 1.00
## 361 361 1.00
## 362 362 1.00
## 363 363 1.00
## 364 364 1.00
## 365 365 1.00
# If there are more than 23 people in the room the chance is 50 percent that two people have the same birthday.