R markdown is a plain-text file format for integrating text and R code, and creating transparent, reproducible and interactive reports. An R markdown file (.Rmd) contains metadata, markdown and R code “chunks,”" and can be “knit” into numerous output types. Answer the test questions by adding R code to the fenced code areas below each item. There are questions that require a written answer that also need to be answered. Enter your comments in the space provided as shown below:
Answer: (Enter your answer here.)
Once completed, you will “knit” and submit the resulting .html document and the .Rmd file. The .html will present the output of your R code and your written answers, but your R code will not appear. Your R code will appear in the .Rmd file. The resulting .html document will be graded. Points assigned to each item appear in this template.
Before proceeding, look to the top of the .Rmd for the (YAML) metadata block, where the title, author and output are given. Please change author to include your name, with the format ‘lastName, firstName.’
If you encounter issues with knitting the .html, please send an email via Canvas to your TA.
Each code chunk is delineated by six (6) backticks; three (3) at the start and three (3) at the end. After the opening ticks, arguments are passed to the code chunk and in curly brackets. Please do not add or remove backticks, or modify the arguments or values inside the curly brackets. An example code chunk is included here:
# Comments are included in each code chunk, simply as prompts
#...R code placed here
#...R code placed here
R code only needs to be added inside the code chunks for each assignment item. However, there are questions that follow many assignment items. Enter your answers in the space provided. An example showing how to use the template and respond to a question follows.
Example Problem with Solution:
Use rbinom() to generate two random samples of size 10,000 from the binomial distribution. For the first sample, use p = 0.45 and n = 10. For the second sample, use p = 0.55 and n = 10. Convert the sample frequencies to sample proportions and compute the mean number of successes for each sample. Present these statistics.
set.seed(123)
sample.one <- table(rbinom(10000, 10, 0.45)) / 10000
sample.two <- table(rbinom(10000, 10, 0.55)) / 10000
successes <- seq(0, 10)
round(sum(sample.one*successes), digits = 1) # [1] 4.5
## [1] 4.5
round(sum(sample.two*successes), digits = 1) # [1] 5.5
## [1] 5.5
Question: How do the simulated expectations compare to calculated binomial expectations?
Answer: The calculated binomial expectations are 10(0.45) = 4.5 and 10(0.55) = 5.5. After rounding the simulated results, the same values are obtained.
Submit both the .Rmd and .html files for grading. You may remove the instructions and example problem above, but do not remove the YAML metadata block or the first, “setup” code chunk. Address the steps that appear below and answer all the questions. Be sure to address each question with code and comments as needed. You may use either base R functions or ggplot2 for the visualizations.
The following code chunk will:
Do not include package installation code in this document. Packages should be installed via the Console or ‘Packages’ tab. You will also need to download the abalones.csv from the course site to a known location on your machine. Unless a file.path() is specified, R will look to directory where this .Rmd is stored when knitting.
## 'data.frame': 1036 obs. of 8 variables:
## $ SEX : Factor w/ 3 levels "F","I","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ LENGTH: num 5.57 3.67 10.08 4.09 6.93 ...
## $ DIAM : num 4.09 2.62 7.35 3.15 4.83 ...
## $ HEIGHT: num 1.26 0.84 2.205 0.945 1.785 ...
## $ WHOLE : num 11.5 3.5 79.38 4.69 21.19 ...
## $ SHUCK : num 4.31 1.19 44 2.25 9.88 ...
## $ RINGS : int 6 4 6 3 6 6 5 6 5 6 ...
## $ CLASS : Factor w/ 5 levels "A1","A2","A3",..: 1 1 1 1 1 1 1 1 1 1 ...
(1)(a) (1 point) Use summary() to obtain and present descriptive statistics from mydata. Use table() to present a frequency table using CLASS and RINGS. There should be 115 cells in the table you present.
## SEX LENGTH DIAM HEIGHT WHOLE
## F:326 Min. : 2.73 Min. : 1.995 Min. :0.525 Min. : 1.625
## I:329 1st Qu.: 9.45 1st Qu.: 7.350 1st Qu.:2.415 1st Qu.: 56.484
## M:381 Median :11.45 Median : 8.925 Median :2.940 Median :101.344
## Mean :11.08 Mean : 8.622 Mean :2.947 Mean :105.832
## 3rd Qu.:13.02 3rd Qu.:10.185 3rd Qu.:3.570 3rd Qu.:150.319
## Max. :16.80 Max. :13.230 Max. :4.935 Max. :315.750
## SHUCK RINGS CLASS VOLUME
## Min. : 0.5625 Min. : 3.000 A1:108 Min. : 3.612
## 1st Qu.: 23.3006 1st Qu.: 8.000 A2:236 1st Qu.:163.545
## Median : 42.5700 Median : 9.000 A3:329 Median :307.363
## Mean : 45.4396 Mean : 9.993 A4:188 Mean :326.804
## 3rd Qu.: 64.2897 3rd Qu.:11.000 A5:175 3rd Qu.:463.264
## Max. :157.0800 Max. :25.000 Max. :995.673
## RATIO
## Min. :0.06734
## 1st Qu.:0.12241
## Median :0.13914
## Mean :0.14205
## 3rd Qu.:0.15911
## Max. :0.31176
##
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## A1 9 8 24 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## A2 0 0 0 0 91 145 0 0 0 0 0 0 0 0 0 0 0 0
## A3 0 0 0 0 0 0 182 147 0 0 0 0 0 0 0 0 0 0
## A4 0 0 0 0 0 0 0 0 125 63 0 0 0 0 0 0 0 0
## A5 0 0 0 0 0 0 0 0 0 0 48 35 27 15 13 8 8 6
##
## 21 22 23 24 25
## A1 0 0 0 0 0
## A2 0 0 0 0 0
## A3 0 0 0 0 0
## A4 0 0 0 0 0
## A5 4 1 7 2 1
Question (1 point): Briefly discuss the variable types and distributional implications such as potential skewness and outliers.
Answer: (Enter your answer here.) We have numerical, integers and factor variables. By looking at the summary our of data, we can get an idea of the distribution of the data. At first glance, the WHOLE, SHUCK, RINGS and VOLUME variables have max values substantially above the mean/median. This is interesting and perhaps worth looking into as these variables might have outliers and may be contributing to a skew to the right.
(1)(b) (1 point) Generate a table of counts using SEX and CLASS. Add margins to this table (Hint: There should be 15 cells in this table plus the marginal totals. Apply table() first, then pass the table object to addmargins() (Kabacoff Section 7.2 pages 144-147)). Lastly, present a barplot of these data; ignoring the marginal totals.
##
## A1 A2 A3 A4 A5
## F 5 41 121 82 77
## I 91 133 65 21 19
## M 12 62 143 85 79
##
## A1 A2 A3 A4 A5 Sum
## F 5 41 121 82 77 326
## I 91 133 65 21 19 329
## M 12 62 143 85 79 381
## Sum 108 236 329 188 175 1036
Essay Question (2 points): Discuss the sex distribution of abalones. What stands out about the distribution of abalones by CLASS?
Answer: (Enter your answer here.) The graph above tells us the distribution of the abalones by sex and by class. Class allows us to see the classification of the abalons by the number of rings in each with A1 being the youngest. It is clear that younger abalons will belong to A1 and A2 which can be appreciated in the graph. We notice that most of the females and males are in the middle of the distribution, and that both have roughly the same shape, but that males account for slightly more than females. One interesting thing we can appreciate from the graph is that there seems to be a number of infant cases in which they are older in age classified(CLASS) by the number of rings (classified as A4 and A5).
(1)(c) (1 point) Select a simple random sample of 200 observations from “mydata” and identify this sample as “work.” Use set.seed(123) prior to drawing this sample. Do not change the number 123. Note that sample() “takes a sample of the specified size from the elements of x.” We cannot sample directly from “mydata.” Instead, we need to sample from the integers, 1 to 1036, representing the rows of “mydata.” Then, select those rows from the data frame (Kabacoff Section 4.10.5 page 87).
Using “work”, construct a scatterplot matrix of variables 2-6 with plot(work[, 2:6]) (these are the continuous variables excluding VOLUME and RATIO). The sample “work” will not be used in the remainder of the assignment.
## SEX LENGTH DIAM HEIGHT WHOLE SHUCK RINGS CLASS VOLUME
## 415 F 11.0250 9.03000 2.83500 105.437500 54.603125 9 A3 282.240551
## 463 F 11.7600 9.24000 2.83500 100.312500 44.187500 9 A3 308.057904
## 179 I 8.1900 6.30000 2.10000 33.312500 13.812500 7 A2 108.353700
## 526 F 13.5450 10.71000 4.20000 199.856250 79.953750 12 A4 609.281190
## 195 I 9.9750 7.56000 2.62500 61.312500 25.625000 8 A2 197.953875
## 938 M 13.3350 10.50000 3.46500 162.307500 80.053750 12 A4 485.160638
## 665 M 5.6700 4.09500 1.68000 12.500000 4.812500 6 A1 39.007332
## 602 F 12.3900 9.34500 2.73000 141.562500 48.768750 13 A5 316.091822
## 709 M 10.9200 8.71500 3.67500 94.125000 31.927500 8 A2 349.741665
## 1011 M 11.1300 8.71500 2.73000 105.312500 34.375000 20 A5 264.804403
## 953 M 13.1250 10.29000 3.46500 142.353750 59.963750 11 A4 467.969906
## 348 F 11.5500 9.03000 3.15000 105.000000 49.375000 8 A2 328.533975
## 1017 M 12.9150 10.08000 3.99000 170.000000 66.312500 18 A5 519.430968
## 649 F 13.0200 10.92000 4.72500 147.937500 48.195000 23 A5 671.792940
## 989 M 13.4400 11.02500 3.88500 213.375000 94.421250 13 A5 575.663760
## 355 F 12.9150 10.08000 3.36000 156.562500 73.125000 8 A2 437.415552
## 840 M 12.9150 9.97500 3.57000 141.125000 59.338125 10 A3 459.912836
## 26 I 3.4650 2.52000 0.63000 2.687500 0.875000 3 A1 5.501034
## 519 F 13.7550 10.60500 4.09500 183.663750 88.580000 11 A4 597.344919
## 426 F 13.1250 10.08000 3.57000 169.062500 78.716875 10 A3 472.311000
## 1023 M 12.1800 9.66000 3.46500 153.437500 59.125000 16 A5 407.687742
## 766 M 12.3900 9.97500 3.46500 134.625000 56.244375 9 A3 428.240216
## 211 I 9.5550 7.03500 2.20500 50.687500 21.875000 8 A2 148.218832
## 932 M 13.1250 10.39500 3.88500 176.396250 87.036250 11 A4 530.047547
## 590 F 12.1800 9.55500 3.57000 113.437500 47.685000 13 A5 415.476243
## 593 F 13.0200 10.71000 4.30500 168.437500 60.881250 14 A5 600.307281
## 555 F 11.3400 8.82000 2.94000 102.637500 47.508750 11 A4 294.055272
## 871 M 9.6600 7.35000 2.52000 64.375000 27.720000 10 A3 178.922520
## 373 F 12.1800 9.13500 3.15000 104.250000 53.500000 8 A2 350.482545
## 844 M 12.6000 9.66000 3.15000 155.875000 66.020625 9 A3 383.405400
## 143 I 11.1300 8.82000 2.52000 74.562500 31.937500 7 A2 247.379832
## 544 F 9.2400 7.45500 2.41500 52.912500 20.406875 11 A4 166.355343
## 490 F 14.4900 12.18000 4.09500 207.250000 89.385000 10 A3 722.719179
## 621 F 13.4400 11.02500 4.51500 222.375000 57.821250 22 A5 669.014640
## 775 M 9.6600 7.66500 2.62500 58.375000 23.450625 10 A3 194.365238
## 905 M 14.2800 11.34000 3.99000 206.932500 87.771250 12 A4 646.121448
## 937 M 14.2800 11.44500 3.88500 213.180000 86.668750 11 A4 634.943421
## 842 M 12.3900 10.18500 3.25500 134.812500 56.120625 9 A3 410.755448
## 23 I 7.1400 5.35500 1.57500 22.500000 9.312500 6 A1 60.219653
## 923 M 12.8100 10.18500 3.67500 158.673750 66.640000 12 A4 479.476699
## 956 M 11.7600 9.66000 4.93500 107.036250 40.731250 12 A4 560.623896
## 309 I 11.8650 9.03000 2.83500 106.812500 37.737563 11 A4 303.744593
## 135 I 7.9800 5.98500 1.99500 30.375000 11.187500 7 A2 95.281799
## 821 M 12.2850 10.08000 3.88500 130.000000 53.707500 10 A3 481.090428
## 997 M 12.2850 8.50500 3.15000 157.062500 54.375000 15 A5 329.124364
## 224 I 10.7100 8.29500 2.20500 69.062500 29.250000 8 A2 195.890987
## 166 I 8.9250 6.51000 1.99500 43.812500 20.562500 8 A2 115.912991
## 217 I 9.5550 9.13500 2.31000 53.312500 24.375000 8 A2 201.628177
## 290 I 10.9200 8.61000 3.04500 80.750000 36.321250 9 A3 286.294554
## 581 F 13.2300 9.97500 3.67500 177.875000 52.976250 14 A5 484.986994
## 72 I 3.3600 2.31000 0.52500 2.250000 0.812500 3 A1 4.074840
## 588 F 15.4350 11.86500 4.72500 254.625000 110.925000 13 A5 865.318899
## 575 F 14.0700 11.55000 3.99000 177.288750 69.846875 12 A4 648.408915
## 141 I 5.5650 4.09500 1.15500 10.500000 4.562500 7 A2 26.320920
## 722 M 11.4450 9.45000 3.15000 97.562500 46.963125 8 A2 340.689037
## 865 M 11.1300 8.71500 2.52000 88.250000 41.518125 9 A3 244.434834
## 859 M 12.6000 10.08000 3.46500 114.562500 51.170625 9 A3 440.082720
## 153 I 10.1850 7.87500 2.94000 65.125000 25.000000 8 A2 235.808213
## 294 I 8.7150 6.82500 2.41500 41.062500 16.517531 12 A4 143.643898
## 277 I 11.7600 8.92500 2.83500 102.562500 45.508750 9 A3 297.555930
## 1035 M 12.3900 10.50000 4.20000 148.375000 51.500000 16 A5 546.399000
## 41 I 6.3000 4.62000 1.36500 15.437500 7.375000 5 A1 39.729690
## 431 F 13.6500 10.29000 3.25500 140.250000 68.806250 9 A3 457.192417
## 90 I 4.5150 3.57000 1.15500 7.562500 2.562500 6 A1 18.616925
## 316 I 11.0292 8.48400 3.07545 82.500000 31.072125 13 A5 287.775186
## 223 I 7.8750 5.98500 1.89000 32.125000 13.062500 7 A2 89.079244
## 528 F 12.6000 9.76500 3.36000 144.457500 59.997500 11 A4 413.411040
## 116 I 10.9200 8.61000 2.52000 74.375000 29.812500 8 A2 236.933424
## 606 F 11.3400 9.13500 3.67500 111.500000 41.055000 13 A5 380.696557
## 774 M 7.2450 5.67000 1.99500 24.625000 8.229375 9 A3 81.952904
## 747 M 13.5450 10.71000 4.09500 153.250000 72.826875 10 A3 594.049160
## 456 F 12.0750 9.97500 3.36000 111.875000 45.513125 9 A3 404.705700
## 598 F 9.7650 7.98000 2.83500 72.375000 26.520000 14 A5 220.916525
## 854 M 12.8100 9.87000 3.46500 131.500000 61.627500 9 A3 438.096235
## 39 I 5.7750 4.62000 1.68000 17.062500 7.062500 6 A1 44.823240
## 159 I 11.0250 8.40000 2.73000 80.687500 40.625000 8 A2 252.825300
## 752 M 11.7600 9.76500 3.04500 110.937500 41.394375 9 A3 349.676838
## 209 I 8.7150 6.82500 2.10000 41.687500 18.062500 7 A2 124.907738
## 374 F 10.6050 8.19000 2.41500 82.500000 38.062500 8 A2 209.754704
## 818 M 13.6500 10.92000 3.25500 171.000000 76.539375 9 A3 485.183790
## 34 I 7.6650 5.67000 1.78500 24.625000 10.187500 6 A1 77.577082
## 516 F 14.2800 10.81500 3.67500 206.358750 65.984375 12 A4 567.560385
## 13 I 4.5150 3.25500 1.26000 6.759375 2.625000 5 A1 18.517369
## 69 I 6.3000 4.83000 1.57500 15.875000 6.500000 6 A1 47.925675
## 895 M 10.1850 8.08500 2.62500 60.881250 24.500000 12 A4 216.157528
## 755 M 7.8750 5.88000 1.99500 27.812500 10.828125 10 A3 92.378475
## 409 F 10.7100 8.40000 2.52000 87.562500 43.808750 10 A3 226.709280
## 308 I 11.8650 9.24000 3.67500 109.187500 48.670875 11 A4 402.899805
## 278 I 10.5000 7.98000 2.83500 74.250000 36.076250 9 A3 237.544650
## 89 I 3.3600 2.31000 0.52500 2.437500 0.937500 4 A1 4.074840
## 928 M 15.5400 12.49500 3.99000 296.246250 140.813750 11 A4 774.747477
## 537 F 13.5450 10.60500 3.46500 168.045000 70.812500 11 A4 497.728972
## 291 I 11.5500 9.34500 3.04500 97.875000 35.797781 11 A4 328.661314
## 424 F 12.4950 9.76500 3.15000 134.562500 61.988750 9 A3 384.343076
## 880 M 13.0200 9.76500 3.99000 171.041250 69.886250 11 A4 507.289797
## 286 I 12.3900 9.34500 2.83500 96.437500 40.180000 9 A3 328.249199
## 908 M 13.4400 10.60500 3.78000 165.367500 72.275000 11 A4 538.767936
## 671 M 8.8200 7.14000 2.41500 52.687500 21.656250 8 A2 152.084142
## 121 I 8.1900 5.88000 1.89000 26.875000 10.562500 8 A2 91.017108
## 110 I 10.7100 8.08500 3.04500 95.812500 49.812500 8 A2 263.667616
## 158 I 7.2450 5.67000 2.31000 26.687500 10.250000 7 A2 94.892837
## 64 I 8.8200 6.61500 2.10000 42.937500 19.625000 6 A1 122.523030
## 483 F 13.0200 9.97500 3.36000 165.562500 86.670625 9 A3 436.378320
## 910 M 15.7500 11.55000 3.78000 241.357500 115.395000 11 A4 687.629250
## 477 F 13.6500 10.50000 3.99000 183.000000 80.989375 9 A3 571.866750
## 480 F 12.7050 9.55500 3.04500 122.187500 59.085000 9 A3 369.651657
## 711 M 8.8200 7.03500 2.41500 46.125000 21.161250 8 A2 149.847611
## 67 I 5.0400 3.67500 0.94500 9.656250 3.937500 5 A1 17.503290
## 663 M 10.5000 8.19000 2.83500 82.437500 39.312500 6 A1 243.795825
## 890 M 11.8650 9.24000 2.41500 117.108750 49.490000 11 A4 264.762729
## 847 M 11.4450 8.61000 2.94000 92.125000 43.188750 9 A3 289.711863
## 85 I 6.1950 4.83000 1.68000 20.312500 8.125000 5 A1 50.268708
## 165 I 10.5000 8.08500 2.52000 70.000000 35.437500 8 A2 213.929100
## 648 F 13.0200 9.87000 4.72500 139.375000 48.195000 15 A5 607.197465
## 51 I 7.6650 5.67000 1.78500 23.437500 10.125000 6 A1 77.577082
## 74 I 7.9800 5.88000 1.78500 34.187500 14.375000 6 A1 83.756484
## 178 I 10.1850 8.08500 2.73000 74.996875 31.312500 7 A2 224.803829
## 362 F 12.8100 9.87000 3.36000 134.312500 61.562500 8 A2 424.820592
## 236 I 11.0250 8.40000 3.04500 76.187500 30.380000 9 A3 281.997450
## 610 F 9.1350 7.35000 2.31000 48.000000 18.232500 13 A5 155.098597
## 330 F 9.7650 7.35000 2.62500 60.250000 28.750000 6 A1 188.403469
## 726 M 8.8200 7.24500 2.20500 53.750000 21.656250 7 A2 140.901485
## 127 I 9.0300 6.61500 2.41500 48.000000 23.562500 8 A2 144.256282
## 212 I 7.4550 5.46000 1.89000 24.812500 8.937500 7 A2 76.931127
## 686 M 12.0750 9.45000 3.46500 120.687500 61.627500 8 A2 395.386819
## 785 M 11.9700 9.45000 3.25500 149.375000 69.609375 10 A3 368.194208
## 814 M 13.1250 10.39500 3.25500 128.125000 56.925000 9 A3 444.093891
## 310 I 13.0200 9.87000 3.25500 120.750000 52.550438 11 A4 418.291587
## 744 M 11.7600 9.03000 3.04500 112.437500 57.420000 9 A3 323.357076
## 878 M 15.1200 11.76000 3.78000 202.278750 84.647500 11 A4 672.126336
## 243 I 8.5050 6.61500 2.20500 43.375000 19.661250 9 A3 124.054568
## 862 M 8.9250 6.82500 2.52000 46.937500 17.572500 9 A3 153.501075
## 926 M 9.0300 7.24500 2.41500 38.823750 11.331250 11 A4 157.994975
## 792 M 10.0800 7.87500 3.04500 97.125000 26.730000 9 A3 241.712100
## 113 I 12.0750 9.03000 2.73000 92.812500 36.187500 8 A2 297.671692
## 619 F 13.9650 11.23500 3.99000 187.000000 73.631250 17 A5 626.018132
## 1013 M 11.4450 8.61000 3.04500 109.125000 37.937500 18 A5 300.058715
## 151 I 9.1350 6.61500 2.31000 46.062500 20.187500 7 A2 139.588738
## 666 M 9.4500 7.03500 2.62500 58.259375 27.062500 6 A1 174.511969
## 614 F 14.1750 11.65500 4.30500 240.625000 90.907500 13 A5 711.227436
## 767 M 12.2850 9.97500 3.15000 133.125000 65.773125 10 A3 386.010056
## 160 I 8.9250 6.61500 1.99500 45.937500 23.312500 7 A2 117.782556
## 391 F 11.1300 8.61000 3.04500 106.572812 47.343750 9 A3 291.800219
## 155 I 11.3400 8.19000 2.62500 78.187500 31.562500 8 A2 243.795825
## 1024 M 14.1750 11.65500 4.20000 179.812500 68.125000 21 A5 693.880425
## 5 I 6.9300 4.83000 1.78500 21.187500 9.875000 6 A1 59.747341
## 326 I 11.0292 8.59005 2.96940 72.187500 23.765000 15 A5 281.325052
## 784 M 13.1250 9.55500 3.57000 135.250000 61.318125 9 A3 447.711469
## 280 I 12.8100 9.76500 3.15000 120.062500 55.063750 9 A3 394.032398
## 800 M 9.8700 7.56000 2.83500 62.625000 20.604375 10 A3 211.539762
## 789 M 13.5450 10.50000 4.09500 175.125000 76.291875 10 A3 582.401138
## 567 F 13.3350 10.18500 3.46500 161.861250 72.550625 11 A4 470.605818
## 843 M 7.9800 6.09000 2.52000 35.375000 14.540625 9 A3 122.467464
## 238 I 10.8150 8.29500 2.62500 72.562500 28.971250 9 A3 235.489866
## 764 M 12.0750 9.34500 3.36000 104.875000 49.561875 9 A3 379.145340
## 339 F 13.3350 10.50000 3.99000 161.250000 74.125000 8 A2 558.669825
## 962 M 15.7500 11.65500 4.51500 275.125000 131.360625 13 A5 828.801619
## 822 M 12.4950 9.87000 3.25500 150.187500 60.885000 10 A3 401.424991
## 137 I 8.2950 6.51000 1.78500 39.625000 19.125000 7 A2 96.390803
## 455 F 12.7050 9.55500 3.04500 107.750000 42.167500 9 A3 369.651657
## 738 M 11.0250 8.50500 2.83500 94.687500 40.899375 10 A3 265.831217
## 560 F 14.8050 11.76000 3.57000 185.831250 78.151250 11 A4 621.561276
## 589 F 10.7100 8.19000 2.20500 76.500000 23.842500 13 A5 193.411355
## 83 I 9.8700 7.35000 2.62500 53.937500 23.750000 6 A1 190.429312
## 696 M 9.5550 7.35000 2.31000 57.250000 24.750000 8 A2 162.229568
## 942 M 13.7550 10.92000 3.78000 190.230000 88.016250 11 A4 567.773388
## 196 I 9.4500 7.35000 2.73000 68.375000 30.625000 8 A2 189.618975
## 769 M 7.1400 5.56500 1.78500 24.205000 9.528750 10 A3 70.925368
## 680 M 11.2350 8.50500 2.94000 91.437500 41.580000 7 A2 280.927804
## 941 M 13.7550 10.71000 4.51500 227.396250 108.841250 11 A4 665.131966
## 968 M 13.0200 10.18500 3.25500 128.687500 52.593750 13 A5 431.641318
## 500 F 13.8600 10.50000 3.46500 151.788750 59.031875 12 A4 504.261450
## 889 M 13.6500 11.02500 3.99000 174.483750 73.193750 11 A4 600.460088
## 344 F 11.5500 9.24000 2.83500 105.437500 54.250000 8 A2 302.556870
## 909 M 8.0850 6.51000 2.10000 36.273750 13.046250 11 A4 110.530035
## 459 F 10.2900 7.66500 3.04500 79.312500 25.186875 10 A3 240.167828
## 20 I 8.4000 6.09000 2.10000 33.437500 15.062500 5 A1 107.427600
## 1032 M 12.7050 9.87000 2.41500 139.250000 49.062500 15 A5 302.837015
## 164 I 9.5550 7.35000 2.83500 67.062500 35.687500 7 A2 199.099924
## 52 I 9.8700 7.45500 2.52000 46.062500 15.750000 6 A1 185.423742
## 534 F 15.2250 11.97000 4.30500 206.486250 95.790000 11 A4 784.557191
## 177 I 10.5000 8.40000 2.52000 77.000000 32.625000 8 A2 222.264000
## 554 F 15.1200 12.07500 4.51500 267.750000 110.274375 12 A4 824.321610
## 827 M 12.4950 9.97500 2.94000 128.812500 60.946875 10 A3 366.434617
## 84 I 9.8700 7.98000 2.62500 60.562500 26.375000 6 A1 206.751825
## 523 F 14.1750 11.34000 4.41000 203.107500 88.322500 11 A4 708.883245
## 633 F 13.8600 10.92000 4.20000 209.500000 85.807500 17 A5 635.675040
## 392 F 13.2300 10.60500 4.09500 163.250000 65.145000 9 A3 574.545494
## 302 I 12.0750 9.45000 2.83500 103.062500 39.677344 11 A4 323.498306
## 597 F 14.8050 11.44500 3.78000 192.437500 77.456250 13 A5 640.495390
## 706 M 9.2400 6.82500 1.68000 51.625000 17.820000 8 A2 105.945840
## 901 M 13.9650 11.02500 3.78000 182.197500 82.258750 12 A4 581.984392
## 874 M 9.8700 7.87500 2.52000 70.953750 27.685000 12 A4 195.870150
## 430 F 12.0750 10.08000 3.46500 134.750000 64.513750 9 A3 421.745940
## 710 M 8.7150 6.61500 2.52000 50.187500 24.626250 8 A2 145.277307
## 761 M 14.4900 11.13000 3.99000 199.437500 83.902500 10 A3 643.482063
## 712 M 7.3500 5.56500 1.89000 28.187500 12.313125 7 A2 77.306197
## 428 F 12.7050 10.29000 3.15000 141.812500 66.470625 9 A3 411.813517
## 672 M 6.5100 4.72500 1.68000 16.812500 6.682500 7 A2 51.676380
## 250 I 13.2300 9.97500 3.04500 132.562500 63.271250 10 A3 401.846366
## RATIO
## 415 0.19346308
## 463 0.14343894
## 179 0.12747603
## 526 0.13122636
## 195 0.12944935
## 938 0.16500463
## 665 0.12337424
## 602 0.15428666
## 709 0.09128881
## 1011 0.12981280
## 953 0.12813591
## 348 0.15028887
## 1017 0.12766374
## 649 0.07174086
## 989 0.16402153
## 355 0.16717513
## 840 0.12902037
## 26 0.15906101
## 519 0.14828953
## 426 0.16666323
## 1023 0.14502521
## 766 0.13133838
## 211 0.14758583
## 932 0.16420461
## 590 0.11477191
## 593 0.10141681
## 555 0.16156401
## 871 0.15492740
## 373 0.15264669
## 844 0.17219534
## 143 0.12910309
## 544 0.12267039
## 490 0.12367874
## 621 0.08642748
## 775 0.12065236
## 905 0.13584327
## 937 0.13649838
## 842 0.13662783
## 23 0.15464221
## 923 0.13898486
## 956 0.07265343
## 309 0.12424110
## 135 0.11741487
## 821 0.11163702
## 997 0.16521111
## 224 0.14931774
## 166 0.17739599
## 217 0.12089084
## 290 0.12686672
## 581 0.10923231
## 72 0.19939433
## 588 0.12818973
## 575 0.10772041
## 141 0.17334121
## 722 0.13784748
## 865 0.16985355
## 859 0.11627502
## 153 0.10601836
## 294 0.11498944
## 277 0.15294184
## 1035 0.09425347
## 41 0.18562944
## 431 0.15049736
## 90 0.13764357
## 316 0.10797361
## 223 0.14663910
## 528 0.14512796
## 116 0.12582649
## 606 0.10784179
## 774 0.10041590
## 747 0.12259402
## 456 0.11245981
## 598 0.12004534
## 854 0.14067115
## 39 0.15756335
## 159 0.16068408
## 752 0.11837894
## 209 0.14460673
## 374 0.18146196
## 818 0.15775336
## 34 0.13132100
## 516 0.11625966
## 13 0.14175880
## 69 0.13562668
## 895 0.11334327
## 755 0.11721481
## 409 0.19323757
## 308 0.12080144
## 278 0.15187145
## 89 0.23007038
## 928 0.18175438
## 537 0.14227120
## 291 0.10891997
## 424 0.16128494
## 880 0.13776396
## 286 0.12240700
## 908 0.13414867
## 671 0.14239650
## 121 0.11604961
## 110 0.18892157
## 158 0.10801658
## 64 0.16017397
## 483 0.19861350
## 910 0.16781572
## 477 0.14162281
## 480 0.15983967
## 711 0.14121847
## 67 0.22495771
## 663 0.16125174
## 890 0.18692208
## 847 0.14907484
## 85 0.16163137
## 165 0.16565068
## 648 0.07937286
## 51 0.13051535
## 74 0.17162850
## 178 0.13928811
## 362 0.14491411
## 236 0.10773147
## 610 0.11755425
## 330 0.15259804
## 726 0.15369781
## 127 0.16333777
## 212 0.11617534
## 686 0.15586635
## 785 0.18905614
## 814 0.12818235
## 310 0.12563111
## 744 0.17757459
## 878 0.12593986
## 243 0.15848872
## 862 0.11447803
## 926 0.07171905
## 792 0.11058611
## 113 0.12156850
## 619 0.11761840
## 1013 0.12643359
## 151 0.14462127
## 666 0.15507532
## 614 0.12781776
## 767 0.17039226
## 160 0.19792829
## 391 0.16224714
## 155 0.12946284
## 1024 0.09817974
## 5 0.16527932
## 326 0.08447524
## 784 0.13695902
## 280 0.13974422
## 800 0.09740190
## 789 0.13099541
## 567 0.15416432
## 843 0.11873051
## 238 0.12302546
## 764 0.13071999
## 339 0.13268123
## 962 0.15849465
## 822 0.15167217
## 137 0.19841104
## 455 0.11407361
## 738 0.15385467
## 560 0.12573378
## 589 0.12327353
## 83 0.12471819
## 696 0.15256159
## 942 0.15502003
## 196 0.16150810
## 769 0.13434897
## 680 0.14800956
## 941 0.16363858
## 968 0.12184596
## 500 0.11706601
## 889 0.12189611
## 344 0.17930513
## 909 0.11803353
## 459 0.10487198
## 20 0.14021071
## 1032 0.16200959
## 164 0.17924417
## 52 0.08494058
## 534 0.12209435
## 177 0.14678490
## 554 0.13377591
## 827 0.16632401
## 84 0.12756840
## 523 0.12459386
## 633 0.13498642
## 392 0.11338528
## 302 0.12265085
## 597 0.12093178
## 706 0.16819915
## 901 0.14134185
## 874 0.14134364
## 430 0.15296828
## 710 0.16951202
## 761 0.13038825
## 712 0.15927733
## 428 0.16140953
## 672 0.12931440
## 250 0.15745134
(2)(a) (1 point) Use “mydata” to plot WHOLE versus VOLUME. Color code data points by CLASS.
(2)(b) (2 points) Use “mydata” to plot SHUCK versus WHOLE with WHOLE on the horizontal axis. Color code data points by CLASS. As an aid to interpretation, determine the maximum value of the ratio of SHUCK to WHOLE. Add to the chart a straight line with zero intercept using this maximum value as the slope of the line. If you are using the ‘base R’ plot() function, you may use abline() to add this line to the plot. Use help(abline) in R to determine the coding for the slope and intercept arguments in the functions. If you are using ggplot2 for visualizations, geom_abline() should be used.
Essay Question (2 points): How does the variability in this plot differ from the plot in (a)? Compare the two displays. Keep in mind that SHUCK is a part of WHOLE. Consider the location of the different age classes.
Answer: (Enter your answer here.) There is more vaiability in plot (a) compared to plot (b). Plot (a) shows that there are two max outliers associated with A1/Infants that may be skewing the results. Also, plot (b) shows the max ratio of shuck to whole as depicted by a straight line, however, most of the abalones fall under this line which could indicate that waiting for older abalones (A3 - A5) may not yield more shuck weight. —–
(3)(a) (2 points) Use “mydata” to create a multi-figured plot with histograms, boxplots and Q-Q plots of RATIO differentiated by sex. This can be done using par(mfrow = c(3,3)) and base R or grid.arrange() and ggplot2. The first row would show the histograms, the second row the boxplots and the third row the Q-Q plots. Be sure these displays are legible.
Essay Question (2 points): Compare the displays. How do the distributions compare to normality? Take into account the criteria discussed in the sync sessions to evaluate non-normality.
Answer: (Enter your answer here.) Given the theoritical quantiles, it seems that the graphs correspond well to a standard normal distribution. However, it seems that for all three sex class, the values seem to be deviating from normality on the upper right hand side of the graph. This can also be seen in the histograms for all sex classes where the histogram exhibit a right skew led by outliers. We can witness the presence of outliers given in the boxplots. By looking at the boxplots and analyzing the QQ-plots, we can see that there seems to be a larger presence of outliers in the Infant group.
(3)(b) (2 points) Use the boxplots to identify RATIO outliers (mild and extreme both) for each sex. Present the abalones with these outlying RATIO values along with their associated variables in “mydata” (Hint: display the observations by passing a data frame to the kable() function).
Essay Question (2 points): What are your observations regarding the results in (3)(b)?
Answer: (Enter your answer here.) There are mild outliers in all sex classes. However, there only are cases of extreme outliers in the female and infant categories. The presence of outliers was initially suspected by the examination of the plots above. It is a great that extreme outliers were identified in some of the categories. —–
(4)(a) (3 points) With “mydata,” display side-by-side boxplots for VOLUME and WHOLE, each differentiated by CLASS There should be five boxes for VOLUME and five for WHOLE. Also, display side-by-side scatterplots: VOLUME and WHOLE versus RINGS. Present these four figures in one graphic: the boxplots in one row and the scatterplots in a second row. Base R or ggplot2 may be used.
Essay Question (5 points) How well do you think these variables would perform as predictors of age? Explain.
Answer: (Enter your answer here.) The variables Volume, Whole and Rings are good predictors of age. Since the more rings an abalone has, the greaters its age; we can appreciate that these variables have an upwards trend. Both volume and Whole increase as the abalones have more rings. However, it is likely that the explanatory power of the Class variable is smaller compared to the others as we noticed earlier that a number of infants were classified as class 4 and 5. —–
(5)(a) (2 points) Use aggregate() with “mydata” to compute the mean values of VOLUME, SHUCK and RATIO for each combination of SEX and CLASS. Then, using matrix(), create matrices of the mean values. Using the “dimnames” argument within matrix() or the rownames() and colnames() functions on the matrices, label the rows by SEX and columns by CLASS. Present the three matrices (Kabacoff Section 5.6.2, p. 110-111). The kable() function is useful for this purpose. You do not need to be concerned with the number of digits presented.
## A1 A2 A3 A4 A5
## F 0.1546644 0.1569554 0.1512698 0.1554605 0.1475600
## I 0.1564017 0.1450304 0.1372256 0.1462123 0.1379609
## M 0.1244413 0.1364881 0.1233605 0.1167649 0.1262089
## A1 A2 A3 A4 A5
## F 255.2994 66.51618 103.7232 276.8573 160.3200
## I 245.3857 412.60794 270.7406 358.1181 498.0489
## M 316.4129 442.61552 486.1525 318.6930 440.2074
## A1 A2 A3 A4 A5
## F 38.90000 10.11332 16.39583 42.50305 23.41024
## I 38.33855 59.69121 37.17969 52.96933 69.05161
## M 39.85369 61.42726 59.17076 36.47047 55.02762
(5)(b) (3 points) Present three graphs. Each graph should include three lines, one for each sex. The first should show mean RATIO versus CLASS; the second, mean VOLUME versus CLASS; the third, mean SHUCK versus CLASS. This may be done with the ‘base R’ interaction.plot() function or with ggplot2 using grid.arrange().
Essay Question (2 points): What questions do these plots raise? Consider aging and sex differences.
Answer: (Enter your answer here.) It seems that as the abalones age, the proportion of meat as given by the mean ratio, drops considerably. This means that there is less meat relative to the size of the abalone. We can also appreciate that the volume of females is larger than males as the abalones age. It also seems that both females and males peak in size/volume in class 4 as opposed to infants, which increase slightly as they age to class 5. We also see that the mean shuck for all sex classes peakes at class 4. This raises the question as to whether it is better to eat the abalones in class 4 as the shuck does not grow anymore past this point. Furthermore, by looking at the mean ratio, it is clear that the proportion of meat is largerst when all abalones are at Class 2. Would it be more profitable if producers/growers sell abalones for consumption once they reach this stage as they could start growing new ones and the proportion of meat to size does not really grow from here?
5(c) (3 points) Present four boxplots using par(mfrow = c(2, 2) or grid.arrange(). The first line should show VOLUME by RINGS for the infants and, separately, for the adult; factor levels “M” and “F,” combined. The second line should show WHOLE by RINGS for the infants and, separately, for the adults. Since the data are sparse beyond 15 rings, limit the displays to less than 16 rings. One way to accomplish this is to generate a new data set using subset() to select RINGS < 16. Use ylim = c(0, 1100) for VOLUME and ylim = c(0, 400) for WHOLE. If you wish to reorder the displays for presentation purposes or use ggplot2 go ahead.
Essay Question (2 points): What do these displays suggest about abalone growth? Also, compare the infant and adult displays. What differences stand out?
Answer: (Enter your answer here.) These graphs suggest that the growth of adult abalones varies more than infant abalones. We can see this by looking at the longer whiskers exhibited by the adult graphs as opposed to shorter whiskers of infants. This suggests that both adult volume and adult whole tends to be more dispersed as the number of rings grows. The graphs also suggest that Volume and Whole also tend to increase with the number of rings. —–
Conclusions
Essay Question 1) (5 points) Based solely on these data, what are plausible statistical reasons that explain the failure of the original study? Consider to what extent physical measurements may be used for age prediction.
Answer: (Enter your answer here.) The presence of outliers may be affecting the data which contributes to the right skew witnessed. This should be addressed in order to achieve normality of the data and an approximation of the normal distribution. A possible measure that needs to be addressed is the Infant category and how this is measured. It is known that the sexing of the abalons is difficult, therefore this could be a category prone to error measurement. The idea behind this point is that we saw that infants can still have upwards of 10 rings. This bears the question of how infants are classified as the number of rings is a sign of age. Another point to consider is correlation between the variables as some of them are a function of the other. The model for age prediction might run into multicollinearity. Physical measurements may be used for age prediction to a certain extent. Weight, Sex, Volume are definitely a good measures. However, we would need to add some measures for controlling the environmentral effect on growth. Abalone growth is a function of its environment such as pollution, food and others. Therefore to get a good measure of which physical variables are good predictors of age, measures of environment should be added.
Essay Question 2) (3 points) Do not refer to the abalone data or study. If you were presented with an overall histogram and summary statistics from a sample of some population or phenomenon and no other information, what questions might you ask before accepting them as representative of the sampled population or phenomenon?
Answer: (Enter your answer here.) Questions I would ask include: How long has the study been going on? How was the sample obtained? Where was the sample obtained? Is the sample size big enough to be representative of the population? Is there a presence of outliers?, How many data points are being given in the sample? What is the mean of the sample distribution? What is the standard deviation of the sample distribution? Is the data symmetrical about its mean? I would examine the measures of central tendency.
Essay Question 3) (3 points) Do not refer to the abalone data or study. What do you see as difficulties analyzing data derived from observational studies? Can causality be determined? What might be learned from such studies?
Answer: (Enter your answer here.) Data from observational studies have the disadvantage of having human/measurement errors as the measured values have to be manually recorded. It is also possible that observational studies may lead to biases. Data correlation may be detected and may be interpreted as causality without investigating the underlying factors affecting such relationships. It is possible that a purely observational study may yield some causality in some cases but it is prone to biases and erroneous causality. From observational studies we can learn about relationship between the independent and the dependent variables. From here, we can gather some insights into underlying relationships/other variables that might be affectign some of our variables. We might discover other variables which could perhaps have a strong correlation and a causality effect.