R markdown is a plain-text file format for integrating text and R code, and creating transparent, reproducible and interactive reports. An R markdown file (.Rmd) contains metadata, markdown and R code “chunks,”" and can be “knit” into numerous output types. Answer the test questions by adding R code to the fenced code areas below each item. There are questions that require a written answer that also need to be answered. Enter your comments in the space provided as shown below:

Answer: (Enter your answer here.)

Once completed, you will “knit” and submit the resulting .html document and the .Rmd file. The .html will present the output of your R code and your written answers, but your R code will not appear. Your R code will appear in the .Rmd file. The resulting .html document will be graded. Points assigned to each item appear in this template.

Before proceeding, look to the top of the .Rmd for the (YAML) metadata block, where the title, author and output are given. Please change author to include your name, with the format ‘lastName, firstName.’

If you encounter issues with knitting the .html, please send an email via Canvas to your TA.

Each code chunk is delineated by six (6) backticks; three (3) at the start and three (3) at the end. After the opening ticks, arguments are passed to the code chunk and in curly brackets. Please do not add or remove backticks, or modify the arguments or values inside the curly brackets. An example code chunk is included here:

# Comments are included in each code chunk, simply as prompts

#...R code placed here

#...R code placed here

R code only needs to be added inside the code chunks for each assignment item. However, there are questions that follow many assignment items. Enter your answers in the space provided. An example showing how to use the template and respond to a question follows.


Example Problem with Solution:

Use rbinom() to generate two random samples of size 10,000 from the binomial distribution. For the first sample, use p = 0.45 and n = 10. For the second sample, use p = 0.55 and n = 10. Convert the sample frequencies to sample proportions and compute the mean number of successes for each sample. Present these statistics.

set.seed(123)
sample.one <- table(rbinom(10000, 10, 0.45)) / 10000
sample.two <- table(rbinom(10000, 10, 0.55)) / 10000

successes <- seq(0, 10)

round(sum(sample.one*successes), digits = 1) # [1] 4.5
## [1] 4.5
round(sum(sample.two*successes), digits = 1) # [1] 5.5
## [1] 5.5

Question: How do the simulated expectations compare to calculated binomial expectations?

Answer: The calculated binomial expectations are 10(0.45) = 4.5 and 10(0.55) = 5.5. After rounding the simulated results, the same values are obtained.


Submit both the .Rmd and .html files for grading. You may remove the instructions and example problem above, but do not remove the YAML metadata block or the first, “setup” code chunk. Address the steps that appear below and answer all the questions. Be sure to address each question with code and comments as needed. You may use either base R functions or ggplot2 for the visualizations.


The following code chunk will:

  1. load the “ggplot2”, “gridExtra” and “knitr” packages, assuming each has been installed on your machine,
  2. read-in the abalones dataset, defining a new data frame, “mydata,”
  3. return the structure of that data frame, and
  4. calculate new variables, VOLUME and RATIO.

Do not include package installation code in this document. Packages should be installed via the Console or ‘Packages’ tab. You will also need to download the abalones.csv from the course site to a known location on your machine. Unless a file.path() is specified, R will look to directory where this .Rmd is stored when knitting.

## 'data.frame':    1036 obs. of  8 variables:
##  $ SEX   : Factor w/ 3 levels "F","I","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ LENGTH: num  5.57 3.67 10.08 4.09 6.93 ...
##  $ DIAM  : num  4.09 2.62 7.35 3.15 4.83 ...
##  $ HEIGHT: num  1.26 0.84 2.205 0.945 1.785 ...
##  $ WHOLE : num  11.5 3.5 79.38 4.69 21.19 ...
##  $ SHUCK : num  4.31 1.19 44 2.25 9.88 ...
##  $ RINGS : int  6 4 6 3 6 6 5 6 5 6 ...
##  $ CLASS : Factor w/ 5 levels "A1","A2","A3",..: 1 1 1 1 1 1 1 1 1 1 ...

Test Items starts from here - There are 6 sections

Section 1: (6 points) Summarizing the data.

(1)(a) (1 point) Use summary() to obtain and present descriptive statistics from mydata. Use table() to present a frequency table using CLASS and RINGS. There should be 115 cells in the table you present.

##  SEX         LENGTH           DIAM            HEIGHT          WHOLE        
##  F:326   Min.   : 2.73   Min.   : 1.995   Min.   :0.525   Min.   :  1.625  
##  I:329   1st Qu.: 9.45   1st Qu.: 7.350   1st Qu.:2.415   1st Qu.: 56.484  
##  M:381   Median :11.45   Median : 8.925   Median :2.940   Median :101.344  
##          Mean   :11.08   Mean   : 8.622   Mean   :2.947   Mean   :105.832  
##          3rd Qu.:13.02   3rd Qu.:10.185   3rd Qu.:3.570   3rd Qu.:150.319  
##          Max.   :16.80   Max.   :13.230   Max.   :4.935   Max.   :315.750  
##      SHUCK              RINGS        CLASS        VOLUME       
##  Min.   :  0.5625   Min.   : 3.000   A1:108   Min.   :  3.612  
##  1st Qu.: 23.3006   1st Qu.: 8.000   A2:236   1st Qu.:163.545  
##  Median : 42.5700   Median : 9.000   A3:329   Median :307.363  
##  Mean   : 45.4396   Mean   : 9.993   A4:188   Mean   :326.804  
##  3rd Qu.: 64.2897   3rd Qu.:11.000   A5:175   3rd Qu.:463.264  
##  Max.   :157.0800   Max.   :25.000            Max.   :995.673  
##      RATIO        
##  Min.   :0.06734  
##  1st Qu.:0.12241  
##  Median :0.13914  
##  Mean   :0.14205  
##  3rd Qu.:0.15911  
##  Max.   :0.31176
##     
##        3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
##   A1   9   8  24  67   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##   A2   0   0   0   0  91 145   0   0   0   0   0   0   0   0   0   0   0   0
##   A3   0   0   0   0   0   0 182 147   0   0   0   0   0   0   0   0   0   0
##   A4   0   0   0   0   0   0   0   0 125  63   0   0   0   0   0   0   0   0
##   A5   0   0   0   0   0   0   0   0   0   0  48  35  27  15  13   8   8   6
##     
##       21  22  23  24  25
##   A1   0   0   0   0   0
##   A2   0   0   0   0   0
##   A3   0   0   0   0   0
##   A4   0   0   0   0   0
##   A5   4   1   7   2   1

Question (1 point): Briefly discuss the variable types and distributional implications such as potential skewness and outliers.

Answer: (Enter your answer here.) We have numerical, integers and factor variables. By looking at the summary our of data, we can get an idea of the distribution of the data. At first glance, the WHOLE, SHUCK, RINGS and VOLUME variables have max values substantially above the mean/median. This is interesting and perhaps worth looking into as these variables might have outliers and may be contributing to a skew to the right.

(1)(b) (1 point) Generate a table of counts using SEX and CLASS. Add margins to this table (Hint: There should be 15 cells in this table plus the marginal totals. Apply table() first, then pass the table object to addmargins() (Kabacoff Section 7.2 pages 144-147)). Lastly, present a barplot of these data; ignoring the marginal totals.

##    
##      A1  A2  A3  A4  A5
##   F   5  41 121  82  77
##   I  91 133  65  21  19
##   M  12  62 143  85  79
##      
##         A1   A2   A3   A4   A5  Sum
##   F      5   41  121   82   77  326
##   I     91  133   65   21   19  329
##   M     12   62  143   85   79  381
##   Sum  108  236  329  188  175 1036

Essay Question (2 points): Discuss the sex distribution of abalones. What stands out about the distribution of abalones by CLASS?

Answer: (Enter your answer here.) The graph above tells us the distribution of the abalones by sex and by class. Class allows us to see the classification of the abalons by the number of rings in each with A1 being the youngest. It is clear that younger abalons will belong to A1 and A2 which can be appreciated in the graph. We notice that most of the females and males are in the middle of the distribution, and that both have roughly the same shape, but that males account for slightly more than females. One interesting thing we can appreciate from the graph is that there seems to be a number of infant cases in which they are older in age classified(CLASS) by the number of rings (classified as A4 and A5).

(1)(c) (1 point) Select a simple random sample of 200 observations from “mydata” and identify this sample as “work.” Use set.seed(123) prior to drawing this sample. Do not change the number 123. Note that sample() “takes a sample of the specified size from the elements of x.” We cannot sample directly from “mydata.” Instead, we need to sample from the integers, 1 to 1036, representing the rows of “mydata.” Then, select those rows from the data frame (Kabacoff Section 4.10.5 page 87).

Using “work”, construct a scatterplot matrix of variables 2-6 with plot(work[, 2:6]) (these are the continuous variables excluding VOLUME and RATIO). The sample “work” will not be used in the remainder of the assignment.

##      SEX  LENGTH     DIAM  HEIGHT      WHOLE      SHUCK RINGS CLASS     VOLUME
## 415    F 11.0250  9.03000 2.83500 105.437500  54.603125     9    A3 282.240551
## 463    F 11.7600  9.24000 2.83500 100.312500  44.187500     9    A3 308.057904
## 179    I  8.1900  6.30000 2.10000  33.312500  13.812500     7    A2 108.353700
## 526    F 13.5450 10.71000 4.20000 199.856250  79.953750    12    A4 609.281190
## 195    I  9.9750  7.56000 2.62500  61.312500  25.625000     8    A2 197.953875
## 938    M 13.3350 10.50000 3.46500 162.307500  80.053750    12    A4 485.160638
## 665    M  5.6700  4.09500 1.68000  12.500000   4.812500     6    A1  39.007332
## 602    F 12.3900  9.34500 2.73000 141.562500  48.768750    13    A5 316.091822
## 709    M 10.9200  8.71500 3.67500  94.125000  31.927500     8    A2 349.741665
## 1011   M 11.1300  8.71500 2.73000 105.312500  34.375000    20    A5 264.804403
## 953    M 13.1250 10.29000 3.46500 142.353750  59.963750    11    A4 467.969906
## 348    F 11.5500  9.03000 3.15000 105.000000  49.375000     8    A2 328.533975
## 1017   M 12.9150 10.08000 3.99000 170.000000  66.312500    18    A5 519.430968
## 649    F 13.0200 10.92000 4.72500 147.937500  48.195000    23    A5 671.792940
## 989    M 13.4400 11.02500 3.88500 213.375000  94.421250    13    A5 575.663760
## 355    F 12.9150 10.08000 3.36000 156.562500  73.125000     8    A2 437.415552
## 840    M 12.9150  9.97500 3.57000 141.125000  59.338125    10    A3 459.912836
## 26     I  3.4650  2.52000 0.63000   2.687500   0.875000     3    A1   5.501034
## 519    F 13.7550 10.60500 4.09500 183.663750  88.580000    11    A4 597.344919
## 426    F 13.1250 10.08000 3.57000 169.062500  78.716875    10    A3 472.311000
## 1023   M 12.1800  9.66000 3.46500 153.437500  59.125000    16    A5 407.687742
## 766    M 12.3900  9.97500 3.46500 134.625000  56.244375     9    A3 428.240216
## 211    I  9.5550  7.03500 2.20500  50.687500  21.875000     8    A2 148.218832
## 932    M 13.1250 10.39500 3.88500 176.396250  87.036250    11    A4 530.047547
## 590    F 12.1800  9.55500 3.57000 113.437500  47.685000    13    A5 415.476243
## 593    F 13.0200 10.71000 4.30500 168.437500  60.881250    14    A5 600.307281
## 555    F 11.3400  8.82000 2.94000 102.637500  47.508750    11    A4 294.055272
## 871    M  9.6600  7.35000 2.52000  64.375000  27.720000    10    A3 178.922520
## 373    F 12.1800  9.13500 3.15000 104.250000  53.500000     8    A2 350.482545
## 844    M 12.6000  9.66000 3.15000 155.875000  66.020625     9    A3 383.405400
## 143    I 11.1300  8.82000 2.52000  74.562500  31.937500     7    A2 247.379832
## 544    F  9.2400  7.45500 2.41500  52.912500  20.406875    11    A4 166.355343
## 490    F 14.4900 12.18000 4.09500 207.250000  89.385000    10    A3 722.719179
## 621    F 13.4400 11.02500 4.51500 222.375000  57.821250    22    A5 669.014640
## 775    M  9.6600  7.66500 2.62500  58.375000  23.450625    10    A3 194.365238
## 905    M 14.2800 11.34000 3.99000 206.932500  87.771250    12    A4 646.121448
## 937    M 14.2800 11.44500 3.88500 213.180000  86.668750    11    A4 634.943421
## 842    M 12.3900 10.18500 3.25500 134.812500  56.120625     9    A3 410.755448
## 23     I  7.1400  5.35500 1.57500  22.500000   9.312500     6    A1  60.219653
## 923    M 12.8100 10.18500 3.67500 158.673750  66.640000    12    A4 479.476699
## 956    M 11.7600  9.66000 4.93500 107.036250  40.731250    12    A4 560.623896
## 309    I 11.8650  9.03000 2.83500 106.812500  37.737563    11    A4 303.744593
## 135    I  7.9800  5.98500 1.99500  30.375000  11.187500     7    A2  95.281799
## 821    M 12.2850 10.08000 3.88500 130.000000  53.707500    10    A3 481.090428
## 997    M 12.2850  8.50500 3.15000 157.062500  54.375000    15    A5 329.124364
## 224    I 10.7100  8.29500 2.20500  69.062500  29.250000     8    A2 195.890987
## 166    I  8.9250  6.51000 1.99500  43.812500  20.562500     8    A2 115.912991
## 217    I  9.5550  9.13500 2.31000  53.312500  24.375000     8    A2 201.628177
## 290    I 10.9200  8.61000 3.04500  80.750000  36.321250     9    A3 286.294554
## 581    F 13.2300  9.97500 3.67500 177.875000  52.976250    14    A5 484.986994
## 72     I  3.3600  2.31000 0.52500   2.250000   0.812500     3    A1   4.074840
## 588    F 15.4350 11.86500 4.72500 254.625000 110.925000    13    A5 865.318899
## 575    F 14.0700 11.55000 3.99000 177.288750  69.846875    12    A4 648.408915
## 141    I  5.5650  4.09500 1.15500  10.500000   4.562500     7    A2  26.320920
## 722    M 11.4450  9.45000 3.15000  97.562500  46.963125     8    A2 340.689037
## 865    M 11.1300  8.71500 2.52000  88.250000  41.518125     9    A3 244.434834
## 859    M 12.6000 10.08000 3.46500 114.562500  51.170625     9    A3 440.082720
## 153    I 10.1850  7.87500 2.94000  65.125000  25.000000     8    A2 235.808213
## 294    I  8.7150  6.82500 2.41500  41.062500  16.517531    12    A4 143.643898
## 277    I 11.7600  8.92500 2.83500 102.562500  45.508750     9    A3 297.555930
## 1035   M 12.3900 10.50000 4.20000 148.375000  51.500000    16    A5 546.399000
## 41     I  6.3000  4.62000 1.36500  15.437500   7.375000     5    A1  39.729690
## 431    F 13.6500 10.29000 3.25500 140.250000  68.806250     9    A3 457.192417
## 90     I  4.5150  3.57000 1.15500   7.562500   2.562500     6    A1  18.616925
## 316    I 11.0292  8.48400 3.07545  82.500000  31.072125    13    A5 287.775186
## 223    I  7.8750  5.98500 1.89000  32.125000  13.062500     7    A2  89.079244
## 528    F 12.6000  9.76500 3.36000 144.457500  59.997500    11    A4 413.411040
## 116    I 10.9200  8.61000 2.52000  74.375000  29.812500     8    A2 236.933424
## 606    F 11.3400  9.13500 3.67500 111.500000  41.055000    13    A5 380.696557
## 774    M  7.2450  5.67000 1.99500  24.625000   8.229375     9    A3  81.952904
## 747    M 13.5450 10.71000 4.09500 153.250000  72.826875    10    A3 594.049160
## 456    F 12.0750  9.97500 3.36000 111.875000  45.513125     9    A3 404.705700
## 598    F  9.7650  7.98000 2.83500  72.375000  26.520000    14    A5 220.916525
## 854    M 12.8100  9.87000 3.46500 131.500000  61.627500     9    A3 438.096235
## 39     I  5.7750  4.62000 1.68000  17.062500   7.062500     6    A1  44.823240
## 159    I 11.0250  8.40000 2.73000  80.687500  40.625000     8    A2 252.825300
## 752    M 11.7600  9.76500 3.04500 110.937500  41.394375     9    A3 349.676838
## 209    I  8.7150  6.82500 2.10000  41.687500  18.062500     7    A2 124.907738
## 374    F 10.6050  8.19000 2.41500  82.500000  38.062500     8    A2 209.754704
## 818    M 13.6500 10.92000 3.25500 171.000000  76.539375     9    A3 485.183790
## 34     I  7.6650  5.67000 1.78500  24.625000  10.187500     6    A1  77.577082
## 516    F 14.2800 10.81500 3.67500 206.358750  65.984375    12    A4 567.560385
## 13     I  4.5150  3.25500 1.26000   6.759375   2.625000     5    A1  18.517369
## 69     I  6.3000  4.83000 1.57500  15.875000   6.500000     6    A1  47.925675
## 895    M 10.1850  8.08500 2.62500  60.881250  24.500000    12    A4 216.157528
## 755    M  7.8750  5.88000 1.99500  27.812500  10.828125    10    A3  92.378475
## 409    F 10.7100  8.40000 2.52000  87.562500  43.808750    10    A3 226.709280
## 308    I 11.8650  9.24000 3.67500 109.187500  48.670875    11    A4 402.899805
## 278    I 10.5000  7.98000 2.83500  74.250000  36.076250     9    A3 237.544650
## 89     I  3.3600  2.31000 0.52500   2.437500   0.937500     4    A1   4.074840
## 928    M 15.5400 12.49500 3.99000 296.246250 140.813750    11    A4 774.747477
## 537    F 13.5450 10.60500 3.46500 168.045000  70.812500    11    A4 497.728972
## 291    I 11.5500  9.34500 3.04500  97.875000  35.797781    11    A4 328.661314
## 424    F 12.4950  9.76500 3.15000 134.562500  61.988750     9    A3 384.343076
## 880    M 13.0200  9.76500 3.99000 171.041250  69.886250    11    A4 507.289797
## 286    I 12.3900  9.34500 2.83500  96.437500  40.180000     9    A3 328.249199
## 908    M 13.4400 10.60500 3.78000 165.367500  72.275000    11    A4 538.767936
## 671    M  8.8200  7.14000 2.41500  52.687500  21.656250     8    A2 152.084142
## 121    I  8.1900  5.88000 1.89000  26.875000  10.562500     8    A2  91.017108
## 110    I 10.7100  8.08500 3.04500  95.812500  49.812500     8    A2 263.667616
## 158    I  7.2450  5.67000 2.31000  26.687500  10.250000     7    A2  94.892837
## 64     I  8.8200  6.61500 2.10000  42.937500  19.625000     6    A1 122.523030
## 483    F 13.0200  9.97500 3.36000 165.562500  86.670625     9    A3 436.378320
## 910    M 15.7500 11.55000 3.78000 241.357500 115.395000    11    A4 687.629250
## 477    F 13.6500 10.50000 3.99000 183.000000  80.989375     9    A3 571.866750
## 480    F 12.7050  9.55500 3.04500 122.187500  59.085000     9    A3 369.651657
## 711    M  8.8200  7.03500 2.41500  46.125000  21.161250     8    A2 149.847611
## 67     I  5.0400  3.67500 0.94500   9.656250   3.937500     5    A1  17.503290
## 663    M 10.5000  8.19000 2.83500  82.437500  39.312500     6    A1 243.795825
## 890    M 11.8650  9.24000 2.41500 117.108750  49.490000    11    A4 264.762729
## 847    M 11.4450  8.61000 2.94000  92.125000  43.188750     9    A3 289.711863
## 85     I  6.1950  4.83000 1.68000  20.312500   8.125000     5    A1  50.268708
## 165    I 10.5000  8.08500 2.52000  70.000000  35.437500     8    A2 213.929100
## 648    F 13.0200  9.87000 4.72500 139.375000  48.195000    15    A5 607.197465
## 51     I  7.6650  5.67000 1.78500  23.437500  10.125000     6    A1  77.577082
## 74     I  7.9800  5.88000 1.78500  34.187500  14.375000     6    A1  83.756484
## 178    I 10.1850  8.08500 2.73000  74.996875  31.312500     7    A2 224.803829
## 362    F 12.8100  9.87000 3.36000 134.312500  61.562500     8    A2 424.820592
## 236    I 11.0250  8.40000 3.04500  76.187500  30.380000     9    A3 281.997450
## 610    F  9.1350  7.35000 2.31000  48.000000  18.232500    13    A5 155.098597
## 330    F  9.7650  7.35000 2.62500  60.250000  28.750000     6    A1 188.403469
## 726    M  8.8200  7.24500 2.20500  53.750000  21.656250     7    A2 140.901485
## 127    I  9.0300  6.61500 2.41500  48.000000  23.562500     8    A2 144.256282
## 212    I  7.4550  5.46000 1.89000  24.812500   8.937500     7    A2  76.931127
## 686    M 12.0750  9.45000 3.46500 120.687500  61.627500     8    A2 395.386819
## 785    M 11.9700  9.45000 3.25500 149.375000  69.609375    10    A3 368.194208
## 814    M 13.1250 10.39500 3.25500 128.125000  56.925000     9    A3 444.093891
## 310    I 13.0200  9.87000 3.25500 120.750000  52.550438    11    A4 418.291587
## 744    M 11.7600  9.03000 3.04500 112.437500  57.420000     9    A3 323.357076
## 878    M 15.1200 11.76000 3.78000 202.278750  84.647500    11    A4 672.126336
## 243    I  8.5050  6.61500 2.20500  43.375000  19.661250     9    A3 124.054568
## 862    M  8.9250  6.82500 2.52000  46.937500  17.572500     9    A3 153.501075
## 926    M  9.0300  7.24500 2.41500  38.823750  11.331250    11    A4 157.994975
## 792    M 10.0800  7.87500 3.04500  97.125000  26.730000     9    A3 241.712100
## 113    I 12.0750  9.03000 2.73000  92.812500  36.187500     8    A2 297.671692
## 619    F 13.9650 11.23500 3.99000 187.000000  73.631250    17    A5 626.018132
## 1013   M 11.4450  8.61000 3.04500 109.125000  37.937500    18    A5 300.058715
## 151    I  9.1350  6.61500 2.31000  46.062500  20.187500     7    A2 139.588738
## 666    M  9.4500  7.03500 2.62500  58.259375  27.062500     6    A1 174.511969
## 614    F 14.1750 11.65500 4.30500 240.625000  90.907500    13    A5 711.227436
## 767    M 12.2850  9.97500 3.15000 133.125000  65.773125    10    A3 386.010056
## 160    I  8.9250  6.61500 1.99500  45.937500  23.312500     7    A2 117.782556
## 391    F 11.1300  8.61000 3.04500 106.572812  47.343750     9    A3 291.800219
## 155    I 11.3400  8.19000 2.62500  78.187500  31.562500     8    A2 243.795825
## 1024   M 14.1750 11.65500 4.20000 179.812500  68.125000    21    A5 693.880425
## 5      I  6.9300  4.83000 1.78500  21.187500   9.875000     6    A1  59.747341
## 326    I 11.0292  8.59005 2.96940  72.187500  23.765000    15    A5 281.325052
## 784    M 13.1250  9.55500 3.57000 135.250000  61.318125     9    A3 447.711469
## 280    I 12.8100  9.76500 3.15000 120.062500  55.063750     9    A3 394.032398
## 800    M  9.8700  7.56000 2.83500  62.625000  20.604375    10    A3 211.539762
## 789    M 13.5450 10.50000 4.09500 175.125000  76.291875    10    A3 582.401138
## 567    F 13.3350 10.18500 3.46500 161.861250  72.550625    11    A4 470.605818
## 843    M  7.9800  6.09000 2.52000  35.375000  14.540625     9    A3 122.467464
## 238    I 10.8150  8.29500 2.62500  72.562500  28.971250     9    A3 235.489866
## 764    M 12.0750  9.34500 3.36000 104.875000  49.561875     9    A3 379.145340
## 339    F 13.3350 10.50000 3.99000 161.250000  74.125000     8    A2 558.669825
## 962    M 15.7500 11.65500 4.51500 275.125000 131.360625    13    A5 828.801619
## 822    M 12.4950  9.87000 3.25500 150.187500  60.885000    10    A3 401.424991
## 137    I  8.2950  6.51000 1.78500  39.625000  19.125000     7    A2  96.390803
## 455    F 12.7050  9.55500 3.04500 107.750000  42.167500     9    A3 369.651657
## 738    M 11.0250  8.50500 2.83500  94.687500  40.899375    10    A3 265.831217
## 560    F 14.8050 11.76000 3.57000 185.831250  78.151250    11    A4 621.561276
## 589    F 10.7100  8.19000 2.20500  76.500000  23.842500    13    A5 193.411355
## 83     I  9.8700  7.35000 2.62500  53.937500  23.750000     6    A1 190.429312
## 696    M  9.5550  7.35000 2.31000  57.250000  24.750000     8    A2 162.229568
## 942    M 13.7550 10.92000 3.78000 190.230000  88.016250    11    A4 567.773388
## 196    I  9.4500  7.35000 2.73000  68.375000  30.625000     8    A2 189.618975
## 769    M  7.1400  5.56500 1.78500  24.205000   9.528750    10    A3  70.925368
## 680    M 11.2350  8.50500 2.94000  91.437500  41.580000     7    A2 280.927804
## 941    M 13.7550 10.71000 4.51500 227.396250 108.841250    11    A4 665.131966
## 968    M 13.0200 10.18500 3.25500 128.687500  52.593750    13    A5 431.641318
## 500    F 13.8600 10.50000 3.46500 151.788750  59.031875    12    A4 504.261450
## 889    M 13.6500 11.02500 3.99000 174.483750  73.193750    11    A4 600.460088
## 344    F 11.5500  9.24000 2.83500 105.437500  54.250000     8    A2 302.556870
## 909    M  8.0850  6.51000 2.10000  36.273750  13.046250    11    A4 110.530035
## 459    F 10.2900  7.66500 3.04500  79.312500  25.186875    10    A3 240.167828
## 20     I  8.4000  6.09000 2.10000  33.437500  15.062500     5    A1 107.427600
## 1032   M 12.7050  9.87000 2.41500 139.250000  49.062500    15    A5 302.837015
## 164    I  9.5550  7.35000 2.83500  67.062500  35.687500     7    A2 199.099924
## 52     I  9.8700  7.45500 2.52000  46.062500  15.750000     6    A1 185.423742
## 534    F 15.2250 11.97000 4.30500 206.486250  95.790000    11    A4 784.557191
## 177    I 10.5000  8.40000 2.52000  77.000000  32.625000     8    A2 222.264000
## 554    F 15.1200 12.07500 4.51500 267.750000 110.274375    12    A4 824.321610
## 827    M 12.4950  9.97500 2.94000 128.812500  60.946875    10    A3 366.434617
## 84     I  9.8700  7.98000 2.62500  60.562500  26.375000     6    A1 206.751825
## 523    F 14.1750 11.34000 4.41000 203.107500  88.322500    11    A4 708.883245
## 633    F 13.8600 10.92000 4.20000 209.500000  85.807500    17    A5 635.675040
## 392    F 13.2300 10.60500 4.09500 163.250000  65.145000     9    A3 574.545494
## 302    I 12.0750  9.45000 2.83500 103.062500  39.677344    11    A4 323.498306
## 597    F 14.8050 11.44500 3.78000 192.437500  77.456250    13    A5 640.495390
## 706    M  9.2400  6.82500 1.68000  51.625000  17.820000     8    A2 105.945840
## 901    M 13.9650 11.02500 3.78000 182.197500  82.258750    12    A4 581.984392
## 874    M  9.8700  7.87500 2.52000  70.953750  27.685000    12    A4 195.870150
## 430    F 12.0750 10.08000 3.46500 134.750000  64.513750     9    A3 421.745940
## 710    M  8.7150  6.61500 2.52000  50.187500  24.626250     8    A2 145.277307
## 761    M 14.4900 11.13000 3.99000 199.437500  83.902500    10    A3 643.482063
## 712    M  7.3500  5.56500 1.89000  28.187500  12.313125     7    A2  77.306197
## 428    F 12.7050 10.29000 3.15000 141.812500  66.470625     9    A3 411.813517
## 672    M  6.5100  4.72500 1.68000  16.812500   6.682500     7    A2  51.676380
## 250    I 13.2300  9.97500 3.04500 132.562500  63.271250    10    A3 401.846366
##           RATIO
## 415  0.19346308
## 463  0.14343894
## 179  0.12747603
## 526  0.13122636
## 195  0.12944935
## 938  0.16500463
## 665  0.12337424
## 602  0.15428666
## 709  0.09128881
## 1011 0.12981280
## 953  0.12813591
## 348  0.15028887
## 1017 0.12766374
## 649  0.07174086
## 989  0.16402153
## 355  0.16717513
## 840  0.12902037
## 26   0.15906101
## 519  0.14828953
## 426  0.16666323
## 1023 0.14502521
## 766  0.13133838
## 211  0.14758583
## 932  0.16420461
## 590  0.11477191
## 593  0.10141681
## 555  0.16156401
## 871  0.15492740
## 373  0.15264669
## 844  0.17219534
## 143  0.12910309
## 544  0.12267039
## 490  0.12367874
## 621  0.08642748
## 775  0.12065236
## 905  0.13584327
## 937  0.13649838
## 842  0.13662783
## 23   0.15464221
## 923  0.13898486
## 956  0.07265343
## 309  0.12424110
## 135  0.11741487
## 821  0.11163702
## 997  0.16521111
## 224  0.14931774
## 166  0.17739599
## 217  0.12089084
## 290  0.12686672
## 581  0.10923231
## 72   0.19939433
## 588  0.12818973
## 575  0.10772041
## 141  0.17334121
## 722  0.13784748
## 865  0.16985355
## 859  0.11627502
## 153  0.10601836
## 294  0.11498944
## 277  0.15294184
## 1035 0.09425347
## 41   0.18562944
## 431  0.15049736
## 90   0.13764357
## 316  0.10797361
## 223  0.14663910
## 528  0.14512796
## 116  0.12582649
## 606  0.10784179
## 774  0.10041590
## 747  0.12259402
## 456  0.11245981
## 598  0.12004534
## 854  0.14067115
## 39   0.15756335
## 159  0.16068408
## 752  0.11837894
## 209  0.14460673
## 374  0.18146196
## 818  0.15775336
## 34   0.13132100
## 516  0.11625966
## 13   0.14175880
## 69   0.13562668
## 895  0.11334327
## 755  0.11721481
## 409  0.19323757
## 308  0.12080144
## 278  0.15187145
## 89   0.23007038
## 928  0.18175438
## 537  0.14227120
## 291  0.10891997
## 424  0.16128494
## 880  0.13776396
## 286  0.12240700
## 908  0.13414867
## 671  0.14239650
## 121  0.11604961
## 110  0.18892157
## 158  0.10801658
## 64   0.16017397
## 483  0.19861350
## 910  0.16781572
## 477  0.14162281
## 480  0.15983967
## 711  0.14121847
## 67   0.22495771
## 663  0.16125174
## 890  0.18692208
## 847  0.14907484
## 85   0.16163137
## 165  0.16565068
## 648  0.07937286
## 51   0.13051535
## 74   0.17162850
## 178  0.13928811
## 362  0.14491411
## 236  0.10773147
## 610  0.11755425
## 330  0.15259804
## 726  0.15369781
## 127  0.16333777
## 212  0.11617534
## 686  0.15586635
## 785  0.18905614
## 814  0.12818235
## 310  0.12563111
## 744  0.17757459
## 878  0.12593986
## 243  0.15848872
## 862  0.11447803
## 926  0.07171905
## 792  0.11058611
## 113  0.12156850
## 619  0.11761840
## 1013 0.12643359
## 151  0.14462127
## 666  0.15507532
## 614  0.12781776
## 767  0.17039226
## 160  0.19792829
## 391  0.16224714
## 155  0.12946284
## 1024 0.09817974
## 5    0.16527932
## 326  0.08447524
## 784  0.13695902
## 280  0.13974422
## 800  0.09740190
## 789  0.13099541
## 567  0.15416432
## 843  0.11873051
## 238  0.12302546
## 764  0.13071999
## 339  0.13268123
## 962  0.15849465
## 822  0.15167217
## 137  0.19841104
## 455  0.11407361
## 738  0.15385467
## 560  0.12573378
## 589  0.12327353
## 83   0.12471819
## 696  0.15256159
## 942  0.15502003
## 196  0.16150810
## 769  0.13434897
## 680  0.14800956
## 941  0.16363858
## 968  0.12184596
## 500  0.11706601
## 889  0.12189611
## 344  0.17930513
## 909  0.11803353
## 459  0.10487198
## 20   0.14021071
## 1032 0.16200959
## 164  0.17924417
## 52   0.08494058
## 534  0.12209435
## 177  0.14678490
## 554  0.13377591
## 827  0.16632401
## 84   0.12756840
## 523  0.12459386
## 633  0.13498642
## 392  0.11338528
## 302  0.12265085
## 597  0.12093178
## 706  0.16819915
## 901  0.14134185
## 874  0.14134364
## 430  0.15296828
## 710  0.16951202
## 761  0.13038825
## 712  0.15927733
## 428  0.16140953
## 672  0.12931440
## 250  0.15745134


Section 2: (5 points) Summarizing the data using graphics.

(2)(a) (1 point) Use “mydata” to plot WHOLE versus VOLUME. Color code data points by CLASS.

(2)(b) (2 points) Use “mydata” to plot SHUCK versus WHOLE with WHOLE on the horizontal axis. Color code data points by CLASS. As an aid to interpretation, determine the maximum value of the ratio of SHUCK to WHOLE. Add to the chart a straight line with zero intercept using this maximum value as the slope of the line. If you are using the ‘base R’ plot() function, you may use abline() to add this line to the plot. Use help(abline) in R to determine the coding for the slope and intercept arguments in the functions. If you are using ggplot2 for visualizations, geom_abline() should be used.

Essay Question (2 points): How does the variability in this plot differ from the plot in (a)? Compare the two displays. Keep in mind that SHUCK is a part of WHOLE. Consider the location of the different age classes.

Answer: (Enter your answer here.) There is more vaiability in plot (a) compared to plot (b). Plot (a) shows that there are two max outliers associated with A1/Infants that may be skewing the results. Also, plot (b) shows the max ratio of shuck to whole as depicted by a straight line, however, most of the abalones fall under this line which could indicate that waiting for older abalones (A3 - A5) may not yield more shuck weight. —–

Section 3: (8 points) Getting insights about the data using graphs.

(3)(a) (2 points) Use “mydata” to create a multi-figured plot with histograms, boxplots and Q-Q plots of RATIO differentiated by sex. This can be done using par(mfrow = c(3,3)) and base R or grid.arrange() and ggplot2. The first row would show the histograms, the second row the boxplots and the third row the Q-Q plots. Be sure these displays are legible.

Essay Question (2 points): Compare the displays. How do the distributions compare to normality? Take into account the criteria discussed in the sync sessions to evaluate non-normality.

Answer: (Enter your answer here.) Given the theoritical quantiles, it seems that the graphs correspond well to a standard normal distribution. However, it seems that for all three sex class, the values seem to be deviating from normality on the upper right hand side of the graph. This can also be seen in the histograms for all sex classes where the histogram exhibit a right skew led by outliers. We can witness the presence of outliers given in the boxplots. By looking at the boxplots and analyzing the QQ-plots, we can see that there seems to be a larger presence of outliers in the Infant group.

(3)(b) (2 points) Use the boxplots to identify RATIO outliers (mild and extreme both) for each sex. Present the abalones with these outlying RATIO values along with their associated variables in “mydata” (Hint: display the observations by passing a data frame to the kable() function).

Essay Question (2 points): What are your observations regarding the results in (3)(b)?

Answer: (Enter your answer here.) There are mild outliers in all sex classes. However, there only are cases of extreme outliers in the female and infant categories. The presence of outliers was initially suspected by the examination of the plots above. It is a great that extreme outliers were identified in some of the categories. —–

Section 4: (8 points) Getting insights about possible predictors.

(4)(a) (3 points) With “mydata,” display side-by-side boxplots for VOLUME and WHOLE, each differentiated by CLASS There should be five boxes for VOLUME and five for WHOLE. Also, display side-by-side scatterplots: VOLUME and WHOLE versus RINGS. Present these four figures in one graphic: the boxplots in one row and the scatterplots in a second row. Base R or ggplot2 may be used.

Essay Question (5 points) How well do you think these variables would perform as predictors of age? Explain.

Answer: (Enter your answer here.) The variables Volume, Whole and Rings are good predictors of age. Since the more rings an abalone has, the greaters its age; we can appreciate that these variables have an upwards trend. Both volume and Whole increase as the abalones have more rings. However, it is likely that the explanatory power of the Class variable is smaller compared to the others as we noticed earlier that a number of infants were classified as class 4 and 5. —–

Section 5: (12 points) Getting insights regarding different groups in the data.

(5)(a) (2 points) Use aggregate() with “mydata” to compute the mean values of VOLUME, SHUCK and RATIO for each combination of SEX and CLASS. Then, using matrix(), create matrices of the mean values. Using the “dimnames” argument within matrix() or the rownames() and colnames() functions on the matrices, label the rows by SEX and columns by CLASS. Present the three matrices (Kabacoff Section 5.6.2, p. 110-111). The kable() function is useful for this purpose. You do not need to be concerned with the number of digits presented.

##          A1        A2        A3        A4        A5
## F 0.1546644 0.1569554 0.1512698 0.1554605 0.1475600
## I 0.1564017 0.1450304 0.1372256 0.1462123 0.1379609
## M 0.1244413 0.1364881 0.1233605 0.1167649 0.1262089
##         A1        A2       A3       A4       A5
## F 255.2994  66.51618 103.7232 276.8573 160.3200
## I 245.3857 412.60794 270.7406 358.1181 498.0489
## M 316.4129 442.61552 486.1525 318.6930 440.2074
##         A1       A2       A3       A4       A5
## F 38.90000 10.11332 16.39583 42.50305 23.41024
## I 38.33855 59.69121 37.17969 52.96933 69.05161
## M 39.85369 61.42726 59.17076 36.47047 55.02762

(5)(b) (3 points) Present three graphs. Each graph should include three lines, one for each sex. The first should show mean RATIO versus CLASS; the second, mean VOLUME versus CLASS; the third, mean SHUCK versus CLASS. This may be done with the ‘base R’ interaction.plot() function or with ggplot2 using grid.arrange().

Essay Question (2 points): What questions do these plots raise? Consider aging and sex differences.

Answer: (Enter your answer here.) It seems that as the abalones age, the proportion of meat as given by the mean ratio, drops considerably. This means that there is less meat relative to the size of the abalone. We can also appreciate that the volume of females is larger than males as the abalones age. It also seems that both females and males peak in size/volume in class 4 as opposed to infants, which increase slightly as they age to class 5. We also see that the mean shuck for all sex classes peakes at class 4. This raises the question as to whether it is better to eat the abalones in class 4 as the shuck does not grow anymore past this point. Furthermore, by looking at the mean ratio, it is clear that the proportion of meat is largerst when all abalones are at Class 2. Would it be more profitable if producers/growers sell abalones for consumption once they reach this stage as they could start growing new ones and the proportion of meat to size does not really grow from here?

5(c) (3 points) Present four boxplots using par(mfrow = c(2, 2) or grid.arrange(). The first line should show VOLUME by RINGS for the infants and, separately, for the adult; factor levels “M” and “F,” combined. The second line should show WHOLE by RINGS for the infants and, separately, for the adults. Since the data are sparse beyond 15 rings, limit the displays to less than 16 rings. One way to accomplish this is to generate a new data set using subset() to select RINGS < 16. Use ylim = c(0, 1100) for VOLUME and ylim = c(0, 400) for WHOLE. If you wish to reorder the displays for presentation purposes or use ggplot2 go ahead.

Essay Question (2 points): What do these displays suggest about abalone growth? Also, compare the infant and adult displays. What differences stand out?

Answer: (Enter your answer here.) These graphs suggest that the growth of adult abalones varies more than infant abalones. We can see this by looking at the longer whiskers exhibited by the adult graphs as opposed to shorter whiskers of infants. This suggests that both adult volume and adult whole tends to be more dispersed as the number of rings grows. The graphs also suggest that Volume and Whole also tend to increase with the number of rings. —–

Section 6: (11 points) Conclusions from the Exploratory Data Analysis (EDA).

Conclusions

Essay Question 1) (5 points) Based solely on these data, what are plausible statistical reasons that explain the failure of the original study? Consider to what extent physical measurements may be used for age prediction.

Answer: (Enter your answer here.) The presence of outliers may be affecting the data which contributes to the right skew witnessed. This should be addressed in order to achieve normality of the data and an approximation of the normal distribution. A possible measure that needs to be addressed is the Infant category and how this is measured. It is known that the sexing of the abalons is difficult, therefore this could be a category prone to error measurement. The idea behind this point is that we saw that infants can still have upwards of 10 rings. This bears the question of how infants are classified as the number of rings is a sign of age. Another point to consider is correlation between the variables as some of them are a function of the other. The model for age prediction might run into multicollinearity. Physical measurements may be used for age prediction to a certain extent. Weight, Sex, Volume are definitely a good measures. However, we would need to add some measures for controlling the environmentral effect on growth. Abalone growth is a function of its environment such as pollution, food and others. Therefore to get a good measure of which physical variables are good predictors of age, measures of environment should be added.

Essay Question 2) (3 points) Do not refer to the abalone data or study. If you were presented with an overall histogram and summary statistics from a sample of some population or phenomenon and no other information, what questions might you ask before accepting them as representative of the sampled population or phenomenon?

Answer: (Enter your answer here.) Questions I would ask include: How long has the study been going on? How was the sample obtained? Where was the sample obtained? Is the sample size big enough to be representative of the population? Is there a presence of outliers?, How many data points are being given in the sample? What is the mean of the sample distribution? What is the standard deviation of the sample distribution? Is the data symmetrical about its mean? I would examine the measures of central tendency.

Essay Question 3) (3 points) Do not refer to the abalone data or study. What do you see as difficulties analyzing data derived from observational studies? Can causality be determined? What might be learned from such studies?

Answer: (Enter your answer here.) Data from observational studies have the disadvantage of having human/measurement errors as the measured values have to be manually recorded. It is also possible that observational studies may lead to biases. Data correlation may be detected and may be interpreted as causality without investigating the underlying factors affecting such relationships. It is possible that a purely observational study may yield some causality in some cases but it is prone to biases and erroneous causality. From observational studies we can learn about relationship between the independent and the dependent variables. From here, we can gather some insights into underlying relationships/other variables that might be affectign some of our variables. We might discover other variables which could perhaps have a strong correlation and a causality effect.