Test Items starts from here - There are 6 sections

## 'data.frame':    1036 obs. of  8 variables:
##  $ SEX   : Factor w/ 3 levels "F","I","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ LENGTH: num  5.57 3.67 10.08 4.09 6.93 ...
##  $ DIAM  : num  4.09 2.62 7.35 3.15 4.83 ...
##  $ HEIGHT: num  1.26 0.84 2.205 0.945 1.785 ...
##  $ WHOLE : num  11.5 3.5 79.38 4.69 21.19 ...
##  $ SHUCK : num  4.31 1.19 44 2.25 9.88 ...
##  $ RINGS : int  6 4 6 3 6 6 5 6 5 6 ...
##  $ CLASS : Factor w/ 5 levels "A1","A2","A3",..: 1 1 1 1 1 1 1 1 1 1 ...
Section 1: (6 points) Summarizing the data.

(1)(a) (1 point) Use summary() to obtain and present descriptive statistics from mydata. Use table() to present a frequency table using CLASS and RINGS. There should be 115 cells in the table you present.

## 'data.frame':    1036 obs. of  10 variables:
##  $ SEX   : Factor w/ 3 levels "F","I","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ LENGTH: num  5.57 3.67 10.08 4.09 6.93 ...
##  $ DIAM  : num  4.09 2.62 7.35 3.15 4.83 ...
##  $ HEIGHT: num  1.26 0.84 2.205 0.945 1.785 ...
##  $ WHOLE : num  11.5 3.5 79.38 4.69 21.19 ...
##  $ SHUCK : num  4.31 1.19 44 2.25 9.88 ...
##  $ RINGS : int  6 4 6 3 6 6 5 6 5 6 ...
##  $ CLASS : Factor w/ 5 levels "A1","A2","A3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ VOLUME: num  28.7 8.1 163.4 12.2 59.7 ...
##  $ RATIO : num  0.15 0.147 0.269 0.185 0.165 ...
##  SEX         LENGTH           DIAM            HEIGHT          WHOLE        
##  F:326   Min.   : 2.73   Min.   : 1.995   Min.   :0.525   Min.   :  1.625  
##  I:329   1st Qu.: 9.45   1st Qu.: 7.350   1st Qu.:2.415   1st Qu.: 56.484  
##  M:381   Median :11.45   Median : 8.925   Median :2.940   Median :101.344  
##          Mean   :11.08   Mean   : 8.622   Mean   :2.947   Mean   :105.832  
##          3rd Qu.:13.02   3rd Qu.:10.185   3rd Qu.:3.570   3rd Qu.:150.319  
##          Max.   :16.80   Max.   :13.230   Max.   :4.935   Max.   :315.750  
##      SHUCK              RINGS        CLASS        VOLUME       
##  Min.   :  0.5625   Min.   : 3.000   A1:108   Min.   :  3.612  
##  1st Qu.: 23.3006   1st Qu.: 8.000   A2:236   1st Qu.:163.545  
##  Median : 42.5700   Median : 9.000   A3:329   Median :307.363  
##  Mean   : 45.4396   Mean   : 9.993   A4:188   Mean   :326.804  
##  3rd Qu.: 64.2897   3rd Qu.:11.000   A5:175   3rd Qu.:463.264  
##  Max.   :157.0800   Max.   :25.000            Max.   :995.673  
##      RATIO        
##  Min.   :0.06734  
##  1st Qu.:0.12241  
##  Median :0.13914  
##  Mean   :0.14205  
##  3rd Qu.:0.15911  
##  Max.   :0.31176
##     
##        3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
##   A1   9   8  24  67   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##   A2   0   0   0   0  91 145   0   0   0   0   0   0   0   0   0   0   0   0
##   A3   0   0   0   0   0   0 182 147   0   0   0   0   0   0   0   0   0   0
##   A4   0   0   0   0   0   0   0   0 125  63   0   0   0   0   0   0   0   0
##   A5   0   0   0   0   0   0   0   0   0   0  48  35  27  15  13   8   8   6
##     
##       21  22  23  24  25
##   A1   0   0   0   0   0
##   A2   0   0   0   0   0
##   A3   0   0   0   0   0
##   A4   0   0   0   0   0
##   A5   4   1   7   2   1

Question (1 point): Briefly discuss the variable types and distributional implications such as potential skewness and outliers.

Answer: There are 10 variables. Sex (“SEX”) is a categorical variable, which details the sex of the individual abalone. There are 3 varieties: M for male, F for female, and I for infant. From the table above, female and infant are fairly even. But male has about 14% more than either. each of the three varieties seem to be evenly distributed. Length (“LEGNTH”) is a numerical variable, which details the longest shell length, in centimeters (cm). From the table above, length ranges from 2.73 to 16.80 and seems to be skewed towards the larger values (right). Diameter (“DIAM”) is a numerical variable, which details the length (diameter) perpendicular to the length, in centimeters (cm). From the table above, diameter ranges from 1.995 to 13.230 and seems to have most data gathered around 7-9 cm. Height (“HEIGHT”) is a numerical value, which details the length (height) perpendicular to the length and diameter, in centimeters (cm). From the table above, height ranges from 0.525 to 4.935 and most of the data spans over 2.4 to 3.6. However, with the third quartile being 3.570, there might be outliers. Whole weight (“WHOLE”) is a numerical variable, which details the whole weight of the abalone, in grams (g). From the table above, whole weight ranges from 1.625 to 315.750 and most of the data spans over 55 to 150. However, with the third quartile being 150.319, I highly suspect there is at least one outlier. Shuck weight (“SHUCK”) is a numerical variable, which details the shucked weight of the meat, in grams (g). From the table above, shuck weigh ranges from 0.5625 to 157.0800 and seems to be skewed towards the smaller values (left). Age (“RINGS”) is a numerical variable, which details the number of growth rings on the abalone’s shell. To determine the age of the abalone, add 1.5 to the number of rings. From the table above, age ranges from 3 to 25 and most of the data spans over 8 to 10. Class (“CLASS”) is a categorical variable, which details the classification of the abalone and is determined by the number of growth rings. There are 5 varieties: A1 is the youngest and A5 is the oldest. From the table above, A2 and A3 seem to be the two most popular classes, with A3 having more than twice the amount as A1 and almost twice the amount as A5. Volume (“VOLUME”) is a numerical variable, which is calculated by multiplying length, diameter, and height, in centimeters cubed (cm^3). From the table above, volume ranges from 3.612 to 995.673. However, with the third quartile being 463.264, I highly suspect there is at least one outlier. Ratio (“RATIO”) is a numerical variable, which is calculated by dividing shuck weight by volume, in grams per centimeter cubed (g/cm^3). From the table above, ratio ranges from 0.06734 to 0.31176 and seems to have most data gathered around 12-16 g/cm^3.

(1)(b) (1 point) Generate a table of counts using SEX and CLASS. Add margins to this table (Hint: There should be 15 cells in this table plus the marginal totals. Apply table() first, then pass the table object to addmargins() (Kabacoff Section 7.2 pages 144-147)). Lastly, present a barplot of these data; ignoring the marginal totals.

##      
##         A1   A2   A3   A4   A5  Sum
##   F      5   41  121   82   77  326
##   I     91  133   65   21   19  329
##   M     12   62  143   85   79  381
##   Sum  108  236  329  188  175 1036

Essay Question (2 points): Discuss the sex distribution of abalones. What stands out about the distribution of abalones by CLASS?

Answer: In the graph above, I can see that most infant abalones fell into classes A1, A2, and A3. This makes sense: since they are young, they should be low in age classification. I can see that most male abalones fell into class A3 and most female abalones also fell into class A3. This also makes sense: since they are adults, they should not be low in age classification. I also understand why there aren’t large amounts of males and females in classes A4 and A5 since it might an indication of survival once developing reproductive organs. These trends are also evident in the classes. Class A1 is mostly full of infants. Classes A4 and A5 are mostly full of males and females. However, there are some things that stand out to me. First, I see that, from Class A2 to A3, there is a sharp decrease in infants and a sharp increase in males and females. I wonder if this is an indication of the age of when abalones generally mature and develop their reproductive organs. I also see that there are still infants in classes A4 and A5. Age classification is determined by the number of growth rings. So it seems to me that there are either some ‘old’ infants that are about to develop into either males or females or perhaps there is just a group of abalones that do not develop reproductive organs.

(1)(c) (1 point) Select a simple random sample of 200 observations from “mydata” and identify this sample as “work.” Use set.seed(123) prior to drawing this sample. Do not change the number 123. Note that sample() “takes a sample of the specified size from the elements of x.” We cannot sample directly from “mydata.” Instead, we need to sample from the integers, 1 to 1036, representing the rows of “mydata.” Then, select those rows from the data frame (Kabacoff Section 4.10.5 page 87).

Using “work”, construct a scatterplot matrix of variables 2-6 with plot(work[, 2:6]) (these are the continuous variables excluding VOLUME and RATIO). The sample “work” will not be used in the remainder of the assignment.

##      SEX  LENGTH     DIAM  HEIGHT      WHOLE      SHUCK RINGS CLASS     VOLUME
## 415    F 11.0250  9.03000 2.83500 105.437500  54.603125     9    A3 282.240551
## 463    F 11.7600  9.24000 2.83500 100.312500  44.187500     9    A3 308.057904
## 179    I  8.1900  6.30000 2.10000  33.312500  13.812500     7    A2 108.353700
## 526    F 13.5450 10.71000 4.20000 199.856250  79.953750    12    A4 609.281190
## 195    I  9.9750  7.56000 2.62500  61.312500  25.625000     8    A2 197.953875
## 938    M 13.3350 10.50000 3.46500 162.307500  80.053750    12    A4 485.160638
## 665    M  5.6700  4.09500 1.68000  12.500000   4.812500     6    A1  39.007332
## 602    F 12.3900  9.34500 2.73000 141.562500  48.768750    13    A5 316.091822
## 709    M 10.9200  8.71500 3.67500  94.125000  31.927500     8    A2 349.741665
## 1011   M 11.1300  8.71500 2.73000 105.312500  34.375000    20    A5 264.804403
## 953    M 13.1250 10.29000 3.46500 142.353750  59.963750    11    A4 467.969906
## 348    F 11.5500  9.03000 3.15000 105.000000  49.375000     8    A2 328.533975
## 1017   M 12.9150 10.08000 3.99000 170.000000  66.312500    18    A5 519.430968
## 649    F 13.0200 10.92000 4.72500 147.937500  48.195000    23    A5 671.792940
## 989    M 13.4400 11.02500 3.88500 213.375000  94.421250    13    A5 575.663760
## 355    F 12.9150 10.08000 3.36000 156.562500  73.125000     8    A2 437.415552
## 840    M 12.9150  9.97500 3.57000 141.125000  59.338125    10    A3 459.912836
## 26     I  3.4650  2.52000 0.63000   2.687500   0.875000     3    A1   5.501034
## 519    F 13.7550 10.60500 4.09500 183.663750  88.580000    11    A4 597.344919
## 426    F 13.1250 10.08000 3.57000 169.062500  78.716875    10    A3 472.311000
## 1023   M 12.1800  9.66000 3.46500 153.437500  59.125000    16    A5 407.687742
## 766    M 12.3900  9.97500 3.46500 134.625000  56.244375     9    A3 428.240216
## 211    I  9.5550  7.03500 2.20500  50.687500  21.875000     8    A2 148.218832
## 932    M 13.1250 10.39500 3.88500 176.396250  87.036250    11    A4 530.047547
## 590    F 12.1800  9.55500 3.57000 113.437500  47.685000    13    A5 415.476243
## 593    F 13.0200 10.71000 4.30500 168.437500  60.881250    14    A5 600.307281
## 555    F 11.3400  8.82000 2.94000 102.637500  47.508750    11    A4 294.055272
## 871    M  9.6600  7.35000 2.52000  64.375000  27.720000    10    A3 178.922520
## 373    F 12.1800  9.13500 3.15000 104.250000  53.500000     8    A2 350.482545
## 844    M 12.6000  9.66000 3.15000 155.875000  66.020625     9    A3 383.405400
## 143    I 11.1300  8.82000 2.52000  74.562500  31.937500     7    A2 247.379832
## 544    F  9.2400  7.45500 2.41500  52.912500  20.406875    11    A4 166.355343
## 490    F 14.4900 12.18000 4.09500 207.250000  89.385000    10    A3 722.719179
## 621    F 13.4400 11.02500 4.51500 222.375000  57.821250    22    A5 669.014640
## 775    M  9.6600  7.66500 2.62500  58.375000  23.450625    10    A3 194.365238
## 905    M 14.2800 11.34000 3.99000 206.932500  87.771250    12    A4 646.121448
## 937    M 14.2800 11.44500 3.88500 213.180000  86.668750    11    A4 634.943421
## 842    M 12.3900 10.18500 3.25500 134.812500  56.120625     9    A3 410.755448
## 23     I  7.1400  5.35500 1.57500  22.500000   9.312500     6    A1  60.219653
## 923    M 12.8100 10.18500 3.67500 158.673750  66.640000    12    A4 479.476699
## 956    M 11.7600  9.66000 4.93500 107.036250  40.731250    12    A4 560.623896
## 309    I 11.8650  9.03000 2.83500 106.812500  37.737563    11    A4 303.744593
## 135    I  7.9800  5.98500 1.99500  30.375000  11.187500     7    A2  95.281799
## 821    M 12.2850 10.08000 3.88500 130.000000  53.707500    10    A3 481.090428
## 997    M 12.2850  8.50500 3.15000 157.062500  54.375000    15    A5 329.124364
## 224    I 10.7100  8.29500 2.20500  69.062500  29.250000     8    A2 195.890987
## 166    I  8.9250  6.51000 1.99500  43.812500  20.562500     8    A2 115.912991
## 217    I  9.5550  9.13500 2.31000  53.312500  24.375000     8    A2 201.628177
## 290    I 10.9200  8.61000 3.04500  80.750000  36.321250     9    A3 286.294554
## 581    F 13.2300  9.97500 3.67500 177.875000  52.976250    14    A5 484.986994
## 72     I  3.3600  2.31000 0.52500   2.250000   0.812500     3    A1   4.074840
## 588    F 15.4350 11.86500 4.72500 254.625000 110.925000    13    A5 865.318899
## 575    F 14.0700 11.55000 3.99000 177.288750  69.846875    12    A4 648.408915
## 141    I  5.5650  4.09500 1.15500  10.500000   4.562500     7    A2  26.320920
## 722    M 11.4450  9.45000 3.15000  97.562500  46.963125     8    A2 340.689037
## 865    M 11.1300  8.71500 2.52000  88.250000  41.518125     9    A3 244.434834
## 859    M 12.6000 10.08000 3.46500 114.562500  51.170625     9    A3 440.082720
## 153    I 10.1850  7.87500 2.94000  65.125000  25.000000     8    A2 235.808213
## 294    I  8.7150  6.82500 2.41500  41.062500  16.517531    12    A4 143.643898
## 277    I 11.7600  8.92500 2.83500 102.562500  45.508750     9    A3 297.555930
## 1035   M 12.3900 10.50000 4.20000 148.375000  51.500000    16    A5 546.399000
## 41     I  6.3000  4.62000 1.36500  15.437500   7.375000     5    A1  39.729690
## 431    F 13.6500 10.29000 3.25500 140.250000  68.806250     9    A3 457.192417
## 90     I  4.5150  3.57000 1.15500   7.562500   2.562500     6    A1  18.616925
## 316    I 11.0292  8.48400 3.07545  82.500000  31.072125    13    A5 287.775186
## 223    I  7.8750  5.98500 1.89000  32.125000  13.062500     7    A2  89.079244
## 528    F 12.6000  9.76500 3.36000 144.457500  59.997500    11    A4 413.411040
## 116    I 10.9200  8.61000 2.52000  74.375000  29.812500     8    A2 236.933424
## 606    F 11.3400  9.13500 3.67500 111.500000  41.055000    13    A5 380.696557
## 774    M  7.2450  5.67000 1.99500  24.625000   8.229375     9    A3  81.952904
## 747    M 13.5450 10.71000 4.09500 153.250000  72.826875    10    A3 594.049160
## 456    F 12.0750  9.97500 3.36000 111.875000  45.513125     9    A3 404.705700
## 598    F  9.7650  7.98000 2.83500  72.375000  26.520000    14    A5 220.916525
## 854    M 12.8100  9.87000 3.46500 131.500000  61.627500     9    A3 438.096235
## 39     I  5.7750  4.62000 1.68000  17.062500   7.062500     6    A1  44.823240
## 159    I 11.0250  8.40000 2.73000  80.687500  40.625000     8    A2 252.825300
## 752    M 11.7600  9.76500 3.04500 110.937500  41.394375     9    A3 349.676838
## 209    I  8.7150  6.82500 2.10000  41.687500  18.062500     7    A2 124.907738
## 374    F 10.6050  8.19000 2.41500  82.500000  38.062500     8    A2 209.754704
## 818    M 13.6500 10.92000 3.25500 171.000000  76.539375     9    A3 485.183790
## 34     I  7.6650  5.67000 1.78500  24.625000  10.187500     6    A1  77.577082
## 516    F 14.2800 10.81500 3.67500 206.358750  65.984375    12    A4 567.560385
## 13     I  4.5150  3.25500 1.26000   6.759375   2.625000     5    A1  18.517369
## 69     I  6.3000  4.83000 1.57500  15.875000   6.500000     6    A1  47.925675
## 895    M 10.1850  8.08500 2.62500  60.881250  24.500000    12    A4 216.157528
## 755    M  7.8750  5.88000 1.99500  27.812500  10.828125    10    A3  92.378475
## 409    F 10.7100  8.40000 2.52000  87.562500  43.808750    10    A3 226.709280
## 308    I 11.8650  9.24000 3.67500 109.187500  48.670875    11    A4 402.899805
## 278    I 10.5000  7.98000 2.83500  74.250000  36.076250     9    A3 237.544650
## 89     I  3.3600  2.31000 0.52500   2.437500   0.937500     4    A1   4.074840
## 928    M 15.5400 12.49500 3.99000 296.246250 140.813750    11    A4 774.747477
## 537    F 13.5450 10.60500 3.46500 168.045000  70.812500    11    A4 497.728972
## 291    I 11.5500  9.34500 3.04500  97.875000  35.797781    11    A4 328.661314
## 424    F 12.4950  9.76500 3.15000 134.562500  61.988750     9    A3 384.343076
## 880    M 13.0200  9.76500 3.99000 171.041250  69.886250    11    A4 507.289797
## 286    I 12.3900  9.34500 2.83500  96.437500  40.180000     9    A3 328.249199
## 908    M 13.4400 10.60500 3.78000 165.367500  72.275000    11    A4 538.767936
## 671    M  8.8200  7.14000 2.41500  52.687500  21.656250     8    A2 152.084142
## 121    I  8.1900  5.88000 1.89000  26.875000  10.562500     8    A2  91.017108
## 110    I 10.7100  8.08500 3.04500  95.812500  49.812500     8    A2 263.667616
## 158    I  7.2450  5.67000 2.31000  26.687500  10.250000     7    A2  94.892837
## 64     I  8.8200  6.61500 2.10000  42.937500  19.625000     6    A1 122.523030
## 483    F 13.0200  9.97500 3.36000 165.562500  86.670625     9    A3 436.378320
## 910    M 15.7500 11.55000 3.78000 241.357500 115.395000    11    A4 687.629250
## 477    F 13.6500 10.50000 3.99000 183.000000  80.989375     9    A3 571.866750
## 480    F 12.7050  9.55500 3.04500 122.187500  59.085000     9    A3 369.651657
## 711    M  8.8200  7.03500 2.41500  46.125000  21.161250     8    A2 149.847611
## 67     I  5.0400  3.67500 0.94500   9.656250   3.937500     5    A1  17.503290
## 663    M 10.5000  8.19000 2.83500  82.437500  39.312500     6    A1 243.795825
## 890    M 11.8650  9.24000 2.41500 117.108750  49.490000    11    A4 264.762729
## 847    M 11.4450  8.61000 2.94000  92.125000  43.188750     9    A3 289.711863
## 85     I  6.1950  4.83000 1.68000  20.312500   8.125000     5    A1  50.268708
## 165    I 10.5000  8.08500 2.52000  70.000000  35.437500     8    A2 213.929100
## 648    F 13.0200  9.87000 4.72500 139.375000  48.195000    15    A5 607.197465
## 51     I  7.6650  5.67000 1.78500  23.437500  10.125000     6    A1  77.577082
## 74     I  7.9800  5.88000 1.78500  34.187500  14.375000     6    A1  83.756484
## 178    I 10.1850  8.08500 2.73000  74.996875  31.312500     7    A2 224.803829
## 362    F 12.8100  9.87000 3.36000 134.312500  61.562500     8    A2 424.820592
## 236    I 11.0250  8.40000 3.04500  76.187500  30.380000     9    A3 281.997450
## 610    F  9.1350  7.35000 2.31000  48.000000  18.232500    13    A5 155.098597
## 330    F  9.7650  7.35000 2.62500  60.250000  28.750000     6    A1 188.403469
## 726    M  8.8200  7.24500 2.20500  53.750000  21.656250     7    A2 140.901485
## 127    I  9.0300  6.61500 2.41500  48.000000  23.562500     8    A2 144.256282
## 212    I  7.4550  5.46000 1.89000  24.812500   8.937500     7    A2  76.931127
## 686    M 12.0750  9.45000 3.46500 120.687500  61.627500     8    A2 395.386819
## 785    M 11.9700  9.45000 3.25500 149.375000  69.609375    10    A3 368.194208
## 814    M 13.1250 10.39500 3.25500 128.125000  56.925000     9    A3 444.093891
## 310    I 13.0200  9.87000 3.25500 120.750000  52.550438    11    A4 418.291587
## 744    M 11.7600  9.03000 3.04500 112.437500  57.420000     9    A3 323.357076
## 878    M 15.1200 11.76000 3.78000 202.278750  84.647500    11    A4 672.126336
## 243    I  8.5050  6.61500 2.20500  43.375000  19.661250     9    A3 124.054568
## 862    M  8.9250  6.82500 2.52000  46.937500  17.572500     9    A3 153.501075
## 926    M  9.0300  7.24500 2.41500  38.823750  11.331250    11    A4 157.994975
## 792    M 10.0800  7.87500 3.04500  97.125000  26.730000     9    A3 241.712100
## 113    I 12.0750  9.03000 2.73000  92.812500  36.187500     8    A2 297.671692
## 619    F 13.9650 11.23500 3.99000 187.000000  73.631250    17    A5 626.018132
## 1013   M 11.4450  8.61000 3.04500 109.125000  37.937500    18    A5 300.058715
## 151    I  9.1350  6.61500 2.31000  46.062500  20.187500     7    A2 139.588738
## 666    M  9.4500  7.03500 2.62500  58.259375  27.062500     6    A1 174.511969
## 614    F 14.1750 11.65500 4.30500 240.625000  90.907500    13    A5 711.227436
## 767    M 12.2850  9.97500 3.15000 133.125000  65.773125    10    A3 386.010056
## 160    I  8.9250  6.61500 1.99500  45.937500  23.312500     7    A2 117.782556
## 391    F 11.1300  8.61000 3.04500 106.572812  47.343750     9    A3 291.800219
## 155    I 11.3400  8.19000 2.62500  78.187500  31.562500     8    A2 243.795825
## 1024   M 14.1750 11.65500 4.20000 179.812500  68.125000    21    A5 693.880425
## 5      I  6.9300  4.83000 1.78500  21.187500   9.875000     6    A1  59.747341
## 326    I 11.0292  8.59005 2.96940  72.187500  23.765000    15    A5 281.325052
## 784    M 13.1250  9.55500 3.57000 135.250000  61.318125     9    A3 447.711469
## 280    I 12.8100  9.76500 3.15000 120.062500  55.063750     9    A3 394.032398
## 800    M  9.8700  7.56000 2.83500  62.625000  20.604375    10    A3 211.539762
## 789    M 13.5450 10.50000 4.09500 175.125000  76.291875    10    A3 582.401138
## 567    F 13.3350 10.18500 3.46500 161.861250  72.550625    11    A4 470.605818
## 843    M  7.9800  6.09000 2.52000  35.375000  14.540625     9    A3 122.467464
## 238    I 10.8150  8.29500 2.62500  72.562500  28.971250     9    A3 235.489866
## 764    M 12.0750  9.34500 3.36000 104.875000  49.561875     9    A3 379.145340
## 339    F 13.3350 10.50000 3.99000 161.250000  74.125000     8    A2 558.669825
## 962    M 15.7500 11.65500 4.51500 275.125000 131.360625    13    A5 828.801619
## 822    M 12.4950  9.87000 3.25500 150.187500  60.885000    10    A3 401.424991
## 137    I  8.2950  6.51000 1.78500  39.625000  19.125000     7    A2  96.390803
## 455    F 12.7050  9.55500 3.04500 107.750000  42.167500     9    A3 369.651657
## 738    M 11.0250  8.50500 2.83500  94.687500  40.899375    10    A3 265.831217
## 560    F 14.8050 11.76000 3.57000 185.831250  78.151250    11    A4 621.561276
## 589    F 10.7100  8.19000 2.20500  76.500000  23.842500    13    A5 193.411355
## 83     I  9.8700  7.35000 2.62500  53.937500  23.750000     6    A1 190.429312
## 696    M  9.5550  7.35000 2.31000  57.250000  24.750000     8    A2 162.229568
## 942    M 13.7550 10.92000 3.78000 190.230000  88.016250    11    A4 567.773388
## 196    I  9.4500  7.35000 2.73000  68.375000  30.625000     8    A2 189.618975
## 769    M  7.1400  5.56500 1.78500  24.205000   9.528750    10    A3  70.925368
## 680    M 11.2350  8.50500 2.94000  91.437500  41.580000     7    A2 280.927804
## 941    M 13.7550 10.71000 4.51500 227.396250 108.841250    11    A4 665.131966
## 968    M 13.0200 10.18500 3.25500 128.687500  52.593750    13    A5 431.641318
## 500    F 13.8600 10.50000 3.46500 151.788750  59.031875    12    A4 504.261450
## 889    M 13.6500 11.02500 3.99000 174.483750  73.193750    11    A4 600.460088
## 344    F 11.5500  9.24000 2.83500 105.437500  54.250000     8    A2 302.556870
## 909    M  8.0850  6.51000 2.10000  36.273750  13.046250    11    A4 110.530035
## 459    F 10.2900  7.66500 3.04500  79.312500  25.186875    10    A3 240.167828
## 20     I  8.4000  6.09000 2.10000  33.437500  15.062500     5    A1 107.427600
## 1032   M 12.7050  9.87000 2.41500 139.250000  49.062500    15    A5 302.837015
## 164    I  9.5550  7.35000 2.83500  67.062500  35.687500     7    A2 199.099924
## 52     I  9.8700  7.45500 2.52000  46.062500  15.750000     6    A1 185.423742
## 534    F 15.2250 11.97000 4.30500 206.486250  95.790000    11    A4 784.557191
## 177    I 10.5000  8.40000 2.52000  77.000000  32.625000     8    A2 222.264000
## 554    F 15.1200 12.07500 4.51500 267.750000 110.274375    12    A4 824.321610
## 827    M 12.4950  9.97500 2.94000 128.812500  60.946875    10    A3 366.434617
## 84     I  9.8700  7.98000 2.62500  60.562500  26.375000     6    A1 206.751825
## 523    F 14.1750 11.34000 4.41000 203.107500  88.322500    11    A4 708.883245
## 633    F 13.8600 10.92000 4.20000 209.500000  85.807500    17    A5 635.675040
## 392    F 13.2300 10.60500 4.09500 163.250000  65.145000     9    A3 574.545494
## 302    I 12.0750  9.45000 2.83500 103.062500  39.677344    11    A4 323.498306
## 597    F 14.8050 11.44500 3.78000 192.437500  77.456250    13    A5 640.495390
## 706    M  9.2400  6.82500 1.68000  51.625000  17.820000     8    A2 105.945840
## 901    M 13.9650 11.02500 3.78000 182.197500  82.258750    12    A4 581.984392
## 874    M  9.8700  7.87500 2.52000  70.953750  27.685000    12    A4 195.870150
## 430    F 12.0750 10.08000 3.46500 134.750000  64.513750     9    A3 421.745940
## 710    M  8.7150  6.61500 2.52000  50.187500  24.626250     8    A2 145.277307
## 761    M 14.4900 11.13000 3.99000 199.437500  83.902500    10    A3 643.482063
## 712    M  7.3500  5.56500 1.89000  28.187500  12.313125     7    A2  77.306197
## 428    F 12.7050 10.29000 3.15000 141.812500  66.470625     9    A3 411.813517
## 672    M  6.5100  4.72500 1.68000  16.812500   6.682500     7    A2  51.676380
## 250    I 13.2300  9.97500 3.04500 132.562500  63.271250    10    A3 401.846366
##           RATIO
## 415  0.19346308
## 463  0.14343894
## 179  0.12747603
## 526  0.13122636
## 195  0.12944935
## 938  0.16500463
## 665  0.12337424
## 602  0.15428666
## 709  0.09128881
## 1011 0.12981280
## 953  0.12813591
## 348  0.15028887
## 1017 0.12766374
## 649  0.07174086
## 989  0.16402153
## 355  0.16717513
## 840  0.12902037
## 26   0.15906101
## 519  0.14828953
## 426  0.16666323
## 1023 0.14502521
## 766  0.13133838
## 211  0.14758583
## 932  0.16420461
## 590  0.11477191
## 593  0.10141681
## 555  0.16156401
## 871  0.15492740
## 373  0.15264669
## 844  0.17219534
## 143  0.12910309
## 544  0.12267039
## 490  0.12367874
## 621  0.08642748
## 775  0.12065236
## 905  0.13584327
## 937  0.13649838
## 842  0.13662783
## 23   0.15464221
## 923  0.13898486
## 956  0.07265343
## 309  0.12424110
## 135  0.11741487
## 821  0.11163702
## 997  0.16521111
## 224  0.14931774
## 166  0.17739599
## 217  0.12089084
## 290  0.12686672
## 581  0.10923231
## 72   0.19939433
## 588  0.12818973
## 575  0.10772041
## 141  0.17334121
## 722  0.13784748
## 865  0.16985355
## 859  0.11627502
## 153  0.10601836
## 294  0.11498944
## 277  0.15294184
## 1035 0.09425347
## 41   0.18562944
## 431  0.15049736
## 90   0.13764357
## 316  0.10797361
## 223  0.14663910
## 528  0.14512796
## 116  0.12582649
## 606  0.10784179
## 774  0.10041590
## 747  0.12259402
## 456  0.11245981
## 598  0.12004534
## 854  0.14067115
## 39   0.15756335
## 159  0.16068408
## 752  0.11837894
## 209  0.14460673
## 374  0.18146196
## 818  0.15775336
## 34   0.13132100
## 516  0.11625966
## 13   0.14175880
## 69   0.13562668
## 895  0.11334327
## 755  0.11721481
## 409  0.19323757
## 308  0.12080144
## 278  0.15187145
## 89   0.23007038
## 928  0.18175438
## 537  0.14227120
## 291  0.10891997
## 424  0.16128494
## 880  0.13776396
## 286  0.12240700
## 908  0.13414867
## 671  0.14239650
## 121  0.11604961
## 110  0.18892157
## 158  0.10801658
## 64   0.16017397
## 483  0.19861350
## 910  0.16781572
## 477  0.14162281
## 480  0.15983967
## 711  0.14121847
## 67   0.22495771
## 663  0.16125174
## 890  0.18692208
## 847  0.14907484
## 85   0.16163137
## 165  0.16565068
## 648  0.07937286
## 51   0.13051535
## 74   0.17162850
## 178  0.13928811
## 362  0.14491411
## 236  0.10773147
## 610  0.11755425
## 330  0.15259804
## 726  0.15369781
## 127  0.16333777
## 212  0.11617534
## 686  0.15586635
## 785  0.18905614
## 814  0.12818235
## 310  0.12563111
## 744  0.17757459
## 878  0.12593986
## 243  0.15848872
## 862  0.11447803
## 926  0.07171905
## 792  0.11058611
## 113  0.12156850
## 619  0.11761840
## 1013 0.12643359
## 151  0.14462127
## 666  0.15507532
## 614  0.12781776
## 767  0.17039226
## 160  0.19792829
## 391  0.16224714
## 155  0.12946284
## 1024 0.09817974
## 5    0.16527932
## 326  0.08447524
## 784  0.13695902
## 280  0.13974422
## 800  0.09740190
## 789  0.13099541
## 567  0.15416432
## 843  0.11873051
## 238  0.12302546
## 764  0.13071999
## 339  0.13268123
## 962  0.15849465
## 822  0.15167217
## 137  0.19841104
## 455  0.11407361
## 738  0.15385467
## 560  0.12573378
## 589  0.12327353
## 83   0.12471819
## 696  0.15256159
## 942  0.15502003
## 196  0.16150810
## 769  0.13434897
## 680  0.14800956
## 941  0.16363858
## 968  0.12184596
## 500  0.11706601
## 889  0.12189611
## 344  0.17930513
## 909  0.11803353
## 459  0.10487198
## 20   0.14021071
## 1032 0.16200959
## 164  0.17924417
## 52   0.08494058
## 534  0.12209435
## 177  0.14678490
## 554  0.13377591
## 827  0.16632401
## 84   0.12756840
## 523  0.12459386
## 633  0.13498642
## 392  0.11338528
## 302  0.12265085
## 597  0.12093178
## 706  0.16819915
## 901  0.14134185
## 874  0.14134364
## 430  0.15296828
## 710  0.16951202
## 761  0.13038825
## 712  0.15927733
## 428  0.16140953
## 672  0.12931440
## 250  0.15745134


Section 2: (5 points) Summarizing the data using graphics.

(2)(a) (1 point) Use “mydata” to plot WHOLE versus VOLUME. Color code data points by CLASS.

(2)(b) (2 points) Use “mydata” to plot SHUCK versus WHOLE with WHOLE on the horizontal axis. Color code data points by CLASS. As an aid to interpretation, determine the maximum value of the ratio of SHUCK to WHOLE. Add to the chart a straight line with zero intercept using this maximum value as the slope of the line. If you are using the ‘base R’ plot() function, you may use abline() to add this line to the plot. Use help(abline) in R to determine the coding for the slope and intercept arguments in the functions. If you are using ggplot2 for visualizations, geom_abline() should be used.

Essay Question (2 points): How does the variability in this plot differ from the plot in (a)? Compare the two displays. Keep in mind that SHUCK is a part of WHOLE. Consider the location of the different age classes.

Answer: When comparing the two displays, they data points show an almost identical trend: as the x axis increases, the y axis increases. However, each graph is comparing two different variables. In the first graph, volume is being compared to whole weight (cm^3 vs. g). The older classes, A4 and A5, identified in darker and bolder colors, show volumes that start at 200 cm^3. Most of the younger classes, with smaller whole weights, show smaller volumes. This demonstrates that the younger abalones are smaller in volume and, therefore, smaller in whole weight. The older abalones are larger in volume and, therefore, larger in whole weight. In the second graph, shuck weight is being compared to whole weight (g vs. g). The older classes, A4 and A5, identified in darker and bolder colors, show larger whole weights that have smaller shuck weights. Most of the younger classes, with smaller whole weights, show larger shuck weights. This demonstrates that, although the younger abalones generally weigh less than the older ones, you get more meat (shucked weight) out of them compared to the older abalones.


Section 3: (8 points) Getting insights about the data using graphs.

(3)(a) (2 points) Use “mydata” to create a multi-figured plot with histograms, boxplots and Q-Q plots of RATIO differentiated by sex. This can be done using par(mfrow = c(3,3)) and base R or grid.arrange() and ggplot2. The first row would show the histograms, the second row the boxplots and the third row the Q-Q plots. Be sure these displays are legible.

Essay Question (2 points): Compare the displays. How do the distributions compare to normality? Take into account the criteria discussed in the sync sessions to evaluate non-normality.

Answer: When looking at the histograms, both female and infant seem to have a relatively normal distribution. Male, however, seems to have a slightly skewed result with a spike in the data, around 0.135 gm/cm^3. When looking at the boxplots, it’s evident that there are some outliers. Both male and infant seem to have some outliers in the upper ratio values. But female has outliers both in the upper and lower ends of the data. This is also confirmed in the qqplots. In each ggplot, there are several points at the far right of the qqline that do not fall on the qqline. This is also evident on the far left of the female qqplot. The data in the qqplots for both male and infant also deviate from the qqline towards the far right, indicating that there might be some skewness.

(3)(b) (2 points) Use the boxplots to identify RATIO outliers (mild and extreme both) for each sex. Present the abalones with these outlying RATIO values along with their associated variables in “mydata” (Hint: display the observations by passing a data frame to the kable() function).

SEX LENGTH DIAM HEIGHT WHOLE SHUCK RINGS CLASS VOLUME RATIO
350 F 7.980 6.720 2.415 80.93750 40.37500 7 A2 129.505824 0.3117620
379 F 15.330 11.970 3.465 252.06250 134.89812 10 A3 635.827846 0.2121614
420 F 11.550 7.980 3.465 150.62500 68.55375 10 A3 319.365585 0.2146560
421 F 13.125 10.290 2.310 142.00000 66.47062 9 A3 311.979938 0.2130606
458 F 11.445 8.085 3.150 139.81250 68.49062 9 A3 291.478399 0.2349767
586 F 12.180 9.450 4.935 133.87500 38.25000 14 A5 568.023435 0.0673388
3 I 10.080 7.350 2.205 79.37500 44.00000 6 A1 163.364040 0.2693371
37 I 4.305 3.255 0.945 6.18750 2.93750 3 A1 13.242072 0.2218308
42 I 2.835 2.730 0.840 3.62500 1.56250 4 A1 6.501222 0.2403394
58 I 6.720 4.305 1.680 22.62500 11.00000 5 A1 48.601728 0.2263294
67 I 5.040 3.675 0.945 9.65625 3.93750 5 A1 17.503290 0.2249577
89 I 3.360 2.310 0.525 2.43750 0.93750 4 A1 4.074840 0.2300704
105 I 6.930 4.725 1.575 23.37500 11.81250 7 A2 51.572194 0.2290478
200 I 9.135 6.300 2.520 74.56250 32.37500 8 A2 145.027260 0.2232339
746 M 13.440 10.815 1.680 130.25000 63.73125 10 A3 244.194048 0.2609861
754 M 10.500 7.770 3.150 132.68750 61.13250 9 A3 256.992750 0.2378764
803 M 10.710 8.610 3.255 160.31250 70.41375 9 A3 300.153640 0.2345924
810 M 12.285 9.870 3.465 176.12500 99.00000 10 A3 420.141472 0.2356349
852 M 11.550 8.820 3.360 167.56250 78.27187 10 A3 342.286560 0.2286735

Essay Question (2 points): What are your observations regarding the results in (3)(b)?

Answer: From looking at the table above, numerical parts like length, diameter, height, whole weight, shuch weight, rings, and volume all seem to be spread across the range of each variable. However, there were two trends that I noticed. First, most outliers were infants. Second, most outliers were from younger age classifications. This leads me to believe that, since these outliers are based on ratio, which was calculated from shuck weight divided by volume, younger abalones and infant abalones sometimes provide too much or too little meat when shucked, realtive to their volume. This could be attributed to their maturity.


Section 4: (8 points) Getting insights about possible predictors.

(4)(a) (3 points) With “mydata,” display side-by-side boxplots for VOLUME and WHOLE, each differentiated by CLASS There should be five boxes for VOLUME and five for WHOLE. Also, display side-by-side scatterplots: VOLUME and WHOLE versus RINGS. Present these four figures in one graphic: the boxplots in one row and the scatterplots in a second row. Base R or ggplot2 may be used.

Essay Question (5 points) How well do you think these variables would perform as predictors of age? Explain.

Answer: Overall, these variables would not perform well as predictors of age. When looking at the box plots, all of the actual boxes are small, indicating that most of the volume and whole weight data for each age classification is around a certain value. However, classes A3, A4, and A5 all have their boxes around the same volume (around 350-550 cm^3) and around the same weight (around 100-150 g). Each of those classes also have long ‘whiskers’ that extend, sometimes, throughout the entire range of values. This indicates that there is also a fair amount of data around the mean. This indicates that if you found an abalone that was 450 cm^3 in volume and 125 g in whole weight, these box plots would tell you that the abalone could belong to A3, A4, or A5. This is also demonstrated in the scatterplots where the right side of the graph shows the data widely dispersed. However, classes A1 and, a bit, A2 show better prospects. In the box plots, they not only each have a small box, where data is mostly gathered around a single value, but also their ‘whiskers’ don’t reach as far as the other three classes. This shows that the data is more condensed around a single value, which means that each class generally has a common whole weight and volume. This is also demonstrated in the scatterplots where the left side of each graph comes almost to a point. This point, if extrapolated, would form line, indicating a trend. So overall, these variables would not perform well as predictors of age. But if the abalone is small in volume and small in whole weight, these variables would perform well.


Section 5: (12 points) Getting insights regarding different groups in the data.

(5)(a) (2 points) Use aggregate() with “mydata” to compute the mean values of VOLUME, SHUCK and RATIO for each combination of SEX and CLASS. Then, using matrix(), create matrices of the mean values. Using the “dimnames” argument within matrix() or the rownames() and colnames() functions on the matrices, label the rows by SEX and columns by CLASS. Present the three matrices (Kabacoff Section 5.6.2, p. 110-111). The kable() function is useful for this purpose. You do not need to be concerned with the number of digits presented.

## [1] "Volume"
##         Class A1 Class A2 Class A3 Class A4 Class A5
## Female 255.29938 276.8573 412.6079 498.0489 486.1525
## Infant  66.51618 160.3200 270.7406 316.4129 318.6930
## Male   103.72320 245.3857 358.1181 442.6155 440.2074
## [1] "Shuck Weight"
##        Class A1 Class A2 Class A3 Class A4 Class A5
## Female 38.90000 42.50305 59.69121 69.05161 59.17076
## Infant 10.11332 23.41024 37.17969 39.85369 36.47047
## Male   16.39583 38.33855 52.96933 61.42726 55.02762
## [1] "Ratio"
##         Class A1  Class A2  Class A3  Class A4  Class A5
## Female 0.1546644 0.1554605 0.1450304 0.1379609 0.1233605
## Infant 0.1569554 0.1475600 0.1372256 0.1244413 0.1167649
## Male   0.1512698 0.1564017 0.1462123 0.1364881 0.1262089
Class A1 Class A2 Class A3 Class A4 Class A5
Female 255.2993762 276.8573127 412.6079448 498.0488860 486.1525267
Infant 66.5161784 160.3199911 270.7406333 316.4129246 318.6929873
Male 103.7232000 245.3857109 358.1181100 442.6155218 440.2073625
Female 38.9000000 42.5030488 59.6912087 69.0516082 59.1707630
Infant 10.1133242 23.4102444 37.1796923 39.8536875 36.4704743
Male 16.3958333 38.3385484 52.9693269 61.4272647 55.0276187
Female 0.1546644 0.1554605 0.1450304 0.1379609 0.1233605
Infant 0.1569554 0.1475600 0.1372256 0.1244413 0.1167649
Male 0.1512698 0.1564017 0.1462123 0.1364881 0.1262089

(5)(b) (3 points) Present three graphs. Each graph should include three lines, one for each sex. The first should show mean RATIO versus CLASS; the second, mean VOLUME versus CLASS; the third, mean SHUCK versus CLASS. This may be done with the ‘base R’ interaction.plot() function or with ggplot2 using grid.arrange().

Essay Question (2 points): What questions do these plots raise? Consider aging and sex differences.

Answer: The first graph demonstrates that as abalones mature, there is a lower average shuck meat to volume ratio that is gathered from them. This raises the question of why shuck weight/volume decreases as the abalone ages. It also shows infant abalones having the lowest shuck weight for almost all classes. This could be an indication of maturity in the animal. It would raise the questions of when does the abalone mature, is it worth harvesting older abalones if they yield small amounts of meat, and what happens to the infants that never mature at A5. The second graph demonstrates that as abalones mature, they grow in average volume. This is universal throughout all sexes and classes. However, it looks like females generally start young with higher volumes than the rest and infants start young with lower volumes than the rest. This would probably indicate some sort of sexual dwarfism in the species. Females in some species generally are larger in size because of the burden of offspring-bearing. Infants being smaller in volume when they are young makes sense. Babies in most species are small. But it raises the question of why do they stop growing. Certain species of goldfish can grow to fit their environment. But with this second graph, perhaps abalones stop growing around class A4. The third graph demonstrates that as abalones mature, they grow in average shuck weight. This is understandable to me, as other livestock animals exhibit the same as they grow (pigs, chickens, and cows, for example). However, just like the first graph, it shows that the oldest abalones start to have less shuck weight compared to when they were younger. This raises the question of what happens to the abalones in their lifecycle around class A4 and A5. Overall, these three graphs show some interesting trends and raise some interesting questions.

5(c) (3 points) Present four boxplots using par(mfrow = c(2, 2) or grid.arrange(). The first line should show VOLUME by RINGS for the infants and, separately, for the adult; factor levels “M” and “F,” combined. The second line should show WHOLE by RINGS for the infants and, separately, for the adults. Since the data are sparse beyond 15 rings, limit the displays to less than 16 rings. One way to accomplish this is to generate a new data set using subset() to select RINGS < 16. Use ylim = c(0, 1100) for VOLUME and ylim = c(0, 400) for WHOLE. If you wish to reorder the displays for presentation purposes or use ggplot2 go ahead.

Essay Question (2 points): What do these displays suggest about abalone growth? Also, compare the infant and adult displays. What differences stand out?

Answer: These displays suggest that as an abalone grows it’s growth rings, it also grows in whole weigh (g) and in volume (g). However, when comparing infant to adult, infants have much less variety in their whole weights and volumes. Infants stay below 200 g of whole weight their whole lives. Adults can have over 200 g. Infants stay below 600 cm^3 in volume their whole lives. Adults can have over 600 cm^3. Those differences stand out to me the most but I am curious about the drop off in each graph around ring 15. Abalones with 15 rings tend to have the same whole weight and volume as abalones with 8 or 9 rings. I wonder if that is an indication of abalone ‘retirement’.


Section 6: (11 points) Conclusions from the Exploratory Data Analysis (EDA).

Conclusions

Essay Question 1) (5 points) Based solely on these data, what are plausible statistical reasons that explain the failure of the original study? Consider to what extent physical measurements may be used for age prediction.

Answer: The original study wanted to predict the age of abalone from physical measurements to avoid the necessity of counting growth rings for aging. However, the study was not successful. The study organizers stated that more information and more variables of abalone data are needed in order to draw a more solid conclusion. I agree with the study organizers that this was the main problem of the study. Rings were the original indicator of age. Yet, there were outliers and there were confusing indications that both infants and adults can have a number of rings. I think that more information is needed to clarify the present variables. Statistically, with the outliers and sometimes skewed data, it’s easy to understand why solid conclusions couldn’t be made. Also with how observational studies are, the really is no tinkering with the environment or testing. All data is from observation, which also means that observational studies rely heavily on that information. I think that the variables chosen were great but they could have added more. The study organizers mentioned looking into the food that is readily available to the abalones and tracking weather patterns and locations. Apart from those, I think that characterizing the water near abalones would also help indicate if the water is oxiginated enough. Perhaps some tend to place themselves near undersea vents of some kind. Tracking any sort of natural predators would be interesting as well. I also think looking into the possibility of abalones having any sort of symbiotic relationship with neighboring species or even abalones being invasive would prove interesting as well. Overall, I can see where this study went wrong. But I have hope that future abalone studies!

Essay Question 2) (3 points) Do not refer to the abalone data or study. If you were presented with an overall histogram and summary statistics from a sample of some population or phenomenon and no other information, what questions might you ask before accepting them as representative of the sampled population or phenomenon?

Answer: Before diving into the histogram and the summary statistics, I would want to know how the data was gathered and what methods were used to pick the sample. I would want to be confident in the origin of the data and the accuracy of the sample selection. Then I would take a look at the histogram. A histogram may be a good visualization to check the shape of the data for any skewness. I would take it into account but would want to see a scatterplot. As great as a histogram is, it does not show all trends in the data. A scatterplot would show me more. Then I would move onto the summary statistics. I would look carefully at the summary statistics to confirm any trends I saw in the histogram and scatterplot. Finally, as great as summary statistics are, they only provide a bird’s eye view on a dataset in numerical form. If I were presented with just summary statistics, I would also want to see a box plot to see a visual representation of the summary statistics. Numbers may indicate some things. But the visualization might illuminate something. I would want to see both data summarized, like the statistics, and the data detailed, like the scatterplot, to make sure that any trends can be found.

Essay Question 3) (3 points) Do not refer to the abalone data or study. What do you see as difficulties analyzing data derived from observational studies? Can causality be determined? What might be learned from such studies?

Answer: Observational studies have the issue of the necessary removal of control of the observer or researcher. They are a needed method for observing many animals since observational studies are sometimes needed for ethical concerns or logistical restraints. There is no action being caused to study a reaction. The action and reaction are already happening, possibly continuously, consecutively, or sequentially. The observer must make sense of the environment without intrusion. Nothing can be tested or adjusted. Because of this, it’s really hard for the observer to truly be sure that one thing is causing the other. As a result, most observational study theories can only be tested and proven through conducting more observational studies. I think that from this, we can learn that data must have a clear indication of causality through trials and testing to generate substantial results.