3.1.

The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe.

The data can be accessed via:

data(Glass)
str(Glass)
## 'data.frame':    214 obs. of  10 variables:
##  $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
##  $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
##  $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
##  $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
##  $ Si  : num  71.8 72.7 73 72.6 73.1 ...
##  $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
##  $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
##  $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
##  $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
head(Glass)
##        RI    Na   Mg   Al    Si    K   Ca Ba   Fe Type
## 1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75  0 0.00    1
## 2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83  0 0.00    1
## 3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78  0 0.00    1
## 4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22  0 0.00    1
## 5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07  0 0.00    1
## 6 1.51596 12.79 3.61 1.62 72.97 0.64 8.07  0 0.26    1

(a)

Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.

Glass %>%
  keep(is.numeric) %>%
  gather() %>%
  ggplot(aes(value)) + 
  geom_histogram(bins = 15, fill = "skyblue") + 
  facet_wrap(~key, scales = 'free', ncol = 5)

Glass %>%
  keep(is.numeric) %>%
  gather() %>%
  ggplot(aes(value)) + 
  geom_boxplot(fill = "orange") + 
  facet_wrap(~key, scales = 'free', ncol = 5) 

ggplot(Glass, aes(x = Type, fill = Type)) + 
  geom_bar() +
  scale_fill_viridis(discrete = TRUE)  

cor(Glass[,-10])
##               RI          Na           Mg          Al          Si            K
## RI  1.0000000000 -0.19188538 -0.122274039 -0.40732603 -0.54205220 -0.289832711
## Na -0.1918853790  1.00000000 -0.273731961  0.15679367 -0.06980881 -0.266086504
## Mg -0.1222740393 -0.27373196  1.000000000 -0.48179851 -0.16592672  0.005395667
## Al -0.4073260341  0.15679367 -0.481798509  1.00000000 -0.00552372  0.325958446
## Si -0.5420521997 -0.06980881 -0.165926723 -0.00552372  1.00000000 -0.193330854
## K  -0.2898327111 -0.26608650  0.005395667  0.32595845 -0.19333085  1.000000000
## Ca  0.8104026963 -0.27544249 -0.443750026 -0.25959201 -0.20873215 -0.317836155
## Ba -0.0003860189  0.32660288 -0.492262118  0.47940390 -0.10215131 -0.042618059
## Fe  0.1430096093 -0.24134641  0.083059529 -0.07440215 -0.09420073 -0.007719049
##            Ca            Ba           Fe
## RI  0.8104027 -0.0003860189  0.143009609
## Na -0.2754425  0.3266028795 -0.241346411
## Mg -0.4437500 -0.4922621178  0.083059529
## Al -0.2595920  0.4794039017 -0.074402151
## Si -0.2087322 -0.1021513105 -0.094200731
## K  -0.3178362 -0.0426180594 -0.007719049
## Ca  1.0000000 -0.1128409671  0.124968219
## Ba -0.1128410  1.0000000000 -0.058691755
## Fe  0.1249682 -0.0586917554  1.000000000
point <- as.integer(Glass$Type)
pairs(Glass[,-10], pch=point, col=Glass$Type) 

cor_matrix <- cor(select_if(Glass, is.numeric))
cor_matrix
##               RI          Na           Mg          Al          Si            K
## RI  1.0000000000 -0.19188538 -0.122274039 -0.40732603 -0.54205220 -0.289832711
## Na -0.1918853790  1.00000000 -0.273731961  0.15679367 -0.06980881 -0.266086504
## Mg -0.1222740393 -0.27373196  1.000000000 -0.48179851 -0.16592672  0.005395667
## Al -0.4073260341  0.15679367 -0.481798509  1.00000000 -0.00552372  0.325958446
## Si -0.5420521997 -0.06980881 -0.165926723 -0.00552372  1.00000000 -0.193330854
## K  -0.2898327111 -0.26608650  0.005395667  0.32595845 -0.19333085  1.000000000
## Ca  0.8104026963 -0.27544249 -0.443750026 -0.25959201 -0.20873215 -0.317836155
## Ba -0.0003860189  0.32660288 -0.492262118  0.47940390 -0.10215131 -0.042618059
## Fe  0.1430096093 -0.24134641  0.083059529 -0.07440215 -0.09420073 -0.007719049
##            Ca            Ba           Fe
## RI  0.8104027 -0.0003860189  0.143009609
## Na -0.2754425  0.3266028795 -0.241346411
## Mg -0.4437500 -0.4922621178  0.083059529
## Al -0.2595920  0.4794039017 -0.074402151
## Si -0.2087322 -0.1021513105 -0.094200731
## K  -0.3178362 -0.0426180594 -0.007719049
## Ca  1.0000000 -0.1128409671  0.124968219
## Ba -0.1128410  1.0000000000 -0.058691755
## Fe  0.1249682 -0.0586917554  1.000000000
corrplot(cor_matrix, type = "upper", diag = FALSE)

Summarizing the notable correlations, the correlation between RI and Ca is high (0.8104027), indicating a strong positive relationship. The correlation between Mg and Ba is negative (-0.4922621), indicating a strong negative relationship. The correlation between Al and Mg is negative (-0.4817985), indicating a moderate negative relationship. Finally, the correlation between Al and Ba is positive (0.4794039), indicating a moderate positive relationship.

(b)

Do there appear to be any outliers in the data? Are any predictors skewed?

Outliers are present in all elements except for magnesium (Mg). Barium (Ba), iron (Fe), and potassium (K) exhibit the most severe outliers. Additionally, Ba, Fe, and K demonstrate right-skewed distributions, while Mg displays a left-skewed distribution.

outlier_values_Ba <- boxplot.stats(Glass$Ba)$out
outlier_values_fe <- boxplot.stats(Glass$Fe)$out
outlier_values_k <- boxplot.stats(Glass$K)$out
outlier_values_Mg <- boxplot.stats(Glass$Mg)$out

#par(mfrow=c(2,2)) 

boxplot(Glass$Ba, main = "Ba", boxwex = 0.5)
mtext(paste("Outliers (Ba): ", paste(outlier_values_Ba, collapse = ", ")), cex = 0.6)

boxplot(Glass$K, main = "K", boxwex = 0.5)
mtext(paste("Outliers (k): ", paste(outlier_values_k, collapse = ", ")), cex = 0.6)

boxplot(Glass$Mg, main = "Mg", boxwex = 0.5)
mtext(paste("Outliers (Mg): ", paste(outlier_values_Mg, collapse = ", ")), cex = 0.6)

boxplot(Glass$Fe, main = "Fe", boxwex = 0.5)
mtext(paste("Outliers (Fe): ", paste(outlier_values_fe, collapse = ", ")), cex = 0.6)

#par(mfrow=c(2,2))

(c)

Are there any relevant transformations of one or more predictors that might improve the classification model?

Transformation using the log() function resulted in a distribution that, while not perfectly normal, displayed a substantial improvement in its graphical representation.

For example, before the log transformation, the skewness value for Fe was 1.729811, indicating a right-skewed distribution. After the log transformation, if the skewness value becomes -1.21, it indicates a left-skewed distribution. The skewness value closer to 0 indicates a distribution closer to normal. Thus, if the skewness value is closer to -1.21 after the log transformation, it suggests a distribution with a longer tail to the left, closer to a normal distribution.

skewness_k <- skewness(Glass$K)
skewness_k
## [1] 6.460089
non_zero_positive_values <- Glass$K[Glass$K > 0]
transformed_data <- log(non_zero_positive_values)

skewness_k <- skewness(transformed_data)
print(skewness_k)
## [1] -0.9404054
plot(density(transformed_data))

describe(transformed_data)
##    vars   n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## X1    1 184 -0.87 0.86  -0.56   -0.77 0.17 -3.91 1.83  5.74 -0.94     1.91 0.06
skewness_Fe <- skewness(Glass$Fe)
skewness_Fe
## [1] 1.729811
non_zero_positive_values <- Glass$Fe[Glass$Fe > 0]
transformed_data <- log(non_zero_positive_values)

transformed_data_Fe <- log(Glass$Fe)

skewness_Fe <- skewness(transformed_data_Fe)

plot(density(transformed_data_Fe))

describe(transformed_data)
##    vars  n  mean   sd median trimmed mad   min   max range  skew kurtosis   se
## X1    1 70 -1.91 0.63   -1.8   -1.86 0.6 -4.61 -0.67  3.93 -1.21     3.34 0.08
skewness_Ba <- skewness(Glass$Ba)
skewness_Ba
## [1] 3.36868
non_zero_positive_values <- Glass$Ba[Glass$Ba > 0]
transformed_data <- log(non_zero_positive_values)

transformed_data_Ba <- log(Glass$Ba)

skewness_Ba <- skewness(transformed_data_Ba)

plot(density(transformed_data_Ba))

describe(transformed_data)
##    vars  n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
## X1    1 38 -0.44 1.07  -0.39   -0.37 1.25 -2.81 1.15  3.96 -0.66    -0.68 0.17

3.2.

The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.

The data can be loaded via:

library(mlbench)
data(Soybean)
head(Soybean)
##                   Class date plant.stand precip temp hail crop.hist area.dam
## 1 diaporthe-stem-canker    6           0      2    1    0         1        1
## 2 diaporthe-stem-canker    4           0      2    1    0         2        0
## 3 diaporthe-stem-canker    3           0      2    1    0         1        0
## 4 diaporthe-stem-canker    3           0      2    1    0         1        0
## 5 diaporthe-stem-canker    6           0      2    1    0         2        0
## 6 diaporthe-stem-canker    5           0      2    1    0         3        0
##   sever seed.tmt germ plant.growth leaves leaf.halo leaf.marg leaf.size
## 1     1        0    0            1      1         0         2         2
## 2     2        1    1            1      1         0         2         2
## 3     2        1    2            1      1         0         2         2
## 4     2        0    1            1      1         0         2         2
## 5     1        0    2            1      1         0         2         2
## 6     1        0    1            1      1         0         2         2
##   leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion
## 1           0         0         0    1       1            3             1
## 2           0         0         0    1       0            3             1
## 3           0         0         0    1       0            3             0
## 4           0         0         0    1       0            3             0
## 5           0         0         0    1       0            3             1
## 6           0         0         0    1       0            3             0
##   fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods
## 1               1         1        0            0         0          0
## 2               1         1        0            0         0          0
## 3               1         1        0            0         0          0
## 4               1         1        0            0         0          0
## 5               1         1        0            0         0          0
## 6               1         1        0            0         0          0
##   fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1           4    0           0             0         0          0     0
## 2           4    0           0             0         0          0     0
## 3           4    0           0             0         0          0     0
## 4           4    0           0             0         0          0     0
## 5           4    0           0             0         0          0     0
## 6           4    0           0             0         0          0     0
summary(Soybean)
##                  Class          date     plant.stand  precip      temp    
##  brown-spot         : 92   5      :149   0   :354    0   : 74   0   : 80  
##  alternarialeaf-spot: 91   4      :131   1   :293    1   :112   1   :374  
##  frog-eye-leaf-spot : 91   3      :118   NA's: 36    2   :459   2   :199  
##  phytophthora-rot   : 88   2      : 93               NA's: 38   NA's: 30  
##  anthracnose        : 44   6      : 90                                    
##  brown-stem-rot     : 44   (Other):101                                    
##  (Other)            :233   NA's   :  1                                    
##    hail     crop.hist  area.dam    sever     seed.tmt     germ     plant.growth
##  0   :435   0   : 65   0   :123   0   :195   0   :305   0   :165   0   :441    
##  1   :127   1   :165   1   :227   1   :322   1   :222   1   :213   1   :226    
##  NA's:121   2   :219   2   :145   2   : 45   2   : 35   2   :193   NA's: 16    
##             3   :218   3   :187   NA's:121   NA's:121   NA's:112               
##             NA's: 16   NA's:  1                                                
##                                                                                
##                                                                                
##  leaves  leaf.halo  leaf.marg  leaf.size  leaf.shread leaf.malf  leaf.mild 
##  0: 77   0   :221   0   :357   0   : 51   0   :487    0   :554   0   :535  
##  1:606   1   : 36   1   : 21   1   :327   1   : 96    1   : 45   1   : 20  
##          2   :342   2   :221   2   :221   NA's:100    NA's: 84   2   : 20  
##          NA's: 84   NA's: 84   NA's: 84                          NA's:108  
##                                                                            
##                                                                            
##                                                                            
##    stem     lodging    stem.cankers canker.lesion fruiting.bodies ext.decay 
##  0   :296   0   :520   0   :379     0   :320      0   :473        0   :497  
##  1   :371   1   : 42   1   : 39     1   : 83      1   :104        1   :135  
##  NA's: 16   NA's:121   2   : 36     2   :177      NA's:106        2   : 13  
##                        3   :191     3   : 65                      NA's: 38  
##                        NA's: 38     NA's: 38                                
##                                                                             
##                                                                             
##  mycelium   int.discolor sclerotia  fruit.pods fruit.spots   seed    
##  0   :639   0   :581     0   :625   0   :407   0   :345    0   :476  
##  1   :  6   1   : 44     1   : 20   1   :130   1   : 75    1   :115  
##  NA's: 38   2   : 20     NA's: 38   2   : 14   2   : 57    NA's: 92  
##             NA's: 38                3   : 48   4   :100              
##                                     NA's: 84   NA's:106              
##                                                                      
##                                                                      
##  mold.growth seed.discolor seed.size  shriveling  roots    
##  0   :524    0   :513      0   :532   0   :539   0   :551  
##  1   : 67    1   : 64      1   : 59   1   : 38   1   : 86  
##  NA's: 92    NA's:106      NA's: 92   NA's:106   2   : 15  
##                                                  NA's: 31  
##                                                            
##                                                            
## 
## See ?Soybean for details

(a)

Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?

columns <- colnames(Soybean)
columns
##  [1] "Class"           "date"            "plant.stand"     "precip"         
##  [5] "temp"            "hail"            "crop.hist"       "area.dam"       
##  [9] "sever"           "seed.tmt"        "germ"            "plant.growth"   
## [13] "leaves"          "leaf.halo"       "leaf.marg"       "leaf.size"      
## [17] "leaf.shread"     "leaf.malf"       "leaf.mild"       "stem"           
## [21] "lodging"         "stem.cankers"    "canker.lesion"   "fruiting.bodies"
## [25] "ext.decay"       "mycelium"        "int.discolor"    "sclerotia"      
## [29] "fruit.pods"      "fruit.spots"     "seed"            "mold.growth"    
## [33] "seed.discolor"   "seed.size"       "shriveling"      "roots"
for (col in columns) {
  print(col)
  print(table(Soybean[[col]]))
}
## [1] "Class"
## 
##                2-4-d-injury         alternarialeaf-spot 
##                          16                          91 
##                 anthracnose            bacterial-blight 
##                          44                          20 
##           bacterial-pustule                  brown-spot 
##                          20                          92 
##              brown-stem-rot                charcoal-rot 
##                          44                          20 
##               cyst-nematode diaporthe-pod-&-stem-blight 
##                          14                          15 
##       diaporthe-stem-canker                downy-mildew 
##                          20                          20 
##          frog-eye-leaf-spot            herbicide-injury 
##                          91                           8 
##      phyllosticta-leaf-spot            phytophthora-rot 
##                          20                          88 
##              powdery-mildew           purple-seed-stain 
##                          20                          20 
##        rhizoctonia-root-rot 
##                          20 
## [1] "date"
## 
##   0   1   2   3   4   5   6 
##  26  75  93 118 131 149  90 
## [1] "plant.stand"
## 
##   0   1 
## 354 293 
## [1] "precip"
## 
##   0   1   2 
##  74 112 459 
## [1] "temp"
## 
##   0   1   2 
##  80 374 199 
## [1] "hail"
## 
##   0   1 
## 435 127 
## [1] "crop.hist"
## 
##   0   1   2   3 
##  65 165 219 218 
## [1] "area.dam"
## 
##   0   1   2   3 
## 123 227 145 187 
## [1] "sever"
## 
##   0   1   2 
## 195 322  45 
## [1] "seed.tmt"
## 
##   0   1   2 
## 305 222  35 
## [1] "germ"
## 
##   0   1   2 
## 165 213 193 
## [1] "plant.growth"
## 
##   0   1 
## 441 226 
## [1] "leaves"
## 
##   0   1 
##  77 606 
## [1] "leaf.halo"
## 
##   0   1   2 
## 221  36 342 
## [1] "leaf.marg"
## 
##   0   1   2 
## 357  21 221 
## [1] "leaf.size"
## 
##   0   1   2 
##  51 327 221 
## [1] "leaf.shread"
## 
##   0   1 
## 487  96 
## [1] "leaf.malf"
## 
##   0   1 
## 554  45 
## [1] "leaf.mild"
## 
##   0   1   2 
## 535  20  20 
## [1] "stem"
## 
##   0   1 
## 296 371 
## [1] "lodging"
## 
##   0   1 
## 520  42 
## [1] "stem.cankers"
## 
##   0   1   2   3 
## 379  39  36 191 
## [1] "canker.lesion"
## 
##   0   1   2   3 
## 320  83 177  65 
## [1] "fruiting.bodies"
## 
##   0   1 
## 473 104 
## [1] "ext.decay"
## 
##   0   1   2 
## 497 135  13 
## [1] "mycelium"
## 
##   0   1 
## 639   6 
## [1] "int.discolor"
## 
##   0   1   2 
## 581  44  20 
## [1] "sclerotia"
## 
##   0   1 
## 625  20 
## [1] "fruit.pods"
## 
##   0   1   2   3 
## 407 130  14  48 
## [1] "fruit.spots"
## 
##   0   1   2   4 
## 345  75  57 100 
## [1] "seed"
## 
##   0   1 
## 476 115 
## [1] "mold.growth"
## 
##   0   1 
## 524  67 
## [1] "seed.discolor"
## 
##   0   1 
## 513  64 
## [1] "seed.size"
## 
##   0   1 
## 532  59 
## [1] "shriveling"
## 
##   0   1 
## 539  38 
## [1] "roots"
## 
##   0   1   2 
## 551  86  15
par(mfrow = c(3,3))
for(i in 2:ncol(Soybean)) {
  plot(Soybean[i], main = colnames(Soybean[i]), col = "lightblue")
}

(b)

Roughly 18 % of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes?

Overall, NA values appear to be most prevalent in data belonging to the ‘phytophthora-rot’ Class. By variable, hail (17.7%), sever (17.7%), seed.tmt (17.7%), lodging (17.7%) appear to be variables with many NA values. This information can serve as a reference for handling NA values when processing or analyzing data.

sum(!complete.cases(Soybean))
## [1] 121
naniar::miss_var_summary(Soybean)
## # A tibble: 36 × 3
##    variable        n_miss pct_miss
##    <chr>            <int>    <dbl>
##  1 hail               121     17.7
##  2 sever              121     17.7
##  3 seed.tmt           121     17.7
##  4 lodging            121     17.7
##  5 germ               112     16.4
##  6 leaf.mild          108     15.8
##  7 fruiting.bodies    106     15.5
##  8 fruit.spots        106     15.5
##  9 seed.discolor      106     15.5
## 10 shriveling         106     15.5
## # ℹ 26 more rows
naniar::vis_miss(Soybean)

naniar::gg_miss_var(Soybean)

naniar::gg_miss_upset(Soybean)

VIM::aggr(Soybean)

na_counts <- sapply(unique(Soybean$Class), function(class_val) {
  sum(is.na(Soybean[Soybean$Class == class_val, ]))
})

result <- data.frame(Class = unique(Soybean$Class), NA_Counts = na_counts)
result <- result[result$NA_Counts != 0, ]
result <- result[order(-result$NA_Counts), ]
result
##                          Class NA_Counts
## 4             phytophthora-rot      1214
## 18                2-4-d-injury       450
## 17               cyst-nematode       336
## 16 diaporthe-pod-&-stem-blight       177
## 19            herbicide-injury       160

(c)

Develop a strategy for handling missing data, either by eliminating predictors or imputation.

Handling missing values can vary depending on the dataset. Options include removing all missing values, replacing them with specific values, or deleting variables with excessive missing values. In this assignment, the focus will be on processing the ‘hail’ variable, which has the largest number of missing values in the Soybean dataset.

For analysis, it is often desirable to work with data that has no missing values.In such cases, the na.omit() function can be useful. It removes any rows with missing values and returns the cleaned dataset.

#Remove rows containing NA
df = na.omit(Soybean)
#Remove rows containing NA for only one specific variable
df1 = Soybean[complete.cases(Soybean$hail),]

The method for substituting missing values allows for various approaches, such as replacing NA with the average, median, minimum, maximum, or the average of neighboring values. The function can identify missing values using the is.na() function and then insert the replacement values accordingly.

Soybean$hail <- ifelse(is.na(Soybean$hail), median(as.numeric(Soybean$hail), na.rm = TRUE), Soybean$hail)
Soybean$hail
##   [1] 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 2 2 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 2 2 2 1 2 1 1 1 1 1 2
## [112] 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1
## [149] 1 1 2 2 1 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 2 1 1 2 2 1 1 1 2 1 2 1
## [186] 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 2 1 2 1 2 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1
## [334] 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [445] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 2 1 1 1 2 1 2 2 1 2 2 1 2 1 2 2 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1
## [519] 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2
## [556] 2 2 1 2 2 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [593] 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1