The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe.
The data can be accessed via:
data(Glass)
str(Glass)
## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
head(Glass)
## RI Na Mg Al Si K Ca Ba Fe Type
## 1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0 0.00 1
## 2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0 0.00 1
## 3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0 0.00 1
## 4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22 0 0.00 1
## 5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07 0 0.00 1
## 6 1.51596 12.79 3.61 1.62 72.97 0.64 8.07 0 0.26 1
Using visualizations, explore the predictor variables to understand their distributions as well as the relationships between predictors.
Glass %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
geom_histogram(bins = 15, fill = "skyblue") +
facet_wrap(~key, scales = 'free', ncol = 5)
Glass %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
geom_boxplot(fill = "orange") +
facet_wrap(~key, scales = 'free', ncol = 5)
ggplot(Glass, aes(x = Type, fill = Type)) +
geom_bar() +
scale_fill_viridis(discrete = TRUE)
cor(Glass[,-10])
## RI Na Mg Al Si K
## RI 1.0000000000 -0.19188538 -0.122274039 -0.40732603 -0.54205220 -0.289832711
## Na -0.1918853790 1.00000000 -0.273731961 0.15679367 -0.06980881 -0.266086504
## Mg -0.1222740393 -0.27373196 1.000000000 -0.48179851 -0.16592672 0.005395667
## Al -0.4073260341 0.15679367 -0.481798509 1.00000000 -0.00552372 0.325958446
## Si -0.5420521997 -0.06980881 -0.165926723 -0.00552372 1.00000000 -0.193330854
## K -0.2898327111 -0.26608650 0.005395667 0.32595845 -0.19333085 1.000000000
## Ca 0.8104026963 -0.27544249 -0.443750026 -0.25959201 -0.20873215 -0.317836155
## Ba -0.0003860189 0.32660288 -0.492262118 0.47940390 -0.10215131 -0.042618059
## Fe 0.1430096093 -0.24134641 0.083059529 -0.07440215 -0.09420073 -0.007719049
## Ca Ba Fe
## RI 0.8104027 -0.0003860189 0.143009609
## Na -0.2754425 0.3266028795 -0.241346411
## Mg -0.4437500 -0.4922621178 0.083059529
## Al -0.2595920 0.4794039017 -0.074402151
## Si -0.2087322 -0.1021513105 -0.094200731
## K -0.3178362 -0.0426180594 -0.007719049
## Ca 1.0000000 -0.1128409671 0.124968219
## Ba -0.1128410 1.0000000000 -0.058691755
## Fe 0.1249682 -0.0586917554 1.000000000
point <- as.integer(Glass$Type)
pairs(Glass[,-10], pch=point, col=Glass$Type)
cor_matrix <- cor(select_if(Glass, is.numeric))
cor_matrix
## RI Na Mg Al Si K
## RI 1.0000000000 -0.19188538 -0.122274039 -0.40732603 -0.54205220 -0.289832711
## Na -0.1918853790 1.00000000 -0.273731961 0.15679367 -0.06980881 -0.266086504
## Mg -0.1222740393 -0.27373196 1.000000000 -0.48179851 -0.16592672 0.005395667
## Al -0.4073260341 0.15679367 -0.481798509 1.00000000 -0.00552372 0.325958446
## Si -0.5420521997 -0.06980881 -0.165926723 -0.00552372 1.00000000 -0.193330854
## K -0.2898327111 -0.26608650 0.005395667 0.32595845 -0.19333085 1.000000000
## Ca 0.8104026963 -0.27544249 -0.443750026 -0.25959201 -0.20873215 -0.317836155
## Ba -0.0003860189 0.32660288 -0.492262118 0.47940390 -0.10215131 -0.042618059
## Fe 0.1430096093 -0.24134641 0.083059529 -0.07440215 -0.09420073 -0.007719049
## Ca Ba Fe
## RI 0.8104027 -0.0003860189 0.143009609
## Na -0.2754425 0.3266028795 -0.241346411
## Mg -0.4437500 -0.4922621178 0.083059529
## Al -0.2595920 0.4794039017 -0.074402151
## Si -0.2087322 -0.1021513105 -0.094200731
## K -0.3178362 -0.0426180594 -0.007719049
## Ca 1.0000000 -0.1128409671 0.124968219
## Ba -0.1128410 1.0000000000 -0.058691755
## Fe 0.1249682 -0.0586917554 1.000000000
corrplot(cor_matrix, type = "upper", diag = FALSE)
Summarizing the notable correlations, the correlation between RI
and Ca is high (0.8104027), indicating a strong positive relationship.
The correlation between Mg and Ba is negative (-0.4922621), indicating a
strong negative relationship. The correlation between Al and Mg is
negative (-0.4817985), indicating a moderate negative relationship.
Finally, the correlation between Al and Ba is positive (0.4794039),
indicating a moderate positive relationship.
Do there appear to be any outliers in the data? Are any predictors skewed?
Outliers are present in all elements except for magnesium (Mg). Barium (Ba), iron (Fe), and potassium (K) exhibit the most severe outliers. Additionally, Ba, Fe, and K demonstrate right-skewed distributions, while Mg displays a left-skewed distribution.
outlier_values_Ba <- boxplot.stats(Glass$Ba)$out
outlier_values_fe <- boxplot.stats(Glass$Fe)$out
outlier_values_k <- boxplot.stats(Glass$K)$out
outlier_values_Mg <- boxplot.stats(Glass$Mg)$out
#par(mfrow=c(2,2))
boxplot(Glass$Ba, main = "Ba", boxwex = 0.5)
mtext(paste("Outliers (Ba): ", paste(outlier_values_Ba, collapse = ", ")), cex = 0.6)
boxplot(Glass$K, main = "K", boxwex = 0.5)
mtext(paste("Outliers (k): ", paste(outlier_values_k, collapse = ", ")), cex = 0.6)
boxplot(Glass$Mg, main = "Mg", boxwex = 0.5)
mtext(paste("Outliers (Mg): ", paste(outlier_values_Mg, collapse = ", ")), cex = 0.6)
boxplot(Glass$Fe, main = "Fe", boxwex = 0.5)
mtext(paste("Outliers (Fe): ", paste(outlier_values_fe, collapse = ", ")), cex = 0.6)
#par(mfrow=c(2,2))
Are there any relevant transformations of one or more predictors that might improve the classification model?
Transformation using the log() function resulted in a distribution that, while not perfectly normal, displayed a substantial improvement in its graphical representation.
For example, before the log transformation, the skewness value for Fe was 1.729811, indicating a right-skewed distribution. After the log transformation, if the skewness value becomes -1.21, it indicates a left-skewed distribution. The skewness value closer to 0 indicates a distribution closer to normal. Thus, if the skewness value is closer to -1.21 after the log transformation, it suggests a distribution with a longer tail to the left, closer to a normal distribution.
skewness_k <- skewness(Glass$K)
skewness_k
## [1] 6.460089
non_zero_positive_values <- Glass$K[Glass$K > 0]
transformed_data <- log(non_zero_positive_values)
skewness_k <- skewness(transformed_data)
print(skewness_k)
## [1] -0.9404054
plot(density(transformed_data))
describe(transformed_data)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 184 -0.87 0.86 -0.56 -0.77 0.17 -3.91 1.83 5.74 -0.94 1.91 0.06
skewness_Fe <- skewness(Glass$Fe)
skewness_Fe
## [1] 1.729811
non_zero_positive_values <- Glass$Fe[Glass$Fe > 0]
transformed_data <- log(non_zero_positive_values)
transformed_data_Fe <- log(Glass$Fe)
skewness_Fe <- skewness(transformed_data_Fe)
plot(density(transformed_data_Fe))
describe(transformed_data)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 70 -1.91 0.63 -1.8 -1.86 0.6 -4.61 -0.67 3.93 -1.21 3.34 0.08
skewness_Ba <- skewness(Glass$Ba)
skewness_Ba
## [1] 3.36868
non_zero_positive_values <- Glass$Ba[Glass$Ba > 0]
transformed_data <- log(non_zero_positive_values)
transformed_data_Ba <- log(Glass$Ba)
skewness_Ba <- skewness(transformed_data_Ba)
plot(density(transformed_data_Ba))
describe(transformed_data)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 38 -0.44 1.07 -0.39 -0.37 1.25 -2.81 1.15 3.96 -0.66 -0.68 0.17
The soybean data can also be found at the UC Irvine Machine Learning Repository. Data were collected to predict disease in 683 soybeans. The 35 predictors are mostly categorical and include information on the environmental conditions (e.g., temperature, precipitation) and plant conditions (e.g., left spots, mold growth). The outcome labels consist of 19 distinct classes.
The data can be loaded via:
library(mlbench)
data(Soybean)
head(Soybean)
## Class date plant.stand precip temp hail crop.hist area.dam
## 1 diaporthe-stem-canker 6 0 2 1 0 1 1
## 2 diaporthe-stem-canker 4 0 2 1 0 2 0
## 3 diaporthe-stem-canker 3 0 2 1 0 1 0
## 4 diaporthe-stem-canker 3 0 2 1 0 1 0
## 5 diaporthe-stem-canker 6 0 2 1 0 2 0
## 6 diaporthe-stem-canker 5 0 2 1 0 3 0
## sever seed.tmt germ plant.growth leaves leaf.halo leaf.marg leaf.size
## 1 1 0 0 1 1 0 2 2
## 2 2 1 1 1 1 0 2 2
## 3 2 1 2 1 1 0 2 2
## 4 2 0 1 1 1 0 2 2
## 5 1 0 2 1 1 0 2 2
## 6 1 0 1 1 1 0 2 2
## leaf.shread leaf.malf leaf.mild stem lodging stem.cankers canker.lesion
## 1 0 0 0 1 1 3 1
## 2 0 0 0 1 0 3 1
## 3 0 0 0 1 0 3 0
## 4 0 0 0 1 0 3 0
## 5 0 0 0 1 0 3 1
## 6 0 0 0 1 0 3 0
## fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods
## 1 1 1 0 0 0 0
## 2 1 1 0 0 0 0
## 3 1 1 0 0 0 0
## 4 1 1 0 0 0 0
## 5 1 1 0 0 0 0
## 6 1 1 0 0 0 0
## fruit.spots seed mold.growth seed.discolor seed.size shriveling roots
## 1 4 0 0 0 0 0 0
## 2 4 0 0 0 0 0 0
## 3 4 0 0 0 0 0 0
## 4 4 0 0 0 0 0 0
## 5 4 0 0 0 0 0 0
## 6 4 0 0 0 0 0 0
summary(Soybean)
## Class date plant.stand precip temp
## brown-spot : 92 5 :149 0 :354 0 : 74 0 : 80
## alternarialeaf-spot: 91 4 :131 1 :293 1 :112 1 :374
## frog-eye-leaf-spot : 91 3 :118 NA's: 36 2 :459 2 :199
## phytophthora-rot : 88 2 : 93 NA's: 38 NA's: 30
## anthracnose : 44 6 : 90
## brown-stem-rot : 44 (Other):101
## (Other) :233 NA's : 1
## hail crop.hist area.dam sever seed.tmt germ plant.growth
## 0 :435 0 : 65 0 :123 0 :195 0 :305 0 :165 0 :441
## 1 :127 1 :165 1 :227 1 :322 1 :222 1 :213 1 :226
## NA's:121 2 :219 2 :145 2 : 45 2 : 35 2 :193 NA's: 16
## 3 :218 3 :187 NA's:121 NA's:121 NA's:112
## NA's: 16 NA's: 1
##
##
## leaves leaf.halo leaf.marg leaf.size leaf.shread leaf.malf leaf.mild
## 0: 77 0 :221 0 :357 0 : 51 0 :487 0 :554 0 :535
## 1:606 1 : 36 1 : 21 1 :327 1 : 96 1 : 45 1 : 20
## 2 :342 2 :221 2 :221 NA's:100 NA's: 84 2 : 20
## NA's: 84 NA's: 84 NA's: 84 NA's:108
##
##
##
## stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay
## 0 :296 0 :520 0 :379 0 :320 0 :473 0 :497
## 1 :371 1 : 42 1 : 39 1 : 83 1 :104 1 :135
## NA's: 16 NA's:121 2 : 36 2 :177 NA's:106 2 : 13
## 3 :191 3 : 65 NA's: 38
## NA's: 38 NA's: 38
##
##
## mycelium int.discolor sclerotia fruit.pods fruit.spots seed
## 0 :639 0 :581 0 :625 0 :407 0 :345 0 :476
## 1 : 6 1 : 44 1 : 20 1 :130 1 : 75 1 :115
## NA's: 38 2 : 20 NA's: 38 2 : 14 2 : 57 NA's: 92
## NA's: 38 3 : 48 4 :100
## NA's: 84 NA's:106
##
##
## mold.growth seed.discolor seed.size shriveling roots
## 0 :524 0 :513 0 :532 0 :539 0 :551
## 1 : 67 1 : 64 1 : 59 1 : 38 1 : 86
## NA's: 92 NA's:106 NA's: 92 NA's:106 2 : 15
## NA's: 31
##
##
##
## See ?Soybean for details
Investigate the frequency distributions for the categorical predictors. Are any of the distributions degenerate in the ways discussed earlier in this chapter?
columns <- colnames(Soybean)
columns
## [1] "Class" "date" "plant.stand" "precip"
## [5] "temp" "hail" "crop.hist" "area.dam"
## [9] "sever" "seed.tmt" "germ" "plant.growth"
## [13] "leaves" "leaf.halo" "leaf.marg" "leaf.size"
## [17] "leaf.shread" "leaf.malf" "leaf.mild" "stem"
## [21] "lodging" "stem.cankers" "canker.lesion" "fruiting.bodies"
## [25] "ext.decay" "mycelium" "int.discolor" "sclerotia"
## [29] "fruit.pods" "fruit.spots" "seed" "mold.growth"
## [33] "seed.discolor" "seed.size" "shriveling" "roots"
for (col in columns) {
print(col)
print(table(Soybean[[col]]))
}
## [1] "Class"
##
## 2-4-d-injury alternarialeaf-spot
## 16 91
## anthracnose bacterial-blight
## 44 20
## bacterial-pustule brown-spot
## 20 92
## brown-stem-rot charcoal-rot
## 44 20
## cyst-nematode diaporthe-pod-&-stem-blight
## 14 15
## diaporthe-stem-canker downy-mildew
## 20 20
## frog-eye-leaf-spot herbicide-injury
## 91 8
## phyllosticta-leaf-spot phytophthora-rot
## 20 88
## powdery-mildew purple-seed-stain
## 20 20
## rhizoctonia-root-rot
## 20
## [1] "date"
##
## 0 1 2 3 4 5 6
## 26 75 93 118 131 149 90
## [1] "plant.stand"
##
## 0 1
## 354 293
## [1] "precip"
##
## 0 1 2
## 74 112 459
## [1] "temp"
##
## 0 1 2
## 80 374 199
## [1] "hail"
##
## 0 1
## 435 127
## [1] "crop.hist"
##
## 0 1 2 3
## 65 165 219 218
## [1] "area.dam"
##
## 0 1 2 3
## 123 227 145 187
## [1] "sever"
##
## 0 1 2
## 195 322 45
## [1] "seed.tmt"
##
## 0 1 2
## 305 222 35
## [1] "germ"
##
## 0 1 2
## 165 213 193
## [1] "plant.growth"
##
## 0 1
## 441 226
## [1] "leaves"
##
## 0 1
## 77 606
## [1] "leaf.halo"
##
## 0 1 2
## 221 36 342
## [1] "leaf.marg"
##
## 0 1 2
## 357 21 221
## [1] "leaf.size"
##
## 0 1 2
## 51 327 221
## [1] "leaf.shread"
##
## 0 1
## 487 96
## [1] "leaf.malf"
##
## 0 1
## 554 45
## [1] "leaf.mild"
##
## 0 1 2
## 535 20 20
## [1] "stem"
##
## 0 1
## 296 371
## [1] "lodging"
##
## 0 1
## 520 42
## [1] "stem.cankers"
##
## 0 1 2 3
## 379 39 36 191
## [1] "canker.lesion"
##
## 0 1 2 3
## 320 83 177 65
## [1] "fruiting.bodies"
##
## 0 1
## 473 104
## [1] "ext.decay"
##
## 0 1 2
## 497 135 13
## [1] "mycelium"
##
## 0 1
## 639 6
## [1] "int.discolor"
##
## 0 1 2
## 581 44 20
## [1] "sclerotia"
##
## 0 1
## 625 20
## [1] "fruit.pods"
##
## 0 1 2 3
## 407 130 14 48
## [1] "fruit.spots"
##
## 0 1 2 4
## 345 75 57 100
## [1] "seed"
##
## 0 1
## 476 115
## [1] "mold.growth"
##
## 0 1
## 524 67
## [1] "seed.discolor"
##
## 0 1
## 513 64
## [1] "seed.size"
##
## 0 1
## 532 59
## [1] "shriveling"
##
## 0 1
## 539 38
## [1] "roots"
##
## 0 1 2
## 551 86 15
par(mfrow = c(3,3))
for(i in 2:ncol(Soybean)) {
plot(Soybean[i], main = colnames(Soybean[i]), col = "lightblue")
}
Roughly 18 % of the data are missing. Are there particular predictors that are more likely to be missing? Is the pattern of missing data related to the classes?
Overall, NA values appear to be most prevalent in data belonging to the ‘phytophthora-rot’ Class. By variable, hail (17.7%), sever (17.7%), seed.tmt (17.7%), lodging (17.7%) appear to be variables with many NA values. This information can serve as a reference for handling NA values when processing or analyzing data.
sum(!complete.cases(Soybean))
## [1] 121
naniar::miss_var_summary(Soybean)
## # A tibble: 36 × 3
## variable n_miss pct_miss
## <chr> <int> <dbl>
## 1 hail 121 17.7
## 2 sever 121 17.7
## 3 seed.tmt 121 17.7
## 4 lodging 121 17.7
## 5 germ 112 16.4
## 6 leaf.mild 108 15.8
## 7 fruiting.bodies 106 15.5
## 8 fruit.spots 106 15.5
## 9 seed.discolor 106 15.5
## 10 shriveling 106 15.5
## # ℹ 26 more rows
naniar::vis_miss(Soybean)
naniar::gg_miss_var(Soybean)
naniar::gg_miss_upset(Soybean)
VIM::aggr(Soybean)
na_counts <- sapply(unique(Soybean$Class), function(class_val) {
sum(is.na(Soybean[Soybean$Class == class_val, ]))
})
result <- data.frame(Class = unique(Soybean$Class), NA_Counts = na_counts)
result <- result[result$NA_Counts != 0, ]
result <- result[order(-result$NA_Counts), ]
result
## Class NA_Counts
## 4 phytophthora-rot 1214
## 18 2-4-d-injury 450
## 17 cyst-nematode 336
## 16 diaporthe-pod-&-stem-blight 177
## 19 herbicide-injury 160
Develop a strategy for handling missing data, either by eliminating predictors or imputation.
Handling missing values can vary depending on the dataset. Options include removing all missing values, replacing them with specific values, or deleting variables with excessive missing values. In this assignment, the focus will be on processing the ‘hail’ variable, which has the largest number of missing values in the Soybean dataset.
For analysis, it is often desirable to work with data that has no missing values.In such cases, the na.omit() function can be useful. It removes any rows with missing values and returns the cleaned dataset.
#Remove rows containing NA
df = na.omit(Soybean)
#Remove rows containing NA for only one specific variable
df1 = Soybean[complete.cases(Soybean$hail),]
The method for substituting missing values allows for various approaches, such as replacing NA with the average, median, minimum, maximum, or the average of neighboring values. The function can identify missing values using the is.na() function and then insert the replacement values accordingly.
Soybean$hail <- ifelse(is.na(Soybean$hail), median(as.numeric(Soybean$hail), na.rm = TRUE), Soybean$hail)
Soybean$hail
## [1] 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 2 2 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 2 2 2 1 2 1 1 1 1 1 2
## [112] 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1
## [149] 1 1 2 2 1 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 2 1 1 2 2 1 1 1 2 1 2 1
## [186] 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 2 1 2 1 2 2 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1
## [334] 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [445] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 2 1 1 1 2 1 2 2 1 2 2 1 2 1 2 2 2 1 2 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1
## [519] 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2
## [556] 2 2 1 2 2 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [593] 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [630] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1