0. INTRODUCTION
Text classification is also called supervised Machine Learning for
text analysis. We used text classification assess the demonstration of
data literacy in an open online learning network. The dataset consists
of 1431 observations of students’ comments based on six activities
including Climate
Threats, Air
Pollution, U.S
Wellbeing, Covid
Herd Immunity, Vaccination
Roadblocks, and First
Vaccinated from the NY Times Learning Network.
Data source. To create the dataset, we first coded
students’ comments with the following dimensions: match, tension, more
information, connection, challenge, affection, and science. These
dimensions demonstrate critical data literacy skills. For instance,
“challenge” entails the capacity of challenging data collection methods,
ways of representing data, interpreting data, etc. In each dimension, we
scored the comments using a 2-point scale (0-1). Taking “match” as an
example, if learners explained that their personal experiences matched
data trends (i.e., “My community is one of the areas under high water
stress. This is true because it gets really hot in the summer and we
need water.”), their comments were scored as 1 point. Otherwise, we
scored the comments as 0 points. Reliability was calculated, based on
10% of comments from each activity, with Cohen’s kappa. Kappa was found
to be above 0.8 for each dimension, indicating good reliability.
In this walkthrough, we will use the “more Information” dimension as
an example to address the following research question: How can
machine learning be used to assess learners’ critical data literacy in
open online learning environments? The label of “more
information” relates to the extent to which the student demonstrates
curiosity about other data that the featured visualization did not show.
This could be a future trend or when the student sought more contexts to
interpret data trends.
Text classification workflow

This workflow shows that to build a text classification model, we
should first turn text columns into features in data wrangling, and set
up training datasets and testing datasets for model development.
Specifically, before analysis, we’ll take a quick look at the data and
understand the context of data collection. Then we turn the text column
into numeric features for data modeling. After that, we create two sets
of data and only use the training set for model development.
1. PREPARE
First, let’s load the following packages that we’ll be needing for
this walkthrough:
library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim) #naive Bayes model is available in the tidymodels package discrim.
2. WRANGLE
To get started, we need to import, or “read”, our data into R. The
function used to import your data will depend on the file format of the
data you are trying to import. First, however, you’ll need to do the
following:
- Download the
comments.csv file we’ll be using.
- Create a folder in the directory on your computer where you stored
your R Project and name it “data”.
- Add the file to your data folder.
- Check your Files tab in RStudio to verify that your file is indeed
in your data folder.
Now let’s read our data into our Environment using the
read_csv() function and assign it to a variable name so we
can work with it like any other object in R. Then let’s take a quick
glimpse() at the data to see what we have to work with.
comments <- read_csv("data/comments.csv")
## Rows: 1431 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): user_display_name, user_location, comment
## dbl (1): activity
## lgl (1): more_info
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(comments)
## Rows: 1,431
## Columns: 5
## $ activity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ user_display_name <chr> "Angie", "welington nunez", "valeria", "caleb", "Ang…
## $ user_location <chr> "bronk", "bronx", "chicago , il", "us", "Massachuset…
## $ comment <chr> "this graph shows me when there is a extreme rainfal…
## $ more_info <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRU…
We can see that the dataset includes five columns, activity (which
activity the comment came from), user_display_name (the name of users),
user_location (the location of users), comment (users’ comments), and
more_info (whether the comment demonstrates students’ data literacy in
the dimension of seeking for more information).
Let’s look at the data! Here are the first six comments:
head(comments$comment)
## [1] "this graph shows me when there is a extreme rainfall and if it close to where you live"
## [2] "the graph shows me that there will be an extreme rainfall"
## [3] "this map showed me that the damage has already been done and that it isn??t even one specific place(state) but all of the map. our planet has already been hit with the effects and we can??t really do much but just watch and wait to see what happens eventually in the future."
## [4] "In all places there going to be affected no matter what.In the state where i live we??ve experimented hurricanes more. So somethings might change for example the level of water or wind speed."
## [5] "I noticed that my county was a high risk for hurricanes, I wonder why that is when I have never experienced a hurricane in the 13 years I've lived here."
## [6] "I notice that at the moment no matter where you live in the united states there is some sort of extreme unusual weather that could possibly affect you and your state."
3. MODEL
Now, let’s build a classification model with two columns in the
dataset: comment and more_info. We need to split the data into training
and testing datasets. We can use the initial_split()
function from to create this binary split of the data. The
strata argument ensures that the distribution of
more_info is similar in the training set and testing set.
Since the split uses random sampling, we set a seed so we can reproduce
our results.
set.seed(123)
comments <- comments %>%
mutate(more_info=as.factor(more_info))
comments_split <- initial_split(comments, strata = more_info)
comments_train <- training(comments_split)
comments_test <- testing(comments_split)
We can check the dimensions of the two splits with the function
dim()
dim(comments_train)
## [1] 1073 5
dim(comments_test)
## [1] 358 5
Next we need to preprocess this data (i.e., comment) to prepare it
for modeling; we have text data, and we need to build numeric features
for machine learning from that text. We first initialize our set of
preprocessing transformations with the recipe() function,
using a formula expression to specify the variables, our outcome (i.e.,
more_info) plus our predictor (i.e., comment), along with the data set
(i.e., comments_train). The comments_test data set is for the final
testing step and should not be used for building models.
comments_rec <-
recipe(more_info ~ comment, data = comments_train)
Now we add steps to process the comment variable. First we tokenize
the text to words with step_tokenize(). Before we calculate
tf-idf we use step_tokenfilter() to only keep the 1000 most
frequent tokens, to avoid creating too many variables in our model. To
finish, we use step_tfidf() to compute tf-idf.
comments_rec <- comments_rec %>%
step_tokenize(comment) %>%
step_tokenfilter(comment, max_tokens = 1e3) %>%
step_tfidf(comment)
Let’s create 10-fold cross-validation sets, and use these resampled
sets for performance estimates. In this way, 90% of the training data is
included in each fold, and the other 10% is held out for evaluation.
set.seed(234)
comments_folds <- vfold_cv(comments_train)
comments_folds
## # 10-fold cross-validation
## # A tibble: 10 × 2
## splits id
## <list> <chr>
## 1 <split [965/108]> Fold01
## 2 <split [965/108]> Fold02
## 3 <split [965/108]> Fold03
## 4 <split [966/107]> Fold04
## 5 <split [966/107]> Fold05
## 6 <split [966/107]> Fold06
## 7 <split [966/107]> Fold07
## 8 <split [966/107]> Fold08
## 9 <split [966/107]> Fold09
## 10 <split [966/107]> Fold10
Now let’s set up a naive bayes model.
nb_spec <- naive_Bayes() %>%
set_mode("classification") %>%
set_engine("naivebayes")
nb_spec
## Naive Bayes Model Specification (classification)
##
## Computational engine: naivebayes
Now that we have a full specification of the preprocessing recipe and
set up a model, we can build up a tidymodels workflow() to
bundle together our modeling components.
nb_wf <- workflow() %>%
add_recipe(comments_rec) %>%
add_model(nb_spec)
nb_wf
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: naive_Bayes()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
##
## • step_tokenize()
## • step_tokenfilter()
## • step_tfidf()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Naive Bayes Model Specification (classification)
##
## Computational engine: naivebayes
We fit one time to the training data as a whole. Now, to estimate how
well that model performs, let’s fit the model many times, once to each
of these resampled folds, and then evaluate on the heldout part of each
resampled fold.
nb_rs <- fit_resamples(
nb_wf,
comments_folds,
control = control_resamples(save_pred = TRUE)
)
We can use collect_metrics() to get evaluation
information.
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics
## # A tibble: 2 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 accuracy binary 0.616 10 0.0270 Preprocessor1_Model1
## 2 roc_auc binary 0.726 10 0.0237 Preprocessor1_Model1
Another way to evaluate our model is to evaluate the confusion
matrix. A confusion matrix tabulates a model’s false positives and false
negatives for each class. The function conf_mat() computes
a cross-tabulation of observed and predicted classes. This allows us to
visualize how well the model performs and helps us to identify problems
and think about ways to address the problems.
cm <- conf_mat(nb_rs[[5]][[1]], truth = more_info,
estimate = .pred_class)
autoplot(cm, type = "heatmap")

✅ Comprehension Check
What do you notice in the confusion matrix?
Replace the naive Bayes model with a SVM model and compare the
results from these two models:
- Does the SVM model work better than the naive Bayes model?
- Why does the SVM model work better (or worse)?
svm_poly(degree = 1) %>%
set_mode(“classification”) %>%
set_engine(“kernlab”, scaled = FALSE)
nb_spec2 <- svm_poly(degree = 1) %>% # set model
set_mode("classification") %>%
set_engine("kernlab", scaled = FALSE)
nb_wf2 <- workflow() %>% # run model from training data
add_recipe(comments_rec) %>%
add_model(nb_spec2)
install.packages("kernlab")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(kernlab)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:scales':
##
## alpha
## The following object is masked from 'package:purrr':
##
## cross
## The following object is masked from 'package:ggplot2':
##
## alpha
nb_rs2 <- fit_resamples(
nb_wf2,
comments_folds,
control = control_resamples(save_pred = TRUE)
)
nb_rs_metrics2 <- collect_metrics(nb_rs2)
cm2 <- conf_mat(nb_rs2[[5]][[1]], truth = more_info,
estimate = .pred_class)
autoplot(cm2, type = "heatmap")

LS0tCnRpdGxlOiAiTGFiIDQ6IFRleHQgQ2xhc3NpZmljYXRpb24gaW4gT3BlbiBMZWFybmluZyBSZXNvdXJzZXMiCm91dHB1dDogCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2RlcHRoOiAzCiAgICB0b2NfZmxvYXQ6IHllcwogICAgY29kZV9mb2xkaW5nOiBzaG93CiAgICBjb2RlX2Rvd25sb2FkOiBUUlVFCmVkaXRvcl9vcHRpb25zOiAKICBtYXJrZG93bjogCiAgICB3cmFwOiA3MgotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCmBgYAoKIyMgMC4gSU5UUk9EVUNUSU9OCgpUZXh0IGNsYXNzaWZpY2F0aW9uIGlzIGFsc28gY2FsbGVkIHN1cGVydmlzZWQgTWFjaGluZSBMZWFybmluZyBmb3IgdGV4dAphbmFseXNpcy4gV2UgdXNlZCB0ZXh0IGNsYXNzaWZpY2F0aW9uIGFzc2VzcyB0aGUgZGVtb25zdHJhdGlvbiBvZiBkYXRhCmxpdGVyYWN5IGluIGFuIG9wZW4gb25saW5lIGxlYXJuaW5nIG5ldHdvcmsuIFRoZSBkYXRhc2V0IGNvbnNpc3RzIG9mCjE0MzEgb2JzZXJ2YXRpb25zIG9mIHN0dWRlbnRzJyBjb21tZW50cyBiYXNlZCBvbiBzaXggYWN0aXZpdGllcwppbmNsdWRpbmcgW0NsaW1hdGUKVGhyZWF0c10oaHR0cHM6Ly93d3cubnl0aW1lcy5jb20vMjAyMC8xMC8xNS9sZWFybmluZy93aGF0cy1nb2luZy1vbi1pbi10aGlzLWdyYXBoLWNsaW1hdGUtdGhyZWF0cy5odG1sKSwKW0FpcgpQb2xsdXRpb25dKGh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjEvMDIvMTEvbGVhcm5pbmcvd2hhdHMtZ29pbmctb24taW4tdGhpcy1ncmFwaC13b3JsZC1jaXRpZXMtYWlyLXBvbGx1dGlvbi5odG1sKSwKW1UuUwpXZWxsYmVpbmddKGh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjAvMTIvMDMvbGVhcm5pbmcvd2hhdHMtZ29pbmctb24taW4tdGhpcy1ncmFwaC11cy13ZWxsLWJlaW5nLWNvbXBhcmVkLWludGVybmF0aW9uYWxseS5odG1sKSwKW0NvdmlkIEhlcmQKSW1tdW5pdHldKGh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjEvMDMvMDQvbGVhcm5pbmcvd2hhdHMtZ29pbmctb24taW4tdGhpcy1ncmFwaC1jb3ZpZC1oZXJkLWltbXVuaXR5Lmh0bWwpLApbVmFjY2luYXRpb24KUm9hZGJsb2Nrc10oaHR0cHM6Ly93d3cubnl0aW1lcy5jb20vMjAyMS8wMy8xOC9sZWFybmluZy93aGF0cy1nb2luZy1vbi1pbi10aGlzLWdyYXBoLXZhY2NpbmF0aW9uLXJvYWRibG9ja3MuaHRtbCksCmFuZCBbRmlyc3QKVmFjY2luYXRlZF0oaHR0cHM6Ly93d3cubnl0aW1lcy5jb20vMjAyMC8xMi8yOC9sZWFybmluZy93aGF0cy1nb2luZy1vbi1pbi10aGlzLWdyYXBoLWZpcnN0LXZhY2NpbmF0ZWQuaHRtbCkKZnJvbSB0aGUgTlkgVGltZXMgTGVhcm5pbmcgTmV0d29yay4KCioqRGF0YSBzb3VyY2UqKi4gVG8gY3JlYXRlIHRoZSBkYXRhc2V0LCB3ZSBmaXJzdCBjb2RlZCBzdHVkZW50cycKY29tbWVudHMgd2l0aCB0aGUgZm9sbG93aW5nIGRpbWVuc2lvbnM6IG1hdGNoLCB0ZW5zaW9uLCBtb3JlCmluZm9ybWF0aW9uLCBjb25uZWN0aW9uLCBjaGFsbGVuZ2UsIGFmZmVjdGlvbiwgYW5kIHNjaWVuY2UuIFRoZXNlCmRpbWVuc2lvbnMgZGVtb25zdHJhdGUgY3JpdGljYWwgZGF0YSBsaXRlcmFjeSBza2lsbHMuIEZvciBpbnN0YW5jZSwKImNoYWxsZW5nZSIgZW50YWlscyB0aGUgY2FwYWNpdHkgb2YgY2hhbGxlbmdpbmcgZGF0YSBjb2xsZWN0aW9uIG1ldGhvZHMsCndheXMgb2YgcmVwcmVzZW50aW5nIGRhdGEsIGludGVycHJldGluZyBkYXRhLCBldGMuIEluIGVhY2ggZGltZW5zaW9uLCB3ZQpzY29yZWQgdGhlIGNvbW1lbnRzIHVzaW5nIGEgMi1wb2ludCBzY2FsZSAoMC0xKS4gVGFraW5nICJtYXRjaCIgYXMgYW4KZXhhbXBsZSwgaWYgbGVhcm5lcnMgZXhwbGFpbmVkIHRoYXQgdGhlaXIgcGVyc29uYWwgZXhwZXJpZW5jZXMgbWF0Y2hlZApkYXRhIHRyZW5kcyAoaS5lLiwgIk15IGNvbW11bml0eSBpcyBvbmUgb2YgdGhlIGFyZWFzIHVuZGVyIGhpZ2ggd2F0ZXIKc3RyZXNzLiBUaGlzIGlzIHRydWUgYmVjYXVzZSBpdCBnZXRzIHJlYWxseSBob3QgaW4gdGhlIHN1bW1lciBhbmQgd2UKbmVlZCB3YXRlci4iKSwgdGhlaXIgY29tbWVudHMgd2VyZSBzY29yZWQgYXMgMSBwb2ludC4gT3RoZXJ3aXNlLCB3ZQpzY29yZWQgdGhlIGNvbW1lbnRzIGFzIDAgcG9pbnRzLiBSZWxpYWJpbGl0eSB3YXMgY2FsY3VsYXRlZCwgYmFzZWQgb24KMTAlIG9mIGNvbW1lbnRzIGZyb20gZWFjaCBhY3Rpdml0eSwgd2l0aCBDb2hlbidzIGthcHBhLiBLYXBwYSB3YXMgZm91bmQKdG8gYmUgYWJvdmUgMC44IGZvciBlYWNoIGRpbWVuc2lvbiwgaW5kaWNhdGluZyBnb29kIHJlbGlhYmlsaXR5LgoKSW4gdGhpcyB3YWxrdGhyb3VnaCwgd2Ugd2lsbCB1c2UgdGhlICJtb3JlIEluZm9ybWF0aW9uIiBkaW1lbnNpb24gYXMgYW4KZXhhbXBsZSB0byBhZGRyZXNzIHRoZSBmb2xsb3dpbmcgcmVzZWFyY2ggcXVlc3Rpb246ICoqSG93IGNhbiBtYWNoaW5lCmxlYXJuaW5nIGJlIHVzZWQgdG8gYXNzZXNzIGxlYXJuZXJzJyBjcml0aWNhbCBkYXRhIGxpdGVyYWN5IGluIG9wZW4Kb25saW5lIGxlYXJuaW5nIGVudmlyb25tZW50cz8qKiBUaGUgbGFiZWwgb2YgIm1vcmUgaW5mb3JtYXRpb24iIHJlbGF0ZXMKdG8gdGhlIGV4dGVudCB0byB3aGljaCB0aGUgc3R1ZGVudCBkZW1vbnN0cmF0ZXMgY3VyaW9zaXR5IGFib3V0IG90aGVyCmRhdGEgdGhhdCB0aGUgZmVhdHVyZWQgdmlzdWFsaXphdGlvbiBkaWQgbm90IHNob3cuIFRoaXMgY291bGQgYmUgYQpmdXR1cmUgdHJlbmQgb3Igd2hlbiB0aGUgc3R1ZGVudCBzb3VnaHQgbW9yZSBjb250ZXh0cyB0byBpbnRlcnByZXQgZGF0YQp0cmVuZHMuCgojIyMgVGV4dCBjbGFzc2lmaWNhdGlvbiB3b3JrZmxvdwoKWyFbRmlndXJlwqBzb3VyY2U6wqBodHRwczovL21vbmtleWxlYXJuLmNvbS90ZXh0LWNsYXNzaWZpY2F0aW9uXShpbWcvdGNmbG93LnBuZyAiQSBmbG93Y2hhcnQgb2YgdGV4dCBjbGFzc2lmaWNhdGlvbi4iKXtzdHlsZT0id2lkdGgiCndpZHRoPSI1MDAifV0oaHR0cHM6Ly9tb25rZXlsZWFybi5jb20vdGV4dC1jbGFzc2lmaWNhdGlvbi8pCgpUaGlzIHdvcmtmbG93IHNob3dzIHRoYXQgdG8gYnVpbGQgYSB0ZXh0IGNsYXNzaWZpY2F0aW9uIG1vZGVsLCB3ZSBzaG91bGQKZmlyc3QgdHVybiB0ZXh0IGNvbHVtbnMgaW50byBmZWF0dXJlcyBpbiBkYXRhIHdyYW5nbGluZywgYW5kIHNldCB1cAp0cmFpbmluZyBkYXRhc2V0cyBhbmQgdGVzdGluZyBkYXRhc2V0cyBmb3IgbW9kZWwgZGV2ZWxvcG1lbnQuClNwZWNpZmljYWxseSwgYmVmb3JlIGFuYWx5c2lzLCB3ZSdsbCB0YWtlIGEgcXVpY2sgbG9vayBhdCB0aGUgZGF0YSBhbmQKdW5kZXJzdGFuZCB0aGUgY29udGV4dCBvZiBkYXRhIGNvbGxlY3Rpb24uIFRoZW4gd2UgdHVybiB0aGUgdGV4dCBjb2x1bW4KaW50byBudW1lcmljIGZlYXR1cmVzIGZvciBkYXRhIG1vZGVsaW5nLiBBZnRlciB0aGF0LCB3ZSBjcmVhdGUgdHdvIHNldHMKb2YgZGF0YSBhbmQgb25seSB1c2UgdGhlIHRyYWluaW5nIHNldCBmb3IgbW9kZWwgZGV2ZWxvcG1lbnQuCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIDEuIFBSRVBBUkUKCkZpcnN0LCBsZXQncyBsb2FkIHRoZSBmb2xsb3dpbmcgcGFja2FnZXMgdGhhdCB3ZSdsbCBiZSBuZWVkaW5nIGZvciB0aGlzCndhbGt0aHJvdWdoOgoKYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkodGlkeW1vZGVscykKbGlicmFyeSh0ZXh0cmVjaXBlcykKbGlicmFyeShkaXNjcmltKSAjbmFpdmUgQmF5ZXMgbW9kZWwgaXMgYXZhaWxhYmxlIGluIHRoZSB0aWR5bW9kZWxzIHBhY2thZ2UgZGlzY3JpbS4KYGBgCgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCiMjIDIuIFdSQU5HTEUKClRvIGdldCBzdGFydGVkLCB3ZSBuZWVkIHRvIGltcG9ydCwgb3IgInJlYWQiLCBvdXIgZGF0YSBpbnRvIFIuIFRoZQpmdW5jdGlvbiB1c2VkIHRvIGltcG9ydCB5b3VyIGRhdGEgd2lsbCBkZXBlbmQgb24gdGhlIGZpbGUgZm9ybWF0IG9mIHRoZQpkYXRhIHlvdSBhcmUgdHJ5aW5nIHRvIGltcG9ydC4gRmlyc3QsIGhvd2V2ZXIsIHlvdSdsbCBuZWVkIHRvIGRvIHRoZQpmb2xsb3dpbmc6CgoxLiAgRG93bmxvYWQgdGhlIGBjb21tZW50cy5jc3ZgIGZpbGUgd2UnbGwgYmUgdXNpbmcuCjIuICBDcmVhdGUgYSBmb2xkZXIgaW4gdGhlIGRpcmVjdG9yeSBvbiB5b3VyIGNvbXB1dGVyIHdoZXJlIHlvdSBzdG9yZWQKICAgIHlvdXIgUiBQcm9qZWN0IGFuZCBuYW1lIGl0ICJkYXRhIi4KMy4gIEFkZCB0aGUgZmlsZSB0byB5b3VyIGRhdGEgZm9sZGVyLgo0LiAgQ2hlY2sgeW91ciBGaWxlcyB0YWIgaW4gUlN0dWRpbyB0byB2ZXJpZnkgdGhhdCB5b3VyIGZpbGUgaXMgaW5kZWVkCiAgICBpbiB5b3VyIGRhdGEgZm9sZGVyLgoKTm93IGxldCdzIHJlYWQgb3VyIGRhdGEgaW50byBvdXIgRW52aXJvbm1lbnQgdXNpbmcgdGhlIGByZWFkX2NzdigpYApmdW5jdGlvbiBhbmQgYXNzaWduIGl0IHRvIGEgdmFyaWFibGUgbmFtZSBzbyB3ZSBjYW4gd29yayB3aXRoIGl0IGxpa2UKYW55IG90aGVyIG9iamVjdCBpbiBSLiBUaGVuIGxldCdzIHRha2UgYSBxdWljayBgZ2xpbXBzZSgpYCBhdCB0aGUgZGF0YQp0byBzZWUgd2hhdCB3ZSBoYXZlIHRvIHdvcmsgd2l0aC4KCmBgYHtyIHJlYWQtY3N2fQpjb21tZW50cyA8LSByZWFkX2NzdigiZGF0YS9jb21tZW50cy5jc3YiKQoKZ2xpbXBzZShjb21tZW50cykKYGBgCgpXZSBjYW4gc2VlIHRoYXQgdGhlIGRhdGFzZXQgaW5jbHVkZXMgZml2ZSBjb2x1bW5zLCBhY3Rpdml0eSAod2hpY2gKYWN0aXZpdHkgdGhlIGNvbW1lbnQgY2FtZSBmcm9tKSwgdXNlcl9kaXNwbGF5X25hbWUgKHRoZSBuYW1lIG9mIHVzZXJzKSwKdXNlcl9sb2NhdGlvbiAodGhlIGxvY2F0aW9uIG9mIHVzZXJzKSwgY29tbWVudCAodXNlcnMnIGNvbW1lbnRzKSwgYW5kCm1vcmVfaW5mbyAod2hldGhlciB0aGUgY29tbWVudCBkZW1vbnN0cmF0ZXMgc3R1ZGVudHMnIGRhdGEgbGl0ZXJhY3kgaW4KdGhlIGRpbWVuc2lvbiBvZiBzZWVraW5nIGZvciBtb3JlIGluZm9ybWF0aW9uKS4KCkxldCdzIGxvb2sgYXQgdGhlIGRhdGEhIEhlcmUgYXJlIHRoZSBmaXJzdCBzaXggY29tbWVudHM6CgpgYGB7cn0KaGVhZChjb21tZW50cyRjb21tZW50KQpgYGAKCiMjIDMuIE1PREVMCgpOb3csIGxldCdzIGJ1aWxkIGEgY2xhc3NpZmljYXRpb24gbW9kZWwgd2l0aCB0d28gY29sdW1ucyBpbiB0aGUgZGF0YXNldDoKY29tbWVudCBhbmQgbW9yZV9pbmZvLiBXZSBuZWVkIHRvIHNwbGl0IHRoZSBkYXRhIGludG8gdHJhaW5pbmcgYW5kCnRlc3RpbmcgZGF0YXNldHMuIFdlIGNhbiB1c2UgdGhlIGBpbml0aWFsX3NwbGl0KClgIGZ1bmN0aW9uIGZyb20gdG8KY3JlYXRlIHRoaXMgYmluYXJ5IHNwbGl0IG9mIHRoZSBkYXRhLiBUaGUgYHN0cmF0YWAgYXJndW1lbnQgZW5zdXJlcyB0aGF0CnRoZSBkaXN0cmlidXRpb24gb2YgYG1vcmVfaW5mb2AgaXMgc2ltaWxhciBpbiB0aGUgdHJhaW5pbmcgc2V0IGFuZAp0ZXN0aW5nIHNldC4gU2luY2UgdGhlIHNwbGl0IHVzZXMgcmFuZG9tIHNhbXBsaW5nLCB3ZSBzZXQgYSBzZWVkIHNvIHdlCmNhbiByZXByb2R1Y2Ugb3VyIHJlc3VsdHMuCgpgYGB7cn0Kc2V0LnNlZWQoMTIzKQoKY29tbWVudHMgPC0gY29tbWVudHMgJT4lCiAgbXV0YXRlKG1vcmVfaW5mbz1hcy5mYWN0b3IobW9yZV9pbmZvKSkKCmNvbW1lbnRzX3NwbGl0IDwtIGluaXRpYWxfc3BsaXQoY29tbWVudHMsIHN0cmF0YSA9IG1vcmVfaW5mbykKCmNvbW1lbnRzX3RyYWluIDwtIHRyYWluaW5nKGNvbW1lbnRzX3NwbGl0KQpjb21tZW50c190ZXN0IDwtIHRlc3RpbmcoY29tbWVudHNfc3BsaXQpCmBgYAoKV2UgY2FuIGNoZWNrIHRoZSBkaW1lbnNpb25zIG9mIHRoZSB0d28gc3BsaXRzIHdpdGggdGhlIGZ1bmN0aW9uIGBkaW0oKWAKCmBgYHtyfQpkaW0oY29tbWVudHNfdHJhaW4pCgpkaW0oY29tbWVudHNfdGVzdCkKYGBgCgpOZXh0IHdlIG5lZWQgdG8gcHJlcHJvY2VzcyB0aGlzIGRhdGEgKGkuZS4sIGNvbW1lbnQpIHRvIHByZXBhcmUgaXQgZm9yCm1vZGVsaW5nOyB3ZSBoYXZlIHRleHQgZGF0YSwgYW5kIHdlIG5lZWQgdG8gYnVpbGQgbnVtZXJpYyBmZWF0dXJlcyBmb3IKbWFjaGluZSBsZWFybmluZyBmcm9tIHRoYXQgdGV4dC4gV2UgZmlyc3QgaW5pdGlhbGl6ZSBvdXIgc2V0IG9mCnByZXByb2Nlc3NpbmcgdHJhbnNmb3JtYXRpb25zIHdpdGggdGhlIGByZWNpcGUoKWAgZnVuY3Rpb24sIHVzaW5nIGEKZm9ybXVsYSBleHByZXNzaW9uIHRvIHNwZWNpZnkgdGhlIHZhcmlhYmxlcywgb3VyIG91dGNvbWUgKGkuZS4sCm1vcmVfaW5mbykgcGx1cyBvdXIgcHJlZGljdG9yIChpLmUuLCBjb21tZW50KSwgYWxvbmcgd2l0aCB0aGUgZGF0YSBzZXQKKGkuZS4sIGNvbW1lbnRzX3RyYWluKS4gVGhlIGNvbW1lbnRzX3Rlc3QgZGF0YSBzZXQgaXMgZm9yIHRoZSBmaW5hbAp0ZXN0aW5nIHN0ZXAgYW5kIHNob3VsZCBub3QgYmUgdXNlZCBmb3IgYnVpbGRpbmcgbW9kZWxzLgoKYGBge3J9CmNvbW1lbnRzX3JlYyA8LQogIHJlY2lwZShtb3JlX2luZm8gfiBjb21tZW50LCBkYXRhID0gY29tbWVudHNfdHJhaW4pCmBgYAoKTm93IHdlIGFkZCBzdGVwcyB0byBwcm9jZXNzIHRoZSBjb21tZW50IHZhcmlhYmxlLiBGaXJzdCB3ZSB0b2tlbml6ZSB0aGUKdGV4dCB0byB3b3JkcyB3aXRoIGBzdGVwX3Rva2VuaXplKClgLiBCZWZvcmUgd2UgY2FsY3VsYXRlIHRmLWlkZiB3ZSB1c2UKYHN0ZXBfdG9rZW5maWx0ZXIoKWAgdG8gb25seSBrZWVwIHRoZSAxMDAwIG1vc3QgZnJlcXVlbnQgdG9rZW5zLCB0bwphdm9pZCBjcmVhdGluZyB0b28gbWFueSB2YXJpYWJsZXMgaW4gb3VyIG1vZGVsLiBUbyBmaW5pc2gsIHdlIHVzZQpgc3RlcF90ZmlkZigpYCB0byBjb21wdXRlIHRmLWlkZi4KCmBgYHtyIHByZXByb2Nlc3NpbmcgY29tbWVudCB2YXJpYWJsZX0KY29tbWVudHNfcmVjIDwtIGNvbW1lbnRzX3JlYyAlPiUKICBzdGVwX3Rva2VuaXplKGNvbW1lbnQpICU+JQogIHN0ZXBfdG9rZW5maWx0ZXIoY29tbWVudCwgbWF4X3Rva2VucyA9IDFlMykgJT4lCiAgc3RlcF90ZmlkZihjb21tZW50KQpgYGAKCkxldCdzIGNyZWF0ZSAxMC1mb2xkIGNyb3NzLXZhbGlkYXRpb24gc2V0cywgYW5kIHVzZSB0aGVzZSByZXNhbXBsZWQgc2V0cwpmb3IgcGVyZm9ybWFuY2UgZXN0aW1hdGVzLiBJbiB0aGlzIHdheSwgOTAlIG9mIHRoZSB0cmFpbmluZyBkYXRhIGlzCmluY2x1ZGVkIGluIGVhY2ggZm9sZCwgYW5kIHRoZSBvdGhlciAxMCUgaXMgaGVsZCBvdXQgZm9yIGV2YWx1YXRpb24uCgpgYGB7cn0Kc2V0LnNlZWQoMjM0KQpjb21tZW50c19mb2xkcyA8LSB2Zm9sZF9jdihjb21tZW50c190cmFpbikKCmNvbW1lbnRzX2ZvbGRzCmBgYAoKTm93IGxldCdzIHNldCB1cCBhIG5haXZlIGJheWVzIG1vZGVsLgoKYGBge3J9Cm5iX3NwZWMgPC0gbmFpdmVfQmF5ZXMoKSAlPiUKICBzZXRfbW9kZSgiY2xhc3NpZmljYXRpb24iKSAlPiUKICBzZXRfZW5naW5lKCJuYWl2ZWJheWVzIikKCm5iX3NwZWMKYGBgCgpOb3cgdGhhdCB3ZSBoYXZlIGEgZnVsbCBzcGVjaWZpY2F0aW9uIG9mIHRoZSBwcmVwcm9jZXNzaW5nIHJlY2lwZSBhbmQKc2V0IHVwIGEgbW9kZWwsIHdlIGNhbiBidWlsZCB1cCBhIHRpZHltb2RlbHMgYHdvcmtmbG93KClgIHRvIGJ1bmRsZQp0b2dldGhlciBvdXIgbW9kZWxpbmcgY29tcG9uZW50cy4KCmBgYHtyfQpuYl93ZiA8LSB3b3JrZmxvdygpICU+JQogIGFkZF9yZWNpcGUoY29tbWVudHNfcmVjKSAlPiUKICBhZGRfbW9kZWwobmJfc3BlYykKCm5iX3dmCmBgYAoKV2UgZml0IG9uZSB0aW1lIHRvIHRoZSB0cmFpbmluZyBkYXRhIGFzIGEgd2hvbGUuIE5vdywgdG8gZXN0aW1hdGUgaG93CndlbGwgdGhhdCBtb2RlbCBwZXJmb3JtcywgbGV0J3MgZml0IHRoZSBtb2RlbCBtYW55IHRpbWVzLCBvbmNlIHRvIGVhY2gKb2YgdGhlc2UgcmVzYW1wbGVkIGZvbGRzLCBhbmQgdGhlbiBldmFsdWF0ZSBvbiB0aGUgaGVsZG91dCBwYXJ0IG9mIGVhY2gKcmVzYW1wbGVkIGZvbGQuCgpgYGB7cn0KbmJfcnMgPC0gZml0X3Jlc2FtcGxlcygKICBuYl93ZiwKICBjb21tZW50c19mb2xkcywKICBjb250cm9sID0gY29udHJvbF9yZXNhbXBsZXMoc2F2ZV9wcmVkID0gVFJVRSkKKQpgYGAKCldlIGNhbiB1c2UgYGNvbGxlY3RfbWV0cmljcygpYCB0byBnZXQgZXZhbHVhdGlvbiBpbmZvcm1hdGlvbi4KCmBgYHtyfQpuYl9yc19tZXRyaWNzIDwtIGNvbGxlY3RfbWV0cmljcyhuYl9ycykKCm5iX3JzX21ldHJpY3MKYGBgCgpBbm90aGVyIHdheSB0byBldmFsdWF0ZSBvdXIgbW9kZWwgaXMgdG8gZXZhbHVhdGUgdGhlIGNvbmZ1c2lvbiBtYXRyaXguIEEKY29uZnVzaW9uIG1hdHJpeCB0YWJ1bGF0ZXMgYSBtb2RlbCdzIGZhbHNlIHBvc2l0aXZlcyBhbmQgZmFsc2UgbmVnYXRpdmVzCmZvciBlYWNoIGNsYXNzLiBUaGUgZnVuY3Rpb24gY29uZl9tYXRgKClgIGNvbXB1dGVzIGEgY3Jvc3MtdGFidWxhdGlvbiBvZgpvYnNlcnZlZCBhbmQgcHJlZGljdGVkIGNsYXNzZXMuIFRoaXMgYWxsb3dzIHVzIHRvIHZpc3VhbGl6ZSBob3cgd2VsbCB0aGUKbW9kZWwgcGVyZm9ybXMgYW5kIGhlbHBzIHVzIHRvIGlkZW50aWZ5IHByb2JsZW1zIGFuZCB0aGluayBhYm91dCB3YXlzIHRvCmFkZHJlc3MgdGhlIHByb2JsZW1zLgoKYGBge3J9CmNtIDwtIGNvbmZfbWF0KG5iX3JzW1s1XV1bWzFdXSwgdHJ1dGggPSBtb3JlX2luZm8sCiAgICAgICAgIGVzdGltYXRlID0gLnByZWRfY2xhc3MpCgphdXRvcGxvdChjbSwgdHlwZSA9ICJoZWF0bWFwIikKYGBgCgojIyMjIyDinIUgQ29tcHJlaGVuc2lvbiBDaGVjawoKV2hhdCBkbyB5b3Ugbm90aWNlIGluIHRoZSBjb25mdXNpb24gbWF0cml4PwoKUmVwbGFjZSB0aGUgbmFpdmUgQmF5ZXMgbW9kZWwgd2l0aCBhIFNWTSBtb2RlbCBhbmQgY29tcGFyZSB0aGUgcmVzdWx0cwpmcm9tIHRoZXNlIHR3byBtb2RlbHM6CgoxLiAgRG9lcyB0aGUgU1ZNIG1vZGVsIHdvcmsgYmV0dGVyIHRoYW4gdGhlIG5haXZlIEJheWVzIG1vZGVsPwoyLiAgV2h5IGRvZXMgdGhlIFNWTSBtb2RlbCB3b3JrIGJldHRlciAob3Igd29yc2UpPwoKc3ZtX3BvbHkoZGVncmVlID0gMSkgJVw+JQoKwqAgc2V0X21vZGUoImNsYXNzaWZpY2F0aW9uIikgJVw+JQoKwqAgc2V0X2VuZ2luZSgia2VybmxhYiIsIHNjYWxlZCA9IEZBTFNFKQoKYGBge3J9Cm5iX3NwZWMyIDwtIHN2bV9wb2x5KGRlZ3JlZSA9IDEpICU+JSAjIHNldCBtb2RlbAogICBzZXRfbW9kZSgiY2xhc3NpZmljYXRpb24iKSAlPiUKICAgc2V0X2VuZ2luZSgia2VybmxhYiIsIHNjYWxlZCA9IEZBTFNFKQoKbmJfd2YyIDwtIHdvcmtmbG93KCkgJT4lICMgcnVuIG1vZGVsIGZyb20gdHJhaW5pbmcgZGF0YQogICBhZGRfcmVjaXBlKGNvbW1lbnRzX3JlYykgJT4lCiAgIGFkZF9tb2RlbChuYl9zcGVjMikKYGBgCgpgYGB7cn0KaW5zdGFsbC5wYWNrYWdlcygia2VybmxhYiIpCmxpYnJhcnkoa2VybmxhYikKCm5iX3JzMiA8LSBmaXRfcmVzYW1wbGVzKAogICBuYl93ZjIsCiAgIGNvbW1lbnRzX2ZvbGRzLAogICBjb250cm9sID0gY29udHJvbF9yZXNhbXBsZXMoc2F2ZV9wcmVkID0gVFJVRSkKKQpgYGAKCmBgYHtyfQpuYl9yc19tZXRyaWNzMiA8LSBjb2xsZWN0X21ldHJpY3MobmJfcnMyKQoKY20yIDwtIGNvbmZfbWF0KG5iX3JzMltbNV1dW1sxXV0sIHRydXRoID0gbW9yZV9pbmZvLAogICAgICAgICAgICAgICBlc3RpbWF0ZSA9IC5wcmVkX2NsYXNzKQphdXRvcGxvdChjbTIsIHR5cGUgPSAiaGVhdG1hcCIpCmBgYAo=