In this evaluation, I take 100 utterances from the original data. I call this baseline. The baseline utterances are repeated. For each repeat of a baseline utterance, I create a different rephrasing. I then classify the original utterance and the rephrasing, and compare the two classifications.
For example:
| utterance_baseline | utterance_new |
|---|---|
| how can i add something to my cart? | how do i put something to my cart? |
| how can i add something to my cart? | how can i change the items in my cart? |
| how can i add something to my cart? | how can i add items to my cart? |
This is a work in progress.
Maybe the rephrasing could be automated with GPT3?
baseline %>%
inner_join(mapping) %>%
inner_join(evaluation) %>%
mutate(match = node_baseline == node_new) %>%
select(utterance_baseline, utterance_new, label_baseline, label_new, match, node_baseline, node_new)