Pre-chunking

TODO for spellchecked column, remove double spaces before running for future

Run chunk script

Post chunk stuff

## Test passed 🎊

check things that weren’t substrings

should get resolved into a copy of the pre_abstract file

verify only substrings

find what wasn’t used

prep for abstract

TODO how to do substring verification where it’ll show the lines that error!

TODO did we lose the NAs somewhere??

pre tag

go run tagging here

or actually maybe the gpt tagging is pretty iffy and we should just doing it closed class?

post tag

what do we do about the tagging sometimes being bad?

pre-sbert

post-sbert

Old stuff

Chunking agreement TODO

https://github.com/mjpost/sacrebleu

run sacrebleu --i=pre_charf_model.txt --metric=chrf --chrf-word-order 2 --chrf-whitespace pre_charf_correct.txt

not sure what settings actually make sense –>