A 従来のNLI分類をLLMsにより再実施 :
You are a forensic linguistics expert that reads English texts written by non-native authors in order to classify the native language of the author as one of:
“ARA”: Arabic
“CHI”: Chinese
“FRE”: French
“GER”: German
“HIN”: Hindi
“ITA”: Italian
“JPN”: Japanese
“KOR”: Korean
“SPA”: Spanish
“TEL”: Telugu
“TUR”: Turkish
Use clues such as spelling errors, word choice, syntactic patterns, and grammatical errors to decide.
DO NOT USE ANY OTHER CLASS.
IMPORTANT: Do not classify any input as “ENG” (English). English is an invalid choice.
Valid output formats:
Class: “ARA”
Class: “CHI”
Class: “FRE”
Class: “GER”
<TOEFL11 ESSAY TEXT>
Classify the text as one of ARA, CHI, FRE, GER, HIN, ITA, JPN, KOR, SPA, TEL, or TUR. Do not output any other class - do NOT choose “ENG” (English). What is the closest native language of the author of this English text from the given list?
A 従来のNLI分類をLLMsにより再実施 :
| Model | TOEFL11 Test Set |
|---|---|
| Random Guess Baseline | 9.1% |
| SVM + Meta-Classifier [@malmasi-dras-2018-native] | 86.8% |
| BERT + Meta-Classifier [@steinbakken-gamback-2020-native] | 85.3% |
| GPT-2 [@lotfi-etal-2020-deep] | 89.0% |
| Ours - GPT-3.5 (Zero-shot) | 74.0% |
| Ours - GPT-4 (Zero-shot) | 91.7% |
| Ours - GPT-3.5 (Open-set, Zero-shot) | 73.4% |
| Ours - GPT-4 (Open-set, Zero-shot) | 86.7% |
正解率/Accuracy :
A 従来のNLI分類をLLMsにより再実施 :
GPT-3.5とGPT-4の比較:
GPT-3.5は今回のデータセットにおいて、12%の文書は最初に英語だと予測され、再分類するといつもフランス語を答える
| Model | TOEFL11 Test Set |
|---|---|
| Ours - GPT-3.5 (Zero-shot) | 74.0% |
| Ours - GPT-4 (Zero-shot) | 91.7% |
| Ours - GPT-3.5 (Open-set, Zero-shot) | 73.4% |
| Ours - GPT-4 (Open-set, Zero-shot) | 86.7% |
B 文長による影響を調査 :
C Open-Set分類タスクにおける表現 :
You are a forensic linguistics expert that reads texts written by non-native authors in order to identify their native language.
Analyze each text and identify the native language of the author.
Use clues such as spelling errors, word choice, syntactic patterns, and grammatical errors to decide.
<TOEFL11 ESSAY TEXT>
C Open-Set分類タスクにおける表現 :
Open-SetとClosed-Setの比較:
GPT-3.5とGPT-4どっちも精度が落ちているので指定外のクラス予測を詳しく見る必要がある
| Model | TOEFL11 Test Set |
|---|---|
| Ours - GPT-3.5 (Zero-shot) | 74.0% |
| Ours - GPT-4 (Zero-shot) | 91.7% |
| Ours - GPT-3.5 (Open-set, Zero-shot) | 73.4% |
| Ours - GPT-4 (Open-set, Zero-shot) | 86.7% |
C Open-Set分類タスクにおける表現 :
GPT-3.5とGPT-4の比較:
GPTモデルは実際の母語識別において応用できるようになっている
LLMsは母語識別の予測結果について解釈できるかどうか :
You must provide a guess. Output two named sections: (1) “Native Language” with the name of the language, and (2) “Reasoning” with a detailed explanation of your judgement with examples from the text.
The text is quite well-written, with only minor indicators pointing towards the author’s native language being German. The major clue is the author’s use of reflexive verbs in instances where they are not commonly used by native English speakers. For example, “the consumer’s responsibility to train itself” - a native English speaker would typically phrase this as “it’s the consumer’s responsibility to train themselves”.
This is a reflection of the regular use of reflexive verbs in German. In addition, the phrase, “they are fake, so why do we believe them anyway?” is syntax reminiscent of German structure. The use of colons before explanations, though not incorrect in English, also suggest German as it is more prevalent in German writing, such as in “realized: the advertisement”
LLMsは母語識別の予測結果について解釈できるかどうか :