Clue and Reasoning Prompting (CARP) – A breakthrough approach enhancing the performance of Large Language Models in text classification tasks
- CARP, a novel methodology for text classification, prompts LLMs to progressively find superficial clues and induce a diagnostic reasoning process for making decisions, offering remarkable results across various benchmarks.
- The use of kNN demonstration search during in-context learning helps CARP overcome the limitation of token numbers in LLMs, leveraging the entire labeled dataset for task-specific evidence.
- Beyond setting new benchmarks in 4 out of 5 widely-used text classification tasks, CARP also shows impressive potential in low-resource and domain-adaptation setups, even with just 16 examples per class.
Despite the ongoing advancements in Large Language Models (LLMs) such as GPT-3, their performances in text classification tasks remain somewhat underwhelming when compared to fine-tuned models. They struggle to adequately address complex linguistic phenomena like intensification, contrast, and irony and are also hindered by the limitation of token numbers allowed in in-context learning.
To tackle these issues, researchers have introduced a new approach named Clue and Reasoning Prompting (CARP). This method leverages a progressive reasoning strategy to enhance LLMs’ performances in text classification tasks. Initially, CARP prompts LLMs to identify superficial clues such as keywords, tones, semantic relations, references, etc. Based on these clues, a diagnostic reasoning process is induced to make final decisions.
Furthermore, to circumvent the limited-token issue in LLMs, CARP employs a fine-tuned model on the supervised dataset for k-nearest neighbors (kNN) demonstration search during in-context learning. This technique enables the model to capitalize on both the generalization capability of LLMs and task-specific evidence provided by the complete labeled dataset.
The results achieved by CARP are quite remarkable, achieving new state-of-the-art (SOTA) performances on 4 out of 5 widely-used text-classification benchmarks. These include scores of 97.39 on SST-2, 96.40 on AGNews, 98.78 on R8, and 96.95 on R52, along with a performance that is comparable to SOTA on MR.
In addition to these impressive results, CARP has demonstrated exceptional capabilities in low-resource and domain-adaptation setups. Remarkably, even with just 16 examples per class, CARP can achieve performance levels comparable to supervised models trained with 1,024 examples per class.
The researchers have also developed a new evaluation method for CARP, where LLMs generate rational explanations rather than relying on human editing. This quality of generated reasoning is assessed on three key parameters: reliability, fluency, and logical faithfulness.
To measure reliability, zero-shot GPT-3 is used as a self-critique model to evaluate if the generated reasoning process supports making decisions for the input text. The fluency of the generated text is evaluated using perplexity, a metric for reference-free text generation tasks. Logic faithfulness, on the other hand, is gauged by a 16-shot ICL with GPT-3 to determine if the generated rationable explanations can be inferred from the input text.
With CARP yielding new SOTA performances on numerous text-classification benchmarks and demonstrating immense potential in low-resource and domain-adaption setups, it marks a significant breakthrough in text classification tasks. The future research would explore its application across more natural language understanding tasks, aiming to further push the boundaries of LLM performance.