New study illuminates the dual mechanisms of in-context learning, suggesting a differentiation between task recognition and task learning capabilities in large language models.
- The mechanisms of in-context learning (ICL) in large language models (LLMs) can be broken down into two key components: task recognition (TR) and task learning (TL).
- TR allows the model to recognize a task and apply pre-trained priors, while TL allows the model to learn new input-label mappings unseen during pre-training.
- The study found that while small models could perform TR, it was the larger models that showed a real proficiency in TL, improving their performance with more demonstrations.
A recent study investigating the intricacies of in-context learning (ICL) in large language models (LLMs) has highlighted two key mechanisms: task recognition (TR) and task learning (TL). By conducting a series of controlled experiments across several classification datasets and three families of LLMs – GPT-3, LLaMA, and OPT – the researchers were able to distinguish the roles of TR and TL in the ICL process.
TR enables LLMs to identify tasks through demonstrations and apply their pre-existing priors, even in the absence of ground-truth labels. On the other hand, TL represents the ability to grasp new input-label mappings that were not seen during pre-training. The study demonstrated that non-trivial performance could be achieved solely through TR, but that larger models or additional demonstrations did not improve this ability.
In contrast, TL emerges as the model scales. Smaller models were found incapable of performing TL even with additional demonstrations, whereas larger models could use more demonstrations to consistently improve their TL performance.
While previous studies have often regarded ICL as a blanket term, this work makes a case for distinguishing between TR and TL. The study contends that even small models can perform TR, but this capability does not scale. Conversely, TL emerges as an ability unique to large models, which can exploit more demonstrations to enhance their TL performance.
The study has its limitations, mainly focusing on classification tasks due to their suitability for the researchers’ RANDOM and ABSTRACT setup. More complex NLP tasks, as well as a deeper understanding of how models “learn” mechanistically, are left as areas for future exploration. Despite this, the study provides a valuable framework for future research in ICL, emphasizing the importance of distinguishing between TR and TL and the conditions under which these experiments are conducted.