Listen

Description

This research introduces a novel synthetic data framework, In-Context Learning with Hypothesis-Class Guidance (ICL-HCG), which integrates an explicit task description, or instruction, in the form of a hypothesis class prefix to better simulate real-world ICL scenarios. The authors conduct extensive empirical evaluations comparing generalization capabilities, model architectures like the Transformer and Mamba, and the effect of instruction on performance. Results show that including the hypothesis prefix significantly boosts the accuracy of ICL compared to instruction-free methods, highlighting the importance of task descriptions in guiding the model. Both the Transformer and Mamba successfully learn ICL-HCG and generalize to new tasks, although Mamba proves more sample-efficient and superior on OOD hypothesis generalization. Crucially, the study finds that increased pretraining hypothesis diversity substantially improves ICL accuracy when instructions are present.