Listen

Description

At inference, large language models use in-context learning with zero-, one-, or few-shot examples to perform new tasks without weight updates, and can be grounded with Retrieval Augmented Generation (RAG) by embedding documents into vector databases for real-time factual lookup using cosine similarity. LLM agents autonomously plan, act, and use external tools via orchestrated loops with persistent memory, while recent benchmarks like GPQA (STEM reasoning), SWE Bench (agentic coding), and MMMU (multimodal college-level tasks) test performance alongside prompt engineering techniques such as chain-of-thought reasoning, structured few-shot prompts, positive instruction framing, and iterative self-correction.

Links

In-Context Learning (ICL)

Retrieval Augmented Generation (RAG) and Grounding

LLM Agents

Multimodal Large Language Models (MLLMs)

Advanced LLM Architectures and Training Directions

Evaluation Benchmarks (as of 2025)

Prompt Engineering: High-Impact Techniques

Trends and Research Outlook