(ICLR26) Training Memory-Augmented LLM Agent via Online Self-Distillation
📄 Paper · 💻 Code · 📝 OpenReview Zeyuan Liu¹*, Jeonghye Kim¹˒²*, Xufang Luo¹†, Dongsheng Li¹, Yuqing Yang¹ Microsoft Research¹ · KAIST² · ICLR 2026 * Equal contribution; work done during an internship at Microsoft Research | † Corresponding author Existing LLM-based agents rely heavily on prior knowledge and thus fail to learn effectively in environments that require discovering and exploring novel states. To address this limitation, we propose a reinforcement learning framework that promotes exploration through memory and combines on- and off-policy optimization to improve generalization without relying on memory at inference time. ...