<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Agent on Agent Lightning ⚡ Blog</title><link>https://agent-lightning.github.io/tags/agent/</link><description>Recent content in Agent on Agent Lightning ⚡ Blog</description><generator>Hugo -- 0.152.2</generator><language>en-us</language><lastBuildDate>Fri, 13 Feb 2026 00:00:00 +0800</lastBuildDate><atom:link href="https://agent-lightning.github.io/tags/agent/index.xml" rel="self" type="application/rss+xml"/><item><title>(ICLR26) Training Memory-Augmented LLM Agent via Online Self-Distillation</title><link>https://agent-lightning.github.io/posts/empo2/</link><pubDate>Fri, 13 Feb 2026 00:00:00 +0800</pubDate><guid>https://agent-lightning.github.io/posts/empo2/</guid><description>&lt;h1 id="empo-exploratory-memory-augmented-llm-agent-via-hybrid-on--and-off-policy-optimization"&gt;EMPO²: Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Zeyuan Liu¹*, Jeonghye Kim¹˒²*, Xufang Luo¹†, Dongsheng Li¹, Yuqing Yang¹&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;small style="color:#64748b;"&gt;Microsoft Research¹ · KAIST² · ICLR 2026&lt;/small&gt;&lt;br&gt;
&lt;small style="color:#94a3b8; font-size:0.8em;"&gt;* Equal contribution; work done during an internship at Microsoft Research | † Corresponding author&lt;/small&gt;&lt;/p&gt;
&lt;p&gt;📄 &lt;a href="https://openreview.net/pdf/c3f914c63072858c90376dcdf90ee00023322f05.pdf" target="_blank"&gt;Paper&lt;/a&gt; · 💻 &lt;a href="https://github.com/microsoft/agent-lightning/tree/main/contrib/recipes/envs" target="_blank"&gt;Code&lt;/a&gt; · 📝 &lt;a href="https://openreview.net/forum?id=UOzxviKVFO" target="_blank"&gt;OpenReview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img loading="lazy" src="https://agent-lightning.github.io/posts/empo2/images/empo2_gif.gif"&gt;&lt;/p&gt;
&lt;p&gt;Existing LLM-based agents rely heavily on prior knowledge and thus fail to learn effectively in environments that require discovering and exploring novel states. To address this limitation, we propose a reinforcement learning framework that promotes exploration through memory and combines on- and off-policy optimization to improve generalization without relying on memory at inference time.&lt;/p&gt;</description></item></channel></rss>