Adopting the Trajectory Level Aggregation for Faster Training

Adopting the Trajectory Level Aggregation for Faster Training Agent Lightning (AGL) Team Date: Dec. 2025 1. Introduction In the context of Multi-turn Agent Reinforcement Learning (RL), data collection relies on rollouts where an agent interacts with an environment over multiple sequential turns. The strategy used to process these rollouts into training samples is a critical architectural decision that fundamentally impacts both training efficiency and model performance. Currently, Agent Lightning supports two primary strategies for aggregating these interaction traces: Transition Aggregation and Trajectory Aggregation. ...

December 17, 2025 · 10 min

Tinker X Agent Lightning

Tuning ANY AI agent with Tinker X Agent-lightning Yuge Zhang Nov. 2025 Tinker is the first product built by an all-star company called Thinking Machine Lab, whose team members come from leading organizations such as OpenAI. Notable members include former OpenAI CTO Mira Murati; John Schulman, the first author of PPO; Barret Zoph, a leading scientist in AutoML (the area I previously worked in); and well-known Asian researchers like Danqi Chen and Lilian Weng. ...

November 19, 2025 · 32 min

No More Retokenization Drift

No More Retokenization Drift: Returning Token IDs via OpenAI Compatible APIs Matters in Agent RL Agent Lightning (AGL) team Date: Oct. 2025 TL;DR. Agent often calls LLMs via OpenAI‑compatible endpoints, which previously return only string-based inputs and outputs. In agent RL, this can lead to inconsistencies between training and inference due to the phenomenon we call Retokenization Drift. This phenomenon occurs because tokens are detokenized during inference and subsequently retokenized during training; the two sets of tokens may differ even though their corresponding strings are identical. Now, you can ask vLLM’s OpenAI‑compatible endpoints to return the exact token IDs for both prompts and generated responses. Pass "return_token_ids": true to /v1/chat/completions or /v1/completions and you’ll receive prompt_token_ids and token_ids alongside the regular text output. This makes agent RL robust, as no more drift will happen. This pairs perfectly with Agent Lightning, where each model call is viewed as separate update sample without stitching; just log the returned IDs via return_token_ids enabled. ...

November 18, 2025 · 8 min