EP282: AI gladiators training in shopping arenas

22 min · Gisteren

Beschrijving

Title: Bittensor Agent Arenas as a Trajectory Primitive: Distilling a Shopping Agent from ShoppingBench Subnet Traces Source: http://arxiv.org/abs/2606.10064v1 Summary: This paper introduces the concept of Agent Arenas as a "trajectory primitive," establishing a novel framework for generating diverse, incentive-aligned training data for agentic post-training. This approach represents a significant breakthrough in scaling agent capabilities by moving beyond the limitations of synthetic data and unjudged production logs.

Reacties

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de Learning GenAI via SOTA Papers community!

Probeer gratis

Alle afleveringen

283 afleveringen

EP282: AI gladiators training in shopping arenas

Gisteren22 min

EP281: Restoring plasticity to over-trained AI

Title: When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff Source: http://arxiv.org/abs/2606.09932v1 Summary: This paper identifies and solves the critical 'loss of plasticity' bottleneck in the standard LLM post-training pipeline where excessive SFT inhibits subsequent RL optimization. It introduces 'Rejuvenation', a foundational training primitive that uses model fusion and neuron resets to enable robust reasoning gains during RL while preserving SFT-acquired knowledge.

Gisteren22 min

EP280: Trajectory Refined Distillation Fixes AI Reasoning

Title: Trajectory-Refined Distillation Source: http://arxiv.org/abs/2606.08432v1 Summary: This paper identifies and mitigates 'prefix failure' in on-policy distillation, a structural issue that hampers the efficiency of reasoning-scale post-training. By introducing trajectory-level corrections, it provides a foundational efficiency breakthrough that improves exploration and reasoning accuracy for large language models.

1 jul 202623 min

EP279: Ending AI amnesia with strategy cards

Title: ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems Source: http://arxiv.org/abs/2606.08702v1 Summary: ConMem establishes a novel framework for multi-agent adaptation using relation-aware memory graphs to distill and coordinate reusable strategies from historical trajectories. It represents a foundational advancement in agentic reasoning loops by enabling robust, training-free adaptation with significantly reduced planning overhead.

1 jul 202621 min

EP278: Hacking AI Agents with Fake Errors

Title: VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation Source: http://arxiv.org/abs/2606.07992v1 Summary: This study exposes a foundational vulnerability in agentic reasoning by identifying 'implicit authority' within error-handling loops as a primary vector for bypassing safety heuristics. It provides a critical analysis of the Model Context Protocol (MCP) and demonstrates how systematic mutations in tool feedback can compromise the integrity of autonomous agent workflows.

30 jun 202622 min

EP282: AI gladiators training in shopping arenas

Beschrijving

Reacties

Probeer 14 dagen gratis

Alle afleveringen