Cognition, Contracts, and Compression

Descripción

Generated Google NotebookLM. Episode Description: In this episode, we explore 10 new papers advancing our understanding of how LLMs think, how agents can be trusted, and how systems can scale more efficiently: * What LLMs really "know" – UCCT proposes a formal theory of cognition in LLMs, arguing intelligence is emergent and context-triggered—not intrinsic. * Rethinking RAG – CoCoA and CoCoA-zero show how multi-agent collaboration improves synergy between internal model memory and retrieved context. * Efficiency, by design – Efficient Agents sheds light on cost/performance trade-offs in agent systems, while Blueprint First separates logic from generation to enable deterministic workflows. * Contrastive learning, upgraded – Context-Adaptive Multi-Prompt Embedding improves vision-language alignment with adaptive token prompts and diversity constraints. * Inference-time teaming – CTTS scales up LLM performance via collective test-time scaling, using reward model ensembles and agent collaboration. * At the edge – A new adaptive agent placement and migration framework uses LLMs and ant colony optimization to meet real-time edge constraints. * Smarter chains of thought – A step entropy metric allows LLMs to prune redundant reasoning during inference, improving cost-efficiency without sacrificing accuracy. * Quantization, vision-style – VLMQ brings post-training quantization to Vision-Language Models, optimizing for both modality balance and efficiency. * Reliable by contract – A Design-by-Contract–inspired layer enables neurosymbolic agents to enforce input-output constraints, offering a formal basis for agent safety. From the nature of LLM cognition to practical methods for verifiable, scalable deployment, this episode highlights where theory meets engineering—and where structure enhances trust. Sources: * The Unified Cognitive Consciousness Theory for Language Models (UCCT) [https://arxiv.org/pdf/2506.02139] | HTML [https://arxiv.org/html/2506.02139v4] * CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy [https://arxiv.org/pdf/2508.01696] | HTML [https://arxiv.org/html/2508.01696v2] * Efficient Agents: Building Effective Agents While Reducing Cost [https://arxiv.org/pdf/2508.02694] | HTML [https://arxiv.org/html/2508.02694v1] * Blueprint First, Model Second: A Framework for Deterministic LLM Workflow [https://arxiv.org/pdf/2508.02721] | HTML [https://arxiv.org/html/2508.02721v1] * Context-Adaptive Multi-Prompt LLM Embedding for Vision-Language Alignment [https://arxiv.org/pdf/2508.02762] | HTML [https://arxiv.org/html/2508.02762v1] * CTTS: Collective Test-Time Scaling [https://arxiv.org/pdf/2508.03333] | HTML [https://arxiv.org/html/2508.03333v1] * Adaptive AI Agent Placement and Migration in Edge Intelligence Systems [https://arxiv.org/pdf/2508.03345] | HTML [https://arxiv.org/html/2508.03345v1] * Compressing Chain-of-Thought in LLMs via Step Entropy [https://arxiv.org/pdf/2508.03346] | HTML [https://arxiv.org/html/2508.03346v1] * VLMQ: Efficient Post-Training Quantization for Vision-Language Models [https://arxiv.org/pdf/2508.03351] | HTML [https://arxiv.org/html/2508.03351v1] * A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design [https://arxiv.org/pdf/2508.03665] | HTML [https://arxiv.org/html/2508.03665v1]

Planning Agents, Emotional Bias, and Trustworthy Responses

Generated with Google NotebookLM. This episode dives into 16 cutting-edge papers that reimagine how LLMs plan, adapt, reason—and stay safe doing it: * Planning meets population play – STRATEGIST lets LLMs refine high-level strategies via text and execute them with Monte Carlo precision, rivaling humans in multi-turn games. * Does tone steer truth? – A systematic study finds GPT-4 resists negative prompt bias—until it doesn’t—revealing tone-induced semantic drift and suppressed emotional alignment. * Geometric insight – Curved Inference tracks how prompts bend the LLM’s residual stream, exposing layers of latent concern and meaning through salience and curvature. * Smarter retrieval, lighter load – SemRAG blends semantic chunking with knowledge graphs to turbocharge domain-specific RAG without the finetuning tax. * Visual agents that learn – VizGenie evolves itself through LLM-generated code and VQA, slashing overhead in scientific visualization tasks. * Tech mapping on autopilot – RATE uses LLMs to extract and validate key tech terms from papers, building networks that outperform BERT-based extractors by 70% F1. * Trust in high-stakes moments – Some models play it safe; others don’t. Sycophancy, clarifying questions, and activation vectors reveal how cautious AI can be shaped. * Guardrails, reimagined – OneShield provides a plug-and-play compliance layer to tailor LLM behavior across privacy, ethics, and safety. * Built-in sabotage defense – SDD defangs malicious fine-tuning by teaching models to answer harmful prompts with elegant irrelevance. * Wireless compositionality – ContextLoRA and ContextGear let one LLM handle multiple multimodal mobile tasks efficiently, backed by task graphs and fine-tuned adaptation. * Measuring uncertainty—properly – A Shapley-based metric replaces naive entropy to better predict when LLMs are bluffing. * Structure for thinking agents – Graph-Augmented LLM Agents use graphs for better planning, tool use, memory, and MAS coordination. * Due diligence done right – A rigorous RAG evaluation protocol blends human and LLM judgment for statistical reliability—perfect for finance and healthcare use cases. * RL, no humans required – RLSF lets models learn from their own confidence levels, improving calibration and reasoning without labels or gold data. * LLMs that plan on phones – MapAgent builds page memory from task traces to navigate mobile UIs with fine-grained, trajectory-aware precision. These papers showcase a new class of agents: introspective, modular, cautious, and capable of evolving workflows across scientific, mobile, and safety-critical contexts. Sources: https://doi.org/10.48550/arXiv.2408.10635 [https://doi.org/10.48550/arXiv.2408.10635] https://doi.org/10.48550/arXiv.2507.21083 [https://doi.org/10.48550/arXiv.2507.21083] https://doi.org/10.48550/arXiv.2507.21107 [https://doi.org/10.48550/arXiv.2507.21107] https://doi.org/10.48550/arXiv.2507.21110 [https://doi.org/10.48550/arXiv.2507.21110] https://doi.org/10.48550/arXiv.2507.21124 [https://doi.org/10.48550/arXiv.2507.21124] https://doi.org/10.48550/arXiv.2507.21125 [https://doi.org/10.48550/arXiv.2507.21125] https://doi.org/10.48550/arXiv.2507.21132 [https://doi.org/10.48550/arXiv.2507.21132] https://doi.org/10.48550/arXiv.2507.21170 [https://doi.org/10.48550/arXiv.2507.21170] https://doi.org/10.48550/arXiv.2507.21182 [https://doi.org/10.48550/arXiv.2507.21182] https://doi.org/10.48550/arXiv.2507.21199 [https://doi.org/10.48550/arXiv.2507.21199] https://doi.org/10.48550/arXiv.2507.21406 [https://doi.org/10.48550/arXiv.2507.21406] https://doi.org/10.48550/arXiv.2507.21407 [https://doi.org/10.48550/arXiv.2507.21407] https://doi.org/10.48550/arXiv.2507.21753 [https://doi.org/10.48550/arXiv.2507.21753] https://doi.org/10.48550/arXiv.2507.21931 [https://doi.org/10.48550/arXiv.2507.21931] https://doi.org/10.48550/arXiv.2507.21953 [https://doi.org/10.48550/arXiv.2507.21953]

30 de jul de 20251 h 16 min

Cognition, Contracts, and Compression

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios