MEMO: Memory as a Model

17 min · 24 de may de 2026

Descripción

MEMO (Memory as a Model), a modular framework designed to integrate new, domain-specific knowledge into Large Language Models (LLMs) without the need for expensive retraining. By encoding information into a dedicated, smaller MEMORY model while keeping the primary EXECUTIVE model frozen, the system avoids catastrophic forgetting and remains compatible with proprietary, closed-source models. The process involves a five-step data synthesis pipeline that converts raw documents into a structured question-answer dataset of "reflections" that capture complex, cross-document relationships. At inference, the EXECUTIVE model retrieves information through a structured multi-turn protocol, decomposing difficult queries into targeted sub-questions. Empirical results across multiple benchmarks demonstrate that MEMO is more robust to retrieval noise than standard methods and achieves superior performance by leveraging internalized parametric knowledge. Furthermore, the framework supports continual knowledge integration through model merging, allowing new data to be added efficiently while maintaining a retrieval cost that is independent of the overall corpus size.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Best AI papers explained!

Prueba gratis

Todos los episodios

754 episodios

Meta-Harness: End-to-End Optimization of Model Harnesses

This paper introduces Meta-Harness, an innovative system designed to automate harness engineering for large language models. Unlike traditional methods that rely on manual coding or compressed feedback, this system uses an agentic proposer to search through and optimize the code that governs how models store, retrieve, and process information. By utilizing a filesystem to access full execution traces and prior performance logs, the proposer can perform targeted edits and sophisticated program rewrites. Experimental results demonstrate that Meta-Harness outperforms human-engineered baselines and existing text optimizers across diverse tasks, including text classification, mathematical reasoning, and agentic coding. Ultimately, the research shows that providing automated agents with unfiltered access to historical experience enables the discovery of highly efficient, high-performance system architectures.

Ayer17 min

Self-Improving Language Models with Bidirectional Evolutionary Search

Researchers have developed Bidirectional Evolutionary Search (BES) to overcome the limitations of standard language model sampling, which often struggles with sparse feedback and predictable outputs. While traditional methods like tree search are confined to a narrow "entropy shell" of high-probability responses, BES escapes this range by using evolutionary operators such as crossover and translocation to recombine successful segments from different trajectories. Simultaneously, a backward search process decomposes complex goals into manageable sub-goals, providing the dense feedback necessary to guide the forward search. Theoretical analysis demonstrates that this dual approach can exponentially reduce the number of samples required to solve difficult reasoning problems. Experimental results confirm that BES significantly improves performance in both model training and real-time inference across logical, mathematical, and agentic tasks. By integrating genetic algorithms with goal decomposition, the framework enables models to discover novel, high-quality solutions that standard autoregressive generation would likely miss.

1 de jun de 202620 min

Generative Modeling via Drifting

This paper discusses Drifting Models, a novel generative modeling paradigm that enables high-quality, one-step image generation without the iterative inference required by diffusion or flow-matching models. Instead of decomposing transformations at the sampling stage, this method evolves a pushforward distribution during the training process by utilizing a neural network optimizer. The core mechanism is a drifting field governed by an anti-symmetric property, which uses positive data samples for attraction and generated negative samples for repulsion to achieve a state of equilibrium. This approach minimizes a training-time loss based on the movement of samples, effectively shifting the iterative complexity from the user's inference phase to the model's optimization phase. To handle high-dimensional data like images, the researchers implement the drifting loss within a multi-scale feature space using self-supervised encoders such as latent-MAE. Their results demonstrate state-of-the-art performance on ImageNet 256×256, achieving superior FID scores in both latent and pixel spaces. Furthermore, the model's versatility is highlighted by its success in robotic control tasks, where it matches or exceeds the performance of traditional multi-step diffusion policies.

31 de may de 202621 min

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

This paper addresses the cost-efficient evaluation of large language models (LLMs) by utilizing multiple AI "judges" with different price points and reliability levels. The researchers formalize this challenge as budgeted heteroskedastic multi-judge estimation, seeking an optimal way to distribute a limited budget across various judges and tasks to achieve the most accurate quality scores. They introduce EST-IVWE, an adaptive algorithm that learns the unknown variances of different judges and assigns resources to those providing the best cost-to-variance trade-off. Through rigorous proofs, the authors demonstrate that their approach is instance-optimal, meaning it achieves the best possible accuracy for any specific set of judges and prompts. Furthermore, the paper provides a theoretical breakthrough by showing that specialized mathematical arguments are required to capture the true geometric structure of this allocation problem. Numerical experiments on synthetic and real-world datasets confirm that this adaptive strategy significantly outperforms simple uniform budgeting.

31 de may de 202621 min

Robust AI Personalization Will Require a Human Context Protocol

This paper proposes the Human Context Protocol (HCP), a technical framework designed to give individuals direct control over how their personal preferences shape AI interactions. Currently, AI personalization relies on fragmented data silos and behavioral inferences that often fail to reflect a user’s true intent or values. By establishing a user-owned preference layer, the protocol allows people to securely store and share specific subsets of their data across different AI services using natural language. This architecture aims to reduce provider lock-in and ensure that artificial intelligence remains aligned with diverse human perspectives. Ultimately, the authors argue that such a system is a legal and ethical necessity for fostering a competitive, transparent, and truly personalized digital ecosystem.

29 de may de 202622 min

MEMO: Memory as a Model

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios