Instance-Optimal Estimation with Multiple LLM Judges on a Budget

21 min · Ayer

Descripción

This paper addresses the cost-efficient evaluation of large language models (LLMs) by utilizing multiple AI "judges" with different price points and reliability levels. The researchers formalize this challenge as budgeted heteroskedastic multi-judge estimation, seeking an optimal way to distribute a limited budget across various judges and tasks to achieve the most accurate quality scores. They introduce EST-IVWE, an adaptive algorithm that learns the unknown variances of different judges and assigns resources to those providing the best cost-to-variance trade-off. Through rigorous proofs, the authors demonstrate that their approach is instance-optimal, meaning it achieves the best possible accuracy for any specific set of judges and prompts. Furthermore, the paper provides a theoretical breakthrough by showing that specialized mathematical arguments are required to capture the true geometric structure of this allocation problem. Numerical experiments on synthetic and real-world datasets confirm that this adaptive strategy significantly outperforms simple uniform budgeting.

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Best AI papers explained!

Prueba gratis

Todos los episodios

752 episodios

Generative Modeling via Drifting

This paper discusses Drifting Models, a novel generative modeling paradigm that enables high-quality, one-step image generation without the iterative inference required by diffusion or flow-matching models. Instead of decomposing transformations at the sampling stage, this method evolves a pushforward distribution during the training process by utilizing a neural network optimizer. The core mechanism is a drifting field governed by an anti-symmetric property, which uses positive data samples for attraction and generated negative samples for repulsion to achieve a state of equilibrium. This approach minimizes a training-time loss based on the movement of samples, effectively shifting the iterative complexity from the user's inference phase to the model's optimization phase. To handle high-dimensional data like images, the researchers implement the drifting loss within a multi-scale feature space using self-supervised encoders such as latent-MAE. Their results demonstrate state-of-the-art performance on ImageNet 256×256, achieving superior FID scores in both latent and pixel spaces. Furthermore, the model's versatility is highlighted by its success in robotic control tasks, where it matches or exceeds the performance of traditional multi-step diffusion policies.

Ayer21 min

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Ayer21 min

Robust AI Personalization Will Require a Human Context Protocol

This paper proposes the Human Context Protocol (HCP), a technical framework designed to give individuals direct control over how their personal preferences shape AI interactions. Currently, AI personalization relies on fragmented data silos and behavioral inferences that often fail to reflect a user’s true intent or values. By establishing a user-owned preference layer, the protocol allows people to securely store and share specific subsets of their data across different AI services using natural language. This architecture aims to reduce provider lock-in and ensure that artificial intelligence remains aligned with diverse human perspectives. Ultimately, the authors argue that such a system is a legal and ethical necessity for fostering a competitive, transparent, and truly personalized digital ecosystem.

29 de may de 202622 min

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

This paper introduces Equilibrium Reasoners (EqR), a novel framework that conceptualizes iterative AI reasoning as a dynamical system converging toward stable latent attractors. By treating the reasoning process as a series of repeated updates to an internal state, the researchers demonstrate that models can scale performance at test-time by simply increasing the number of iterations (depth) or using multiple random starts (breadth). This approach allows a model trained on only 16 iterations to generalize to over 1,000 steps during inference, effectively unrolling the equivalent of 40,000 neural layers. This "attractor perspective" ensures that as the system reaches a mathematical equilibrium, it simultaneously settles on a correct task solution, resulting in near-perfect accuracy on complex benchmarks like Sudoku-Extreme and Maze-Unique. Ultimately, the research proves that aligning a model's internal landscape with task-specific goals enables adaptive computation, where harder problems receive more processing power to reach a valid conclusion.

27 de may de 202617 min

Position: The Pre/Post-Training Boundary Should Govern IP in Industry–Academia ML Collaborations

This paper proposes a new contractual framework called PBOS to resolve persistent intellectual property conflicts in industry-academia machine learning collaborations. By involving scientists in legal negotiations, the authors suggest a clear division based on the pre/post-training boundary of a model. Under this model, pre-training artifacts such as code and architectures are treated as open science, while post-training weights derived from proprietary data remain protected corporate assets. This approach ensures researchers can fulfill academic publication requirements without compromising a company's competitive advantage. Ultimately, the framework aims to reduce the high transaction costs and legal delays that currently prevent many valuable large-scale research partnerships.

25 de may de 202612 min

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios