Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring

1 h 0 min · 19 de may de 2026

Descripción

## Episode Summary In this episode, we cover: - **Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.16386) - **Evaluating Cognitive Age Alignment in Interactive AI Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17894) - **DexHoldem: Playing Texas Hold'em with Dexterous Embodied System** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18727) - **SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18630) - **AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15565) --- *Sponsored by LimitLess AI*

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Unzip!

Prueba gratis

Todos los episodios

82 episodios

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

## Episode Summary In this episode, we cover: - **PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.26730) - **DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.30350) - **CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.24786) - **Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents** (arXiv) - [Read more](http://arxiv.org/abs/2605.30335v1) - **Reflective Prompt Tuning through Language Model Function-Calling** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.21781) --- *Sponsored by LimitLess AI*

Ayer1 h 0 min

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

## Episode Summary In this episode, we cover: - **PANDO: Efficient Multimodal AI Agents via Online Skill Distillation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.24785) - **YoCausal: How Far is Video Generation from World Model? A Causality Perspective** (arXiv) - [Read more](http://arxiv.org/abs/2605.30346v1) - **CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.29271) - **Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.30344) - **Benchmarking Single-Factor Physical Video-to-Audio Generation** (arXiv) - [Read more](http://arxiv.org/abs/2605.30339v1) --- *Sponsored by LimitLess AI*

30 de may de 20261 h 0 min

Forecasting Downstream Performance of LLMs With Proxy Metrics

## Episode Summary In this episode, we cover: - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.20244) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **Forecasting Scientific Progress with Artificial Intelligence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22681) --- *Sponsored by LimitLess AI*

24 de may de 20261 h 0 min

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

## Episode Summary In this episode, we cover: - **Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22717) - **DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback** (arXiv) - [Read more](http://arxiv.org/abs/2605.22781v1) - **AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.17602) - **"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.21363) - **Forecasting Downstream Performance of LLMs With Proxy Metrics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.18607) --- *Sponsored by LimitLess AI*

23 de may de 20261 h 0 min

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

## Episode Summary In this episode, we cover: - **Efficient Agentic Reasoning Through Self-Regulated Simulative Planning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22138) - **AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation** (arXiv) - [Read more](http://arxiv.org/abs/2605.22816v1) - **Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.15669) - **Cambrian-P: Pose-Grounded Video Understanding** (arXiv) - [Read more](http://arxiv.org/abs/2605.22819v1) - **SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.22668) --- *Sponsored by LimitLess AI*

22 de may de 20261 h 0 min

Auditing Multimodal LLM Raters: Central Tendency Bias in Clinical Ordinal Scoring

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios