EACL 2026: LLMs Can Hear… But Can They Reason? A New Benchmark for Audio Intelligence

Descripción

What does it actually mean for a model to understand audio Paper: https://arxiv.org/abs/2601.19673 [https://arxiv.org/abs/2601.19673] In this episode, I talk with Iwona Christop, a PhD student at Adam Mickiewicz University, about her recent EACL paper introducing ART (Audio Reasoning Tasks) — a new benchmark designed to evaluate whether multimodal LLMs can truly reason over audio, not just transcribe or classify it. Most existing benchmarks test audio skills in isolation (like ASR or classification). But real-world intelligence requires something deeper: combining signals, comparing sounds, tracking context, and making decisions. This work takes a different approach: * No text-only shortcuts — tasks can’t be solved via transcription alone * Reasoning-first design — models must combine multiple audio cues * No expert knowledge required — anyone can verify correctness We also dive into the diverse task design, including: * Audio arithmetic (counting and comparing sounds) * Cross-recording speaker & language identification * Sound-based reasoning (e.g., inferring properties from audio) * Speech feature comparison (accents, variations) * Multimodal reasoning across text and sound The dataset includes 9 tasks, 9,000 samples, and 30+ hours of audio — all generated in a scalable way using templates and TTS. 👉 If you care about multimodal reasoning, evaluation, or the limits of current LLM capabilities, this conversation is for you. Iwona Christop: https://www.linkedin.com/in/iwona-christop/ [https://www.linkedin.com/in/iwona-christop/] 👍 Like & subscribe for more deep dives into cutting-edge AI research 🔔 New episodes from EACL 2026 coming soon #WiAIR #EACL2026

Can Language Alone Create Intelligence? Insights from Neuroscience and AI, with Dr. Anna Ivanova

Do large language models truly understand language—or are they sophisticated pattern matchers? In this conversation, Dr. Anna Ivanova (Asst. Prof. at Georgia Tech) explores one of the important questions in AI: the relationship between language, thought, and intelligence. Drawing from neuroscience, cognitive science, and AI research, Anna explains why language understanding is harder to define than most people realize, why reasoning and language are not the same thing, and what today's LLMs can and cannot tell us about human cognition. Key Topics: * Do LLMs understand language or merely generate convincing text? * The difference between formal and functional linguistic competence * What LLMs can learn from language alone—and what they cannot * Why human cognition and AI cognition may be fundamentally different * Theory of mind, reasoning, and common misconceptions about AI capabilities * How cognitive scientists evaluate the "thinking" abilities of LLMs * What neuroscience can teach AI researchers about interpretability * Why understanding AI requires studying both behavior and internal representations * The future of multimodal models and AI cognition Resources & Links: * What does it mean to understand language? [https://arxiv.org/abs/2511.19757] * Dissociating language and thought in large language models [https://static1.squarespace.com/static/64c800a2f333f04f50bf2020/t/66f46f57900cdf4b7060e3ec/1727295320856/Mahowald_Ivanova_et_al_2024_TiCS.pdf] * How to evaluate the cognitive abilities of LLMs [https://drive.google.com/file/d/1OPCTpkluqs8xY3Bs6_--18eXU9u06oEN/view] * How Do LLMs Use Their Depth? [https://arxiv.org/abs/2510.18871] * True Lens [https://github.com/AlignmentResearch/tuned-lens] Connect with Dr. Anna Ivanova: https://bsky.app/profile/neuranna.bsky.social [https://bsky.app/profile/neuranna.bsky.social] https://x.com/neuranna [https://x.com/neuranna] 🎧 Subscribe to stay updated on new episodes spotlighting brilliant women shaping the future of AI. Follow WiAIR at: * LinkedIn [https://www.linkedin.com/company/women-in-ai-research/ ] * Bluesky [https://bsky.app/profile/wiair.bsky.social] * X (Twitter) [https://x.com/WiAIR_podcast] * ⁠WiAIR website⁠ [https://women-in-ai-research.github.io]

17 de jun de 20261 h 11 min

EACL 2026: LLMs Can Hear… But Can They Reason? A New Benchmark for Audio Intelligence

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios