Women in AI Research (WiAIR)
What does it actually mean for a model to understand audio Paper: https://arxiv.org/abs/2601.19673 [https://arxiv.org/abs/2601.19673] In this episode, I talk with Iwona Christop, a PhD student at Adam Mickiewicz University, about her recent EACL paper introducing ART (Audio Reasoning Tasks) — a new benchmark designed to evaluate whether multimodal LLMs can truly reason over audio, not just transcribe or classify it. Most existing benchmarks test audio skills in isolation (like ASR or classification). But real-world intelligence requires something deeper: combining signals, comparing sounds, tracking context, and making decisions. This work takes a different approach: * No text-only shortcuts — tasks can’t be solved via transcription alone * Reasoning-first design — models must combine multiple audio cues * No expert knowledge required — anyone can verify correctness We also dive into the diverse task design, including: * Audio arithmetic (counting and comparing sounds) * Cross-recording speaker & language identification * Sound-based reasoning (e.g., inferring properties from audio) * Speech feature comparison (accents, variations) * Multimodal reasoning across text and sound The dataset includes 9 tasks, 9,000 samples, and 30+ hours of audio — all generated in a scalable way using templates and TTS. 👉 If you care about multimodal reasoning, evaluation, or the limits of current LLM capabilities, this conversation is for you. Iwona Christop: https://www.linkedin.com/in/iwona-christop/ [https://www.linkedin.com/in/iwona-christop/] 👍 Like & subscribe for more deep dives into cutting-edge AI research 🔔 New episodes from EACL 2026 coming soon #WiAIR #EACL2026
31 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y únete a la comunidad de Women in AI Research (WiAIR)!