Hugging Face Trending Papers

Podcast von Code Coin Cognition LLC

Englisch

Wissenschaft & Technologie

Loslegen

Begrenztes Angebot

2 Monate für 1 €

Dann 4,99 € / MonatJederzeit kündbar.

20 Stunden Hörbücher / Monat
Podcasts nur bei Podimo
Alle kostenlosen Podcasts

Loslegen

Mehr Hugging Face Trending Papers

Stay ahead in AI with Hugging Face Trending Papers — your daily digest of trending ai research. Hosts break down the most talked-about papers in machine learning, LLMs, generative AI, and robotics in just few minutes. Clear, conversational insights on problems, methods, benchmarks, and real-world impact — no jargon overload. Perfect for researchers, engineers, students, and AI enthusiasts.

Alle Folgen

15 Folgen

Episode. 15: Real-Time AI: Video, Proactive LLMs & Text Structure

This episode explores groundbreaking AI research, featuring Helios, a real-time long video generation model; Proact-VL, a proactive VideoLLM for real-time AI companions; and T2S-Bench & Structure-of-Thought, a new benchmark and prompting technique for text-to-structure reasoning. ### Featured Papers* **Helios: Real Real-Time Long Video Generation Model** * **Key Insight:** Helios is the first 14B video generation model capable of real-time (19.5 FPS) minute-scale video generation on a single H100 GPU, achieving high quality by addressing long-video drifting and optimizing for efficiency. * **Paper Link:** [https://arxiv.org/pdf/2603.04379.pdf](https://arxiv.org/pdf/2603.04379.pdf)* **Proact-VL: A Proactive VideoLLM for Real-Time AI Companions** * **Key Insight:** Proact-VL introduces a framework for creating proactive, real-time interactive AI companions, particularly for gaming scenarios like commentators and guides, by enabling low-latency inference and autonomous decision-making. * **Paper Link:** [https://arxiv.org/pdf/2603.03447.pdf](https://arxiv.org/pdf/2603.03447.pdf)* **T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning** * **Key Insight:** This work introduces Structure-of-Thought, a prompting technique that guides models to construct intermediate text structures, and T2S-Bench, the first benchmark designed to evaluate and improve models' text-to-structure reasoning capabilities. * **Paper Link:** [https://arxiv.org/pdf/2603.03790.pdf](https://arxiv.org/pdf/2603.03790.pdf)

5. März 2026 - 10 min

Episode 14: Revolutionizing Deep Learning: The Rise of CUDA Agent and Agentic RL

# Hugging Face Trending Papers Episode Summary In this episode, we discuss two trending papers, "Large-Scale Agentic RL for High-Performance CUDA Kernel Generation" and "Language-Agnostic SWE Task Collection at Scale". The first paper presents CUDA Agent, a large-scale reinforcement learning system that optimizes GPUs for deep learning, and the second introduces SWE-rebench V2, a language-agnostic, automated pipeline for collecting real-world software engineering tasks for training software engineering agents. ## Papers Discussed - "Large-Scale Agentic RL for High-Performance CUDA Kernel Generation" introduces CUDA Agent, a system that fundamentally improves GPU optimization ability for deep learning using scalable data synthesis, skill-augmented CUDA development, and reinforcement learning techniques. The system achieves state-of-the-art results on KernelBench. [Read the paper](https://arxiv.org/pdf/2602.24286) - "Language-Agnostic SWE Task Collection at Scale" presents SWE-rebench V2, an automated pipeline for collecting real-world software engineering tasks and constructing reinforcement learning training environments at scale. The pipeline has constructed a dataset of 32,000+ tasks spanning 20 languages and 3,600+ repositories. [Read the paper](https://arxiv.org/pdf/2602.23866) ## Additional Links - Project page for CUDA Agent: [https://cuda-agent.github.io/](https://cuda-agent.github.io/) Remember to follow or subscribe for the latest in AI research, and stay curious!

5. März 2026 - 3 min

Episode 13: Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation **Source:** huggingface_daily **URL:** https://huggingface.co/papers/2511.14993 **Key Points:**- Problem: The research addresses the challenges in high-resolution image and video generation, particularly the scalability and computational complexity associa...- Method: The authors introduce Kandinsky 5.0, a family of foundation models comprising three core variants: Kandinsky 5.0 Image Lite, Kandinsky 5.0 Video Lite,...- Results: Kandinsky 5.0 achieves state-of-the-art performance in high-resolution image and 10-second video synthesis, demonstrating superior generation quality ...- Implications: Kandinsky 5.0 has significant implications for the research community by providing an open-source framework that advances the accessibility and develo...

21. Nov. 2025 - 2 min

Episode 12: Exploring Next-Gen AI: Interactive Scaling & Video-Based Reasoning

# Episode SummaryIn this episode of Hugging Face Trending Papers, we delve into the latest AI research with three top trending papers from arXiv. We explore MiroThinker's interaction scaling for open-source research agents, the new paradigm of "Thinking with Video" for multimodal reasoning, and Lumine's approach to building generalist AI agents for 3D open-world environments. # Mentioned Papers 1. ["MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling"](https://arxiv.org/pdf/2511.11793) - This paper presents MiroThinker, an open-source research agent that improves tool-augmented reasoning and information-seeking capabilities by focusing on efficient interaction scaling. 2. ["Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm"](https://arxiv.org/pdf/2511.04570) - The authors propose "Thinking with Video," a new paradigm that uses video generation models to bridge visual and textual reasoning, overcoming limitations of current "Thinking with Text" and "Thinking with Images" paradigms. 3. ["Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds"](https://arxiv.org/pdf/2511.08892) - Lumine introduces a recipe for developing AI agents capable of completing complex missions in 3D open-world environments, demonstrating strong zero-shot cross-game generalization.

19. Nov. 2025 - 3 min

Episode 11: Unlocking AI Reasoning: Breakthroughs in Looped Language Models

Papers discussed: 1. [Scaling Latent Reasoning via Looped Language Models](https://arxiv.org/pdf/2510.25741): This paper introduces a new kind of pre-trained looped language models, Ouro, which improves reasoning capabilities by integrating reasoning into the pre-training phase. The models have demonstrated superior performance due to enhanced knowledge manipulation capabilities. 2. [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://arxiv.org/pdf/2510.23607): The Concerto model combines 2D and 3D learning for improved spatial cognition in AI. This integration, involving 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding, has yielded promising results in 3D scene perception and set new benchmarks in scene understanding. 3. [RECODE: Unify Plan and Action for Universal Granularity Control](https://arxiv.org/pdf/2510.23564): RECODE is a new paradigm that unifies planning and action within a single code representation, facilitating dynamic control of decision granularity. This approach has proven effective in enhancing inference performance and training data efficiency.

2. Nov. 2025 - 5 min

Super gut, sehr abwechslungsreich Podimo kann man nur weiterempfehlen

Ich liebe Podcasts, Hörbücher u. -spiele, Dokus usw. Hier habe ich genügend Auswahl. Macht 👍 weiter so

Wähle dein Abonnement

Am beliebtesten

Begrenztes Angebot

Premium

20 Stunden Hörbücher