Hugging Face Trending Papers

Podcast by Code Coin Cognition LLC

englanti

Teknologia & tieteet

14 vrk ilmainen kokeilu

Kokeilun jälkeen 7,99 € / kuukausi.Peru milloin tahansa.

Podimon podcastit
Lataa offline-käyttöön

Aloita maksutta

Lisää Hugging Face Trending Papers

Stay ahead in AI with Hugging Face Trending Papers — your daily digest of trending ai research. Hosts break down the most talked-about papers in machine learning, LLMs, generative AI, and robotics in just few minutes. Clear, conversational insights on problems, methods, benchmarks, and real-world impact — no jargon overload. Perfect for researchers, engineers, students, and AI enthusiasts.

Kaikki jaksot

15 jaksot

Episode. 15: Real-Time AI: Video, Proactive LLMs & Text Structure

This episode explores groundbreaking AI research, featuring Helios, a real-time long video generation model; Proact-VL, a proactive VideoLLM for real-time AI companions; and T2S-Bench & Structure-of-Thought, a new benchmark and prompting technique for text-to-structure reasoning. ### Featured Papers* **Helios: Real Real-Time Long Video Generation Model** * **Key Insight:** Helios is the first 14B video generation model capable of real-time (19.5 FPS) minute-scale video generation on a single H100 GPU, achieving high quality by addressing long-video drifting and optimizing for efficiency. * **Paper Link:** [https://arxiv.org/pdf/2603.04379.pdf](https://arxiv.org/pdf/2603.04379.pdf)* **Proact-VL: A Proactive VideoLLM for Real-Time AI Companions** * **Key Insight:** Proact-VL introduces a framework for creating proactive, real-time interactive AI companions, particularly for gaming scenarios like commentators and guides, by enabling low-latency inference and autonomous decision-making. * **Paper Link:** [https://arxiv.org/pdf/2603.03447.pdf](https://arxiv.org/pdf/2603.03447.pdf)* **T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning** * **Key Insight:** This work introduces Structure-of-Thought, a prompting technique that guides models to construct intermediate text structures, and T2S-Bench, the first benchmark designed to evaluate and improve models' text-to-structure reasoning capabilities. * **Paper Link:** [https://arxiv.org/pdf/2603.03790.pdf](https://arxiv.org/pdf/2603.03790.pdf)

5. maalis 2026 - 10 min

Episode 14: Revolutionizing Deep Learning: The Rise of CUDA Agent and Agentic RL

# Hugging Face Trending Papers Episode Summary In this episode, we discuss two trending papers, "Large-Scale Agentic RL for High-Performance CUDA Kernel Generation" and "Language-Agnostic SWE Task Collection at Scale". The first paper presents CUDA Agent, a large-scale reinforcement learning system that optimizes GPUs for deep learning, and the second introduces SWE-rebench V2, a language-agnostic, automated pipeline for collecting real-world software engineering tasks for training software engineering agents. ## Papers Discussed - "Large-Scale Agentic RL for High-Performance CUDA Kernel Generation" introduces CUDA Agent, a system that fundamentally improves GPU optimization ability for deep learning using scalable data synthesis, skill-augmented CUDA development, and reinforcement learning techniques. The system achieves state-of-the-art results on KernelBench. [Read the paper](https://arxiv.org/pdf/2602.24286) - "Language-Agnostic SWE Task Collection at Scale" presents SWE-rebench V2, an automated pipeline for collecting real-world software engineering tasks and constructing reinforcement learning training environments at scale. The pipeline has constructed a dataset of 32,000+ tasks spanning 20 languages and 3,600+ repositories. [Read the paper](https://arxiv.org/pdf/2602.23866) ## Additional Links - Project page for CUDA Agent: [https://cuda-agent.github.io/](https://cuda-agent.github.io/) Remember to follow or subscribe for the latest in AI research, and stay curious!

5. maalis 2026 - 3 min

Episode 13: Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation **Source:** huggingface_daily **URL:** https://huggingface.co/papers/2511.14993 **Key Points:**- Problem: The research addresses the challenges in high-resolution image and video generation, particularly the scalability and computational complexity associa...- Method: The authors introduce Kandinsky 5.0, a family of foundation models comprising three core variants: Kandinsky 5.0 Image Lite, Kandinsky 5.0 Video Lite,...- Results: Kandinsky 5.0 achieves state-of-the-art performance in high-resolution image and 10-second video synthesis, demonstrating superior generation quality ...- Implications: Kandinsky 5.0 has significant implications for the research community by providing an open-source framework that advances the accessibility and develo...

21. marras 2025 - 2 min

Episode 12: Exploring Next-Gen AI: Interactive Scaling & Video-Based Reasoning

# Episode SummaryIn this episode of Hugging Face Trending Papers, we delve into the latest AI research with three top trending papers from arXiv. We explore MiroThinker's interaction scaling for open-source research agents, the new paradigm of "Thinking with Video" for multimodal reasoning, and Lumine's approach to building generalist AI agents for 3D open-world environments. # Mentioned Papers 1. ["MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling"](https://arxiv.org/pdf/2511.11793) - This paper presents MiroThinker, an open-source research agent that improves tool-augmented reasoning and information-seeking capabilities by focusing on efficient interaction scaling. 2. ["Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm"](https://arxiv.org/pdf/2511.04570) - The authors propose "Thinking with Video," a new paradigm that uses video generation models to bridge visual and textual reasoning, overcoming limitations of current "Thinking with Text" and "Thinking with Images" paradigms. 3. ["Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds"](https://arxiv.org/pdf/2511.08892) - Lumine introduces a recipe for developing AI agents capable of completing complex missions in 3D open-world environments, demonstrating strong zero-shot cross-game generalization.

19. marras 2025 - 3 min

Episode 11: Unlocking AI Reasoning: Breakthroughs in Looped Language Models

Papers discussed: 1. [Scaling Latent Reasoning via Looped Language Models](https://arxiv.org/pdf/2510.25741): This paper introduces a new kind of pre-trained looped language models, Ouro, which improves reasoning capabilities by integrating reasoning into the pre-training phase. The models have demonstrated superior performance due to enhanced knowledge manipulation capabilities. 2. [Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations](https://arxiv.org/pdf/2510.23607): The Concerto model combines 2D and 3D learning for improved spatial cognition in AI. This integration, involving 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding, has yielded promising results in 3D scene perception and set new benchmarks in scene understanding. 3. [RECODE: Unify Plan and Action for Universal Granularity Control](https://arxiv.org/pdf/2510.23564): RECODE is a new paradigm that unifies planning and action within a single code representation, facilitating dynamic control of decision granularity. This approach has proven effective in enhancing inference performance and training data efficiency.

2. marras 2025 - 5 min

Loistava design ja vihdoin on helppo löytää podcasteja, joista oikeasti tykkää

Kiva sovellus podcastien kuunteluun, ja sisältö on monipuolista ja kiinnostavaa

Todella kiva äppi, helppo käyttää ja paljon podcasteja, joita en tiennyt ennestään.

Valitse tilauksesi

Suosituimmat

Premium