COEY Cast

Open Source Vibe Check with VibeVoice and MOSS Audio

1 h 0 min · 1. maj 2026
episode Open Source Vibe Check with VibeVoice and MOSS Audio cover

Beskrivelse

Microsoft's open source VibeVoice puts real pressure on audio workflows with multilingual transcription, speaker tracking, timestamps, and long context that can turn recordings into searchable assets. MOSS Audio adds a broader layer of audio understanding with emotion cues, music recognition, sound events, and time aware analysis that could help media teams mine podcasts, calls, ads, and live recordings for actual insight. Then Eva Brain enters with a bigger question for marketers: which parts of campaign management can agents really handle, and where do humans still need to lead? The bigger takeaway is simple. The model matters, but the workflow matters more when teams want automation that is useful, reliable, and still grounded in human judgment.

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af COEY Cast-fællesskabet!

Kom i gang

2 måneder kun 19 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

180 episoder

episode Open Source Vibe Check with VibeVoice and MOSS Audio cover

Open Source Vibe Check with VibeVoice and MOSS Audio

Microsoft's open source VibeVoice puts real pressure on audio workflows with multilingual transcription, speaker tracking, timestamps, and long context that can turn recordings into searchable assets. MOSS Audio adds a broader layer of audio understanding with emotion cues, music recognition, sound events, and time aware analysis that could help media teams mine podcasts, calls, ads, and live recordings for actual insight. Then Eva Brain enters with a bigger question for marketers: which parts of campaign management can agents really handle, and where do humans still need to lead? The bigger takeaway is simple. The model matters, but the workflow matters more when teams want automation that is useful, reliable, and still grounded in human judgment.

1. maj 20261 h 0 min
episode OpenAI GPT 5.5 Ships Quietly, Workflows Loudly cover

OpenAI GPT 5.5 Ships Quietly, Workflows Loudly

OpenAI dropped GPT 5.5 into the API with a huge context window, stronger reasoning, and deeper tool use, and the bigger story is how fast teams can put it to work. This covers why quiet launches matter more than flashy keynotes when marketers, creators, and operators need real workflow gains. It also digs into where automation actually helps first, from research briefs and call note synthesis to support flows with clean guardrails. ElevenLabs adds voice agent templates that make testing easier, while MiniMax Music 2.6 lowers the cost of experimenting with AI audio. The throughline is simple: AI is getting less performative and more operational, and the winners will be teams that ship practical systems with humans still making the calls.

28. apr. 20261 h 0 min
episode Audio Flamingo Next and the Rise of Specialist AI cover

Audio Flamingo Next and the Rise of Specialist AI

AI is getting less monolithic and more specialized, and that shift matters for anyone building real workflows. OpenAI’s GPT Rosalind signals that domain specific models are becoming a serious enterprise play. Higgsfield’s sci fi pilot shows AI video is pushing past flashy clips into longer form storytelling and faster pre production. NVIDIA’s open Audio Flamingo Next points to practical wins in podcast mining, searchable archives, call review, and media repurposing. The throughline is simple. General models still help orchestrate the stack, but specialist systems are where trust, depth, and format specific performance start to matter most. The real advantage comes from designing around recurring jobs, not chasing every shiny model release.

19. apr. 20261 h 0 min
episode Microsoft Foundry Gets Voice, Images, and Transcripts cover

Microsoft Foundry Gets Voice, Images, and Transcripts

Microsoft just bundled MAI Transcribe 1, MAI Voice 1, and MAI Image 2 into Foundry, giving teams one place to handle transcription, synthetic voice, and image generation inside enterprise workflows. That sounds convenient, and it is, but it also raises the classic question of speed versus lock in. The conversation also digs into Audio Omni and why unified audio models could become real creative partners for editing, localization, sound design, and campaign iteration. Then it shifts to the less flashy but more important layer of AI adoption: rights, provenance, royalties, and governance. The real advantage is not stacking more models. It is building workflows that stay modular, accountable, and useful when real teams have to ship.

17. apr. 20261 h 0 min