Shared Everything

How TACC Pushed Supercomputing Toward an IO-First Architecture

37 min · 4. dec. 2025
episode How TACC Pushed Supercomputing Toward an IO-First Architecture cover

Beskrivelse

On today's episode of the Shared Everything podcast, Nicole is live at SC25 with Dan Stanzione, Executive Director of the Texas Advanced Computing Center (TACC), for a look at why Horizon required a fundamental architectural reset. Stanzione explains how rising GPU power densities, liquid cooled 20 megawatt racks, and an increasingly irregular IO profile forced TACC to abandon long held assumptions about parallel filesystems. Years of watching billions of tiny files, unpredictable 4k and 64k reads, and metadata stalls slow entire machines led them to an all solid state tier and a VAST global namespace built for resilience, consistency, and shared access at scale. He describes how this model simplifies AI and hybrid scientific workflows, why the file system has always been the real point of failure, and how Horizon’s architecture reflects a world where IO, not FLOPS, determines what large scale science can do next.

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af Shared Everything-fællesskabet!

Kom i gang

2 måneder kun 19 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

23 episoder

episode How TACC Pushed Supercomputing Toward an IO-First Architecture cover

How TACC Pushed Supercomputing Toward an IO-First Architecture

On today's episode of the Shared Everything podcast, Nicole is live at SC25 with Dan Stanzione, Executive Director of the Texas Advanced Computing Center (TACC), for a look at why Horizon required a fundamental architectural reset. Stanzione explains how rising GPU power densities, liquid cooled 20 megawatt racks, and an increasingly irregular IO profile forced TACC to abandon long held assumptions about parallel filesystems. Years of watching billions of tiny files, unpredictable 4k and 64k reads, and metadata stalls slow entire machines led them to an all solid state tier and a VAST global namespace built for resilience, consistency, and shared access at scale. He describes how this model simplifies AI and hybrid scientific workflows, why the file system has always been the real point of failure, and how Horizon’s architecture reflects a world where IO, not FLOPS, determines what large scale science can do next.

4. dec. 202537 min