AI Post Transformers
This episode explores Google’s Snap system, which moves major host-networking functions out of the kernel and into isolated userspace services while trying to keep the performance benefits usually associated with kernel bypass. It examines why that shift mattered operationally at fleet scale: kernel networking changes could take one to two months to deploy, while Snap enabled roughly weekly releases and had already been adopted across more than half of Google’s machines. The discussion breaks down Snap’s architecture, including centralized host services, microkernel-style isolation, lock-free engine communication, the MicroQuanta scheduler design, latency-sensitive congestion control, and Pony Express as a flagship transport for reliable, asynchronous messaging. Listeners would find it interesting because it frames host networking as a platform-design problem, not just a packet-speed problem, and argues that upgradeability, policy control, and performance can be engineered together rather than traded off. Sources: 1. Snap's Microkernel Approach to Host Networking https://storage.googleapis.com/gweb-research2023-media/pubtools/5281.pdf 2. L4 Microkernels: The Lessons from 20 Years of Research and Deployment — Gernot Heiser, Kevin Elphinstone, 2016 https://trustworthy.systems/publications/nicta_full_text/8988.pdf 3. Arrakis: The Operating System is the Control Plane — Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, Timothy Roscoe, 2014 https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter 4. Snap: a Microkernel Approach to Host Networking — Michael Marty, Marc de Kruijf, Jacob Adriaens, Nandita Dukkipati, Amin Vahdat, et al., 2019 https://research.google/pubs/snap-a-microkernel-approach-to-host-networking/ 5. netmap: A Novel Framework for Fast Packet I/O — Luigi Rizzo, 2012 https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzo 6. mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems — EunYoung Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, KyoungSoo Park, 2014 https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-jeong.pdf 7. IX: A Protected Dataplane Operating System for High Throughput and Low Latency — Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, Edouard Bugnion, 2014 https://csl.stanford.edu/~christos/publications/2014.ix.osdi.pdf 8. VL2: A Scalable and Flexible Data Center Network — Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, Dave Maltz, Parveen Patel, Sudipta Sengupta, 2009 https://www.microsoft.com/en-us/research/publication/vl2-a-scalable-and-flexible-data-center-network/ 9. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization — Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Amin Vahdat, et al., 2018 https://www.usenix.org/conference/nsdi18/presentation/dalton 10. Carousel: Scalable Traffic Shaping at End-Hosts — Ahmed Saeed, Nandita Dukkipati, Valas Valancius, Terry Lam, Carlo Contavalli, Amin Vahdat, 2017 https://research.google/pubs/carousel-scalable-traffic-shaping-at-end-hosts/ 11. FaRM: Fast Remote Memory — Aleksandar Dragojevic, Dushyanth Narayanan, Orion Hodson, Miguel Castro, 2014 https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevi%C4%87 12. Using RDMA Efficiently for Key-Value Services — Anuj Kalia, Michael Kaminsky, David G. Andersen, 2014 https://www.pdl.cmu.edu/PDL-FTP/Storage/herd-sigcomm2014.pdf 13. Datacenter RPCs can be General and Fast — Anuj Kalia, Michael Kaminsky, David Andersen, 2019 https://www.usenix.org/conference/nsdi19/presentation/kalia 14. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads — Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, Hari Balakrishnan, 2019 https://www.usenix.org/conference/nsdi19/presentation/ousterhout 15. Caladan: Mitigating Interference at Microsecond Timescales — Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, Adam Belay, 2020 https://www.usenix.org/conference/osdi20/presentation/fried 16. TAS: TCP Acceleration as an OS Service — Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson, 2019 https://scholar.google.com/scholar?q=TAS:+TCP+Acceleration+as+an+OS+Service 17. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs — Anuj Kalia, Michael Kaminsky, and David G. Andersen, 2016 https://scholar.google.com/scholar?q=FaSST:+Fast,+Scalable+and+Simple+Distributed+Transactions+with+Two-Sided+(RDMA)+Datagram+RPCs 18. Implementing Network Protocols at User Level — C. A. Thekkath, T. D. Nguyen, E. Moy, and E. D. Lazowska, 1993 https://scholar.google.com/scholar?q=Implementing+Network+Protocols+at+User+Level 19. NetEdit: An Orchestration Platform for eBPF Network Functions at Scale — Theophilus A. Benson et al., 2024 https://doi.org/10.1145/3651890.3672227 20. Demystifying Performance of eBPF Network Applications — Farbod Shahinfar, Sebastiano Miano, Aurojit Panda, Gianni Antichi, 2025 https://cs.nyu.edu/~apanda/assets/papers/conext25.pdf 21. Unleashing Unprivileged eBPF Potential with Dynamic Sandboxing — Soo Yee Lim, Xueyuan Han, Thomas Pasquier, 2023 https://arxiv.org/abs/2308.01983 22. Efficient Scheduler Live Update for Linux Kernel with Modularization — Teng Ma et al., 2023 https://doi.org/10.1145/3582016.3582054 23. Communication Offloading on SmartNIC DPUs: A Quantitative Approach — Jacob Wahlgren et al., 2026 https://arxiv.org/abs/2605.04842
670 Folgen
Kommentare
0Sei die erste Person, die kommentiert
Melde dich jetzt an und werde Teil der AI Post Transformers-Community!