Datacast
SHOW NOTES * (01:58) Vinoth shared his college experience studying IT at theMadras Institute of Technology [https://en.wikipedia.org/wiki/Madras_Institute_of_Technology]in Chennai, India. * (07:09) Vinoth reflected on his time at UT Austin, getting a Master's degree inComputer Science [https://www.cs.utexas.edu/]- where he did research onhigh-bandwidth content distribution [https://dl.acm.org/doi/10.1145/1921168.1921199]andlarge-scale parallel processing with shell pipes [http://dl.acm.org/citation.cfm?id=1645175]. * (11:20) Vinoth recalled his two years as a software engineer atOracle [https://www.oracle.com/], working on their database replication engine, HPC, and stream processing. * (15:30) Vinoth walked over his transition toLinkedIn [https://www.linkedin.com/]as a senior software engineer, working primarily onVoldemort [https://github.com/voldemort/voldemort]- a key-value store that handles a big chunk of traffic on Linkedin and serves thousands of requests per second over terabytes of data. * (24:41) Vinoth talked about his career transition toUber [https://eng.uber.com/]in late 2014 as a founding engineer on Uber's data team and architect of Uber's data architecture. * (28:39) Vinoth reflected on the state of Uber's data infrastructure when he joined. * (34:31) Vinoth elaborated onUber's case for incremental processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/]. * (38:53) Vinoth reviewedthe initial design and implementation of Hudi [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber]across the Hadoop ecosystem at Uber in 2016. * (41:33) Vinoth sharedthe evolution of Hudi [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit]after it was initially open-sourced by Uber in 2017 and eventually incubated into theApache Software Foundation [https://www.apache.org/]in 2019. * (46:49) Vinoth explained how to keep the development of Apache Hudi vendor-neutral. * (49:36) Vinoth provided lessons learned aboutestablishing standards for open-source data projects [https://twitter.com/byte_array/status/1505967430769659907]. * (53:45) Vinoth went over thevaluable leadership lessons [https://twitter.com/byte_array/status/1293588493621321729]that he absorbed throughout his 4.5 years at Uber. * (57:17) Vinoth reflected on his 1.5 years as a principal engineer atConfluent [https://www.confluent.io/]working onksqlDB [http://ksqldb.io/], which makes it easy to create event streaming applications. * (01:02:16) Vinoth articulatedthe vision for Apache Hudi as a Streaming Data Lake platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform]. * (01:08:00) Vinoth highlighted the challenges with databases around indexing and concurrency control. * (01:11:37) Vinoth shared the unique challenges around prioritizingthe Hudi roadmap [https://hudi.apache.org/roadmap]andengaging an open-source community [https://twitter.com/byte_array/status/1448881916610969605]. * (01:16:32) Vinoth shared the founding story ofOnehouse [https://www.onehouse.ai/], a cloud-native, fully-managed lakehouse service built on Apache Hudi. * (01:22:02 ) Vinoth emphasizedOnehouse's commitment towards openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness]. * (01:24:36) Vinoth shared valuable hiring lessons to attract the right people who are excited about Onehouse's mission. * (01:26:40) Vinoth shared fundraising advice to founders who are seeking the right investors for their startups. * (01:28:24) Closing segment. VINOTH'S CONTACT INFO * LinkedIn [https://www.linkedin.com/in/vinothchandar/] * Twitter [https://twitter.com/byte_array] ONEHOUSE'S RESOURCES * Website [https://www.onehouse.ai/] | Twitter [https://twitter.com/Onehousehq] | LinkedIn [https://www.linkedin.com/company/onehousehq] * About [https://www.onehouse.ai/about-us] | Product [https://www.onehouse.ai/product] | Blog [https://www.onehouse.ai/blog] | Careers [https://jobs.lever.co/Onehouse] APACHE HUDI'S RESOURCES * User Docs [https://hudi.apache.org] | Technical Wiki [https://cwiki.apache.org/confluence/display/HUDI] | Roadmap [https://hudi.apache.org/roadmap/] * GitHub [http://github.com/apache/incubator-hudi/] | Twitter [https://twitter.com/apachehudi] | Slack [https://join.slack.com/t/apache-hudi/shared_invite/zt-1d5zjsfl3-d_TefVaGyvEe16EANrxz6Q] MENTIONED CONTENT ARTICLES AND PRESENTATIONS * Voldemort : Prototype to Production [https://www.slideshare.net/vinothchandar/voldemort-prototype-to-production-nectar-edits] (May 2014) * Uber's Case for Incremental Processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/] (Aug 2016) * Hoodie: An Open Source Incremental Processing Framework From Uber [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber] (2017) * The Past, Present, and Future of Efficient Data Lake Architectures [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p] (2021) * Highly Available, Fault-Tolerant Pull Queries in ksqlDB [https://www.confluent.io/blog/ksqldb-pull-queries-high-availability/] (May 2020) * Apache Hudi - The Data Lake Platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform/] (July 2021) * Introducing Onehouse [https://www.onehouse.ai/blog/introducing-onehouse] (Feb 2022) * Automagic Data Lake Infrastructure [https://www.onehouse.ai/blog/automagic-data-lake-infrastructure] (Feb 2022) * Onehouse Commitment to Openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness] (Feb 2022) PEOPLE * Leslie Lamport [https://en.wikipedia.org/wiki/Leslie_Lamport] * Jeff Dean [https://en.wikipedia.org/wiki/Jeff_Dean] * Michael Stonebreaker [https://en.wikipedia.org/wiki/Michael_Stonebraker] BOOK * Zero To One [https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296] (by Peter Thiel) NOTES My conversation with Vinoth was recorded back in August 2022. The Onehouse team has had some announcements in 2023 that I recommend looking at: * The Launch Announcement of Onetable [https://www.onehouse.ai/blog/onetable-hudi-delta-iceberg] * The $25M Series A Funding Announcement [https://www.onehouse.ai/blog/announcing-our-series-a] * Onehouse Availability in AWS Marketplace [https://www.onehouse.ai/blog/onehouse-now-available-in-aws-marketplace] * Onehouse Product Demo on building a data lake for GitHub analytics at scale [https://www.onehouse.ai/blog/onehouse-product-demo-building-data-lake-for-github-analytics-at-scale] * Walmart's recent study on different open-source data lakehouse formats [https://medium.com/walmartglobaltech/lakehouse-at-fortune-1-scale-480bcb10391b] * This discussion around the Hudi 1.x vision [https://github.com/apache/hudi/pull/8679] ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].
116 episodes
Comments
0Be the first to comment
Sign up now and become a member of the Datacast community!