Datacast

Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

1 h 32 min · 12. maj 20231 h 32 min

episode Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar cover

Description

SHOW NOTES * (01:58) Vinoth shared his college experience studying IT at theMadras Institute of Technology [https://en.wikipedia.org/wiki/Madras_Institute_of_Technology]in Chennai, India. * (07:09) Vinoth reflected on his time at UT Austin, getting a Master's degree inComputer Science [https://www.cs.utexas.edu/]- where he did research onhigh-bandwidth content distribution [https://dl.acm.org/doi/10.1145/1921168.1921199]andlarge-scale parallel processing with shell pipes [http://dl.acm.org/citation.cfm?id=1645175]. * (11:20) Vinoth recalled his two years as a software engineer atOracle [https://www.oracle.com/], working on their database replication engine, HPC, and stream processing. * (15:30) Vinoth walked over his transition toLinkedIn [https://www.linkedin.com/]as a senior software engineer, working primarily onVoldemort [https://github.com/voldemort/voldemort]- a key-value store that handles a big chunk of traffic on Linkedin and serves thousands of requests per second over terabytes of data. * (24:41) Vinoth talked about his career transition toUber [https://eng.uber.com/]in late 2014 as a founding engineer on Uber's data team and architect of Uber's data architecture. * (28:39) Vinoth reflected on the state of Uber's data infrastructure when he joined. * (34:31) Vinoth elaborated onUber's case for incremental processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/]. * (38:53) Vinoth reviewedthe initial design and implementation of Hudi [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber]across the Hadoop ecosystem at Uber in 2016. * (41:33) Vinoth sharedthe evolution of Hudi [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit]after it was initially open-sourced by Uber in 2017 and eventually incubated into theApache Software Foundation [https://www.apache.org/]in 2019. * (46:49) Vinoth explained how to keep the development of Apache Hudi vendor-neutral. * (49:36) Vinoth provided lessons learned aboutestablishing standards for open-source data projects [https://twitter.com/byte_array/status/1505967430769659907]. * (53:45) Vinoth went over thevaluable leadership lessons [https://twitter.com/byte_array/status/1293588493621321729]that he absorbed throughout his 4.5 years at Uber. * (57:17) Vinoth reflected on his 1.5 years as a principal engineer atConfluent [https://www.confluent.io/]working onksqlDB [http://ksqldb.io/], which makes it easy to create event streaming applications. * (01:02:16) Vinoth articulatedthe vision for Apache Hudi as a Streaming Data Lake platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform]. * (01:08:00) Vinoth highlighted the challenges with databases around indexing and concurrency control. * (01:11:37) Vinoth shared the unique challenges around prioritizingthe Hudi roadmap [https://hudi.apache.org/roadmap]andengaging an open-source community [https://twitter.com/byte_array/status/1448881916610969605]. * (01:16:32) Vinoth shared the founding story ofOnehouse [https://www.onehouse.ai/], a cloud-native, fully-managed lakehouse service built on Apache Hudi. * (01:22:02 ) Vinoth emphasizedOnehouse's commitment towards openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness]. * (01:24:36) Vinoth shared valuable hiring lessons to attract the right people who are excited about Onehouse's mission. * (01:26:40) Vinoth shared fundraising advice to founders who are seeking the right investors for their startups. * (01:28:24) Closing segment. VINOTH'S CONTACT INFO * LinkedIn [https://www.linkedin.com/in/vinothchandar/] * Twitter [https://twitter.com/byte_array] ONEHOUSE'S RESOURCES * Website [https://www.onehouse.ai/] | Twitter [https://twitter.com/Onehousehq] | LinkedIn [https://www.linkedin.com/company/onehousehq] * About [https://www.onehouse.ai/about-us] | Product [https://www.onehouse.ai/product] | Blog [https://www.onehouse.ai/blog] | Careers [https://jobs.lever.co/Onehouse] APACHE HUDI'S RESOURCES * User Docs [https://hudi.apache.org] | Technical Wiki [https://cwiki.apache.org/confluence/display/HUDI] | Roadmap [https://hudi.apache.org/roadmap/] * GitHub [http://github.com/apache/incubator-hudi/] | Twitter [https://twitter.com/apachehudi] | Slack [https://join.slack.com/t/apache-hudi/shared_invite/zt-1d5zjsfl3-d_TefVaGyvEe16EANrxz6Q] MENTIONED CONTENT ARTICLES AND PRESENTATIONS * Voldemort : Prototype to Production [https://www.slideshare.net/vinothchandar/voldemort-prototype-to-production-nectar-edits] (May 2014) * Uber's Case for Incremental Processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/] (Aug 2016) * Hoodie: An Open Source Incremental Processing Framework From Uber [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber] (2017) * The Past, Present, and Future of Efficient Data Lake Architectures [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p] (2021) * Highly Available, Fault-Tolerant Pull Queries in ksqlDB [https://www.confluent.io/blog/ksqldb-pull-queries-high-availability/] (May 2020) * Apache Hudi - The Data Lake Platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform/] (July 2021) * Introducing Onehouse [https://www.onehouse.ai/blog/introducing-onehouse] (Feb 2022) * Automagic Data Lake Infrastructure [https://www.onehouse.ai/blog/automagic-data-lake-infrastructure] (Feb 2022) * Onehouse Commitment to Openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness] (Feb 2022) PEOPLE * Leslie Lamport [https://en.wikipedia.org/wiki/Leslie_Lamport] * Jeff Dean [https://en.wikipedia.org/wiki/Jeff_Dean] * Michael Stonebreaker [https://en.wikipedia.org/wiki/Michael_Stonebraker] BOOK * Zero To One [https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296] (by Peter Thiel) NOTES My conversation with Vinoth was recorded back in August 2022. The Onehouse team has had some announcements in 2023 that I recommend looking at: * The Launch Announcement of Onetable [https://www.onehouse.ai/blog/onetable-hudi-delta-iceberg] * The $25M Series A Funding Announcement [https://www.onehouse.ai/blog/announcing-our-series-a] * Onehouse Availability in AWS Marketplace [https://www.onehouse.ai/blog/onehouse-now-available-in-aws-marketplace] * Onehouse Product Demo on building a data lake for GitHub analytics at scale [https://www.onehouse.ai/blog/onehouse-product-demo-building-data-lake-for-github-analytics-at-scale] * Walmart's recent study on different open-source data lakehouse formats [https://medium.com/walmartglobaltech/lakehouse-at-fortune-1-scale-480bcb10391b] * This discussion around the Hudi 1.x vision [https://github.com/apache/hudi/pull/8679] ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

Comments

0

Be the first to comment

Sign up now and become a member of the Datacast community!

All episodes

116 episodes

Episode 116: Distributed Databases, Open-Source Standards, and Streaming Data Lakehouse with Vinoth Chandar

SHOW NOTES * (01:58) Vinoth shared his college experience studying IT at theMadras Institute of Technology [https://en.wikipedia.org/wiki/Madras_Institute_of_Technology]in Chennai, India. * (07:09) Vinoth reflected on his time at UT Austin, getting a Master's degree inComputer Science [https://www.cs.utexas.edu/]- where he did research onhigh-bandwidth content distribution [https://dl.acm.org/doi/10.1145/1921168.1921199]andlarge-scale parallel processing with shell pipes [http://dl.acm.org/citation.cfm?id=1645175]. * (11:20) Vinoth recalled his two years as a software engineer atOracle [https://www.oracle.com/], working on their database replication engine, HPC, and stream processing. * (15:30) Vinoth walked over his transition toLinkedIn [https://www.linkedin.com/]as a senior software engineer, working primarily onVoldemort [https://github.com/voldemort/voldemort]- a key-value store that handles a big chunk of traffic on Linkedin and serves thousands of requests per second over terabytes of data. * (24:41) Vinoth talked about his career transition toUber [https://eng.uber.com/]in late 2014 as a founding engineer on Uber's data team and architect of Uber's data architecture. * (28:39) Vinoth reflected on the state of Uber's data infrastructure when he joined. * (34:31) Vinoth elaborated onUber's case for incremental processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/]. * (38:53) Vinoth reviewedthe initial design and implementation of Hudi [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber]across the Hadoop ecosystem at Uber in 2016. * (41:33) Vinoth sharedthe evolution of Hudi [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit]after it was initially open-sourced by Uber in 2017 and eventually incubated into theApache Software Foundation [https://www.apache.org/]in 2019. * (46:49) Vinoth explained how to keep the development of Apache Hudi vendor-neutral. * (49:36) Vinoth provided lessons learned aboutestablishing standards for open-source data projects [https://twitter.com/byte_array/status/1505967430769659907]. * (53:45) Vinoth went over thevaluable leadership lessons [https://twitter.com/byte_array/status/1293588493621321729]that he absorbed throughout his 4.5 years at Uber. * (57:17) Vinoth reflected on his 1.5 years as a principal engineer atConfluent [https://www.confluent.io/]working onksqlDB [http://ksqldb.io/], which makes it easy to create event streaming applications. * (01:02:16) Vinoth articulatedthe vision for Apache Hudi as a Streaming Data Lake platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform]. * (01:08:00) Vinoth highlighted the challenges with databases around indexing and concurrency control. * (01:11:37) Vinoth shared the unique challenges around prioritizingthe Hudi roadmap [https://hudi.apache.org/roadmap]andengaging an open-source community [https://twitter.com/byte_array/status/1448881916610969605]. * (01:16:32) Vinoth shared the founding story ofOnehouse [https://www.onehouse.ai/], a cloud-native, fully-managed lakehouse service built on Apache Hudi. * (01:22:02 ) Vinoth emphasizedOnehouse's commitment towards openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness]. * (01:24:36) Vinoth shared valuable hiring lessons to attract the right people who are excited about Onehouse's mission. * (01:26:40) Vinoth shared fundraising advice to founders who are seeking the right investors for their startups. * (01:28:24) Closing segment. VINOTH'S CONTACT INFO * LinkedIn [https://www.linkedin.com/in/vinothchandar/] * Twitter [https://twitter.com/byte_array] ONEHOUSE'S RESOURCES * Website [https://www.onehouse.ai/] | Twitter [https://twitter.com/Onehousehq] | LinkedIn [https://www.linkedin.com/company/onehousehq] * About [https://www.onehouse.ai/about-us] | Product [https://www.onehouse.ai/product] | Blog [https://www.onehouse.ai/blog] | Careers [https://jobs.lever.co/Onehouse] APACHE HUDI'S RESOURCES * User Docs [https://hudi.apache.org] | Technical Wiki [https://cwiki.apache.org/confluence/display/HUDI] | Roadmap [https://hudi.apache.org/roadmap/] * GitHub [http://github.com/apache/incubator-hudi/] | Twitter [https://twitter.com/apachehudi] | Slack [https://join.slack.com/t/apache-hudi/shared_invite/zt-1d5zjsfl3-d_TefVaGyvEe16EANrxz6Q] MENTIONED CONTENT ARTICLES AND PRESENTATIONS * Voldemort : Prototype to Production [https://www.slideshare.net/vinothchandar/voldemort-prototype-to-production-nectar-edits] (May 2014) * Uber's Case for Incremental Processing on Hadoop [https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/] (Aug 2016) * Hoodie: An Open Source Incremental Processing Framework From Uber [https://www.datacouncil.ai/talks/hoodie-an-open-source-incremental-processing-framework-from-uber] (2017) * The Past, Present, and Future of Efficient Data Lake Architectures [https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p] (2021) * Highly Available, Fault-Tolerant Pull Queries in ksqlDB [https://www.confluent.io/blog/ksqldb-pull-queries-high-availability/] (May 2020) * Apache Hudi - The Data Lake Platform [https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform/] (July 2021) * Introducing Onehouse [https://www.onehouse.ai/blog/introducing-onehouse] (Feb 2022) * Automagic Data Lake Infrastructure [https://www.onehouse.ai/blog/automagic-data-lake-infrastructure] (Feb 2022) * Onehouse Commitment to Openness [https://www.onehouse.ai/blog/onehouse-commitment-to-openness] (Feb 2022) PEOPLE * Leslie Lamport [https://en.wikipedia.org/wiki/Leslie_Lamport] * Jeff Dean [https://en.wikipedia.org/wiki/Jeff_Dean] * Michael Stonebreaker [https://en.wikipedia.org/wiki/Michael_Stonebraker] BOOK * Zero To One [https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296] (by Peter Thiel) NOTES My conversation with Vinoth was recorded back in August 2022. The Onehouse team has had some announcements in 2023 that I recommend looking at: * The Launch Announcement of Onetable [https://www.onehouse.ai/blog/onetable-hudi-delta-iceberg] * The $25M Series A Funding Announcement [https://www.onehouse.ai/blog/announcing-our-series-a] * Onehouse Availability in AWS Marketplace [https://www.onehouse.ai/blog/onehouse-now-available-in-aws-marketplace] * Onehouse Product Demo on building a data lake for GitHub analytics at scale [https://www.onehouse.ai/blog/onehouse-product-demo-building-data-lake-for-github-analytics-at-scale] * Walmart's recent study on different open-source data lakehouse formats [https://medium.com/walmartglobaltech/lakehouse-at-fortune-1-scale-480bcb10391b] * This discussion around the Hudi 1.x vision [https://github.com/apache/hudi/pull/8679] ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

12. maj 20231 h 32 min

Episode 115: Product-Led Sales, Community-Led Category Creation, and Unlocking Revenue Data with Alexa Grabell

SHOW NOTES * (01:55) Alexa shared formative experiences of her upbringing in Philadelphia. * (03:47) Alexa reflected on her undergraduate experience at Vanderbilt [http://vanderbilt.edu/] studying Engineering Science [https://engineering.vanderbilt.edu/]. * (05:49) Alexa recalled her first job out of college in management consulting at KPMG [http://www.kpmg.com/US]. * (08:20) Alexa walked over her transition from consulting to technology when she joined the Sales Operations team at Dataminr [http://www.dataminr.com/]. * (12:35) Alexa talked about her proudest accomplishments at Dataminr - seeding the initial idea for Pocus and building a community for women in the workplace. * (20:23) Alexa reflected on her MBA experience at the Stanford Graduate School of Business [https://www.gsb.stanford.edu/]. * (24:05) Alexa elaborated on the mindset difference between investing and operating. * (25:27) Alexa briefly touched on her internship at Monte Carlo [http://montecarlodata.com/]. * (27:58) Alexa shared the founding story of Pocus [https://www.pocus.com/]. * (32:27) Alexa unpacked the concept of Product-Led Sales [https://www.pocus.com/blog/what-is-product-led-sales] as a GTM approach. * (35:40) Alexa provided two example use cases of Pocus. * (39:35) Alexa explained the concepts of Product-Qualified Leads [https://www.pocus.com/blog/the-definitive-pql-guide-part-1] and Sales-Assist [https://www.pocus.com/blog/what-is-the-sales-assist-role]. * (42:20) Alexa discussed the long-term vision of Pocus' product roadmap. * (45:33) Alexa shared valuable hiring lessons to attract the right people who are aligned to Pocus' values. * (51:15) Alexa went over the journey of building the Product-Led Sales community [https://www.pocus.com/community]. * (54:54) Alexa shared the unique opportunities of evolving a category, a community, and a product all at once [https://www.linkedin.com/posts/alexagrabell_ive-been-getting-a-lot-of-questions-about-activity-6920036474860048384-a58k]. * (57:56) Alexa shared fundraising advice to founders who are seeking the right investors for their startups. * (01:01:15 ) Alexa provided advice to a smart, driven female operator who wants to take the leap of founding her company. * (01:03:09) Closing segment. ALEXA' CONTACT INFO * LinkedIn [https://www.linkedin.com/in/alexagrabell/] * Twitter [https://twitter.com/alexa_grabell] POCUS' RESOURCES * Website [https://www.pocus.com/] | Twitter [https://twitter.com/getpocus] | LinkedIn [https://linkedin.com/company/pocus] | YouTube [https://www.youtube.com/channel/UCspjcJqPR6fZ1B3AaG45_YQ/featured] * About [https://www.pocus.com/about-us] | Product [https://www.pocus.com/product] | Blog [https://www.pocus.com/blog] | Careers [https://www.pocus.com/careers] * Community [https://www.pocus.com/community] | Newsletter [https://newsletter.pocus.com/] MENTIONED CONTENT BLOG POSTS * What is Product-Led Sales? [https://www.pocus.com/blog/introducing-product-led-sales] (July 2022) * The Myth of "No Sales" at PLG Companies [https://www.pocus.com/blog/the-myth-of-no-sales-at-product-led-growth-companies] (July 2021) * When To Add A Sales Team to Your PLG Company [https://www.pocus.com/blog/when-to-add-a-sales-team-to-your-plg-company] (Sep 2021) * The Definitive PQL Guide: Part 1 [https://www.pocus.com/blog/the-definitive-pql-guide-part-1], Part 2 [https://www.pocus.com/blog/pql-guide-part-2-how-to-develop-your-product-qualified-lead-engine], Part 3 [https://www.pocus.com/blog/pql-guide-part-3-advanced-product-qualified-lead-scoring-concepts] (Nov 2021) * What Is The Sales-Assist Role? [https://www.pocus.com/blog/what-is-the-sales-assist-role] (Nov 2021) * Introducing Pocus' PLS Platform [https://www.pocus.com/blog/what-is-the-pocus-product-led-sales-platform] (Nov 2021) * Product-Led Sales Community Wisdom Highlights 2021 [https://www.pocus.com/blog/product-led-sales-community-wisdom-highlights-2021] (Dec 2021) * Notes on Community-Led Category Creation with Pocus' Co-Founder, Alexa Grabell [https://gorelay.co/t/notes-on-community-led-category-creation-with-pocus-co-founder-alexa-grabell/692] (Feb 2022) * Sneak Peek at Pocus' PLS Platform [https://www.pocus.com/blog/sneak-peek-at-pocus-product-led-sales-platform] (March 2022) * Announcing $23M to Transform How GTM Teams Use Data to Drive Revenue [https://www.pocus.com/blog/pocus-raises-funding-from-coatue-for-product-led-sales-platform] (June 2022) * Year One: The Product-Led Sales Platform is Here to Stay [https://www.pocus.com/blog/year-one-the-product-led-sales-platform-is-here-to-stay] (July 2022) PEOPLE * Kyle Poyar [https://openviewpartners.com/people/kyle-poyar/] (OpenView Ventures) * Melissa Ross [https://www.linkedin.com/in/melissaclaireross] (Clockwise) * Aaron Geller [https://www.linkedin.com/in/gelleraaron] (QuickNode) NOTES My conversation with Alexa was recorded back in July 2022. The Pocus team has had some announcements in 2023 that I recommend looking at: 1. The launch announcement of Pocus' Revenue Data Platform [https://www.pocus.com/blog/introducing-the-revenue-data-platform] 2. The Product-Led Sales Playbook Volume 2 [https://www.pocus.com/product-led-sales-playbook-vol-2] 3. The Unlocking Revenue podcast [https://www.pocus.com/podcast] 4. The Playbook Library for product-led go-to-market [https://www.pocus.com/playbook-library] ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

2. maj 20231 h 8 min

Episode 114: Building Data Products and Unlocking Data Insights with Carlos Aguilar

SHOW NOTES * (02:06) Carlos shared formative experiences of his upbringing tinkering with robots and websites. * (04:03) Carlos reflected on his education, studying Mechanical and Aerospace Engineering atCornell University [https://www.cornell.edu/]. * (05:34) Carlos discussed the technical details of his research on machine learning applications in robotics and art. * (10:11) Carlos explained his work as a robotic system analyst atKiva Systems [https://en.wikipedia.org/wiki/Amazon_Robotics]. * (15:41) Carlos discussed buildinghis first data product at Kiva [https://trucklos.medium.com/my-hack-for-getting-started-with-data-as-a-product-12f2f19cb62b]. * (20:24) Carlos recalled his stint working on warehouse-automating distributed robots at Amazon Robotics (after the Kiva acquisition [https://techcrunch.com/2012/03/19/amazon-acquires-online-fulfillment-company-kiva-systems-for-775-million-in-cash/]). * (24:31) Carlos revealed his decision in 2013 to join an early-stage healthcare startup calledFlatiron Health [http://www.flatiron.com/]as the first data hire. * (28:43) Carlos shared his experiencebuilding Flatiron's Data Insights team from scratch [https://trucklos.medium.com/how-the-data-insights-team-helps-flatiron-build-useful-data-products-41612cc9df09]. * (31:51) Carlos reviewed different data products built and deployed at Flatiron Health. * (38:41) Carlos shared the key learnings from hiring for his data team at Flatiron. * (44:08) Carlos shared the founding story ofGlean [https://glean.io/], which is building a new way to make data exploration and visualization accessible to everyone. * (50:52) Carlos explainedthe pain points in data visualization/exploration [https://glean.io/blog-posts/introducing-glean]andthe product features of Glean [https://glean.io/#Product]that address them. * (55:03) Carlos dissected GleanDataOps [https://glean.io/data-ops], which brings modern developer workflow to the business intelligence layer and preventsbroken dashboards [https://glean.io/blog-posts/your-dashboard-is-probably-broken]. * (59:28) Carlos outlined the long-term product vision for Glean. * (01:03:11) Carlos shared valuable hiring lessons to attract the right people who are excited about Glean's mission. * (01:07:15) Carlos discussed his team's challenges in finding the early design partners. * (01:10:13) Carlos shared fundraising advice to founders who are seeking the right investors for their startups. * (01:11:57) Closing segment. CARLOS' CONTACT INFO * Twitter [https://twitter.com/trucklos] * LinkedIn [https://www.linkedin.com/in/carlos-aguilar-79448b24/] * GitHub [https://github.com/trucklos] * Website [http://carlos.ag] * Medium [https://trucklos.medium.com] GLEAN'S RESOURCES * Website [https://glean.io] | Twitter [https://twitter.com/gleanhq/] | LinkedIn [https://www.linkedin.com/company/gleanhq] * About [https://glean.io/about-us] | Docs [https://docs.glean.io] | Blog [https://glean.io/blog] * Interactive Public Demo [https://demo.glean.io/app/] | DataOps [https://glean.io/data-ops] MENTIONED CONTENT BLOG POSTS * How the Data Insights team helps Flatiron build useful data products [https://trucklos.medium.com/how-the-data-insights-team-helps-flatiron-build-useful-data-products-41612cc9df09] (May 2018) * The biggest mistake making your first data hire: not interviewing for product [https://trucklos.medium.com/the-biggest-mistake-making-your-first-data-hire-not-interviewing-for-product-951ab7374a8a] (July 2020) * How to interview your first data hire [https://trucklos.medium.com/how-to-interview-your-first-data-hire-fe7c1b5ad37d] (Aug 2020) * My hack for getting started with data as a product [https://trucklos.medium.com/my-hack-for-getting-started-with-data-as-a-product-12f2f19cb62b] (May 2021) * Introducing Glean [https://glean.io/blog-posts/introducing-glean] (March 2022) * Your dashboard is probably broken [https://glean.io/blog-posts/your-dashboard-is-probably-broken] (April 2022) PEOPLE 1. Vicki Boykis [https://twitter.com/vboykis?lang=en] 2. Anthony Goldbloom [https://twitter.com/antgoldbloom] 3. Wes McKinney [https://wesmckinney.com/] BOOK * The Toyota Way: 14 Management Principles from the World's Greatest Manufacturer [https://www.amazon.com/Toyota-Way-Management-Principles-Manufacturer/dp/0071392319] (by Jeffrey Liker) NOTES My conversation with Carlos was recorded back in June 2022. The Glean team has had some announcements in 2023 that I recommend looking at: 1. The recently launched, interactive public demo site [https://demo.glean.io/app/] 2. This recent integration with DuckDB [https://glean.io/blog-posts/using-duckdb-for-not-so-big-data-in-glean] 3. This post about Version Control for BI [https://glean.io/blog-posts/how-to-do-version-control-for-business-intelligence] 4. Their Public Roadmap [https://docs.glean.io/product-roadmap/] ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

19. apr. 20231 h 18 min

Episode 113: Data Applications, Real-Time Analytics, and Cloud Product Management with Shruti Bhat

SHOW NOTES * (01:49) Shruti shared her upbringing in India - where she studied Engineering and Computer Science in the early 2000s. * (03:11) Shruti reflected on her early career as a software engineer atHewlett-Packard [http://hpe.com/]andIBM [http://www.ibm.com/]. * (07:29) Shruti recalled the early days of cloud computing. * (09:01) Shruti reflected on her time pursuing an MBA atUCLA Anderson School of Management [http://anderson.ucla.edu/]. * (11:55) Shruti explained her shift from software engineering to product management. * (14:19) Shruti revisited her years atVMware [https://www.vmware.com/]as a product line manager for cloud infrastructure - owning all aspects of go-to-market strategy and execution for VMware's entire software-defined storage portfolio. * (18:30) Shruti talked about her time as the VP of Marketing atRavello Systems [https://www.crunchbase.com/organization/ravello-systems]- growing the business from zero customers to a successful multi-million dollar acquisition by Oracle. * (23:07) Shruti went over her time as a senior director of product management forOracle's [https://www.oracle.com/index.html]cloud portfolio. * (27:20) Shruti recalled the founding story ofRockset [https://rockset.com/]- where she is a co-founder and Chief Product Officer. * (30:40) Shruti explained the concepts ofreal-time analytics [https://rockset.com/real-time-analytics-explained/]anddata applications [https://rockset.com/what-is-a-data-application/]for the uninitiated. * (37:31) Shruti unpacked the high-level design ofRockset architecture [https://rockset.com/whitepapers/rockset-concepts-designs-and-architecture/]- which brings together cloud-native architecture, schemaless ingestion, converged indexing, and full-featured SQL. * (40:23) Shruti elaborated on the concept of converged indexing. * (42:43) Shruti dissected the technology requirements and the key layers of "the modern real-time data stack [https://thenewstack.io/streaming-data-and-the-modern-real-time-data-stack/]." * (46:17) Shruti talked about the role of partnerships in Rockset's product strategy. * (51:29) Shruti highlighted some of Rockset's customeruse cases [https://rockset.com/customers]. * (56:06) Shruti shared valuable hiring lessons to attract high-integrity and diverse people for Rockset. * (58:51) Shruti shared her take on interviewing on strengths over weaknesses. * (01:01:32) Shruti shared the strategy Rockset used to find design partners in the early days. * (01:05:12) Shruti shared the tactics to combine the power of product-led adoption with sales-driven growth for rapidly scaling Rockset's business. * (01:08:45) Shruti shared fundraising advice to founders who are seeking the right investors for their startups. * (01:10:27) Shruti described the evolution of enterprise marketing and GTM strategy in the past decade. * (01:12:59) Closing segment. SHRUTI'S CONTACT INFO * LinkedIn [https://www.linkedin.com/in/shrutibhat] * Twitter [https://twitter.com/shrutibhat?lang=en] * Forbes [https://profiles.forbes.com/members/tech/profile/Shruti-Bhat-Chief-Product-Officer-SVP-Marketing-Rockset/52b9af88-d663-47f0-8c27-d13cedfd04c1] ROCKSET'S RESOURCES * Website [https://rockset.com/] | Twitter [http://www.twitter.com/RocksetCloud] | LinkedIn [http://www.linkedin.com/companies/RocksetCloud] | Facebook [https://www.facebook.com/RocksetCloud] * Docs [https://rockset.com/docs/] | Blog [https://rockset.com/blog/] | Community [https://community.rockset.com/] * Product [https://rockset.com/product/] | Architecture [https://rockset.com/whitepapers/rockset-concepts-designs-and-architecture/] | Customers [https://rockset.com/customers/] * Real-Time Analytics Explained [https://rockset.com/real-time-analytics-explained/] * What Is A Data Application? [https://rockset.com/what-is-a-data-application/] MENTIONED CONTENT ARTICLES * "Building Data Applications Powered by Real-Time Analytics [https://rockset.com/blog/building-data-applications-powered-by-real-time-analytics/]" (May 2021) * "How startups can create a culture where women can win [https://www.fastcompany.com/90635543/how-startups-can-create-a-culture-where-women-can-win]" (May 2021) * "Streaming Data and the Modern Real-Time Data Stack [https://thenewstack.io/streaming-data-and-the-modern-real-time-data-stack/]" (Nov 2021) PEOPLE * Barr Moses [https://www.linkedin.com/in/barrmoses] (Monte Carlo Data) * Jay Kreps [https://www.linkedin.com/in/jaykreps] (Confluent) * Alex DeBrie [https://alexdebrie.com/] (DynamoDB Expert [https://www.dynamodbbook.com/]) BOOK * Competing Against Luck [https://www.amazon.com/Competing-Against-Luck-Innovation-Customer/dp/0062435612] (by Clayton Christensen) NOTES My conversation with Shruti was recorded back in June 2022. Since then, a lot has happened. I recommend looking at the resources below: * The launch of compute-compute separation for real-time analytics [https://rockset.com/blog/introducing-compute-compute-separation/] (March 2023) * This benchmark on top real-time analytics databases in 2023 [https://rockset.com/blog/comparing-rockset-apache-druid-clickhouse-real-time-analytics/] (Feb 2023) * This talk on emerging architectures for real-time CDC [https://www.youtube.com/watch?v=kPSXJSLqJPQ] (Dec 2022) ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

12. apr. 20231 h 17 min

Episode 112: Distributed Systems Research, The Philosophy of Computational Complexity, and Modern Streaming Database with Arjun Narayan

SHOW NOTES * (01:18) Arjun shared formative experiences of his upbringing - growing up in Bangalore, India; going to UWC Mahindra College [http://www.uwcmahindracollege.org/] for high school; and pursuing a liberal arts education in the US. * (04:45) Arjun described his overall academic experience at Willams College [https://www.williams.edu] - where he studied Computer Science and Economics and did a one-year stint at the Computer Lab at the University of Cambridge [https://www.cam.ac.uk/]. * (11:19) Arjun talked about his specialization within academic computer science: distributed systems. * (14:17) Arjun unpacked the arc of his Ph.D. experience at the University of Pennsylvania [http://www.upenn.edu/], advised by Professor Andreas Haeberlen [http://www.cis.upenn.edu/~ahae]. * (19:25) Arjun dissected the technical challenges and novelty of his Ph.D. dissertation [http://arjunnarayan.com/publications/dissertation.pdf] on distributed systems that computed differentially private things. * (23:20) Arjun shared his love for teaching which benefits his industry career. * (25:55) Arjun walked through his decision to join Cockroach Labs [http://www.cockroachlabs.com/] as a software engineer. * (32:25) Arjun unpacked the CockroachDB Performance Guide [https://www.cockroachlabs.com/guides/cockroachdb-performance] and a RocksDB deep-dive [https://www.cockroachlabs.com/blog/cockroachdb-on-rocksd/] on the Cockroach Labs blog. * (37:24) Arjun shared valuable lessons learned from his scaling journey with Cockroach. * (41:36) Arjun mentioned how his writing practice benefited his day-to-day work designing database systems in a production setting (Check out his posts on database transaction isolation semantics [https://ristret.com/s/f643zk/history_transaction_histories] and the history of log-structured merge trees [https://ristret.com/s/gnd4yr/brief_history_log_structured_merge_trees]). * (45:46) Arjun unpacked his 2019 blog post titled "The Philosophy of Computational Complexity [https://ristret.com/s/qk8wpt/philosophy_computational_complexity]." * (52:52) Arjun emphasized the importance of writing evergreen and authoritative long-form content that attracts a small amount of audience. * (55:54) Arjun shared the story behind the founding of Materialize [https://materialize.com], which builds a SQL streaming database on top of Timely Dataflow [https://github.com/TimelyDataflow/timely-dataflow] and Differential Dataflow [https://github.com/TimelyDataflow/differential-dataflow], two research projects created by his co-founder Frank McSherry [https://github.com/frankmcsherry/blog]. * (01:00:04) Arjun unpacked the architecture design [https://materialize.com/blog-architecture/] of Materialize at a high level. * (01:04:36) Arjun explained a core capability of Materialize called Streaming SQL [https://materialize.com/streaming-sql-intro/]. * (01:07:37) Arjun discussed successful tactics to raise the adoption and contribution to Materialize's open-source project [https://github.com/MaterializeInc/materialize]. * (01:11:23) Arjun walked through the major enterprise-grade features [https://materialize.com/materialize-cloud-open-beta/] baked into Materialize Cloud [https://materialize.com/docs/cloud/get-started-with-cloud/]. * (01:15:54) Arjun dissected a blog post about Materialize’s unbundled cloud architecture [https://materialize.com/materialize-unbundled/] detailing the shift from the Materialize single binary to Materialize Cloud. * (01:21:13) Arjun envisioned how Materialize fits into the quickly evolving modern data stack. * (01:25:07) Arjun shared valuable hiring lessons to attract the right people who are excited about Materialize's mission. * (01:27:59) Arjun shared his brief take on building a high-performance company culture. * (01:29:19) Arjun discussed the challenges for his team to find the early design partners. * (01:31:17) Arjun walked through notable use cases of Materialize. * (01:34:24) Arjun shared fundraising advice with founders who are seeking the right investors for their startups. * (01:41:21) Arjun highlighted the similarities and differences between being a researcher and a founder. * (01:42:46) Closing segment. ARJUN'S CONTACT INFO * LinkedIn [https://www.linkedin.com/in/arjunravinarayan/] * Twitter [https://twitter.com/narayanarjun?lang=en] * GitHub [https://github.com/rjnn] * Google Scholar [https://scholar.google.com/citations?user=jJepOtUAAAAJ&hl=en] MATERIALIZE'S RESOURCES * Website [https://materialize.com/] | Twitter [https://twitter.com/materializeinc] | LinkedIn [https://www.linkedin.com/company/materializeinc/about/] | Slack [https://materialize.com/s/chat] * Docs [https://materialize.com/docs/] | GitHub [https://github.com/MaterializeInc/materialize] * Blog [https://materialize.com/blog/] | Events [https://materialize.com/events/] | Guides [https://materialize.com/guides/] * Careers [https://materialize.com/careers/] MENTIONED CONTENT RESEARCH + ARTICLES * Distributed Differential Privacy and Applications [http://arjunnarayan.com/publications/dissertation.pdf] (2015) * Performance Report: Benchmarking CockroachDB's TPC-C Performance [https://www.cockroachlabs.com/guides/performance-report-benchmarking-cockroachdb-2-0/] * Why We Built CockroachDB on top of RocksDB [https://www.cockroachlabs.com/blog/cockroachdb-on-rocksd/] (2019) * A History of Transaction Histories [https://ristret.com/s/f643zk/history_transaction_histories] (2018) * A Brief History of Log Structured Merge Trees [https://ristret.com/s/gnd4yr/brief_history_log_structured_merge_trees] (2018) PEOPLE * Kyle Kingsbury [https://aphyr.com/about] * Bob Muglia [https://en.wikipedia.org/wiki/Bob_Muglia] * Frank McSherry [https://en.wikipedia.org/wiki/Frank_McSherry] BOOK * Zero To One [https://www.amazon.com/Zero-One-Notes-Startups-Future/dp/0804139296/] (by Peter Thiel) NOTES My conversation with Arjun was recorded back in May 2022. Since then, a lot has happened. I recommend looking at the resources below: * About Materialize webpage [https://materialize.com/about/](which shows the team building Materialize as well as the pedigree) * Guide: What is a Streaming Database [https://materialize.com/guides/streaming-database/] (which walks through why Materialize is important and different from a normal database) * Case Study: Real-time Delivery Tracking UI in a Single Sprint at Onward [https://materialize.com/customer-stories/onward/] * Tech Demo: CI/CD Workflows for dbt+Materialize [https://materialize.com/events/dbt-cicd/] (March 2023) * Announcing The Next Generation of Materialize [https://materialize.com/blog/next-generation/] (Oct 2022) ABOUT THE SHOW Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe. Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com [khanhle.1013@gmail.com]. Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below: * Listen on Spotify [https://open.spotify.com/show/5MlCijZoapALDy0LLxuSY8] * Listen on Apple Podcasts [https://podcasts.apple.com/us/podcast/datacast/id1481793207] * Listen on Google Podcasts [https://www.google.com/podcasts?feed=aHR0cHM6Ly9mZWVkcy5zaW1wbGVjYXN0LmNvbS9GN09PYzFmWQ%3D%3D] If you’re new, see the podcast homepage [https://datacast.simplecast.com/] for the most recent episodes to listen to, or browse the full guest list [https://jameskle.com/podcast-guests].

7. apr. 20231 h 49 min