The Data Life Podcast

27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

44 min · 11. Okt. 2021
Episode 27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot Cover

Beschreibung

We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉 We cover lots of things in the podcast including:  1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc.  2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse.  3. How Airbyte works as an open source data tool.  4. Life aspects of running a fast growing start-up including raising capital, hiring etc.  Links to the tools/ services mentioned:  1. Airbyte: airbyte.io 2. Airbyte Slack where you can talk with the team: slack.airbyte.io  3. Dbt for transformation in ELT: getdbt.com  4. Airflow which is a data orchestration tool: https://airflow.apache.org/ 5. Astronomer which can host Airflow: https://astronomer.io/  Pay as you use data warehouses:  6. Snowflake Data Warehouse: https://www.snowflake.com/ 7. BigQuery Data Warehouse: https://cloud.google.com/bigquery  Set up your own infrastructure:  8. Redshift Data Warehouse: https://aws.amazon.com/redshift/

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der The Data Life Podcast-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

27 Folgen

Episode 27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot Cover

27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉 We cover lots of things in the podcast including:  1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc.  2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse.  3. How Airbyte works as an open source data tool.  4. Life aspects of running a fast growing start-up including raising capital, hiring etc.  Links to the tools/ services mentioned:  1. Airbyte: airbyte.io 2. Airbyte Slack where you can talk with the team: slack.airbyte.io  3. Dbt for transformation in ELT: getdbt.com  4. Airflow which is a data orchestration tool: https://airflow.apache.org/ 5. Astronomer which can host Airflow: https://astronomer.io/  Pay as you use data warehouses:  6. Snowflake Data Warehouse: https://www.snowflake.com/ 7. BigQuery Data Warehouse: https://cloud.google.com/bigquery  Set up your own infrastructure:  8. Redshift Data Warehouse: https://aws.amazon.com/redshift/

11. Okt. 202144 min
Episode 26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow) Cover

26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.  Using an example of running a necklace business from shells - we learn about the following data engineering concepts:  1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics.  2. Spark for large data processing and hosting / running 3. Data orchestration using Airflow My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb  Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20  Tools covered in the episode:  dbt: https://www.getdbt.com/  Databricks: https://databricks.com/ EMR: https://aws.amazon.com/emr/ AWS Redshift: https://aws.amazon.com/redshift/ Snowflake: https://www.snowflake.com/ Delta Lake: https://databricks.com/product/delta-lake-on-databricks

18. Aug. 202139 min
Episode 24: Promoting Women in Tech - With Rupal Gupta Cover

24: Promoting Women in Tech - With Rupal Gupta

In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech!  In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc.  If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/  FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/ 🎉 There will be women in tech speakers from Dropbox, Microsoft, Warby Parker and more. 🌟 Programs and conferences covered in the episode: OMSCS program at Georgia Tech: https://omscs.gatech.edu/ Grace Hopper conference: https://ghc.anitab.org/ Anita Borg Institute: https://anitab.org/ 🌟 Interviewing resources: 1. Pramp: https://www.pramp.com/#/ 2. Interviewing.io: https://interviewing.io/ 3. Educative "Grokking the System Design Interview": https://www.educative.io/courses/grokking-the-system-design-interview 4. AWS Certifications: https://aws.amazon.com/certification/ Disclaimer: All opinions on this podcast are our own and not the views of our employers or organizations. ~Thanks for listening~

8. Okt. 202015 min
Episode 23: Let’s Talk AWS SageMaker for ML Model Deployment Cover

23: Let’s Talk AWS SageMaker for ML Model Deployment

In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas.  We cover points such as: 1. Host ML endpoints for deploying models to thousands or millions of users. 2. Saving costs for model training using SageMaker. 3. Use CloudWatch logs with SageMaker endpoints to debug ML models.  4. Use preconfigured environments or models provided by AWS. 5. Automatically save model artifacts in AWS S3 as you train in SageMaker.  6. Use of version control for SageMaker notebooks with Github. and more…  Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists.  For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: https://www.linkedin.com/in/sanketgupta107/

17. Juni 202019 min