27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

44 min · 11. Okt. 2021

Beschreibung

We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉 We cover lots of things in the podcast including: 1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc. 2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse. 3. How Airbyte works as an open source data tool. 4. Life aspects of running a fast growing start-up including raising capital, hiring etc. Links to the tools/ services mentioned: 1. Airbyte: airbyte.io 2. Airbyte Slack where you can talk with the team: slack.airbyte.io 3. Dbt for transformation in ELT: getdbt.com 4. Airflow which is a data orchestration tool: https://airflow.apache.org/ 5. Astronomer which can host Airflow: https://astronomer.io/ Pay as you use data warehouses: 6. Snowflake Data Warehouse: https://www.snowflake.com/ 7. BigQuery Data Warehouse: https://cloud.google.com/bigquery Set up your own infrastructure: 8. Redshift Data Warehouse: https://aws.amazon.com/redshift/

Kommentare

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der The Data Life Podcast-Community!

Loslegen

Alle Folgen

27 Folgen

27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

11. Okt. 202144 min

26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines. Using an example of running a necklace business from shells - we learn about the following data engineering concepts: 1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics. 2. Spark for large data processing and hosting / running 3. Data orchestration using Airflow My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20 Tools covered in the episode: dbt: https://www.getdbt.com/ Databricks: https://databricks.com/ EMR: https://aws.amazon.com/emr/ AWS Redshift: https://aws.amazon.com/redshift/ Snowflake: https://www.snowflake.com/ Delta Lake: https://databricks.com/product/delta-lake-on-databricks

18. Aug. 202139 min

25: Talking Data Privacy with Jeff Bermant

In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨ In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more! Go to https://getcocoon.com [https://getcocoon.com/mdr_mobile_launch] to download and use Cocoon Rewards Browser. ~Thanks for listening~

4. Aug. 202128 min

24: Promoting Women in Tech - With Rupal Gupta

In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech! In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc. If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/ FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/ 🎉 There will be women in tech speakers from Dropbox, Microsoft, Warby Parker and more. 🌟 Programs and conferences covered in the episode: OMSCS program at Georgia Tech: https://omscs.gatech.edu/ Grace Hopper conference: https://ghc.anitab.org/ Anita Borg Institute: https://anitab.org/ 🌟 Interviewing resources: 1. Pramp: https://www.pramp.com/#/ 2. Interviewing.io: https://interviewing.io/ 3. Educative "Grokking the System Design Interview": https://www.educative.io/courses/grokking-the-system-design-interview 4. AWS Certifications: https://aws.amazon.com/certification/ Disclaimer: All opinions on this podcast are our own and not the views of our employers or organizations. ~Thanks for listening~

8. Okt. 202015 min

23: Let’s Talk AWS SageMaker for ML Model Deployment

In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas. We cover points such as: 1. Host ML endpoints for deploying models to thousands or millions of users. 2. Saving costs for model training using SageMaker. 3. Use CloudWatch logs with SageMaker endpoints to debug ML models. 4. Use preconfigured environments or models provided by AWS. 5. Automatically save model artifacts in AWS S3 as you train in SageMaker. 6. Use of version control for SageMaker notebooks with Github. and more… Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists. For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: https://www.linkedin.com/in/sanketgupta107/

17. Juni 202019 min

27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

Beschreibung

Kommentare

2 Monate für 1 €

Alle Folgen