The Data Life Podcast

The Data Life Podcast

Podcast af Sanket Gupta

This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.

Prøv gratis i 60 dage

99,00 kr. / måned efter prøveperiode.Ingen binding.

Prøv gratis

Alle episoder

27 episoder
episode 27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot artwork
27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉 We cover lots of things in the podcast including:  1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc.  2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse.  3. How Airbyte works as an open source data tool.  4. Life aspects of running a fast growing start-up including raising capital, hiring etc.  Links to the tools/ services mentioned:  1. Airbyte: airbyte.io 2. Airbyte Slack where you can talk with the team: slack.airbyte.io  3. Dbt for transformation in ELT: getdbt.com  4. Airflow which is a data orchestration tool: https://airflow.apache.org/ 5. Astronomer which can host Airflow: https://astronomer.io/  Pay as you use data warehouses:  6. Snowflake Data Warehouse: https://www.snowflake.com/ 7. BigQuery Data Warehouse: https://cloud.google.com/bigquery  Set up your own infrastructure:  8. Redshift Data Warehouse: https://aws.amazon.com/redshift/

11. okt. 2021 - 44 min
episode 26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow) artwork
26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.  Using an example of running a necklace business from shells - we learn about the following data engineering concepts:  1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics.  2. Spark for large data processing and hosting / running 3. Data orchestration using Airflow My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb  Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20  Tools covered in the episode:  dbt: https://www.getdbt.com/  Databricks: https://databricks.com/ EMR: https://aws.amazon.com/emr/ AWS Redshift: https://aws.amazon.com/redshift/ Snowflake: https://www.snowflake.com/ Delta Lake: https://databricks.com/product/delta-lake-on-databricks

18. aug. 2021 - 39 min
episode 25: Talking Data Privacy with Jeff Bermant artwork
25: Talking Data Privacy with Jeff Bermant

In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨  In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more!  Go to https://getcocoon.com [https://getcocoon.com/mdr_mobile_launch] to download and use Cocoon Rewards Browser.  ~Thanks for listening~

04. aug. 2021 - 28 min
episode 24: Promoting Women in Tech - With Rupal Gupta artwork
24: Promoting Women in Tech - With Rupal Gupta

In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech!  In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc.  If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/  FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/ 🎉 There will be women in tech speakers from Dropbox, Microsoft, Warby Parker and more. 🌟 Programs and conferences covered in the episode: OMSCS program at Georgia Tech: https://omscs.gatech.edu/ Grace Hopper conference: https://ghc.anitab.org/ Anita Borg Institute: https://anitab.org/ 🌟 Interviewing resources: 1. Pramp: https://www.pramp.com/#/ 2. Interviewing.io: https://interviewing.io/ 3. Educative "Grokking the System Design Interview": https://www.educative.io/courses/grokking-the-system-design-interview 4. AWS Certifications: https://aws.amazon.com/certification/ Disclaimer: All opinions on this podcast are our own and not the views of our employers or organizations. ~Thanks for listening~

08. okt. 2020 - 15 min
episode 23: Let’s Talk AWS SageMaker for ML Model Deployment artwork
23: Let’s Talk AWS SageMaker for ML Model Deployment

In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas.  We cover points such as: 1. Host ML endpoints for deploying models to thousands or millions of users. 2. Saving costs for model training using SageMaker. 3. Use CloudWatch logs with SageMaker endpoints to debug ML models.  4. Use preconfigured environments or models provided by AWS. 5. Automatically save model artifacts in AWS S3 as you train in SageMaker.  6. Use of version control for SageMaker notebooks with Github. and more…  Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists.  For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: https://www.linkedin.com/in/sanketgupta107/

17. jun. 2020 - 19 min
En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.
Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍
Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Prøv gratis i 60 dage

99,00 kr. / måned efter prøveperiode.Ingen binding.

Eksklusive podcasts

Uden reklamer

Gratis podcasts

Lydbøger

20 timer / måned

Prøv gratis

Kun på Podimo

Populære lydbøger