Building Things with Machine Learning

Podcast af Yaoshiang ho

engelsk

Business

Begrænset tilbud

2 måneder kun 19 kr.

Derefter 99 kr. / månedOpsig når som helst.

20 lydbogstimer pr. måned
Podcasts kun på Podimo
Gratis podcasts

Kom i gang

Læs mere Building Things with Machine Learning

In each episode, we interview people who are building really interesting products using machine learning. Our focus is really on applications , like Medical diagnostics, Autonomous vehicles & advanced driver assistance systems (ADAS), Geospatial analytics, Content analysis, Manufacturing, Logistics, where you find a lot of robotics, and AEC, Architecture / Engineering / Construction. If you are interested in applying machine learning to real world problems, this is the podcast for you.

Alle episoder

7 episoder

Ep 6: Extracting Data from Old Documents with Rosa Lin, Founder, Tolstoy

Rosa Lin is the founder of Tolstoy [www.tolstoy.ai [https://www.tolstoy.ai/]], which specializes in extracting data from documents. As I learned, this is a much tougher problem than traditional OCR! It requires a combination of deep learning and classic CV methods. Rosa also talks about her fascinating background as a journalist and her experience going through Y-Combinator. For more about this podcast, visit www.yaoshiang.com/podcast.html [https://www.yaoshiang.com/podcast.html]. For the video version including visual examples of Tolstoy's work, visit https://www.youtube.com/watch?v=QtHEXvcGGRs&t=9s [https://www.youtube.com/watch?v=QtHEXvcGGRs&t=9s]. 0:26: The problems Tolstoy solves: extracting data from documents like emails, news articles, forms, and handwritten notes and then running NLP algorithms to classify and summarize. 02:54: Typical customers: tech startups, news organizations, utilities, energy companies, legal firms, and educational institutions. 05:05: First walk-through of a use case: Digitizing articles for The Wall Street Journal (with images showing why off the shelf OCR failed). 07:19: Specifics of why OCR fails: multiple articles in a single page, columns, images, headings, and handwriting. 09:18: Training a custom model to deal with columns, with visuals showing how Tolstoy works better than Google Cloud Vision. 11:30: A classic computer vision algorithm for identifying paragraphs. 12:30: Transfer learning with modern Convolution Neural Networks to identify images vs text. 13:38: Second walk-through of a use case: a classification task for a utility company to help find lead pipes. 15:20: Can you spot the handwritten word “lead”? 17:50: Tips for building products around inevitably imprecise ML models. 19:37: Rosa’s personal journey from biology and journalism to entrepreneurship and ML. 22:49: Seeing the promise of AI in 2015 while at the World Bank and starting an AI hobbyist club. 26:25: How training in journalism translated to the skills required for journalism. 28:40: Rosa’s experience with Y-Combinator (YC W17)

24. okt. 2023 - 35 min

Ep 5: Discovering Pharmaceuticals with Machine Learning, with Ryan Emerson of A-Alpha Bio

A true “aha” conversation! Learn how deep learning techniques from natural language processing (NLP) are applied to drug discovery, specifically, protein to protein interactions. Includes a quick and dirty primer on just enough biology to understand the training data A-Alpha Bio uses for their ML models. For more episodes, visit https://yaoshiang.com/podcast.html [https://yaoshiang.com/podcast.html]. Show Notes: 0:37 - The basics of synthetic biology for machine learning practitioners 0:50 - What are proteins and why do they matter? 1:50 - A protein is a string of 20 amino acids… which means it starts looking like a Natural Language Processing problem. 2:35 - DeepMind’s AlphaFold and Meta FAIR’s ESMFold: taking as input a string of amino acids, and then predicting the 3D structure of proteins. 6:23: Where Alphafold got their training data: The Protein Data Bank. 8:07: A Alpha Bio’s product: AlphaSeq. 10:45: The source of the name “A Alpha Bio”: yeast genders. 11:36: Applications of synthetic biology: pharmaceuticals, agriculture. 15:00: Applying ML to predict protein to protein interactions. 20:30: !!! The actual ML techniques applied: treating proteins as strings and applying NLP architectures: RNNs, LSTMs, Attention, and Transformers. 22:50: Discrete Optimization problem to then generate proteins. 28:30: The insights behind why applying ML would work. 31:20: The rise of deep learning in the field of computational biology. 32:50: Ryan’s journey into machine learning and data science 35:20: Advice for deep learning people interested in applying ML to biology Additional papers covering the topic of ML in biology: https://www.nature.com/articles/s41586-021-03819-2 - The AlphaFold paper. https://pubmed.ncbi.nlm.nih.gov/35830864/ - A broad overview of deep learning in biology. https://pubmed.ncbi.nlm.nih.gov/35862514/ - A paper out of the Baker lab in which the authors use deep learning to design proteins from scratch. https://pubmed.ncbi.nlm.nih.gov/35099535/ - From Charlotte Deane’s lab with collaborators from Roche, this paper presents a deep learning approach to rapidly and accurately model the structure of antibody CDR3 loops. One of the papers mentioned in the review above. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129155/ - This is recent work from A-Alpha; this paper doesn’t include any ML but does include some great examples of AlphaSeq data and how it can be applied. The YouTube version of this podcast is available at https://www.youtube.com/watch?v=k2OzeRQIXMs [https://www.youtube.com/watch?v=k2OzeRQIXMs].

7. feb. 2023 - 40 min

Ep 4: ROI from ML at "Reasonable Scale" E-Commerce Companies with Ciro Greco

Ciro Greco has built ML systems used at many named-brand retailers. In this episode, he gives us tips on getting value out of ML at “reasonable scale” companies with NLP and information retrieval. The concept of “reasonable scale” was one he returned to, and he clearly has a very nuanced understanding of that segment and how they are different from the hyper scale tech giants. He also brings advanced ideas like embeddings from NLP towards e-commerce personalization. For more episodes, visit https://yaoshiang.com/podcast.html [https://yaoshiang.com/podcast.html]. Show Notes: 1:36: Key differences in applying ML at “reasonable scale” companies like major retailers where you can’t just “big-data” your way out of problems, compared to the hyper scale tech giants. 3:22: The basics of personalization: suggestions, search, recommendations, and categories. 4:38: A non-obvious challenge: how to personalize for non-logged-in users without a profile who visit infrequently. 9:00: Different incentives for reasonable scale vs hyper scale companies. 9:44: Getting your data right: data ingestion, data practices, organizing teams around data, transforming data, infrastructure for flexible data access, so that you can make developers productive when you have finite resources. 11:23: Learning from experience that data - replayability and replicitability - is more important than modeling. 12:58: Learnings from experiences at presenting at top tier conferences: so many published papers come from the hyper scale companies. 14:19: Taking session data and catalog data to create a “product to vector” embedding to personalize an experience. 19:20: Requirements on how to sell: the sales people must communicate to the “people who write the check” that data integration is a first class citizen, not a downstream task, to achieve ROI. 21:09: Dynamics of regulatory and privacy issues, and how to tackle them as an organization. 24:10: Ciro talks about his personal journey into ML, starting with a PhD in neuroscience and linguistics. 25:46: Early challenges in applying deep learning to NLP. 26:22: The “a ha” moment that led to Ciro’s first startup delivering search products. 27:55: Changes in the role of a data scientist over the past decade. From the role of PhDs who had to tackle problems with very little tooling, to today where there are so many tools available. And a shift towards understanding products and customers. For the video version of this podcast, visit https://www.youtube.com/watch?v=F3e0UPqenwo [https://www.youtube.com/watch?v=F3e0UPqenwo]

28. okt. 2022 - 30 min

Ep 3: Applying ML to Cybersecurity, with Yihua Liao

Yihua Liao is Head of Data Science at Netskope, a next-generation cybersecurity firm. Yihua talks about using both CV and NLP to create novel cybersecurity features. Yihua Liao’s PhD research was on security and machine learning, and he previously worked at Microsoft, Facebook, Uber, and his own startup. For more information about this podcast, visit https://yaoshiang.com/podcast.html [https://yaoshiang.com/podcast.html]. Show Notes: 00:24 - How Netskope addresses cybersecurity. 00:57 - Netskope’s unique approach to cybersecurity through network traffic routing. 02:51 - The prior approach to cybersecurity: a focus on the physical perimeter and firewalls. 03:44 - A unique application of Image Classification in cybersecurity: identifying sensitive documents like driver’s licenses so CISOs (chief information security officer) can set security rules for them. 07:45 - Challenges of building Image Classifiers #1: High quality data. 08:45 - Challenges of building Image Classifiers #2: Managing false positive and false negatives (recall and precision). 09:15 - Challenges of building Image Classifiers #3: Managing latency (15 ms) for a real-time use case. 10:38 - An application of NLP (natural language processing) in cybersecurity: classifying phishing websites. 13:46 - Optimizing LLMs (Large Language Models) through quantization and distillation. 14:45 - How Yihua got into ML. 16:10 - How ML has evolved over the past 15 years. Notes: https://www.netskope.com/ https://www.netskope.com/blog/enhancing-security-with-ai-ml [https://www.netskope.com/blog/enhancing-security-with-ai-ml] A video version of this episode is available at https://www.youtube.com/watch?v=F3e0UPqenwo [https://www.youtube.com/watch?v=F3e0UPqenwo].

6. okt. 2022 - 20 min

Ep 2: Tedd Mann @ CollX

Ted tells us about applying machine learning to the field of baseball cards! 33% of Americans have trading cards, making this a very large addressable market. Learn some tips on scrappy ways to launch an app, and how similarity search powers one of the killer features of the CollX app. Key Moments: Building an application that works around the potential errors of an ML model (15:10). The data and ML behind his trading card valuation model, especially when recent transactions don’t exist. (18:30). Dealing with the latency inherent in ML and networking through the concept of “building lists” (18:25). Early work on product search (24:00). Working with bad training data and adding a “wizard behind the curtains” to deliver value while labeling data (26:18). More UX techniques to reduce perceived latency (28:00). Helping users understand that ML models are not 100% accurate (30:45). Advice for entrepreneurs trying to launch an app (35:20). A video version of this episode with visuals is available at h [https://www.youtube.com/watch?v=RX9xIYnn2v4]ttps://www.youtube.com/watch?v=RX9xIYnn2v4 [https://www.youtube.com/watch?v=RX9xIYnn2v4]. To learn more about this podcast, visit https://yaoshiang.com/podcast.html

14. sept. 2022 - 39 min

En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.

Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍

Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Vælg dit abonnement

Mest populære

Begrænset tilbud

Premium

20 timers lydbøger