Pioneer Park Podcast

Lisää Pioneer Park Podcast

Pioneer Park is a podcast that delves into the minds of the most innovative and thought-provoking individuals in the tech hub of Silicon Valley and Cerebral Valley. Hosting in-depth conversations and interviews with some of the brightest creatives and technologists, Pioneer Park provides an insightful platform for exploring the latest technological advancements, the creative processes behind them, and the impact they are having on society. Listeners can expect to hear from a diverse range of experts and thought leaders in the tech industry, as well as emerging voices that are shaping the future. Pioneer Park offers a unique perspective on the intersection of technology, art, and culture and is a must-listen for anyone interested in the future of technology and its role in shaping our world. pioneerpark.substack.com

Unstructured play and personal tutors with Cinjon Resnick

John and Bryan interview Cinjon Resnick, an AI researcher at NYU. Cinjon is interested in developing unstructured, discovery-oriented play games and experiences. Topics * Ender’s Game and its inspiring concept of the Mind Game as an education system that deeply understands the state of the student. * The present and future of AI-driven tutors (see Cinjon’s longer post here [https://cinjon.com/building-primer]). * Virtues of personal tutors has inspiration back to Socrates and Rousseau, AI may be able to make this broadly accessible. * Currently LingoStar [https://www.lingostar.ai/] and Speak [https://www.speak.com/] offer digital tutors for language * These solutions still lack the empathy of a real human tutor, but that should be possible * This is a potential solution to Bloom’s 2 Sigma Problem [https://en.wikipedia.org/wiki/Bloom%27s_2_sigma_problem]. * Cinjon’s experience with circus, and his interactions with instructors including Victor Fomine [http://www.circopedia.org/Victor_Fomine] * The inspiration for Cinjon’s hackathon project, Animate. * Cinjon’s love of social deception games, and the challenges for developing algorithms that can effectively play those sorts of games with humans. * Impact of Generative AI on gaming * Motion Generation models like PhysDiff [https://nvlabs.github.io/PhysDiff/] or the Motion Diffusion Model [https://arxiv.org/abs/2209.14916] * AIs that are able to imitate human players * Advances in Chess AIs * Natural language APIs * Meeting the challenges of hallucination * The limits and potential of AI-driven storytelling * Why voice processing, is critical to build machines that have human-like empathy * The importance of real time processing, for example simultaneous translation. * The future of Multi agent RL models * Causality Research as an underrated field * Cinjon’s recommendations * Diamond Age by Neal Stephenson [goodreads [https://www.goodreads.com/en/book/show/827]], Ender’s Game by Orson Scott Card [goodreads [https://www.goodreads.com/book/show/375802.Ender_s_Game?from_search=true&from_srp=true&qid=WmfgBLOOcQ&rank=1]] * Rousseau on education: Emile [goodreads [https://www.goodreads.com/book/show/326679.Emile_or_On_Education?ref=nav_sb_ss_4_5]] * Martin Arjovsky’s thesis [arxiv [https://arxiv.org/abs/2103.02667]] * What’s next for Cinjon? He is starting to think deeply about childhood companions. Transcript [00:00:00] Hi, I'm Bryan and I'm John. And we are hosting the Pioneer Park Podcast where we bring you in-depth conversations with some of the most innovative and forward-thinking creators, technologists, and intellectuals. We're here to share our passion for exploring the cutting edge of creativity and technology. And we're excited to bring you along on the journey. Tune in for thought-provoking conversations with some of the brightest minds of Silicon Valley and beyond. Bryan Davis: Hey there. Welcome to the Pioneer Park Podcast. Today we are having an interview with Dr. Cinjon Resnick, who likes to be at the forefront of tech and research. He has worked at Startups, spent time as a fellow at Google Brain, and recently wrapped up a PhD in machine learning from nyu. He's an alum of South Park Commons and is currently working on ideas related to AI powered experiences, games and companionship. Cinjon, welcome. Cinjon Resnick: Thank you. Appreciate it guys. Bryan Davis: So I'm I wanna dive right in. Tell me about Enders Game, your relationship with the book Enders Game and the cool game that was featured there. I [00:01:00] guess that's called Mind Game in the book. What is your memory of reading that and what inspired you about that narrative? Cinjon Resnick: Ender's game is wonderful. If you haven't read it, I highly recommend it. It's a story about a a whole family. And this family is, they're special, but in particular, this one boy ends up taking on the mantle of savior of humanity through a simulated adventure experience. I'll leave at that and let the audience go and read it. But in particular, inside of Ender's Game, there is a story. And the story is guided by this thing called Mind Game. In this story undergoes on an adventure. It's not meant to be beaten. This is meant to test different aspects of the character. There's some fun things that happened, but this was a companion friend. There was interactive companion. It was similar to a diamond age where you have this concept of the primer. The primer is an interactive like Socrates for the child. . So we're dealing Bryan Davis: with a, now a set of [00:02:00] technologies that are potentially opening up the doors towards unstructured, discovery oriented play games experiences. And it sounds like that's a lot of the areas that you're interested in developing your work. What do you think are the experiences that are opened up by the current generative AI technologies? Cinjon Resnick: Yeah, it's a good question cuz if you think about what life was like for a 1700s, 1800s child either you had nothing or you were very wealthy and then you had teachers, those teachers were 1 on 1 personal tutors. So Rousseau talked about this in Émile, and like this idea of. A personal tutor that you would have for you. And so Socrates was this famously for different wealthy patrons, ch children, wealthy patrons. What would be really cool is to provide this for every kid. Are we there yet? Probably not exactly in this document that I wrote sort of goes through where exactly we need to fix and what are the remaining problems to be being met. But we're pretty close and we're close and close to each day. And so the opportunity now [00:03:00] to gear an an ai, if you will, towards a child's ability towards a child so that it is a personalized experience for them and what they want to learn and what they wanna go, but stories they wanna play to, to play out for themselves and really have that AI be something that is not just comfortable for, not just a great experience for the child, but something that parents want. We're close. We're close. And so I think ti that today is time to really start thinking about that and maybe even building towards it, finding some initial wedges. , are there some John McDonnell: kind of maybe early applications or milestones on that journey that you're the most excited about that we could potentially be close to delivering? Cinjon Resnick: Yes. So I think that one very clear one is actually everything that's happening, language learning. We see companies like LingoStar, I think Speak is gonna go this way as well, where they're developing applications that you could imagine just plugging a child into or yourself as an adult language learner and figuring out, okay, this is how you understand French in the context of actually talking about it with.[00:04:00] And from there, the interactive experience actually looks pretty similar to the interactive experience when you're language learning with a human. The thing that, the gap that's missing there, one of, one of the gap, there's two things actually that's pretty clearly missing. One is the empathy to know where the student is on their journey. And also on a day-to-day how they're feeling, how they're doing, do I up the capabilities of this AI or decrease them, et cetera. And the second one is just having a curriculum for the students because it's not just this interactive experience, especially with adult language learners, there's also this idea of at some point you are teaching them, you have some objective in mind for where they're trying to go. That's missing. But I think that we're in a path to build, able to get there and Speak, they already have tons of users in I believe South Korea LingoStar, they're trying the same idea. They're of demonstrating the capabilities of this. Bryan Davis: Have you heard of Bloom's two Sigma? Yeah. Yeah. Which for anyone listening, that's the idea that Language learning or [00:05:00] any kind of learning that is taking place with a single tutor. So the the impact of having a personal tutor on learning is Two Sigma greater than the sort of base case. Basically making the case for individualized tutors and individualized education and bloom's two Sigma is a pretty prominent result in educational theory. And I think speaks to the fact that these personalized Aristotles or these personalized coaches have a big impact on education. And so I suppose what you're proposing is that we might be at a point where we can make these personalized tutors and be achieving much more significant educational results for a much larger proportion of the population. Cinjon Resnick: That's right. That's right. So I have some background in some of this stuff. I've not been a teacher per se, but I do regular personal tutoring in, in circus. So I have a teacher in the circus apparatus that I trained, and it's just night and day, whether I'm working with one of my coaches or not. Similarly when we've tried to do things [00:06:00] around I ran a nonprofit called Depth First Learning for a while, which the whole goal there was to try and figure out a different way to learn from a structured base. I'm not gonna go into the exact details of how this is that person learning.com, wanna check it out. But what was interesting there was when you put people in a group, rather than having 'em just be alone, it works out much better. Why is that? I think a big reason is because you get to learn from what other people are doing, or the people are going in where their knowledge is coming from. And so if you have a particular entity who's geared up to understand the topic better than you, and to be able to go on this journey with you, but is also tuned to knowing where you're at. . It's very powerful. And this is a lot of the things that you're pointing at, let's say with Bloom, et cetera. So one thing that's really cool about doing this with language learning is that it's largely just about talking. And so the subject matter is really easy. You can't, it's hard to get it wrong. A lot of these machines that'll hallucinate answers today, and they can get it wrong, say in history or in finance, that's a problem. [00:07:00] Another area where I think is very primed for this concept is in early childhood interactive experiences, because getting it wrong just doesn't matter to a kid that's, three to five to seven years old, getting something wrong in their story of their day or talking about, a, a big dog that is drooling in the park. And that doesn't matter if it's slightly. . But the experience of making a companion that I can actually have empathy with this child, those are the things you start to be able to build there. And that's actually a direction I've mostly been looking into. Bryan Davis: I'm curious if you've had any mentors or teachers that you feel have been really effective in your life in cultivating that experience for you, that's made you so interested in this. Cinjon Resnick: What exactly are asking if a teacher of mine has actually just effectively been a Bryan Davis: 1 on 1 tutor? Exactly. Whether or not you've had a really sort of significant relationship with somebody who is a mentor or a tutor that kind of really you felt opened up a new field. It sounds like to some degree circus has been that for you. Yeah. Cinjon Resnick: Yes. Yeah, I can definitely answer that way. So [00:08:00] in athletic movement I have two coaches in Montreal that I train with one, and this guy Victor Fomine, who's world famous coach, I'm really lucky to be able to work with him. Another guy think Sergey and I go to them for different things. , but the, and Victor also doesn't speak English, so the opportunity to work with them is just fantastic because they understand home so much of how the body should move. And so the whole experience with that is getting cues, getting, figuring out, okay, this, so we should be doing then, so we're doing then. And so just the cue to tap Tapi, look at your, look at the ground, look at your feet. Just hearing that over and over again at the right time is fantastic. It's just so useful. But then sometimes I can go there and I'll see him train, I'll see him teaching people who are much less skill or even much higher skill, and he changes his course to those people, right? There's an empathy for understanding where they're at. But then still this drive to I think one of the best parts of working with a tutor who's able to adapt to you is if you give more, then they give more.[00:09:00] And if it's a day where you just can't give that much, they recognize that. Being able to build that into this next generation of machines is gonna be so important for getting this tutoring experience. Bryan Davis: One last aside on that, what is your circus skill? I do Cinjon Resnick: straps. It's like artistic rings. John McDonnell: What was the inspiration for your recent hackathon project that you did at South Park Commons? Cinjon Resnick: Animate. You're talking about animate and the idea here, just to sum it up for the audience is we're going to. , I wanted to understand what was the state of the art in a wide variety of systems. There's a wide variety of APIs that we could use to have an interactive experience with ai. And additionally, I also wanted to understand what it would be like to do test. Two things. One is it fun to be read a story to? And two, is it interesting for a language learner to be read a story in different language in like the language that they're [00:10:00] thinking about? At this point, I hadn't yet come across any app that could do the second thing I have in the time since. But the Animate then was we took a chapter of Alice and Wonderland, we turned it into a visual story so you could see a scene with it, and there's a narrator, there's two characters. And then we wanted to have for each of those characters, them talking out their role. So in other words, we turned it into a play. Yeah. And the whole experience of taking the story, turning it into a play, and then animating the play so that you have the characters with their, their their mouths are moving. Then the, oh yeah, then there's the language switching and the interactive experience. So the main goal was really to test, is it interesting to be read a story to what would make it interesting and is there something around language learning there that can be, that can tap into, so built all that out. It was actually rather quick to build it all out. Considering the technology today is very good. Just progress to the point where [00:11:00] you can do all these things and. The goal at the end of this was then put it before some children and see what they liked. And when I did that, there was just one thing that stood out over and over again was the ability to change the scene. It's just fun. It's a fun experience when you edit the scene and you go from something which you can see, it's plain, and it's a canvas. It's a creatively constrained canvas because it's it's characters sitting in front of a fireplace with a chessboard in between them. But then you say, oh, I wanna put a monkey on the chessboard, or I want to change this chair to be a giraffe. And what you get back is just fun. It's surprising, it's creative. It's interesting. It's hilarious. And it was engaging. Watching the kids actually Bryan Davis: play with this. I'm curious, you are very interested in games and play. Do you play any Cinjon Resnick: games? I do. I really like social deception games. So famously like secret Hitler and coup those kinds of games. But I also braid and I used to play a lot of Diablo too when I was a kid and [00:12:00] work off three and those kinda things. Bryan Davis: Do you spend any of your time now deeply invested in any kind of computer worlds or most of them are sort of social deception? Ooh, . Cinjon Resnick: I don't really play any computer games these days. I am gonna play double four when it comes out. I have a childhood nostalgia around it, but I have not invested in any of the ones that I've noticed in my friends playing. I'm not, I never got into factorial, for example. , do you feel like Bryan Davis: the next generation of games? Factorial is a great example of a game that is algorithmically generated, but it's and has a lot of ran randomness that's embedded in the way the game is played. It's very famous for being replayable and time again, every experience will be different. Do you think that generative ai, and I guess, how do you think the generative AI is poised to impact gaming? . Cinjon Resnick: That's a good question. So I think there's some obvious answers here. Things you can point out with AI Dungeon, you could talk about storytelling, whatnot. I think there's two things that may be are less obvious. One direction is around motion generation. So we're starting [00:13:00] to see this past year, really actually this past year, motion generation start to be, start to work. What I mean by this is examples are PhysDiff or the motion diffusion model. These are pointed to a place where you can just say, I want this character to move like Beyonce in a rainstorm with jazz music in the background. And then it, it does some interesting thing because it has some concept of Beyonce rainstorms and jazz music associated with the movement of the human body. , we're not at a point where we can do this with. With shapes that aren't really the human body unless you can slap a faux human body on it. But what that means is that if I was to just draw something that had some resemblance to the human pose, you could imagine creating, turning that thing into its own shape. And so this has a, this could have a huge effect on U G C content. Suddenly U G C content can come alive. So I've seen a couple of startups working along those directions. Not the direction of taking the [00:14:00] motion generation and putting in yet, but being ready for when that's possible. So that's one area. Another area which I think is really promising and is, I know is actually being worked on in place like EA is defining difficulty differently. A lot of times in game difficulty, what is you see a a computer get better or stronger. just based off of they'll give a bonus as to how much gold it collects when it accumulates something, it's just hacks. . But if instead what you can do is define it in terms of how capable it is as an agent and that capability adapts to what your strengths are, so you're doing really well. So they'll just keep upping the difficulty until you're in that sweet spot of just a little bit past what you can do. But if you strive hard, you'll get there. I think that's gonna come, that's gonna come pretty soon. Yeah. So like John McDonnell: I play online chess and [00:15:00] I think the thing that is very cool about it is that you have your ELO rating and you're always paired with people who are a good match. And it was always disappointing. Like I remember as a kid when I realized that like in civilization, deity mode or whatever, like all that was happening was like the bots just. Could build every building in three turns or something. . I was like really disappointed. Like I thought, oh, the bot's gonna be like super intelligent. It's gonna outsmart me. And that's not actually how it works. And you just kinda have to figure out what hacks you can do to exploit its dumb behavior. Yeah, it's like a completely different idea to actually make it be really smarter. Cinjon Resnick: Yeah. Yeah. I think places you'll see this first are things like FPSs. In chess you could do this right now, you could train an alpha go to have any ELO rating you want. Yeah. And then just park it on the server and have it be available to play, I yeah. I dunno if anyone's actually done that though. actually John McDonnell: think so. So I for sure heard about people building chess bots that are intended to have a certain ELO rating. And then I believe actually that on chess.com, some of those bots like I like are actually Yeah. Spec specifically trained to behave like a human would behave. I if the human had that ELO Cinjon Resnick: rating yeah. Bryan Davis: I think some go servers also have [00:16:00] similar bots that are out there at different levels with different sort of training and background. So very interesting to think about. I recently read the paper about diplomacy, the Cicero paper from Facebook, which was talking about the integration of large language models into this sort of like social strategic game. And that was one of the most fascinating examples and that I've run into in recent history of an integration of a very complex social game with a strategic engine. I'm very fascinated to think about what is the sort of next version of this. Are there environments where we could let these things loose so in learn from it proactively. A lot of these things, especially the strategic engines that the Alpha go and these other sort of game engines that are winning these strategy games rely on self play. Hundreds and hundreds of games to be iterated upon, in the background playing each other. And I'm curious, do you think that in a social game or a game that is almost dependent on relationships with humans, do we run into an issue where self play becomes ineffective [00:17:00] because we can't actually mimic human behavior, we can't mimic human Cinjon Resnick: adaptability. What is the world where, what is the game you're thinking of where you need to do that? And guess I'll point out my example is being in AlphaGo, once it passed human capability, they kept getting better because it was now competing against its own population. So the Bryan Davis: example of diplomacy, I think is somewhat interesting because it is reliant on human communication. It's reliant on interpretation and alliances being effectively formed. . And so perhaps there's a category of games that do have this sort of like unbounded social nature and self play. When they used self play in the context of diplomacy they found that the a large PORs, a large part of diplomacy, first of all, takes place over messaging, basically convincing people to ally with you or to invade another country on your behalf. And so that requires that you're able to be persuasive. And when they instituted self play in this system, they found that there was a tremendous amount of semantic drift, where system one and system two were [00:18:00] communicating with each other and they were be beginning to use nonsensical language to And so that seems to be a limiting factor on how well a computer can do in a sort of social setting or a setting where a computer needs to be persuasive. It seems like there needs to be some sort of anchor to the real world. Cinjon Resnick: Yeah. Yeah. So it's been a while since I've been involved in this research direction. I would say the thing that come to mind is called other play. If you haven't seen that, I would look into that. So other play is its work out of I think it was also actually originally a fair, but I associate it more with Jakob Foerster and his lab. The idea behind other play is that you want to train agents that can work not just with their self, but with other agents. And so the goal the whole time was to be able to train agents that play Hanabi with humans at a very high level. And so the algorithms that they come up with around this, even though they're playing self play, Need to [00:19:00] be able to work with humans too. And they actually do a pretty darn good job. So a lot of that I think that there's a lot of room for algorithm improvement where you go in those directions. The challenging part is always going to be to keep the human connection available there. I think though, there's another question that's built into what you're seeing, which is can you make an algorithm that doesn't actually work with humans, that is agents getting better and better, but still a human interpretable as to what they're talking about, right? So that that's, it doesn't need necessarily need to be playing with humans, but needs to be talking in a way that humans can understand or that's what we would want. And one question here is if it's even possible, because maybe what they're saying looks interpretable to humans, but actually has codes underneath all of it. And so that's, that's I think an open question. I don't actually know a research that has addressed any of that but I would expect that to actually happen that once it surpasses the human, trying to understand the strategy [00:20:00] involved is too difficult. And at some point it's going to be so difficult that we're just going to let it happen anyways. We're gonna let it happen because the results are so good. And you can take that as for voting, even I'm not. . Bryan Davis: One of John, one of my favorite conjectures that John has about the future is this world in which there are just natural language APIs to the universe. So basically, every sort of site or service has a natural language API where you state your intent and it is able to perform the actions. And you can have, obviously these APIs that are beginning to interact with each other, just, like a large API server, but they're interacting with natural language. But what I think is interesting in that context is what happens when natural language ceases to mean what it means to us when these bots are that's right. Yeah. To communicate in their own version of our language that to mean very different things. Yeah. Cinjon Resnick: But I love that direction, that emerging communication and as, one of the reasons I wanted to do a PhD was to study that area. And I think that there is, , there's [00:21:00] a lot of fanciful things that we can come up with in that domain. It's just, it's hard to then ground it in a real ac actually useful thing to do. And you saw, we've seen now the rise of agents that we can talk with. And we used to call these, three years ago, we used to call these chat bot, and now we don't, we call them, we just, we've forgotten the word chat bott. Instead we go Just ChatGPT or GPT-3 or the coming one from Google, whatever. But they work now and they work in a way that eschewed all of the purpose that was going on with emerging communication, but maybe it's time to bring it back in. And I would love that. I think that'd be amazing. I also wanna bring up something else I think is interesting in this direction. And that's that's how it's sort of connection with hallucination the hallucination in terms of these the big language models. So my friend Colin has this interesting take , he says it's, bzip, it's pretty hard to imagine these neural nets being more compressive than bzip. And bzip is roughly N over four in terms of using float 32. So if you have size of your language, just [00:22:00] 25% in, that's roughly bzip. Let's just say that's a floor. Okay? So then let's just, we're gonna move on from what Colin's point is there, but that's our floor. Now, if you imagine all of the internet that's generated, it's much larger than the size of these models. So if you imagine stuffing all of that into these models, it's not gonna be able to. in the same way that you and I, when we go around the world, we can't stuff everything into our head. We have to compress a lot. We have to figure out how to make it compositional, but it's not gonna get below what bzip is doing because it's not even caring about making. You can do it John McDonnell: if you're lossy, right? Cinjon Resnick: Yes, that's exactly it. You can do it if you're lossy, and that's where hallucination comes in, because we don't give it any faculty for knowing what it doesn't for knowing that it doesn't know something, and we require that it generates something. The only answer is that it's a lossy hallucination. It has to be. And if you were to instead figure out some faculty for either having a reliable communication channel that [00:23:00] let's it say I don't know what this is. If you don't even do that, you're not gonna get, if you, sorry, if you don't do that, you're not gonna fix this problem. Bryan Davis: So what do you see as solutions to hallucination in the, short and medium term? . Cinjon Resnick: I think the first question is to ask yourself if you need a solution. Cause a lot of times maybe you don't need a solution. Yes, you're gonna need one. If you're trying to do something that's legal obligation. If you require that this thing is airtight in that domain, then you need a solution. But in many places you don't. And ask yourself really, if you do the second answer is at some point we need to teach it or it needs to emerge because that's the seems to be the flavor of the day, is to emerge a property of understanding what it doesn't know. And there are places that people working on that, but even the direction those say involving knowledge bases inside these things, know, people have been doing this for a while. It's not like in the last year and a half was the first time that we started to understand that the stuff can hallucinate people working on summarization for decades, extractive or subtracted summarization. This is not a. [00:24:00] A new topic. We don't have an answer even if you include knowledge base because the network may actually have a concept of the knowledge base, sorry. The model may have a concept that this knowledge base exists without being able to actually point to the fact that caused it to understand something. And in other words, I don't think there is a solution right now. And I think you have to just deal with how much you wanted of it and then otherwise form the right gates you form the the right playpen for your users or whatever to plan. John McDonnell: Yeah, it's really it's interesting your point about the fact that I can't say that it doesn't know like it makes me wonder, if you could instruct tune them or something to be able to ask the follow up question. How confident or were you sure about that and to have it reliably, give you a reasonable response or that kind of thing. Cinjon Resnick: It's unclear though that would help, right? Because it's the same problem. It's like at its core, it just doesn't have the ability to do this again, unless something emerges. That's different. But we haven't seen that. Instead, what it has I've seen really good evidence to suggest that [00:25:00] what it has figured out is the ability to follow your intent. The conversational partner's intent. . So if Brian's talking to this agent and it knows what it's looking for, is some answer along these lines, like why do you not know what you know? What do you mean what I don't? Is it because you are trying to track this fact? Oh yes. That's why. And then you roll dice again and says, oh no, that's not why it's actually this. But the understanding, the, it has, it seems to have some understanding of your intent to where it's going with it. Bryan Davis: Yeah. That's interesting when I phrase this is they seem to be very agreeable and I, yeah. Yes. I suppose that's because a lot of the to your point earlier, like these are trained on data that exists, not data of just denial of existence. And I think it's interesting, it's perhaps an interesting point to think about. The negative case of not knowing is not represented very well in the data that it's trained on. Because the [00:26:00] overwhelmingly the internet is full of information. Even if that information is false, it's not full of people. Or I guess we have an underrepresentation of questions not being. because the questions that it's being trained on are content. So it's almost as if we have this bias towards the things obviously that do exist, the training sample that do exist, and perhaps there's a, there would be a benefit to generating false or negations as part of its, as part of its training sample. One other strategy in this domain that I'm curious to hear feedback on is relatively annotation heavy. And that would require basically taking the input of something like Wikipedia and annotating it as requiring citation or being basically, Labeling as this particular statement coming from a, needing a source or being an example of something that is a timely, factual piece of information and thereby [00:27:00] perhaps teaching a model in its process of training that it needs to basically inject some citation or inject some sort of timely fact. And knowing that and being able to output that as part of its response to then be filled in by so we can imagine, for instance, a tag that indicates a timely fact or a sort of citation needed that's actually in the data as it's ingested. That's just one strategy to throw out there. I understand that it might underlie some of the experimentations with the FLA model from Google. But curious if you have any reactions to that sort of strategy or Cinjon Resnick: others. I think that it's a great strategy for targeting it towards your use case. , if you care a lot about having, lemme put it this way, the model size isn't changing, so you still have this limitation this more meta limitation around can you actually put all the data you need into [00:28:00] this thing? It's comparable to you as a human. Actually. It even has fewer parameters and abilities right now than you as a human. But at some point it'll be comparable in terms of per of parameters in its head. And you yourself can't remember everything that happened. There's just too much data out there. And so the answer must be that it has to compress it into composition ways and then use those compositions to, to meld into these new concepts and then we'll explain them. But even us, when we do that, we still don't remember facts. Because those are too, there's too much information there. And it's too much long. It's too long. Tail. What? I don't anticipate that changing it. What do you see? Why would it change? Wow. Bryan Davis: Sorry. What do you see as the limits of AI driven storytelling? What's the boundary in its capacity to proactively create? Cinjon Resnick: That's interesting. I think in a long term, I don't think that there is [00:29:00] bound it's not bounded. I think that it'll gain all the faculties that we want it to have. I think today, one way that it's, one way that it's bounded is definitely in the empathy and understanding of what's going on. If you try and say, okay you're playing the role of a teacher for a five year old, It's not gonna remember the entire time that's playing with a five year old child. That's one thing that comes up. But at some point the child's gonna say some set of information and the model it's not gonna know that tone means something. There's no ability to take that in. If you say the kid was excitedly saying it, are you saying it with the right way that we gauge it? There's just, there's a lot of lossy information there in how humans receive empathy and give empathy to get where the child is at. And I expect that'll be true for us as well when I've taken. We put that back to adult language learners. When I've taken language classes [00:30:00] or just one-on-one experiences with teachers they have this ability to slow down the way that they speak automatically to gauge where when they figure out that you're not thinking about the right thing, or they can stop and say, oh, you didn't get that word did you? Or on the other flip side, they can speed up when they recognize, oh, you're just, you're fine. You got this. Let's go faster. It's not gonna be able to do that automatically. So there's gonna be this little bit of extra friction every time you use it, where you need to now account for that with design. . And I think it's possible. And it's a very interesting journey in the next 10 years, getting from here to the next step. John McDonnell: How would you try to get it there? Cinjon Resnick: The answers that come to mind feel like a combination of getting the data and getting the right design today. And also just we need to reduce the latency in things like speech conversations. So in a past life, I've worked a lot with audio data. It's, we're talking if it's 16 fpf, oh, sorry, 16 kilohertz, then you're talking tremendous amount of samples per second models today can deal with that. [00:31:00] But it's a whole other modality compared to text, which is much fewer samples per second because each of those samples, much fewer words per second, because each of them contain a lot more information . So if you wanna be able to go from what the experience that we are having right now where I'm talking, and you immediately understand it because there's no extra steps from taking this audio to text, to, to sensory, to, to reasoning, and then reasoning back out to text. That pathway needs to be smoothed. And there is some really interesting work going from audio to audio. But most of the big labs are not focused on that because they're seeing so much power right now. Go into these straight up text to text models that they're gonna focus on them for a while. If you wanna get to a place where it feels like realtime understanding and realtime maybe even empathy it's possible that emerges from the text, but it's gonna do it in a medium that doesn't feel the same as it does with you and I right now, or what happens when you play with a child. And so I, I do believe you, you [00:32:00] almost surely have to go to an audio to audio experience to get there. John McDonnell: it feels like there's almost like a multimodality to this where you can think of it there as being the text itself is like one mode, and then all the kind of like meta text, audio information about the way the person's talking or their speed of speech or their accent or whatever. Is this other stream that, that you're actually gonna want to co-pro as you're making judgements. Cinjon Resnick: I agree. Yeah, I agree. I think that there's so much interesting questions around ity that we don't really understand, and sure there's dire other fields that look at the effect that comes with different pro, but bringing that into the end-to-end experience that is, that's in advances today. We, we don't have good answers yet. There are teams working on this. There are teams at Google or Facebook that I'm familiar with that they're not they're, they're not even private about it. They're public. They're fairly public about the fact they're trying this because it's all early research. John McDonnell: So Brian and I, for our hackathon project, we made a, like a voice [00:33:00] chat bot that you could call on the phone. That's awesome. And it was what, honestly, I think the coolest thing about this project was having that experience of talking to a bot by a voice and then seeing how it's cool and also how it's broken. . And so like when we first turned it on, we used curate as the model and the response time was about a couple hundred milliseconds. And it really felt like Curie is like listening to us and then answering back. And it is really magical just to have the bot be talking in a conversation flow and cadence that matches yours at least a little bit. But then curri, curri was difficult because curi hallucinates a lot. So fun. It's fun that Curie hallucinates actually, but it was like, . I had weird conversations about it where it told me there was like a terrorist attack going on and stuff, and it's OK, , I can't really ship that. . But it was real. It was, and it was creepy. Like it was almost like just there's like a freakiness to that. But but then, so then to get reliability switched to Da Vinci and then it was like, three second lag time or something, and then you just kinda feel like you're giving instructions to [00:34:00] Alexa or something. Yeah. Magical. And it's, and of course like mean to your point man, if you could get that curious speed and then also have the bot be attentive to your ity and the kind of like other aspects of your speech that, that are reflecting your state of mind. Like I could imagine that even if it was just not very smart, like being really magical feeling, Cinjon Resnick: yeah. It really would. And then, I that doesn't even count for the TTS and assr on the other side. So the Texas speech and the speech recognition on both sides of that, yeah, that's probably adds another few hundred milliseconds each way at least. Yeah. Me. This is, Bryan Davis: I think, evidence of how amazing it is to be a social animal and to have a brain that is capable of interacting in a real time, interactive, perceiving, understanding, reacting, all happening so quickly with and it speaks to the fact that there's some amazing compression representations of our world are extremely efficient and in their [00:35:00] ability, both in terms of a memory standpoint and also in terms of a computation standpoint. And I wonder whether or. The, that seems to me from from where I'm standing to be, one of the main limitations that of our current sort of understanding of how AI will progress is we are very far away from being able to represent the world in an efficient way that will allow for real-time communication, realtime speech, realtime video, that sort of thing. Yeah. Cinjon Resnick: You're right, and it's tempting to make predictions that this is very far away, as we've seen things move fairly fast sometimes the stuff that's coming out with respect to music is really incredible, but it's also not realtime. And the, maybe one benchmark to consider here is whether you can do simultaneous translation. People care about having simultaneous transla. , there's big companies that care about it because it means you don't have to take translators with you. There's large organizations that care about it because the UN then can be just having a much more efficient experience [00:36:00] on their floor. But wow, is that a hard problem? The idea that I can be talking right now and that there's someone right to my, in terms of the order here, maybe just someone right below me who can be with only half a second, maybe delay or a second delay, be translating the concepts that I'm saying. It's extraordinarily different than what machine translation does. Machine translation is going trying to do almost the sentence by sentence experience, but here it's more the conceptual experience in order to make that fast enough. And we do not have any good solutions for this. And I think this is probably akin to all of the problems that we've described here with respect to understanding empathy, et. Bryan Davis: what do you believe are the constraints on solving that problem? Do you think it's a understanding of art model architecture? Is it a a hardware issue? Do you feel like any of these things will be breakthrough points? Cinjon Resnick: I think it's largely data. We just don't have, we have tremendous amounts of data for doing machine [00:37:00] translation, for doing simultaneous translation. We have un which UN data might actually be around this this kind of good stuff. I don't know how much Bryan Davis: tens of thousands of hours of recorded un simultaneous translations. For context here, I used to be a translator. I was never a sim. I was never a s I was never a simultaneous translator, but I was a I would. What was, what's the other variety? I can't recall. But basically taking part in meetings with lawyers and translating back and forth between lawyers and clients. And this was a profession I was pursuing and I was fairly close with some people who did become simultaneous translators. And it's, it is amazing to think of the sort of computational training that they are enhancing specifically one part of their brain to be able to instantly code switch in their heads at the speed of human language. And it's very unique and it requires years of training to get right. Yeah. It's Cinjon Resnick: wild what happens in that when they're working on it themselves. Bryan Davis: It's like practice. It's like somebody trying to become a concert pianist is they [00:38:00] just perform and perform. And of course, they're also working to close gaps in any vocabulary that they might be missing. And become domain experts in the variety in the fields in which they really want to concentrate, whether that be politics or economics or specific business experience. So there's vocabulary acquisition that goes along with that training, but a lot of it is sitting in a booth and doing the work over and over. That's cool. Cinjon Resnick: Over, yeah. I really respect that a lot. Tens of thousands of hours. Sounds like enough. But I don't know. I don't know. I've not worked on the problem. I really, I haven't really thought about translation seriously as a research endeavor in four years, but I do perceive that it hasn't reached enough of a, it hasn't reached, it hasn't reached a place where people could say, Hey, this is almost ready to tip over. Let's now just add compute to it. There's no service that offers this. There's nothing That's good enough. John McDonnell: Switching gears a little bit, one thing that, that I really wanted to ask you about was this kind of world of multi-agent RL models. So you've done some work in this, right? , [00:39:00] so it's funny so now, in the Bay area, there's all this excitement about LLMs and everyone and open eyes, brand name is just like infinitely high. And of course they started off doing a lot of these multi-agent models. My, my impression of how this went was that they went transformed when transport, when the transformer paper came out. Then they built GPT and then they realized, oh, this is amazing. We're gonna just like pivot until we're really focusing on this. But I guess What's of become of that multi-agent work? What were they hoping to get out of that and did it just not work and are other people achieving their aims? Like how did that, like what's the state of that field? Cinjon Resnick: Yeah. Also a good question. It's, I don't know their motivations in particular. No. I will say that there was a long period. where people thought that the way to get to general intelligence was through RL, the reward function was the most important thing et cetera, et cetera. And you can learn everything through the reward function. [00:40:00] Theoretically, that remains true, but in practice it's appears to not be as important as having transplant data and a sim simple enough objective that still works for what you need and what the language models with respect to where multiagent stuff is happening. Fair is still doing quite a bit of it, as you can see with Cicero led by Noam and team. Then you have DeepMind, which of course has a bunch of people still work on this stuff all the time. I saw for all recently put out a paper that was really interesting, the 8 0 1, which is all about adaptive learning and be able to do it with small number of samples. , a lot of this work is now building on top of foundational models and then adding RL to it, which is what with RLHF as well. Sure. I think that for the near future it's going to a lot of that, the core multi-agent RL type stuff is gonna be relegated to academic [00:41:00] labs more. I don't know how much is gonna happen because everything is just super hot right now in working with foundation models and then pushing on that. Yeah, and there's also this feeling in academia that, more and more people do the thing that's hot. It's pretty common. And then every once in a while you're gonna have this offshoot that comes around that pushes things forward. It's gonna be surprising and there's gonna work, and then it's gonna take over a little bit more. There are labs I can point to that will continue on this path because they, it's not going to be run over by the computational steamroller so much. , there are important problems to think about that, that say around cooperation. like involving humans. I, if you, it's rare that you're going to be run away with a competition steam roller if you have to involve humans in the loop. Yeah. It's just too hard to then do it Now. Maybe r l HF will lead to some route where you can have people who are [00:42:00] every second are updating something, but you're gonna have to have a huge team doing that. And there's a bet here, say even like the Forester Lab in Oxford is kinda making a bet that actually this is going to continue to be the case and they're important problem to solve. Frankly, right now it just looks like all of research is being dominated by this stuff. And my old advisors at NYU certainly also are seeing that too. And every once in a while like something on Twitter or a comic or something about how all of ML research is now being taken over by these things. I You say that, I think there's one direction though, which is maybe answers your question a positively and that's around robotics. So Open doesn't really do robotics anymore. They stopped and they stopped because it's a different use of the resources that isn't gonna scale as well as this, as well as everything they're doing right now. Yeah. There are teams that are focusing on robotics. There's a, or team at robotic deep Mind, a team at brain. There's teams all over the world that are focusing on robotics still, and they have to bring together so many different parts of this, of the stack. So they're [00:43:00] starting to use LLMs in order to guide. The progress of the robot in order to make it do things that are human controllable. That's say, can, as an example sorry, the paper called Say Can, as an example. And they're also starting to do a lot more immense amounts of simulation. And using all that data and figuring out how to do that in a proper way. So we're gonna start seeing papers much more many more papers come out with high amount simulation and then doing a little bit of symptomal using the fact that these LLMs have so much understanding of the reel. . And that's really cool. That's happening as well. So I think that those are areas that, in terms of the multi-agent l I think you're gonna start seeing it seep into robotics more than you have because some of the other problems that they've tackled, their goal, sorry, that they've been focusing on will be easier to address. Given the lms. It does actually John McDonnell: Remind me of how, So I'm so old that so I had the opportunity to take Yann LeCun's class in like 2010. And I remember thinking like, [00:44:00] oh, like I know neural nets aren't cool. And he's just so obsessed with neural nets. I'm just gonna go take like a class that's like doing Pac learnability with SBS and stuff. And that's what I did. And huge regrets. It was obviously in retrospect to dumb decision . But, young really had to fight through and Ben Gio, these people who were, who kept working on neural nets, like they, they like like the field really abandoned that direction and like other stuff got trendier and they had to just say I'm just gonna work on this anyway. And at the end they were right. I do of wonder if a lot of this kind of multi-agent stuff or l like the trends going away, but like you can see what the potential is and some of the people who just really stick with it might end up being. Cinjon Resnick: That's causality today. Yeah. Oh my gosh. I think of causality as being that today that everyone has look, Facebook, just, Facebook just dropped their causality teams recently. It was part of the firing. There's, if you wanna have something that will push these things to the next level, it's having causal understanding. But we don't have real good ideas of how to bring causal [00:45:00] understanding into donuts. A lot of work on it. Cindy, you Bryan Davis: Take a moment. Uh, Can you define what causality is as a field of Cinjon Resnick: research? Oh, . Just ask me to define causality. What causality is a field? So causality is a field is trying. So I'll start by saying that there are really good conferences for causality. There's also, there's also a part of it, which is in fa fairness conference, it collides somewhat with that direction. And also I'll put a quick pitch here for Jonas Peter's work. He's amazing. Professor Zurich, who's been doing this stuff for a while. Christina Heinze-Deml Martin Arjovsky David Lopez-Paz these are really good researchers in AI. What you're looking for here is the ability to have some sort of the model to give the model a causal understanding of what it's doing. And there's some toy examples here you could throw out. One of 'em is if you have a data set that has it has a really weird distortion around, say women mostly have blonde hair, men mostly have brunette hair. [00:46:00] And then in your test set, it's flipped around. The models will tend to do a correlation there. And if they see something with blonde hair, they'll predict it's a male. Whereas what you really want is for it to have a causal understanding, or sorry, to have an understanding that is hair does not predict gender, or sorry, hair does not predict sex. And that ends up being, you could wait, think about this, is that there is, the causal link is broken there. So in terms of the graphical model, it would look different if that was predictable, if, sorry if sex was predictable by by hair color. So that's the toy problem that people oftentimes use for this. And you can even more toy just by using some Gaussian models to, and then making predictions about that. And we just really don't have good ways to scale this up, that exact toy problem. I can point to a solution. Martin's got a great one in his thesis of how to solve that one. But in terms of scaling it up to the full data sets, [00:47:00] real data, et cetera, et cetera, doesn't work. and going from A to B on this is really important if you want these things to actually have some sort of core understanding of what they're doing. So I think that there's this general hope right now in the field that when you go from 200 billion parameters to 200, maybe 2 trillion parameters, that it just solves it. It just happens, but there's no scientific reason to think that it's true. Interesting. And so I think a lot of this has been forgotten because the work is just, it's just working so right now. But and, sorry, but I forgotten. I just mean it's been put to the side. Bryan Davis: Yep. So what we're talking about here is perhaps an embedded notion of how the world works, perhaps an idea of internalized physics or understandings of the kind of structure of the environment in which it's being trained. rather than just [00:48:00] correlations about entities, which, to be fair, LLMs really seem amazing, but they are at their core really just predictions of next token. Cinjon Resnick: Yes. And so what you're pointing at is something a little different as well. What you're pointing at is having this embedded world model and being able to condition on some world model that's different than having a learn from data, causal understanding. There's different things we can point to and say, which is better, which is worse. The oral world will oftentimes say that actually what you want is these world models. That's not always true too. You're talking about model free versus model based here, but in terms of cause understanding, I think what you're saying is a great next step if we can get to the point where we can use these world models in a reliable way. A plus involving physics into things a plus. Awesome. But what we ultimately want is for it to be able to causally learn from data. And so when you go about the world as the human you can learn that this mirror sits upon this desk. If the desk moves outta the way the mirror will fall. It's not [00:49:00] clear at all that we bring into that experience any sense of this world model of physics. Instead, we have some just causal understanding that this desk is upholding this mirror. Is John McDonnell: it even conceptually, like this is actually, this seems like actually philosophically difficult, right? I, was it Hume who's who had that thing about how causality just can't be inferred from data? Cause you, you're really reading into your data. You're saying like, oh like I've seen this correlational structure before, but , really there's these kind of like rules underlying that and okay like the, and so some things that I see are because of the rules and some things that I see are due to rems in the environment, and I'm gonna go through and instead of decide like which things are which, and infer this like rule set, which I'm then gonna believe, but even my rule set might have problems. So I also have to have uncertainty by my rule set , right? Is that even conceptual? Like this almost seems like a philosophical Cinjon Resnick: problem. Like Bryan Davis: I guess to some degree it's also empirical because I do believe there's evidence that some sort of core understanding of physics is baked into our baked into the model, like pre-baked [00:50:00] into the model. So it's not being derived actively from interaction with the world. So it's very interesting to me to think about what elements of this are hardwired, in the circuitry and what elements of this are learned through interacting and giving that feedback from the world. Cinjon Resnick: . Yeah. These are the questions. And in respect to the philosophical thing is, another one you could ask is do we even need it to be causal or can you just have strong enough correlative things that actually ends up just being fine? Yeah. And there's no real issue. We don't have the answers to this. Yeah. And because we also don't have the answers to what humans are doing with this. Yeah. I have a suspicion that if you want to get to a place where you can have reliable answers come out of a model as to what it knows, what it doesn't know. You want it to have some underlying facility for doing this. Yeah. And perhaps that facility is not purposefully done. It's not built into as a prior in the model. Perhaps it's just emerges, but you need to be confident that it exists and we're It doesn't exist today. Yeah. [00:51:00] But the research into how it could exist, that's what I mean by the field of causality. Yes. That's a really interesting question. Bryan Davis: Perhaps we could wrap up with a question about a recommendation for listeners, something that you've read, or are reading or have been watching, or a game that you're interested in sharing. What would you, what's a takeaway from our conversation that you think you'd recommend to somebody? Cinjon Resnick: I think there's cool book conditions clearly things like Diamond Age Enders Game, which we discussed earlier. Another one along those lines, which I was reminded about from a friend recently is Russo's Take on education. Emile. Those are some pretty clear ones if you wanna think about this direction. I. Respect the causality stuff. Martin Arjovsky's thesis is fantastic. Very approachable. Yes, there's a lot of math in it, but if you want to ignore the maths, very approachable regardless. And [00:52:00] those John McDonnell: are great. Yeah. And I also just wanted to ask what's next for you? What do you want to build research next Cinjon Resnick: Yeah. It's it's fun. I'm right now figuring this out exactly where I'm gonna commit to, but I'm spending a lot of time thinking about childhood companion and how to use the modern tooling to really make something that can grow with a kit. If you can get, let's just imagine you have a child when they're five or six, you get them to love an experience, an interactive companion, and you grow with them over time. I, I think this will properly ride the wave of research such that the clean ones around understanding more memory, understanding more empathy. I think that what you would have from this is the ability to form a lifelong companion. If the kid form a lifelong companion is something that can really help them a lot. And and all the tooling is on. Its very cool. John McDonnell: I really love that idea. I feel like I want a lifelong companion.[00:53:00] Bryan Davis: Thanks so for being part of Pioneer Park. Cinjon Resnick: Pleasure. Thanks. Appreciate, John McDonnell: thanks so much. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit pioneerpark.substack.com [https://pioneerpark.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

23. helmi 2023 - 53 min

The frontiers of clinical AI with Vivek Natarajan

Check out our interview with Vivek Natarajan, a member of South Park Commons and coauthor of the recent paper on Med-PaLM, an adaption of large language models to the medical domain. Topics: * From India to UT Austin to FAIR to Google * Integration of AI into products * Organizing research orgs in large companies * Applications of AI to medicine * Med-PaLM and the limitations of LLMs * Risks and rewards of AI driven products Links: Med-PaLM paper: https://arxiv.org/abs/2212.13138 Follow Vivek on Twitter here: https://twitter.com/vivnat Follow your hosts:John: https://twitter.com/johnvmcdonnell [https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbERVM0c3RjBLXzNpa0txY20wNHZSTEp0VFNTQXxBQ3Jtc0ttTVM5VkV3ZmFwUzgxd1prODZ1NWhZbW4yY1FUb3JJdmVxX1JQUFJBVWNLRXNRYUVuQ2haY0xTanlrV0FSNXNRbE8wdjB3YUtkUVlucUFmUERWazJmbGRjX0RCWHllZWQ0Xzh0RnhSdHNHZXJGWTdLTQ&q=https%3A%2F%2Ftwitter.com%2Fjohnvmcdonnell&v=aGoKNj0vcD8]Bryan: https://twitter.com/GilbertGravis [https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbDlHdzZNZFkyU3NqSzJkUTBka3Baa3F3RTVuUXxBQ3Jtc0trLVJuWkk3SHBvWVVuOVFCRUdlSXpQc2U4bzlqVlBuR1JwOXNvZWZ1U0V2RnIwcWFrbU00bkZMaE1zcUtWODc1b3VaS0pJZ1ZUa1FsTDJvZ05QbkxKT2RSTWRZTHJ6MXNORmprVTVwUmdTYmdKRFVGRQ&q=https%3A%2F%2Ftwitter.com%2FGilbertGravis&v=aGoKNj0vcD8] Interview Transcript [00:00:00] hi, I'm Bryan and I'm John. And we are hosting the Pioneer Park Podcast where we bring you in-depth conversations with some of the most innovative and forward-thinking creators, technologists, and intellectuals. We're here to share our passion for exploring the cutting edge of creativity and technology. And we're excited to bring you along on the journey. Tune in for thought-provoking conversations with some of the brightest minds of Silicon Valley and beyond. Bryan: Welcome to today's episode. I'm Bryan, and this is John. And today we're joined by Vivek, an AI researcher at Google who has been working on translating AI and adapting AI to usage for clinical medicine. He's the co-author on a recent a paper from Google about Med Palm, which is an adaption of the Palm model from Google to the domain of medicine. We're looking forward to talking to Vivek about all the ways that these models are powerful and useful in select domains like medicine and also their limitations. So we're looking forward to [00:01:00] talking about Spoonerisms, confabulation and hallucination and how all of these words apply for the purposes of AI. Vivek: Vivek Welcome. Hi Bryan. Hi, John. Excited to be here. And yeah, talk all things AI and medicine. Bryan: Cool. Yeah. Welcome. So, Vivek, just to sort of ground your background you did your undergrad in India, then you went to UT Austin, and then you came out to the Bay Area after finishing your master's degree. Is that correct? Vivek: Yeah, that's right. Bryan: Cool. I think we may have overlapped in Austin. I lived there for a number of years. I miss Austin a lot of the time. But I curious to hear about your own sort of migration gradually as you made your way west to California. Vivek: Yeah. I think Austin's a beautiful little city and I think Bryan you wouldn't disagree with me if I say that. I think the school UT Austin adds to the charm as well. And for me it was like coming from India, which is a warm weather place moving straight to Texas and Austin, which was equally warm, was good. And yeah I enjoyed the scenery over there. It was a very welcoming environment I would say for graduate students. And I [00:02:00] was also transitioning my major from electrical and electronics, more hardware after, more computer science than AI. And UT at that time felt like a very good place to be in. Had a number of good professors who were doing some amazing research and natural language processing, computer vision, graphical models and robotics as well. So yeah, I really enjoyed my time over there. Bryan: Awesome. And you found yourself now working at the absolute frontier, I think, of artificial intelligence at Google. Tell us a little bit about how your experience, how did you find your, the application of interest or this domain of medicine? What kind of drew you to it? Vivek: Yeah, it's a funny story because even before deep learning, like when I was doing my undergrad back in 20 20, 20 11, in the final year of my undergrad, we were asked to do like thesis projects or pitch ideas, and the idea that I pitched together with my team was actually an AI doctor. And at that point of time, the planning wasn't a common term, it wasn't invented. So my presentation decks all had, support vector machines and all those kind of things. But I still believed in the potential of the technology because it was very clear that if it did not have tech [00:03:00] and AI scaling of medicine, we are not going to be able to scale world class healthcare to everyone. It was quite obvious to me even back then, and especially coming from a place like India where, the medical facilities aren't the greatest. It's, it's getting there. There's been massive improvements in over the last decade, but reaching, the remote villages and towns is still a huge challenge, I would say. And it felt very natural to me that tech and AI would be the place to be. And so at the back of my mind, I think that was always a place that I wanted to work in. And obviously I had a huge interest in machine learning and AI back then. I remember back in undergrad we didn't have the best of internet connections, so whatever bandwidth I could secure, I would download these courses from the KelTech professor Yaser Abu Mustafa and learn about machine learning. And it wasn't taught in our curriculum back then. So it was all on the sidelines, but that grew, that drew me into the field. And so when I came for my masters, it was, I wanted to take as many machine learning courses as possible. And when I joined the industry fulltime again, I wanted to do machine learning. And I got really fortunate that as soon as I came out of grad [00:04:00] school and went over to Facebook, it was when Facebook AI research was started and I got this incredible opportunity to work. At the intersection of research and product. So I had this nice role where I could take the latest and greatest models from fair and put it into production. I learned a ton over there. So it involved like learning all these machine learning frameworks at that point of time, torch and Cafe. Not easy to use by any means. But it was fun and, getting them into, products with like millions of users. That was incredible learning. And like when you work at that low level, you learn all the details of these models, both at training time, both at influence time, and, learning about optimizing. And so that was, it was lucky for me in the sense that I got this opportunity as someone who was a relative. Nobody just had taken a few courses. I did not have a PhD, but that just set me up back in 20 14, 20 15. And since then yeah, I would say just have been very fortunate to work at the frontier of AI and deep learning since then. Bryan: That's awesome. John: Yeah. And so you have to work at both Facebook and Google. How are the cultures of those two companies [00:05:00] different and how would you describe them both? Vivek: Yeah, it's a great question. I think the answer that I would give with respect to Facebook is also kind of outdated because when I was at Facebook, it was still like, you know, a few thousand employees fair was just getting started and fair was, maybe you could even think about it as like a research lab or a startup within this big company. And My experience over there is, yeah, it was incredible because there are all these stories about like, you know, mark Zuckerberg having the AI team sit right next to him. Those are all two. We used to sit right where like the exact team was there and we would be observers to all the meetings and everything that was going on over there. And he, at the end of the day, sometimes he would just pop over and ask questions. And maybe, and he was also doing some projects at that point of time. These site projects or like projects were, I think back in 15, his project was like an Ironman kind of thing. A speech recognition system or voice command system that you can, could do like house task for him. And so it was kind of, fun that the systems that he was using at that point of time to build this voice control assistant or whatever, was actually the speech recognition systems that we were building at Fair and [00:06:00] Facebook AI at that point of time. And so, yeah, high visibility, I would say. But at the same time I think Facebook is one of those very interesting companies, at least back when I was there, where it did not feel like a big company. It felt like a startup. The culture of a startup . The leadership did an exceptional job in scaling it. And even until 20 17, 20 18 there weren't like, you know, as many processes or bureaucracy in terms of getting things done. Yeah, the goal was always to like, you know, just shift things. If you have an argument don't waste time quantification, rather just build and ship and show me that things work, and then, yeah, that's it. And so I really enjoyed that culture where it was all about shipping. So I would say that was probably the best part about Facebook. If I were to maybe say, Okay. What wasn't great, it was that sometimes you get so much lost into the weeds and details that maybe you don't zoom out and look at the big picture. You maybe micro optimizing for certain metrics and you just keep on moving fast where you're doing okay. Like, consider all the potential issues that might crop up as you're making advancements on certain [00:07:00] technologies or building certain things. And so that has obviously manifested itself in different ways since then. But yeah I think that environment was really awesome for builders. The period that I was there, and especially for AI researchers as well, were really well taken care of, provided with all the resources. I remember there's one particular interaction between the CTO of Facebook and Ross , who's a very famous computer vision researcher, like the builder of object detection, some of the greatest object detection systems. And it was at one of these gatherings Shep came up and he basically said, I would literally sweep the floor for you to, do whatever you want. And so that was the level of privilege or access to resources that you had as AI researchers had at Facebook, at least when I was there. Bryan: Is there, oh, is there a difference in the way that I guess AI you feel has been integrated into the products at Facebook versus at Google? It sounds like perhaps Google and I realize, that you're still there, so we can't just, dive into the weeds, but I'm curious, just at a high level, do you feel that one has been a little bit more strategic or meticulous [00:08:00] about the choices of when to adopt systems, of course, and Facebook maybe shooting a little bit more from the hip. What would you say maybe is a thematic difference in the way that those companies have integrated AI into their products? Vivek: Yeah, it's super interesting. For a long period of time while I was at Facebook, we did. So the FAIR Computer vision team was probably among the best. It still is the best, but there were many other areas where it was feeling like we were playing catch up to Google. So I was at Facebook when the, when TensorFlow was released, and I remember one of the most amazing, machine learning frameworks, hacker just saying that, oh, this is something I wanted to build. But it would've taken me like another year. And if I were to like ask a fairy or a genie to bring me something and put it in front of my house, this is what I would want. And there were like similar reactions when, for example, the transformer paper came out. We were like all shocked, oh my God, how well does this work? And at that point of time, I don't think people realized how important this transformer paper was going to be. But still it was quite obvious that this was. Going to change things. So for a long period of time at Facebook, it almost felt like [00:09:00] we were like catching up to Google. Like all the inventions were mostly happening there except maybe in computer vision. I think that has maybe changed a little bit now. FAIR has it's own amazing research in a few areas. think some of the work on proteins that has happened is really awesome. And also the work on embodied agents and habitat, the environment. I think that's all really cool. With respect to product integration. Yeah, I think one of the cool things was By there was this explicit goal of moving as fast as possible from research to production. So I remember like one of these conversations where I think it was probably Sumit Genal someone who when say, I want to take the model from here and put that into production in two weeks, and actually went out and executed by, enabled that. So when some of the Mascar CNN models came out I think in three to four weeks, it was in one of the internal build demos. So that was like really, really cool. And so Facebook had, at least on the computer vision side, built up this mechanism or like way to like productionize things really, really fast. And the, and I [00:10:00] think Pieto and Cafe both played a huge role. Pieto more recently. But cafe's role should not be cafe's role should not be underestimated. Then. Who there at along Andrew? They were all like incredible and I. I think Facebook was visionary on that front. TensorFlow is great but I think the life cycle from research to production is probably longer than say with Pito, at least the version of PTO that I'm talking about back in 20 17, 20 18. And with the switch to computer vision, Facebook had that incredible advantage where they could like, you know, immediately shift things to the app straight from whatever, like timing here, or Ross or other people at Fairway cooking up . And I think that has slowly caught on in other views as well with speech, with nlp. But that I thought was incredible. That was visionary. John: Yeah. That's really cool. Cause actually a question I was gonna ask, and maybe that partly answers it, but it's a challenge at these big companies when you have a kind of skunkworks team that's like doing very cool research and then you have like a completely different product team that's okay, I'm in charge of like serving up a great newsfeed or something. And then how do you move the insights and innovation from the skunkworks [00:11:00] to that production side team. how did that actually work? I mean, and so obviously like from a technical perspective, it sounds like PI Torch made it easier. There's still, it's still like it's not, it's easier, but it's not free. And then there's also like kind of a communication challenge cuz it's like, oh well, like what should the PMs be building? And if you're in a research team you might not exactly know what the product really needs and kinda vice versa. People on the product side might not realize like the impact they could have with research. I'm curious about how that's been solved at places you've worked. And I'm also kind of curious about maybe your perspective on how it should work. Vivek: Um, I think it depends on the goals of the research organization and the wide area AI organization as well. Right. So, if the charter of the AI organization is like to actually serve the bottom line of the company and be integrated into products as soon as possible Then Yeah, for sure. I think you need to organizationally be aligned with this goal and build up everything like the communication systems, the infrastructure, everything to ensure that you can rapidly deploy and get feedback and improve the models as much as possible. And I wouldn't say this was uniformly happening throughout at Facebook, but the computer vision team was, I think very unique in that [00:12:00] sense because there were very good relationships between the fair researchers and the applied ML teams and as well as downstream customers. And they were all like, putting together in the same direction in the sense that they all wanted to have the latest and greatest advances in know, shipped in the apps as soon as possible, but in maybe some other areas like, you know, new Street ranking. I think it took a few years, for example, to transition over from some of the logistic regression models or even for ads to like deep neural nets. Obviously they were like huge wins, huge lifts and metrics, but it wasn't like two weeks, it was more like two years. And I think that is both a mixture of How the organization is set up, as well as maybe some of these areas you are a little bit more reticent to try good stuff because there's risk associated with it. Whereas some of the computer vision stuff, for example, were like more playful features or features where if you go wrong, it's okay. Right? Whereas I think if you go wrong with your heads that hurts your bottom line. So you probably don't wanna screw that up and see, we really wanna be medicist. So at the end of the day, I think if you really care about getting your innovation to people as soon as possible, then I think at all levels of your organization, you need to be aligned. And one [00:13:00] thing that really helped was I think that leadership was great. I think they finally also re rehabbed so that Research and the applied ML teams were all like reporting to the same V page around. And so I think that also really helped. So yeah, I think you have to be very intentional about your organization if you want to like move fast and deploy. And I think Facebook got that right for a long period of time. Bryan: And on that topic I'm curious, we're talking about an article where the application is medicine and obviously what's at stake when you're giving people medical advice is it's, you know, equivalent if not far more serious than anything having to do with the company's revenue. And what is the kind of ambitious context that maybe caused the research group at Google to pursue the application of medicine? What do you think was the sort of pie in the sky goal? Vivek: Sure. think Google has always been at the forefront of medical ai. Along with say, the transformer paper and 10 flow. One of the papers that I was really, really inspired by was this computer vision paper from like my current teammates which showed that you could detect diabetic retinopathy from fundus images as well as [00:14:00] like eye care specialists. And so that I would say was a very big personal moment for me as well in the sense that, okay, that really it, it showed me that, with ai we can do some amazing things in medicine. And it goes back to the same story where it is very obvious that healthcare systems worldwide have like different sets of challenges. But one of the key solutions to these challenges is tech and AI in particular. So. In developing countries there is just a shortage of like specialists and care providers and probably the best way for us to be able to like scale world plus healthcare to everyone is through ai. And in places like the UK and the US it's more that we do have providers, but their time is occupied in not providing care, but in everything else around it. And they are experiencing levels of burnout never seen before. And again, AI is the solution to help them have a much better experience in providing care. So yeah at Google one of the great things is like the [00:15:00] investment in medical care has stayed consistent or increased over a period of time. Different efforts have been made. And that is really inspiring for me. And I would even go out and say that probably the most important application of AI is medicine. And in the next decade, we are going to have a transformative impact using AI in medicine. John: I mean, one thing I was curious about is, you know, there's that challenge within a company of like, okay, how do you get the research moved in to like production in medicine it's actually a way more complicated, right? I mean, so you have this diverse group of healthcare providers, you have certain companies that are investing deeply in, in medical ai or researchers that are investing in medical ai. And I'm curious like you've kind of got this front row seat. Like, what does that process look like? Like how does the research turn into clinical. Vivek: It's definitely an interdisciplinary process right. I think you can't just have engineers and, machine learning scientists working on this. So if you look at [00:16:00] our team, we have expertise. We have some of the best clinicions in the world who have worked, and it's not just from the US but also UK and Australia and a bunch of other places. We have people who have expertise with respect to regulation, who have worked in FDA or like equal bodies elsewhere. We have like legal folks and everything. And so you need all those perspectives to come in just because of the nature of the field that we are in. And I think it's a mix of what are the most interesting things that you can solve in this place as well as, okay we have this technology where we have this unique advantage or this superpower. And how do. Make best use of that. And so at the intersection, like the magic, and that's why we kind of focus on, okay, find out what are the most interesting problems that you can solve with this technology that we have access to? And that's how we generally end up like picking the problems to solve or whatever projects that we work on, and generally you are looking for the biggest impact that you can make. And so the kind of diseases that you go after, if you look at it, they're like, you know, diabetes or cancer or neurological diseases, [00:17:00] which probably have the highest footprint across the world. So if you make a dent over there, then the quality impact, the quality of Jesus life is that you can improve by that's significant then. So that's how we end up choosing our projects or the work that we do. John: Yeah, well actually, here's another way to put it. Like for example, if you think about. The work that's already been done in AI with medical applications what are some of the big wins so far and like how did those get into clinical practice? Vivek: It's a great question. I don't think I would be wrong if I say that actually the promise of AI and medicine has not really translated into real world applications. There's been tons of research papers. I think there's 150 x increase in the number of research papers here in the US since 2016. Yeah. In the medical AI field in particular. But if you look at, say, the number of clinical trials that's lagging behind. More recently there have been quite a few FBA approvals, especially in radiology for using AI applications. But I would say with respect to the research and the promise and the hype, the [00:18:00] translation hasn't necessarily been there. The ones maybe that are most prominent so far have been in medical imaging. And I think that's probably due to the paradigm of AI and d planning that we've been using so far to build medical AI systems, which are still based on like, supervised learning, acquiring large amounts of data and computer vision at least. Still you know, GPT three came out was probably the most advanced field of AI and medical imaging had this nice cleaned up... I wouldn't, okay, not necessarily cleaned up by natural image standards, but generally you had data numbering in the millions from different hospitals, probably easy to like homogenize and clean up. And so it was very well suited to the supervised learning paradigm. And so that's why you saw a lot of activity and momentum and applications in the medical imaging slash computer vision phase. And so, I would say that's probably the most advanced. We've seen applications in radiology you know, there are different startups doing like breast cancer detection models. There are and lung cancer detection. And then other modalities like the ophthalmology modality that I talked about, like [00:19:00] diabetic retinopathy, a bunch of other eye diseases that you can predict from kind images. Dermatology. I think there's a lot of startups and who are like building these apps that can diagnosis skin conditions from smartphone images. Yeah. Ultrasound is another important modality that's becoming prominent just because of the cost effective nature of the sensor. And so you can do a lot of interesting things with it. For example recently from our team at Google, we showed that you could predict gestational age from ultrasound and you can do it very accurately. And so this is cool because it's a cheap sensor and it's a cheap model that you can put on the edge and give it to community health workers and you can like empower them so they don't need it to have access to an expert Iens yeah. Overall I would say medical imaging has probably been the one field in AI and medical AI that has probably had the most set up advances with re respect to the research that has been done, the number of papers and also the number of products that maybe are going through or have gone through FDA approval. So that is there. I think EHR is another modality where people have been trying to come up with operative insights from your EHR data. Typically in, hospital like ICU settings. [00:20:00] But one of the challenges is if you work at a typical icu, like we have this recording at Google where we just have something which shows like, what does it feel like to be in an ICU setting? And so they're like thousands of buzzes, right? And like every minute you're bombarded with notifications and everything. It's really, really challenging. And so if you have an AI system you don't want it to add to the noise, rather it should give you a very unique insight. And that I think is still challenging. So I would say the applications on that front, like using EHR or to like, predict test or medications or predict some interesting stuff like sepsis monitoring from records or something like that. I think that hasn't been successful or that successful just because of the nature of the problem and also the workflow. So, It's important that you not only consider how good of a model that you can build, right? But it, I think the key aspect is also to consider the workflow that where the model will sit in. And so you can have a very amazing model but if it is inappropriate in the workflow, then it's [00:21:00] not gonna be helpful at all. And I think that's the real challenge. Like, if the research is done without you know, accounting for perspectives of doctors or people who are actually on the ground, then you're gonna miss out on this insight. A lot of research that has been done today has probably missed out on this insight. And that includes things like, for example, selecting the operating point of your model. You want to ensure that you send in the right amount of alerts or notifications or do the right amount of recalls because anything less or more, you're adding more burden to the system rather than actually helping out. Bryan: You know, we think about the application of models and one thing that's a bit of takeaway for me is we often need a fairly risk tolerant or a fault tolerance setting because we need to be able to, you know, ascertain when the models are making mistakes and we need to be able to offer points of intervention and confirmation from professionals who are practicing. I'm curious if we're thinking about the balance of kind of opportunity to improve people's health versus risk of making wrong decisions, we often specifically for medicine have a very conservative threshold [00:22:00] and a conservative approach to this, where we are very, very risk averse and not very opportunity seeking. And that might make sense in an environment like the United States where these, as you mentioned before, the established system functions more or less in that people are able to get healthcare. And that's probably true in much of the industrialized world. But I'm curious if you think that in countries where the medical infrastructure is less established, if there's a benefit to being more opportunity seeking, even if that does potentially raise risks or the risk of making mistakes is sometimes higher. Vivek: Yeah. Great question. It's a hard one as well, right? I am all for more medical ai but you want to be responsible that it's researchers. And so if for example, you built a model only using, data from Western institutions and you're gonna put that in, say a place like Africa or India, it's pretty obvious it's not going to work. And that's I responsible on your part. So if you've done the legwork where you've actually built a model like used, [00:23:00] sourced diverse training data and actually validated in the appropriate settings, and you've seen that it works, then yeah, for sure. We should, I think maybe dial down our risk tolerance a little bit more and be more proactive in terms of deploying these technologies. Yeah, with every opportunity comes responsibility. And there's no free pass. I think you still have to do a good job at validating, but I'm with you. I think there is I would say like there, there's a lot more opportunity maybe beyond the US or places with more established healthcare systems in terms of deploying these systems ahead of time and getting feedback data. And it's probably possible that you might actually see these countries adopt AI faster and make actually, and have a leapfrog in how the care is delivered in these countries. It's the same with for example the financial infrastructure, right? So 10 to 15 years back, I would say China and India were like lagging behind the US and credit cards were dominating in the us but now I feel like US is further behind. I haven't been to China, but have heard stories and in India We don't have credit cards, but it's all digital. And the ease of [00:24:00] doing transactions with micro transactions and micro transactions is order of magnitudes higher than in the us. And so it might, this is an opportunity for these countries as well, I feel like by adopting AI to have a improvement in the healthcare systems. And maybe they go even above what's available in Western countries. And I can totally see that happening. Bryan: I'm realizing that we've gotten this far in track conversation without describing what MedPaLM actually does. So maybe for listeners out there, we should what exactly is MedPalm doing? How do you interact with it? How does a researcher interact with it, given that it's not open to the public? Vivek: So I will start off by giving the motivation for this work. Obviously large language models have been the rich in the wide area community. And medical AI and particular tool data. If you look at a lot of the models that have been developed, those are all like narrow single task supervised systems. But on the other hand, medicine is an inherently humane endeavor where language is at the center facilitating interactions between patients, between clinicians, between researchers. [00:25:00] And generally if you ask like clinicians or patients who interact with medical AI in different settings, one of the chief complaints or concerns would be, oh, I wanted to better understand the model one to more interact with it. But all this model gives me is a prediction with a property and I don't understand why that model is giving me this prediction. Right? So that needs to be solved if you want broader uptake of medical ai and that can be solved through language-based interactions, and that's what language models helps us to do. So that was one of the, I would say the chief motivation of this work along with the fact that obviously there is a school technology and we have access to these models. And if you look at the work in general, we have considered a broad variety of medical question answering tasks. These include like medical exam tasks medical research questions, and also consumer medical question answering systems. And we wanted to benchmark and see, okay, how effective are these models and these different potential end user applications? And so the target user could be a medical student, could be a medical researcher, could be actually a consumer who has a medical information need.[00:26:00] So that's where we started. That was a motivation. And we had access to this model called Palm, which is amazing. It's not open sourced. I don't think it's going to get open sourced. But Yeah, the paper is a way for us to like communicate and get feedback as to what we are doing. And I would say the results that we report with respect to like performance and certain data sets that is not maybe as important as say the evaluation benchmark that we are setting up, or the different axis that we propose to evaluate the answers. And I think this is an iterative process which involves multiple rounds of dialogue between not just AI researchers, but also like clinicians social scientists, ethicists, because I think medicine and even patients and because ultimately at the end of the day, I think you require participation from all these folks if you want to really advance and accelerate the adoption of these technologies and models and medicine. And if you even leave out one community then that's going to come back and bite us out. And so we wanna ensure that, ok, this paper is not meant, you have this fancy model that's more like, you know, we have this model and this, these models are going to come now let's build them out in the right way so that it's applicable to all. John: [00:27:00] I find it even interesting kind of the approach of training a model on medical data as opposed to say, like, you give PaLM access to a bunch of information that it would read and use to answer questions. And something that's kind of interesting there, is kind of like distinction between approaches that involve embedding a lot of information in the model parameters versus having the information be external in some way. And how do you feel like that's gonna end up coming together to make systems that are really useful and robust? Vivek: I think it's always going to be a mix of both. It's this classic system. One was a system two thinking of the debate that goes on, right? I don't think we can. Or it's one versus the other, rather it's mix up Both. Large language models are more of the system one kind but I think over the next few months, over the next year, what you're going to see is more like retrieval style models, which are going to allow you to do more system two style thinking and inference. I think with these models, obviously when you are training on internet corpus internet text, there's obviously gonna be medical content in there in different flavors. Some of them may be accurate, some of them may not be. But the model has seen this. So that's [00:28:00] good because outta the box we do see that these models can answer, but they do understand medical terminology. So if you ask like a model Okay, can you explain this condition? Yeah, it does a decent response but the challenge is really medicine is an evolving field. And so there's always new research being published, a new guidelines being published, and see we want to feed in that context information into the model, and then teach the model how to use that context information or additional information and integrate that with what it already knows which is included in the parameters of the model, and then come up with the appropriate responses. And so, yeah, it's gonna be mixed up with that. I don't think it's one versus the other. John: Yeah, I'm just wondering if there's like a way of thinking about it. Like, is it like kind of like, one way you think about it could be like, oh, like should I think about it like the vocabulary? like, okay, I really need the model to have the right vocabulary and I can't just like teach it vocabulary in context very well. And so that's what I'm getting out of tuning it on the domain or is there more to it? Is there like different types of reasoning that happen in a medical domain? I mean, I know people have had this theory that like, oh, maybe chain of thought comes from code and so like training your model on code is important for that. You know, I'm just kinda curious like [00:29:00] what sorts of things you feel like the model's really getting from the fine tuning that you wouldn't be able to do from say, context? Vivek: So I wanna clarify that the amount of fine tuning that we do with the model over here, the MET model is actually not that big. We're using on the order of a few hundred examples and we are doing prompt tuning. So it's not even the end-to-end model that's fine tuned. It's just these additional soft prompt parameters that we learn and. Our hope was that doing this would help condition the model. The one of the assumptions that we have is a lot of the medical data is already encoded within the parameters of the model. But then at test time, we want to do two things. One is actually point the model to use that information. So this is like looking for a needle in the haystack. And so the model knows about science, it knows about, you know, random stuff on the internet, but you want to condition the model into that, for these set of questions. Use your medical information, user clinical knowledge. Yeah. Right. And so that's one of the things that this helps with. And the second thing is in the medical domain, there is a very unique way of [00:30:00] answering things. There's a very unique way of reasoning about things, and we also want to encode that information as much as possible in these soft, prompt vectors. And I don't think we've. Yeah, it's possible that there's a limit to what you can achieve with these soft pro factors, because at the end of the day, it's still like a few hundred token millions of parameters. But at least the impression that I get is you can get the model to understand the stylist technique of the domain. So if you look at the responses of the model generates, it's not overconfident rather it's more subdued. It clearly says this is what I know. Anything beyond this, you should probably go and seek specialist care. And it also learns to trim down the length of its responses because it knows that anything extra, which it's uncertain about it could be incorrect and that could have downstream consequences. So those are the kind of stylistic natures of the domain that you can also encode. I think that's what is probably happening more, it's not knowledge that's encoded in, it's more like conditioning to work well within the domain. Bryan: I'm curious are there any techniques that you are really excited about [00:31:00] for grounding things in sort of external information? So for instance, teaching these things to basically not rely on their system one thinking, but to kind of know, oh, I do need to go fetch this. I do need to go look this up and verify that. Are there any approaches to that that you are particularly excited about or you think are gonna make progress on these problems? Vivek: Sure. I think there's been a class of models which point to this direction, web GPT, retro and a few others. And being demos from like a few startups, Neva and publicity as well, which are going towards getting, like using search and using that as additional context to answer questions and then also citing and attributing the sources. And so, there are a few different approaches, but I think overall they're kind of all the same, retrieve the right information, feed that into the model, and let that model integrate that information with whatever it's already encoded in the parameters already, right? I think the cool part is it seems to me that teaching this kind of behavior to these LLMs is probably quite data efficient. You don't require a lot of examples. It [00:32:00] seems like even with like maybe a few hundred or a few thousand examples, you can tease the model to learn this generalized behavior. So that seems pretty cool. And so this goes down into tool user, right? And search and retrieval is one of the tools that's in the in the models repertoire. But you can also imagine this being generalized to say any expert in the loop. And that expert could be a human in the loop or it could be another ai or it could be anything else. It could be a calculator, for example. For me, the most exciting part is it feels like this kind of behavior is learnable without a lot of examples and it also generalizes, but I think, yeah, we need some research papers or maybe someone at Google or Open, yeah, I'll publish this. But I feel like that's one of the cool things that's coming up right now. John: When you're choosing your instruction examples, are there domains where you expect this to be deployed that you're disproportionately representing or choosing from? So it's like we don't literally need the model to take the mcat, right? I mean, it's impressive if it's good at the mcat, but we kind of wanna potentially deploy this in a clinical setting. I mean, are you imagining like doctor, is like wanting to double check their [00:33:00] understanding of a certain condition and so they're they're going to potentially ask the model " I was wondering if this medication interacts with that medication or like, I'm doing a diagnosis here, but I have a kind of strange combination of symptoms. Like, what do you think? Like, what specifically are you imagining are our clinical application dialogues? Vivek: Yeah. I think there are fair few intended potential applications over here. Probably the ones that we'll see the earliest are more like educational aids to like researchers and students and trainees. I think we are already seeing evidence of charge being used for like educational purposes. And I've actually learned a few topics just by interacting with it. And you can imagine this happening in the medical domain quite a lot especially with a model that's specialized to that. So I feel like those sort of use cases where, which are non-diagnostic and that means also not safety critical, are going to be the first that we'll see and probably that'll happen in a few months. The second set of application is around like aiding researchers and scientists with respect to information retrieval and citations and similar stuff. I think that that could also be incredibly [00:34:00] powerful. Like if I am writing a paper and if a model could retrieve like 10 more related papers with respect to this paragraph, that'll make my job really easy. Because right now, like I think with every time before a paper deadline, the match scramble is to get your references right and it takes a few hours. I think a model that can do that and summarize it will be amazing. I think that there'll be a game changer for researchers and not just medical researchers, but like all kinds of researchers and scientists. I think the final set of applications you would see are more in clinical settings. And again, I think this is gonna be different. They're gonna be certain applications and clinical workflows, which might maybe involve like extracting information from notes or different documents in clinical settings and summarizing them either to patients or maybe to the doctor centers or people who are working in those settings and like just giving them a very simple intuitive interface. Two, the data under the hood and not like the clunky systems that they have right now. Again, that I feel could happen to your timeline. You could say a lot of different applications. I think we are already seeing a lot of interest from healthcare companies in using these models to do such things. And also there's a lot of [00:35:00] documentation that clinicians generate that they're all like fairly templatized all non-diagnostic. But those can also be automated and have an LM generated summary or like a prescription or like a medication authorization letter or referral letter. So again, those sort of applications is completely non diagnostic totally possible that those things happen within, again, a two year timeframe, if not less. Diagnostic is further down the line. And I think first set of applications we would see would be where there is a, like a human in the loop of clinician in the loop. And it would more be like an information aid system for them where they like have a chat interface to like a database. Similar to Google search, but like a more interactive conversational system where they can ask about interactions of medicine or like an interface to EHR records. And I think those are the kind of applications we would see first. And I think ultimately down the line we'll have more diagnostic systems where it's going to be like, an AI and maybe a clinician or an AI alone coming up with diagnosis based on all the context of information. But that I feel is further down the line. And we have all this research. My hope is this all gets translated very soon, but I feel like that's probably a few years [00:36:00] just cause of the number of challenges that we have solve before we get there. Bryan: Back in 2020. I'm pulling back from your tweet history here. You stated in an opinion, you shared an opinion that some of these applications, the publicly accessible LLMs might be doing more harm than good. And I think they probably, at the time you had in mind the potential to be a source of misinformation, be a source of deep fakes, all that sort of stuff. I'm curious if you're thinking about these things and obviously since then ChatGPT has completely exploded, there's a brand new generation of interest in the applications of these models. Are there Do you still feel like the, that sort of sense of caution? I mean obviously Google has been very cautious about releasing any of its LLMs to the public. There's obviously a very storied history of Microsoft and Facebook having to kind of take models offline because of how quickly they become negative. What do you think about the potential of these things to be open in the public as the APIs? Vivek: It's interesting and I'm glad you pulled that out. I would say I was a bit naive with that I was assuming the [00:37:00] worst and maybe that hasn't necessarily happened. Stable diffusion is a very good example of that a model that's out there openly and people are using it mostly for creative applications. I haven't heard, horror stories or anything about that. And so that does point to a future when maybe these models can be open. And honestly, I would love for these models to be open and democratized, but it's. It would be nice to assume everything is good and everyone has good inventions and just answers because that's not true. And so it's very important that you consider what can go wrong, and different organizations have different levels of risk tolerance. And maybe if you're a startup, you don't worry about that so much because you're not gonna be a legal target. But if you're a big tech company, obviously you have to worry about it a lot. So yeah, I've been very pleasantly surprised by how stable diffusion has gone and how GPT three has also been put to use. But maybe that also has got to do with the fact that these models, yeah, I mean, it's all over your Twitter feed, my Twitter feed, but that's a very small fraction of the people who, interact with the tech or or on the internet, it's still [00:38:00] probably like 0.1% or even less than that. So it's not a mass adoption or a mass feature just yet. Yeah, it's hard for me to know to predict how exactly someone in India or like in some other part of the world who is maybe five years behind what we are how would they use these technologies, and it's very likely that people are all going to put this to like amazing use cases. And I hope that is the case but we need to also at the same time be aware of what can go wrong and build tools and systems to ensure that that happens as little as possible so that we can be more open and democratic about these systems because these are amazing. There are more people we can get them like these into the hands of. That's amazing. Actually just maybe one final point over here. I actually don't know how things are going to evolve because it also feels There are these two competing forces. One is this open AI model, or maybe you can even say, Google's model is where it's the model center sitting in some server or some in some cloud somewhere. And then if you look at Apple, on the other hand they're trying to put like stable diffusion kind of models on the phone. And so those seem to be two competing trends with respect to how these models are going to [00:39:00] evolve. LLMs are a different beast. I think stable effusion, I could, didn't expect that model could be compress and put on the phones so quickly, but that did happen. But LLM I think would be a little bit more tricky to do that. And so it may also be like, which technology wins out, because if you can have like a personal component of element sitting on your phone, then that's really cool and that's another way of democratizing this technology and having access to more people. And that might end up happening ultimately. John: Something that's interesting about Google's deployment strategy is, they've been very public actually about what they're doing. So they have these papers they have not released parameters, which is pretty understandable for most models. With a couple exceptions, like I'm kinda excited about flaunt five, for example. It's cool that they released this parameters. They haven't really released like a product in the way that say OpenAI has. How is Google hoping to have a big impact in the world with the approach they have of not really releasing models either for inference or the parameters? Vivek: I think it's hard to predict how things would [00:40:00] evolve. But if you look at it, open ai, I was also not released any of their model parameters. It's an api. I would say it's very hard to predict. I think what big tech companies in general have is distribution. Yeah. And so perhaps what they're all gonna be looking at is how do we integrate it into our existing sphere of products and, just make them more delightful and more magical for people to use. And that might mean a different strategy for meta or Microsoft or a Google, because they all own different kinds of surfaces, different kinds of products. Cloud is a different question over here. I think people would be hoping that these models stay more centralized and you have a lot more cloud customers, and that's probably a very natural evolution of cloud. But I don't know if that will necessarily play out just looking at how stable, de efficient has evolved. I think what we need to watch out for is how quickly is there a Chat GPT eqp open source? And if that comes out very soon before say A G P T four comes out, then I think the trends are kind of obvious. But that might also, what that might also trigger is maybe [00:41:00] open AI would want to talk even less about its research and be more secretive. And that's not great for all, and that might further slow down the open source application, but, open source is an amazing thing. I mean, this will be people working from, "hey, we're just coming together and creating." I think that it's, so, it's hard to predict how open source dynamics and things really well, but I think that's the one thing that I will watch out for, like, how quickly do we get a chat GPT equivalent. And if that comes out rather soon performing as good as say whatever we have right now, then I think that changes the calculus for everyone. I think. So people are just like, at this point of time, still not sure, and it's mostly a wait and watch game for everyone, not just Google, but for all. Bryan: So you've been able to play with Google's internal tools and you've also obviously have played with ChatGPT. I just wanna know, just from your subjective opinion, which one's cooler, but also after you answer that question, I wanna know what brought you to SPC? How did you find yourself becoming a member of SPC? Vivek: I think chat-GPT is awesome. For me it was one of the most magical experiences that I've had with ai. So I was working on [00:42:00] conversational AI five years back. And one of the projects that I was tasked with at that point of time was to build a system that can help you set an alarm. And what that entailed was me writing out thousands of rules and thousands of different ways in which someone can say, set an alarm. And one night I just said, I don't wanna work in conversation AI anymore because it's not gonna scale. Yeah. And so if at that point of time you had told me that, in five years we're gonna have the kind system, I would've said, you're kidding me. And so for me, Just thinking where we were as a field. And I l P was far behind computer vision at that point of time to where we are right now. I think this is one of the most incredible advances that I've ever seen. I can't really compare with Google Systems, but I can just say that is incredible and I hope to see more and more of these systems. And with respect to SPC, no, it's just an incredible community. I felt. I've always wanted to be a part of s spc. I knew people who are at Open AI or Deep Mind. I got a lot of the people who were very early into deep planning and AI back in 20 16, 20 17. I know that they had, they have SPC sub connections. [00:43:00] So the community have always found it interesting and exciting. And so that was one of the motivations just to meet more interesting people and share knowledge, learn about what people are doing and also be exposed to opportunities. I've had this like pretty incredible opportunity to work with a few non-profits. One of them that come to mind is Rocket Learning in India, which is trying to scale education, primary care education to school children. And through SPC I, I got connected to them and I've been advising them on some of their like using AI in their product stack. And we've been using, trying to use that for grading assignments, but we want to do more personalized content generation and curriculum generation and so on and so forth. And again, just similar to medicine, I think AI is going to have a huge impact on education maybe even sooner. So those sort of opportunities where you have this domain expertise that you have built in, if you can share it more freely, like with people who are trying to do some incredible things in the world. I think that's one of the unique value props that s SPC has, where people are trying to do amazing things and you can, tag along with the journey. And it could [00:44:00] be directly as a co-founder or it could also be like, more indirectly where you're an advisor. And sometimes all you need is maybe just a few hours where you just say, oh, you know, take this model and do this thing. And I've had a few of those interactions as well where people have come back and said, oh, you saved days or months for me. And so those sort of things. And it's the same reverse as well where we call SPC. This is if you're going minus one to zero, right? And you looking for new ideas. And for me, beyond medicine and ai, I am very interested in biotechnology. And I actually think that the next few decades that is the decade of bio and ai, and that a few people have said this before: the amount of biological data that we are generating and how for example, our sequencing technology has progressed, it's progressing faster. For example I think like all levels of the stack single cell data to, clinical data genomics data. So this incredibly rich amount of data that's being generated and biology is messy enough that you can't have like hard rules like math or physics. The perfect description language for that is ai. [00:45:00] And for me, SPC felt like a very natural place to engage and learn more about this field. And I've been fortunate enough to meet like a few people who have product expertise in bio biotech. And so that's been amazing as well. We are putting together a tech bio forum now. It's coming up later this month where we are going to host like a series of talks from researchers, founders, venture capitalists in the biotech space. And the hope is like SPC becomes uh, also the go-to place for people who are interested about biotech as much as say about AI and crypto. And if that happens, I would be really delighted. I think that would make my time at SPC really worthwhile. And hopefully I think there's connections and networks the day I go down the entrepreneurial path, I don't have to look too far to find a co-founder. Bryan: We're glad you're here, Vivek. Thank you so much for being part of our show today, and thank you so much for staying and answering some great questions. We'd like to finish things up with asking you for a recommendation on something you're either reading, watching, or listening to. What is one recommendation you'd give people that you are listening to [00:46:00] you now? Vivek: Actually the book that I'm reading right now is a neuroscience textbook, so maybe I'll stay away from recommending that to I think our last one. Bryan: Yeah, I think the last interview may have also recommended a textbook. Vivek: Principles of Neuro Design. That's maybe it's blur, but Yeah. Yeah, I'm super interested in neuro neuroscience and trying to get inspiration. building more low power AI systems because while I work at a place which promotes large models, I like just looking at how the human body is engineered how low power it is how efficient it is. I think we can do better. So just trying to get more inspiration. So yeah, but I don't know if that's for general audience though. Bryan: That's fine. I think this is an audience of a lot of nerds, so it'll fall in familiar ears. John: Yeah. Sounds cool to me. , Vivek: It's good to know. Maybe we should do a reading group session for this one. I dunno. John: Yeah, you should make a s SPC forum about neuroscience. Vivek: That'd be amazing. This is why SPC is awesome. Yeah. Yeah. Bryan: Well, thanks so much for being part of a part of Pioneer [00:47:00] Park. We're so happy to have spoke with you today. Vivek: Thank you so much. This was great. And as I said the reason for being at SPC is being the opportunity to have these kind of interactions where we can go deep into certain topics or learn more about stuff. And with the peer group and the peer network, I'm just glad that we have SPC and I hope like more smart people decide to come and join us over here. Thanks so much feedback. Thanks John. Thanks Bryan. Take care. See you around. Bye. Bye. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit pioneerpark.substack.com [https://pioneerpark.substack.com?utm_medium=podcast&utm_campaign=CTA_1]

8. helmi 2023 - 47 min