The reasons to open-source and the future of AI bootstrapping with Tiezhen Wang

Descripción

Joining me today is Tiezhen Wang (Tom), formerly of Hugging Face, where he worked with researchers in China, Australia, South Korea, Japan and across APAC, to help make open-source models more discoverable, usable, and visible to the global developer community. In this conversation, Tiezhen explains why Hugging Face became the GitHub for models and why open source is not just a distribution mechanism but a different way of coordinating research. We discuss why Chinese AI labs have leaned so aggressively into open models, how DeepSeek changed the commercial logic of open source, and why Qwen, Kimi, GLM, MiniMax, and others are using openness as a way to win attention, recruit talent, and accelerate the whole ecosystem. His core argument is that China’s open-source AI push has three layers. At the researcher level, open source preserves attribution and career mobility. At the company level, open models can become benchmark-led marketing, developer distribution, and a recruiting advantage. At the ecosystem level, government and university incentives are beginning to cultivate open-source culture among younger engineers. We also discuss why US frontier labs have pulled back from openness as research and business have become more tightly coupled, why distillation is much murkier than the public debate suggests, and how DeepSeek’s releases increasingly function as shared R&D for the broader AI ecosystem. The conversation then turns to monetization: why open-weight labs can still make money through API tokens, base-model access, post-training services, and inference optimization. Finally, he lays out his current thinking on AI bootstrapping: the idea that agents may eventually help improve their own harnesses, generate training data, and even improve the models they rely on. We close on a more philosophical question: if a handful of closed labs control access to frontier capability, open source becomes more than a technical preference. It becomes a check on the concentration of power. Tiezhen/ Tom is based in Sydney, Australia. Feel free to reach out to him on X to chat. [https://x.com/Xianbao_QIAN] To find the previous episodes of Differentiated Understanding, see here. [https://aiproem.substack.com/podcast] Every episode, I bring in a guest with a unique point of view on a critical matter, phenomenon, or business trend—someone who can help us see things differently. Season two will host a series of guests from early-stage investing, as well as builders, researchers, founders, and product managers. For more information on the podcast series, see here. [https://aiproem.substack.com/p/launch-of-differentiated-understanding] Chapters 04:07 The Philosophy of Open Source at Hugging Face 12:51 Challenges and Opportunities in Open Source 17:12 The Role of Collaboration in Research 21:50 The Future of Open Source and AI 33:58 What Constitutes Distillation in AI 37:18 Navigating Copyright and AI Distillation 37:43 The APAC AI Landscape: Insights Beyond China 43:08 Understanding the Ecosystem: Labs vs. Hyperscalers 46:21 Monetizing Open Source AI Models 52:02 The Future of AI: Bootstrapping and Self-Evolution Transcript (AI- generated for reference only) Grace Shao (00:00) Tie Zhen thank you so much for joining us today. I’m really excited to have you on. We’ve been trying to make this happen for a while and just so glad the timing’s finally worked out. To start, can you tell us a bit about yourself, your journey, and where you’re at right now in your career and how you see the whole ecosystem? And also, just help us understand Hugging Face a little bit as well. Tiezhen Wang (00:19) Yeah, thanks, Grace, for inviting me. I know, sorry for the long delay. It has been a while, but I’m recently in transition because I just left Hugging Face. So to give you a quick information about very high-level overview, you can think of Hugging Face as the GitHub for AI. If you are not familiar with GitHub, you can think of Hugging Face as Amazon, where you can find all kinds of models in one store. And we are helping, so my job is to help researchers to get their models, which is the open source models on Hugging Face. And they can use the best, like all the tools, all the services on Hugging Face to make their models more discoverable and available to everyone. We also offer all kinds of technologies. For example, we allow them to create demos so that developers do not need to download the whole models. and they were able to try it out and see how it goes. And we also offer services so you can create your own agent using open source models. We do all kinds of scaffolding on top of open source models. another part of work that we do is to help them get more traction. We use LinkedIn. I use Twitter mostly to help them getting well known by the public. And we write analysis on their models and letting people know what are the new inventions from the model, et cetera. we work with researchers across the world. Like myself, it’s focused on APAC, especially Chinese researchers. Yeah, that’s pretty much the goal. quick overview of what I do. If you have any questions, just let me know. Grace Shao (02:03) And how did you get to this role? Because I understand you were with Google for quite a while as well. Tiezhen Wang (02:07) Yes, I was with Google as an engineer. work on ML frameworks. But then we had a bunch of reorg. And I was assigned to a project which is not open-source. But I really like talking to people in the open source world. It’s kind of very different. So when you are paid to work something versus you want to work on something yourself, Like you have very different mentality and very different feelings. So when I was working on the open source machine learning framework, I talked to people outside Google. And I can see the stars in their eyes. They do want to work on something they want. And even though they may not get paid, et cetera, I really like this feeling. So after I was assigned to the non-open-source project, I want to try something like new but also in open source and I was like talking to people in Hugging Face and I really liked them. At that time, like Hugging Face was not like part of the mainstream. It was like a niche product for researchers where researchers can upload models. But I do see there’s a huge potential for Hugging Face to grow up because first I believe in open source and the second like Hugging Face is going to be the entry point where like all people will come in and search for open source models. But the most important of all is that I feel that Hugging Face is a company who understands how open source works. Open source is a huge leverage. If you use it well, it’s going to be very powerful. And Hugging Face is like 200 people, like very small companies compared to other companies growing up from the same area. But they are able to use open source as a leverage. and called for collaborations across the world and do very impactful things. a lot of people, a lot of big companies are doing open source, but they just don’t understand this age. That’s the essence of open source. And I do feel that Hugging Face is doing really well there. That’s one of the reasons why I want to join Hugging Face. Grace Shao (04:06) Yeah, I think that’s amazing. I think that’s something we definitely will double click on later, especially when we talk about why China’s labs seem to have been embracing open source. Just kind of one last question on just the whole ecosystem and how hugging face fit into it. What was the philosophy really held by the whole company? Because I actually listened to one of the founders interviews, Clem’s interview recently. And during the interview, he talked about how Chinese scientists have always been long term contributors to open source technology. And then he said it was really like kind of a pivotal moment around 2022 where American open source contributors kind of took a step back and then there was a sentimental shift in the ecosystem. Why is that and how does Hugging Face kind of view the whole ecosystem? Tiezhen Wang (04:47) Yeah, there are several questions. Let me try to address them one by one. The first one is the philosophy behind Hugging Face. I think it’s really the mindset. so anything that we see where we can have a collaboration, like Hugging Face will just reach out and see if we can collaborate. So if you go to see a lot of work released by researchers, they will have paper on arXiv. and also their project on GitHub. And you’ll see me on all of these issue number one, which is the first issue after the repository has been released. And we just write something saying, offer blah, blah, blah. Do you want to collaborate on something? So for anything that we can collaborate on, we will just call for collaboration. And some we’ll go through, some we’ll not. But this collaborative mindset is very, very different from. like a business point of view. From a business point of view, you will first think, what is my edge and how I win the market, how I compete with others, and what are the end areas. After the competition, what’s the end game, how it will go. So that’s the way of how you can justify the investment and everything. In open source world, it’s totally different. It’s like, I want to do something. I just say it and I do it and there are developers who want to join in and we do it together and we grow the pie gradually. we do not have like, let me put it the other way. So if you see an open source model coming from one of the Chinese lab, for example, GLM 5.1 is released and you may think like Kimi or Minimax like other open source model provider. in China would compete with them. But actually not. Like you will see they are commenting on the Twitter saying, congratulations, et cetera. This is a collaborative mindset where everyone is stepping up on each other. we can do a lot of, as a group, can continue to push the frontier forward. So I think this is very, very different. Yeah, and talking about your second question, the Chinese, well, I wouldn’t say labs. Chinese researchers, labs, companies, et cetera, they all want open source. I think there are three different folds. The first one is on the researcher side. A researcher would always prefer if their work is open source. That’s coming from their academia background, because when you Like on the CS world, when you write a paper, you have to show that it’s actually working. You have to show that all the numbers are real. Other people should be able to verify that. And you can only do that by releasing your code, releasing your models to the community so that other people can evaluate. So a researcher, after they graduate and they go to a company, they will bring this mindset forward. And by default, they are open source people. And another perspective is for their self, for the career development of themselves. So as an engineer in big companies, it’s very often that you are working on some project and nobody knows that you are working on that project until you say that out on the game or on your resume. But open source is very different. We know precisely who has contributed to DeepSeek before. And that’s very attractive for for researchers, because if I have done great work, I want the whole world to know that I’m doing excellent work. This will help me have better branding, help me to do more collaboration, help me in the future step in the career. So a researcher would always love open source, by default. So that’s the first part from a researcher’s level. The second one is from business level. So well for individual is quite easy to embrace open source from manager level from the executive, they need to justify the investment on open source. I have to spend tens of millions in training a model and you want me to give it for free. That’s crazy, right? That’s how people think before DeepSeek. Although we have lot of open source models before DeepSeek, but the trend is completely changed. Before DeepSeek, people were thinking, oh, maybe the model is not that good. Maybe I’ll just open source it. But if the model is good enough, maybe I’ll keep it for private. And that’s one of the reasons why you see a lot of people were saying open source is not that good, especially from Robyn. And lot of people do not understand how the open source works. works. But then people do realize that if they do not open source, they do not even have a chance to stand on the market. Because their model first is not really good. If they just compete on the marketing level, on the business level, they do not stand a chance, not even a chance. So you spend tens of millions and you get nothing. But if you open source, at least you have some sharing and people will remember. And also you can have the market from. for the researchers. I think Qwen team was one of the first team who understand it from a business level and start like open sourcing work. And as the result, it’s very, very good. Like they almost taken the ecosystem from Llama and now they are becoming the default for researchers to do research, which is like a huge branding for Alibaba. And like, I guess like if Alibaba wants to do any kind of business, like it’s quite easy for them. to approach to researchers saying, we are not nobody, right? We are the provider of Qwen and everyone wants to talk with them. And another side for the business is that they find it really hard to attract top talent if they do not do open source, because all these talents want their name on papers, et cetera. if they can pay a lot of money. but they still do not have the best talent. But on the other side, if they do open source and the researchers know that they come to this group and they can have their name marked on history, it’s going to be very attractive. So like this company, even not releasing the best models, they try to release something to make researchers happy. It’s kind of like their...company perk. So that’s another route. But after DeepSeek, everything changed. People know that if I do open source, I can have huge branding for my company. DeepSeek is not doing any kind of commercial stuff, like alteration to cusTiezhen Wangers. Yet they still have a huge evaluation of, I think the most recent number is [unclear: “14 million HKD” in transcript; confirm figure]. That’s a lot of money. So by doing open source alone, they can make money. And that changed the mindset for lot of people. so after DeepSeek, Kimi, GLM, Minimax, and StepFun, they all come into this open source world. actually, they have made a lot of success stories, like GLM and Kimi, by doing open source, lot more people understand them. And they kind of open up. the global market, not just the market in China. for them, I feel that it’s not like losing a lot of money because they doing advertisement in a different way. Kimi was spending tens of millions RMB per year on advertisement. And the result is very short retention. People know them, come to their side, and they do not feel any different. And they just move away. Now, the researcher team, the manager, the executive means, knows that the best score on open source benchmark is the best advertisement. So they can concentrate all their power, not wasting them on advertisement, but concentrating all their money and resources on training the best model. But this best self, it’s the best marketing, and they can create great models and start earning money. So I feel that on the business level, everything starts to make sense. But now there is a new challenge, which is how you can stop people from taking the free ride. It’s a longstanding problem for open source. I did something, for example, I made a database. I spent a ton of engineering hours. I open sourced it. But I’m not making any money, because the cloud provider is taking that for free and start making money and monetizing it. it’s happening for open-source world as well. I open-source the model and all these inference providers and chipmakers and BDA-AMD are making money, but not the researcher who created the initial model. That’s why you see some licensing change and discussion on that. Kimi did the first non-commercial license, and then MiniMax made a more restrictive version. Tiezhen Wang (13:40) made a more restrictive version. But I don’t think that’s the final version. People are still trying different things. And I believe maybe in one or two years, we will have a more standard way of balancing open source and commercialization, et cetera. So that’s the second level. The third level is the third level. So the Chinese government is really encouraging people to do open source. If you do open source, you have extra credits on your bachelor education, et cetera. And Shenzhen recently announced a very interesting policy. So you can have housing points if you do open source on GitHub. basically, they are categorizing. Grace Shao (14:21) So the incentive, yeah, go straight to the students, like even in academia, while they’re still in university. Tiezhen Wang (14:27) Yeah, so it’s kind of cultivating this open source culture when other researchers and developers are still in universities, which is really good. So I do feel that the culture of open source is, if they are winning the young students, we are going to see more open source projects. And to be honest, I do feel that that’s the right approach. Because if you’re not thinking about open source, you are thinking like traditional way of collaborating with people, which is company or corporation. And I feel that the essence of why we had cooperation or company is not keeping peace with how we evolve now. I think about, you set up a company in Hong Kong 200 years ago. Why? Because you have a group of people. You want this group of people. That’s why it’s called company. You have a group of people and you want them to work together. And how you can make sure that everyone had their benefits. Everyone is doing a lot of work. Obviously, they want to have a return. And you do that by setting up the shares and also the voting system. that’s how a group of people is working together. But now the word company has changed. It’s more like a multi-international company where the worker in the company has no work in deciding how the company runs. Whereas open source work is more likely the original version of a company. You have GitHub, you know who has contributed what. Everyone knows your contribution, and you can have your name listed. the group of people coming from all around the world, can. collaborate on something. They do not need to be part of a big company going through all the interview process. They can just collaborate. So I think that’s very, very interesting. And now with Zoom, Tencent meetings, and all the Google Docs, it’s much easier to collaborate internationally. I don’t need to know who is contributing to the PR, but I know someone is interested in my project, and we can work together. And I feel that. That’s probably the future way of how people can collaborate. that’s to end the last point on society level. I think the society is advocating for open source. also open source is probably the way how the society will evolve. Grace Shao (16:48) Thank you. is like so insightful pack that I have to digest that. But you you mentioned quite a few different topics, which I can definitely take this straight, conversing different directions to start. have two questions and they’re actually unrelated. So one at a time. Number one is you really make a point about China being really, you know, strong advocate on open sourcing the LLMs. However, I think Could you tell us the history of open source in China in general? Was there a tradition to want open source technology even pre-LMDs? That’s number one, first half of that question. Second half of that is you say there’s a lot of incentive for researchers to actually want to open source everything, right? And then therefore they can claim their contribution. Well, in the recent interview between Zhang Xiaojun and... deep minds, Yao Shui Yu, I think maybe you’ve also listened to it. You know, one thing that really stood out to me was how he was saying people need to be like responsible. And like for someone who’s not technical, I actually really struggled to understand what he meant at first until like actually Jiang Xiaoxuan actually asked him to clarify as well. His whole point is that in academia, people are so used to only claiming a certain section of what they contribute. So for example, for a big piece of paper or research, that you would take credit for what you contributed, right? And you want to make sure that it’s best optimized, known, heard, seen, whatever, right? Recognized. However, in terms of how LLM can work properly in terms of the long run, whether it’s like, you know, further in post-training and further, you know, know, usage, whatnot, it’s important that people don’t claim so much credit to their own part of the work. It’s more important that people work collaboratively. But kind of to your point on open source that, you know, they can work collaboratively and make sure that each piece works together better instead of each piece working best on their own. So it kind of contradicts your comment on why people want open source, because in that sense, wouldn’t it make sense for people to not want open source? I don’t know. That’s another question. And the third part of this is really if open source makes so much sense for tech companies and makes so much sense for academics. then why are the American labs so anti open source right now? Like what is driving that? Is it purely because commercial reasons or philosophical reasons? This is very big, but you did throw a lot at me. So I’m going to throw these questions back at you. Tiezhen Wang (19:07) Yes, sorry for my very long answer. I think it’s probably by itself worth writing a blog post with enough content, and I can elaborate more. But great questions for the story. Can you remind me? I guess we can go through them one by one. Can you do mine? Yeah. Grace Shao (19:25) Just like in general, source China, China open source. What’s the sense on that? Beyond LLM, right? Like why did Chinese companies always contribute to open source technology? Clem talked about this in his interview, but he didn’t go into that about it, right? So number two was just about, yeah, number two was just about like, why do these academics want to claim their names, right? Is it better for the company in the end or is it just best for them, like the selfish reasons? Tiezhen Wang (19:37) Yeah, okay. Let’s try it. Yes. Mm-hmm. Yep. Grace Shao (19:52) And number three is why are American labs kind of anti open source right now? Tiezhen Wang (19:56) Yeah, so let’s try to address the first one. I think it’s a great question. And I do see the shift. So I feel that AI is probably one of the very few areas where Chinese open source contributors dominate. If you look back to, for example, I would say the initial days of modern open source comes from like an Linux or Apache or database and everything. And where you do see a lot of individual contributors from China, but you are not seeing enough Chinese company creating a project. And then the project gets adopted globally. You are seeing that gradually when we move to the area of cloud-native, like when the Kubernetes comes out. And a lot of Chinese cloud providers are trying to really pay attention to this whole open source world. And you will see that this grows. But now it’s like this. So it grows exponentially. So I think it comes from two folds. The first one is the Chinese participation in the global market. It needs time to warm up. Like for example, lot of Chinese contributors, they can only contribute two projects in Chinese because of the language barrier. So that kind of limits how much they can actually do. And now with larger language models, with better education in the new generation of developers, the language barrier is not that strong. That’s why. That’s how the Chinese open source contributors can make a better impact. And another one is, so in the traditional way of a company’s, like how a company’s structure itself, if you do open source project, it’s kind of hard to justify your credits because the open source by itself is not the core business of a company. There are very, very few companies who had their core business made on open source. Like PingCAP could be one of them. PingCAP start with open source and then find monetization plan. But that’s so small. So few of them. And in the new areas, a lot of companies, their core business is open source and plus monetization. Even for IPO companies, for public list of companies, Minimax is basically one such example. They have their best models, open source. and then trying to make money. So this is very different. If your core business is open source, of course you will put more resource on open source. And it’s more likely for your project to gain a lot of developers. And I feel that the third one is the international collaboration has never been easier before, apart from language barrier. So after the pandemic, I feel that all of a sudden everyone is used to like Zoom and Hangout and collaborating with someone who you don’t see face to face. And this is a great chance for open source project to ramp up. Because before that, you have to meet face to face, and the bandwidth and the people you can meet is kind of limited. And now you have a huge, like, as long as your project is great, like you have a huge pool of potential developers. And the last one is probably AI by itself, like coding agent itself. Although it does make code review much harder because there are probably a of AI scope. But it really lowers barrier of who can contribute to an open source project. Before, you want to contribute to a project. They are probably developing language you do not know. And also, the code base is pretty strong. A developer might not be able to. contribute to the project until he has a very thorough understanding. And that’s probably like months of work. Now you can just ask AI how this part works. And I only need this feature. And what are the code I need to modify? And I can just give it a test locally, and it works. I contributed to some Rust project without being a Rust expert. So that’s how AI makes everything better. So I think it has all these reasons. There are probably more, but I think someone at the end of the day, in two or three years, maybe starting to write some history about how every single aspect of technology, moment and everything, people’s mindset shift, how to cultivate the open source spirit. But I think, yeah, that’s the... Top ones coming out of my mind. Grace Shao (24:26) And then the second question was just that would researchers focused on their own name and frankly ego in this sense actually be the best way to help cultivate the best LLM or whatever whatever product that’s the end product that’s to be shipped. Does that make sense? Because it kind of contradicts what Yao Shunyi was saying on Zhang Xiaoxuan’s podcast, right? He was saying in that sense a lot of researchers will try so hard to only own what they are working on, but they have less of a sense of responsibility for the bigger project. Tiezhen Wang (25:00) Yeah, I haven’t really read the broadcast entirely. But talking to your point, feel that open source or not, or having researchers name these data on papers or not, it’s actually a game changer. If you are a researcher and you might want to stay in academia, why? Because all the papers you publish is very important to your career. Everyone sees. Like you have published which paper with like who and the paper well like also mentioned that you contributed which part of work are you Look like my corresponding author. Are you the main person or you are just like contributing a part of it? And you there’s h-index to measure the impact of a researcher So like you have all that infrastructure working for you if you stay in academia But if all of a sudden you want to start working for company, you lose off that because the company might be able to say, we have a policy where all your open source and all your paper publications, even writing a blog is controlled by the PR team and they have to decide if you can do certain level of things. So your exposure is reduced. You might be very happy because you earn much more, like 10x the salary compared to be an assistant professor. Like after five years, if you ever want to go back to academia, that’s impossible because you lose all of your track record. And with open sourcing, things are very different. The top nailing app wants to hire the best talent from academia. And the people from academia wants to work for the app. But at the end of the day, the two system It’s the same system because all the record is public. So it’s very easy for people to come in and come out, come in and come out. It’s kind of different from people coming from academia and then lose track in and get lost in the company world. So I do feel that having your name listed on the work you publish is very important. It is kind of the concept of [Chinese phrase unclear]. So you. your branding grow with whatever you have done. So your reputation is based on whatever you contribute. So you need to pay extra attention on that. Grace Shao (27:16) I see what you mean. And the last bit of just now what we’re talking about was just why is that if you believe open source makes so much sense to these researchers, that so many researchers in the US or at least some of the companies, the entities right now are not willing to go open source. Tiezhen Wang (27:31) Yeah, there are lot of companies who change their position. Google used to be the company which impressed open source the most, like Google open sourced and like TensorFlow Kubernetes and bunch of other important open source projects. The open sourced transformer, which is the cornerstone of our modern AI system. So Google was really impressing open source. But I feel that At some point in time, Google stopped doing all the open source work because they kind of lighting OpenAI and other companies taking the free ride. Google did a lot of fundamental work and then it’s kind of taken by other companies for free. And also, OpenAI and Anthropic, I feel that they are still contributing to open source, but that’s not their main project. Their main project are hidden secrets that they do not want to share so that other people can catch up. And the research and business is getting so coupled. So for example, a researcher in OpenAI found a way to improve the intelligence level by 10%. Let’s take o1, for example. They managed to find out how to make model think. And by this chain-of-thought and thinking process, the model is like way much smaller, way much smarter. But they do not want to share the gist. As long as they do, other people will catch them up. So it’s kind of a very restricted environment. Although researchers may still want to open source some of their work and have their name listed there and sharing very detailed observation. But the business doesn’t just allow them to because you are trying to. Yeah. also, researchers can also talk to like in some conference. I know how open source, sorry, I know how o1 was roughly made by reading bunch of YouTube videos from the internet made by OpenAI researchers. But after that, you see less and less very detailed research sharing, even the videos or recordings by them. I guess they kind of learned the lesson. But like, yeah, yeah, that could be part of the rhythm. On the open source world, like, it’s kind of different. Like, so before DeepSecR1 was released, lot of people were speculating how OpenAI was doing o1, and they’re trying things in different way. And after DeepSecR1 is released with all the recipes and all the data they shared, like, the open source world seems to be converged on the path. Although that path might Grace Shao (29:46) There can’t be compliance reasons. Tiezhen Wang (30:12) be different from o1 because we never know how o1 was made. But then because of DeepSeek’s contribution and sharing, everyone knows how to make thinking chain. And the whole ecosystem is evolving really, really fast. That’s one of the real value of open source because everyone can just collaborate. No one is holding secrets. Well, there are still a lot of secrets on how you can run as efficient as DeepSeek, but that’s Like too technical, like that’s not too much on the research side. yeah, like I... Grace Shao (30:44) So on that note, yeah, I want to follow up on that. I think I recently wrote about something which is, speaking to our researchers, it got me a sense that DeepSeek in a way is now becoming essentially like a foundation for everyone because, you know, a lot of the labs in China are looking to DeepSeek to see if there’s any like, you know, engineering breakthrough, like your point, and they build on top of each other. Help us understand like each of the labs, because you said, they’re cost constraint, they’re compute constraint. Tiezhen Wang (31:06) Thank you. Grace Shao (31:12) They’re teleconstrained, right? Their resources are constrained and every single asset you can think of compared to the American peers. Now, why does it make sense that they all open source and how are they all optimizing for their own goals at this Tiezhen Wang (31:24) Yeah. So open source by itself, as we just talked about, is an accelerator of the whole ecosystem. So DeepSeek shared all their like, knowings and discoveries and what things work, what things doesn’t work. This by itself is accelerating the whole industry, not just Chinese open source, but also like US open source and US like closed source. Like they just don’t say how much they learn from DeepSeek, but I believe everyone is learning from DeepSeek. Not just that, DeepSeek also contributed to GRPO, which has become the most used algorithm, reinforcement learning algorithm in the industry. So they did a lot of contributions. if you check recent model architecture evolution, what’s proposed by DeepSeek is becoming the standard and getting adopted by many people. For example, Kimi 2.5 was using a model architecture very similar to DeepSeqs. And GLM 5.1 was adopting a lot of components from DeepSeek architecture as well. So it’s kind of sharing and learning and co-evolvement is one of the, I would say, secret of how China is able to. catch up with the US in certain area, although having restricted compute and restricted capital, I would say. If US open source is working again, like the whole ecosystem, like everyone was trying to open source, I would say the human race would be evolving much faster than what we are doing now. Grace Shao (33:01) So on that, how do we understand the accusations of what is being distilled? What is technically shared? What is, how do I understand the gray area of that? Like the accusations from a lot of American labs, Chinese labs right now, like you just said, a lot of American labs are learning from Chinese labs. Frankly, within the researcher community, it’s not even Chinese versus US, it’s really just labs with each other and against each other if they have to, right? Intellectually competing. So then how do we understand what the, industry agreement is on a distillation, why is it so contentious right Tiezhen Wang (33:32) On distillation, yeah, that’s a great question. I can only give you my perspective. first, distillation is a very broad word. We are distilling from each other as well. I learn from you, you’re learning from me, and we are all learning from books and papers and all this public information. So I would say, distillation is a very common practice, like basically how you learn from others. Like you might have a model which summarizes the books and like doing bunch of explorations. And the way for the model itself to move forward and evolve is to distill from its historical data and historical experiments. And like that works for like another model trying to learn like your model as well. And on the research field, distillation is very common. DeepSeek R1 was released with MIT license. Specifically, so I actually asked the team about it. They choose MIT license because they want their model to be distilled by others. Because that was the only model that works really well with the thinking chain. And they want all the open source model to be able to have that. Like they have shared all the recipes, but others do not have data. So DeepSeek design like they’re like small models so that and also the recipes so that other people can easily distill DeepSeek, getting the thinking chain and use that on their own models. like this distillation is happening like everywhere. And I think like US companies are distilling from each other as well. Like I’ve seen like the recent discussion on Twitter in public. where Elon Musk and Sam Altman were kind of battle on that. yeah. And if you think about it the other way, so if you do not allow a model to distill, I mean, the output of a model to be able to train a model which is from a competitor, it’s kind of a very interesting point. Like if we say, I’m reading a book. I’m telling you the story. So you, after reading the output from me, which I think of me as a model, you’re reading my summary and you are not allowed to share the summary to others. You have to read the book, the initial book, not using my summary because of the license, et cetera. That’s kind of ridiculous. That’s not how human transfer knowledge in the past a few thousand years. Like I have a very bold argument. I think that anything like generated by AI should not be copyrightable. So like it should be in public domain, like anything generated by AI, because like anything generated by AI is a distillation of like human entire history and everything that human has created. And if you just take that for free and asking other people do not use that. Like it’s kind of a waste and it’s kind of like blocking people from evolving forward. Because like human content do not have this restriction and why you are putting this restriction on something not copyrightable and generated by machine. So that’s something I do not really understand. So I do see there are terms and conditions saying that my model output cannot be used to improve other models. But I don’t think that’s kind of valid. I’m not sure if someone eventually will file something on the court and we can have a case on that. currently, think there are a lot of things to discuss, but it’s not about if we can distill a model or not, but about something bigger. Should the model creator even have this right to restrict others from distilling from their models? Grace Shao (37:18) That’s really interesting. I think that a lot of the discussions in the public space is really about whether you can use copyright work of human output. And then the argument is always like, just you cannot distill because the company said there’s no distillation allowed. But like to your point, there is no actual clear black and white rule of regulation around this right now. And in fact, it’s it’s bit murky. Yeah. Yeah, yeah, that’s interesting. Tiezhen Wang (37:37) I’m not a lawyer, but I can find a clear answer on that. Grace Shao (37:43) Okay, I want to kind of go to China. Like we’ve kind of talked a bit about the big picture. Well, a lot about the big picture. But let’s look at just the China labs. mean, I know that you represent APAC back then with Hugging Face and you worked around APAC, you lived in Australia. But for the sake of this, know, Chinese labs right now probably are the most relevant out of APAC. Do you think I’m missing anything actually on the APAC conversation? Like, do you think anyone else in the region is relevant in this space that we can talk about? Tiezhen Wang (38:08) Korea is doing really, well. Yeah, Korea is really well. Well, the most, one of the best model is probably Upstage. they, initially they create a Korean model leaderboard, like open source version of like model leaderboard. Well, no, no, the leaderboard was not funded by government. The, the, the, Grace Shao (38:10) Yeah, yeah, give us some picture on that. That’s funded by their government, right? That’s their government funded. Tiezhen Wang (38:26) So Korean, it’s actually a very impactful country, but as the other days, there aren’t enough Korean data. Even for ChatGPT, I think until ChatGPT 4, the model doesn’t speak good Korean. So the model was able to speak very good Chinese from day one, like from ChatGPT 3.5, but because of the data volume, et cetera, speaking Korean was always a challenge until ChatGPT 4. At the time, like now, the open source model is able to speak like Korean. So Upstage create a leaderboard. So the way they solve problem is very interesting. They’re not solving problem by solving problem. They’re solving problem by helping others to solve the problem. So instead of creating a model right away, they create. Grace Shao (39:09) I heard about Upstage from VC in Korea as well, but I don’t know the detail about it. Tell us more about who they are, what they’re doing. Tiezhen Wang (39:15) Well, I don’t know too much about who they are, but I only see their open source contribution. I think the founder is a professor in Guangzhou, but he’s Korean and moved to US. Correct me if I’m wrong. I’m sorry. I’m not really up to date with that information. But I just want to call out because I think that’s a very interesting paradigm. For example, if you are a company, you have your own problem you want to solve. Tiezhen Wang (39:42) Like, how do you want to solve it? Like, you are going to hire some people and define a problem and try to use your own people to solve it, right? So that’s the old way. What’s the open source way? Is you publicly define the problem. You have a leaderboard. Like, you might do a private eval or public eval. It all depends on you. the problem is you have to list your problem. publicly and you have to tell everyone that you can contribute to this problem by submitting a model to a URL and we will do evaluation and see how each model is evolving on this area. So they basically have a leaderboard. And you will like a lot of researchers would be very interested because now they have a problem to solve before they do not even know Korean what’s the problem. So now they have a problem to solve and you will see that the curve goes like. it goes up because there are more and more researchers coming in and all their work are open sourced. So a new researcher wants to jump in the field. They will first have a look on the leaderboard to see how far away from a really usable benchmark. And then he can investigate all the previous attempts and find his own way of kind of just changing something a tiny bit. and apply that to the past people’s work and submit to the leaderboard. And now we are seeing people making progress on the leaderboard. So that’s a very, very clever way because it’s not one company solving the problem. It’s like we are opening the door for everyone to come into this playground and try to solve the problem together. I think within a few months, they were able to get thousands of submissions. which is really massive because just imagine you hire 10 % people, you won’t get that. And now it’s by this new way of doing things like building public, evolving public, you’re having a lot more submissions and you are educating people, et cetera. So they have this very impactful and inspiring leaderboard and then they release a model called Upstage for something. I can’t remember it has been a while. And the Korean dataset and the Korean models are accelerating very fast on high-netics. I think it is now the fourth largest models, speaking Korean. Yeah. Grace Shao (42:04) Very interesting. Yeah, I’m going to shamelessly self-plug in. People can listen to the episode I recorded with one of the leading Korean VCs as well that was published last week. He gave a good AI ecosystem breakdown of stuff. Tiezhen Wang (42:12) okay. Yeah, could you help me like do some like DD first and like just make sure that are correct. Yeah, you can. Grace Shao (42:22) Yeah. No, no, no, he did talk about Upstage as well. It’s very interesting. Yeah, I want to... sorry, go on. Tiezhen Wang (42:29) Yeah. And also, so you asked for APAC. So in Singapore, there are a lot of great researchers, like lot of Chinese researchers will go to Singapore as well, like Cancun too. Yeah. Grace Shao (42:43) Yeah, I think the ecosystem is a bit overlooked by I think Western markets, but definitely there’s a lot happening in around Asia. Like APAC has been including Australia as well as Southeast Asia, East Asia, and Northeast Asia. Okay, I want to bring it back to China. We’ve been kind of talking about China kind of more on the high level sense. Now looking at the companies themselves or the labs, we want to break it down. Just give us a sense like, how do we understand moonshot? Mini, Max, Deep Seek, Zhipu, if you have to put it in one bracket, versus the hyperscalers, Tencent, Alibaba, and ByteDance, in terms of their strategy, in terms of the capabilities. Like how should we understand this ecosystem right now? Are there other relevant players that you think I’ve missed, maybe like Xiaomi or anyone else? Tiezhen Wang (43:25) You mean like how the model creator, model lab, are collaborating with hyperscaler? Is that your question? Grace Shao (43:32) No, no, I just think it’s like the people, the people who are creating LLMs, like are researching on how to deploy LLMs. These are the main players, right? Now, how do they defer? How are they similar? What are we seeing like on the ground? Are some of them becoming more irrelevant? Are some of them becoming maybe say, we just talked about DeepSeek becoming almost infrastructure provider for the whole ecosystem. Tiezhen Wang (43:39) Yep. Grace Shao (43:58) You know, are that mini-max is very, focused on multimodality. Zhipu is very focused on coding capabilities. know, Alibaba really trying to push out commercialization by their existing applications. How successful that is, that’s a different question. Just like an overview of these players. Tiezhen Wang (44:14) Yeah, I do think they’re kind of converging. Yeah, because everyone knows that coding is, the whole market for coding is booming. And if you have a good coding model, you can sell it for profit, for large profit. And I do feel that everyone is rushing for coding. There are people exploring different things, like, For example, Tencent is putting a lot of efforts on Hunyuan and doing OCR stuff. And lot of other companies are doing video generation. But at the end of the day, think from a strategy level, I don’t feel that there are a lot of difference. It’s more likely a case where, you have data? For example, it makes a lot of sense for ByteDance and Kuaishou to work on video generation models because they have a ton of data. And also, do you have? like a large enough scale. Like for example, Kimi is not very active in making all the apps. Like Tencent is making models. They are making like great apps. Like for example, Yuanbao, like a QA app, like based on all the Tencent data, it’s very popular. Like they make QClaw. Like Tencent is able to do that because Tencent has a huge talent pool. Like Tencent is a huge company, Whereas like if you look at the Kimi, Kimi is very conservative in... doing all that because Kimi is still a very small company. So I think from a very high level, everyone was on the same page about the strategy. It’s just more, how much resource do you have? What are the advantage of you? Do you have data? Do you have distribution channel? Do you have product design, success story, et cetera? So yeah, I’m not sure if I answer your questions. Grace Shao (45:57) No, no, that’s good. So we kind of talked about why researchers want to open source. We talked about these companies are somewhat doing the same thing. So then this leads me to the question. We know that open source, open weight does not actually mean they don’t make money. However, obviously means that it’s harder to commercialize as we like alluded to with the US labs, why they make those decisions. Then how do these companies find ways to monetize and sustain their businesses then? Tiezhen Wang (46:21) Well, in the US, are also labs dedicated in making open source models and still making money from other donations or from other parts, like selling apps, et cetera. It’s basically the same way in China, too. For example, DeepSeek is run by, I would say, donations from the people who play the stock market. like there are labs run by VCs and lot of labs are already profitable by like selling tokens like GLM has recently raised the token price because like they see a huge number of demand and they’re like running short on compute. yeah, like open source can make money. Like there are a ton of ways for open source model provider to make money. I have a lot of ideas. if in case you are interested in like making your not profitable, can contact me. But honestly, there are lot of ways. The simplest way is to sell token. If you have the best model, you can sell a token for profit and people will actually buy your token. so it’s very interesting because when we combine science and technology, always consider it’s the same thing. Grace Shao (47:15) Yes, everybody find Tiezhen Wang. Tiezhen Wang (47:37) For model, it’s the same. When we think about models, we just think of a model that generates tokens, et cetera. But actually, there are two different parts. The first one is training, where you have the model. And after you get training, open source the weight you trained. Another part is the inference. So you need to run a lot of optimized CUDA kernels in order to make your token cheap and fast. Either bracket can make a lot of money. For example, you can open source the fine-tuned model, not the base model. So if a company want to use open source model for fine-tuning on their own data, they cannot be building on a fine-tuned model. cannot build. They have to find the base model. And if the base model is not open sourced, you can sell that for profit. And also different clients might have different requirements on the model. The NeoLab can collaborate with the client directly and provide some kind of training and post-training support. So that’s a way of making a lot of money, actually, because training is very expensive. It involves very expensive researchers and data and compute. On the inference side, too. Grace Shao (48:45) Yeah. Tiezhen Wang (48:52) Because the inference is tightly coupled with the data center you own. So your optimization strategy does not, there’s no guarantee that your optimization will work on a different cluster. So a lot of people just do not open source the inference recipe because it’s not that useful. And also it’s kind of a moat. So the model provider who creates the model, they know how to optimize the model best. when the model is released because they have seen the model for four months and they have done a lot of optimization on the model inference. And when the model is out, like everyone else, it’s just starting to know the model and doing some optimizations. So of course, the model provider will sell token in a much efficient way compared to all other competitors. Three months later, when the outside inference provider gets to know all the secrets and do very optimized kernels, there’s a new model coming up. So the model maker, the people who know the model from day zero, always have an advantage on selling the tokens. So that’s one of the very important ways how they can make money. Grace Shao (49:59) I see what you mean. Mm-hmm. Yeah. And does DeepSeek v4 coming out have an impact on how the GLMs of the world or Kimi make money? Like essentially their strategy with the fact that you just said they just raise prices on their tokens. Tiezhen Wang (50:16) Yeah, so GLM and Kimi doesn’t sell DeepSeek or Qwen. So they are not competing with each other directly. I would say the capabilities are on par with each other. So it’s more like a user test. Which one is better? There is no clearly winning between all three models. So we’ll see like Zhipu’s stock price was getting down because people were so worried about DeepSeek. But then they realized that like the Zhipu token selling is not quite impacted, so the stock price bounced back. But at the end of the day, I would say it’s actually a good thing for them. So GLM 5.1 is adopting a lot of core design in DeepSeek with 3.2, I think, model architecture. And they were able to cut down the cost. by adopting all these exploration from DeepSeek. And now V4 Pro is out. I don’t know the details, but a very simple guess is that Zhipu is able to cut down the cost because they can adopt new things from DeepSeek architecture. So Zhipu on one side, because of the demand is so high, so they can increase the token price. and they can learn from DeepSeek and cut down the cost. So Zhipu is going, yeah, exactly, exactly. Grace Shao (51:34) you have a higher immersion. Yeah, this is something I think they’ve talked about as well, like really being able to learn from the engineering breakthroughs that DeepSeek puts out every time. Okay, I have mindful time. I just kind of want to have a few questions on the future outlook. You posted on X recently saying that you’ve been thinking a lot about how do we make AI bootstrap itself? And you you’re going through this transition yourself, you’re thinking about the future of AI. What does it mean for the open source future as well? Tell us a bit about where you stand right now and how you think of this bigger picture. Tiezhen Wang (52:06) Yeah, I’m still doing some exploration on my side. I think this whole AI bootstrapping logic has already been implemented by a lot of big lab internally. The idea is very simple. In compiler world, you can design a programming language and write a compiler probably in well-known languages like C. And then you will first implement this language using the C code. In the next iteration or after a few iterations, you are able to implement this language using your own language. So it’s called bootstrapping. You are basically evolving on your own. You are not relying on something which is not from your language. So it’s like putting it another way. If you see how normal living creature, how they replicate itself and how they evolve. I don’t need to have a screwdriver somewhere to engineer my kid, right? My kid’s just born. All by itself. But how far are we from AI to do similar things? Now we have a coding agent very powerful. We have our AI training pipeline recipe kind of stabilized, at least for small sized models. So are we really far from like AI able to get one of my idea, like I give him the direction, and he’s able to like first bootstrap a very simple version and gradually evolve towards that goal. Like I think it’s like highly possible. So at the end of the day, we might be able to like just tell him what I’m going to do without like giving him all the harness and all the like detailed guidance and. I’m not talking to him 100 times, and he’s able to first lay out what he needs to do and have a plan, and then probably design a DSL or agent all by himself. And probably he will create a model ways to help him to get adapted to this goal. And then he can just keep evolving. All I need to do is to give him more fuel. which is compute, and he’s able to do some evolution and all by himself. It’s kind of like if you have recently read Andrej Karpathy’s Twitter, there’s a concept called auto-research. But auto-research is just evolving on the model weight. It’s not evolving on the agent and harness. I think on the agent level and harness level, there are also a lot of things to do too. So I’m quite new on this journey. What I was able to do is to bootstrap a very simple agent and I can use that agent to optimize the agent. But I think eventually we will get the weights involved too. When the model realized, okay, I’m not just needing an agent, I can create a bunch of data and improve my weights. He’s able to evolve from that too. Grace Shao (55:05) So in the future, how important is the capability of the models versus the harness and then the industry expertise then? Because right now, so much of conversation is still about, you know, the models are very strong. We are seeing what you’re saying already, the agent’s starting to build out things auTiezhen Wangatically. But we still need the taste. We still need the industry expertise to guide them. I find it hard to imagine that, you know, you can plug in something just say, want this to be done. And the agent just starts doing it exactly to your taste and your... imagination? Do you really think that’s happening? Tiezhen Wang (55:34) Yeah, I do feel that it’s happening. Like we are using agent to like especially coding agent to code something that we are completely unfamiliar with. And I’m quite confident that it will actually work. The reason is that like I have defined a set of goals and as long as I see that is moving towards that direction, like I’m good. I do not need to understand the code line by line. Like it’s just the box. But like the difference is I’m using the coding agent. Grace Shao (55:58) Mm-hmm. Tiezhen Wang (56:02) to do something else. What I can do is to use the coding agent to improve coding agent itself. And using the coding agent to generate the data and train the model that coding agent is using. And I would call that a bootstrap, not like just using, like, I think I’m already quite happy with coding agent to do something else. But just like, yeah, yeah. Grace Shao (56:23) Interesting. I want to end on a more philosophical note. So do you view the argument that AI is going to replace humans then? Or do you think AI is going to be in the role to support humans if we could keep on going down this path? Tiezhen Wang (56:35) Well, it’s actually a very, interesting question. And I feel that people do have different feelings. But from a pure technology point of view, I do feel that it’s one condition of like, so technology is not the only thing that will decide everything. You mentioned that if it’s going to help humans, well, it’s not really technology by itself to decide. Like it can be used in different ways, in different social structure, in different like tradition and such. It’s like giving you a gun and you can do things in good. Yeah, it’s not. Yeah, yeah. But like just imagine that you’re going back to history with all your knowledge of modern society. Are you going to help the Grace Shao (57:12) It’s not a good analogy. But yeah. Tiezhen Wang (57:26) like the society, like the history you go back to? Or are you able to help? Like I think it’s basically the same. If you have AI that knows everything, you can just think of it as human being in like 2000 years in the future. And now you have it. And what it is going to help on the society. Like it really... Grace Shao (57:31) Yeah. It’s like that saying, your own capability of using it is the cap of itself. Also, I think there’s a lot of argument and discussion around the fact that even the society as we know it today, the knowledge work that we all have, that we normalize, are not even created until the recent 100 years. And if AI is to disrupt that and replace human in that sense. Why is it so bad? Because it alleviates us to do other things that human multifaceted beings that we are can do. Is that kind of part of the argument as well where like, even if it replace us or helps us, it’s only helping us actually alleviate some of the things, if you take a step back, the things that we don’t want to do, right? Where we can maybe go touch grass. I don’t know, maybe this is very optimistic view of it, but there has been people saying like, The cap on AI capability is a cap of your own intellectual, your own cap of your own ability to navigate or use AI. So the more you can use AI, the more it can help you. The less you can use it actually, the more it will replace you. Tiezhen Wang (58:45) Well, I think it’s very interesting to define what is you. Are you defining you as everyone, or are you defining you as people who have compute? Well, no, it’s not. It’s actually a very, very, very interesting question happening right now. You know, Anthropic coding agent is able to do lot of things. But people are of imagining that we are Grace Shao (58:53) We’re getting really philosophical now. Tiezhen Wang (59:09) Everyone is getting a lot more powerful with models. But what if one day Anthropic just say, you cannot use your coding agent to do certain things? It already happened. Anthropic said, you cannot use your agent to do auTiezhen Wangated tasks. The other thing, there could be other limitations. I have a very bold argument is that the reason why we are able to use AI so cheap that even us, like we do not own a data center, right? Even us can use that. It’s because our data is still valuable. You know, if you use subscriptions, your data is going to be distilled by Anthropic to further improve the model. And like they are able to give us a discount because they still need our data. Grace Shao (59:52) So your point is that one day when they capture enough data, they will not even give us this kind of access for free or for cheap price. Tiezhen Wang (59:59) It depends on how they define you. You ask them, like, you or something. How they define their user. How they define who could be part of the game. Like, if one day, like... Grace Shao (1:00:09) So then the question is, no, but then my question is, then there’s another argument where they’re saying too much power is in the hands of a few companies right now, right? Or a few founders, what not. We need open source, that’s your point, right? No, that’s really interesting. And that was actually gonna be the last question I was gonna ask you. What is one differentiated we hold? And I think you’ve already answered that in that sense, right? Yeah, I think it’s for us to really think about it. But then as the average user, my question is, how do you actually boycott? Tiezhen Wang (1:00:18) That’s why we need open source. Grace Shao (1:00:36) these companies or if not boycotting, how do you actually make an impact? Because if I’m not the developer creating an open source model for the average person to use, me as an average user, what do I do? Tiezhen Wang (1:00:47) Well, just use the model to do the thing you want to do. Try to embrace the model and be more patient for open source models because obviously the open source model is not as good as top tier closed source models. you kind of like, well, I mean, with open source models, you keep all your secret to yourself. So you can have. like better security and you have better control. Open source model will never betray you if you just write on your local laptop. So although the model is not performing as well because he’s not distilling you, right? So still you can trust on your open source models and give it a more task to do. Grace Shao (1:01:20) You host yourself. Tiezhen Wang (1:01:34) I do feel that a lot of open source model is actually capable of doing things. But the expectation might be, think of it as six months, like cloud version of, sorry. Let me put it another way. So think of it as old closed source models and be patient with that. And you can grow up with the open source model together. Grace Shao (1:01:54) That’s very interesting. Thank you so much for your time, Tiezhen Wang. Tiezhen Wang (1:01:56) And thank you, Grace. AI Proem is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to AI Proem at aiproem.substack.com/subscribe [https://aiproem.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

Nathan Lambert Reflects on China’s AI Labs: DeepSeek, Open Models, and the 'Race' with the U.S.

Joining me today is Nathan Lambert [https://substack.com/profile/10472909-nathan-lambert], author of Interconnects AI [null] and a post-training lead at the Allen Institute for AI. Nathan recently returned from a major tour of China’s leading AI labs, where he met with researchers and teams building some of the most impressive open models in the world. In this conversation, we discuss what Nathan saw on the ground: how Chinese AI labs differ from their U.S. counterparts, why open models have become such an important part of China’s AI strategy, and how labs like DeepSeek, Alibaba, ByteDance, Kimi, Z.ai, MiniMax, and others are navigating compute constraints, data access, and commercialization. We also dig into some of the most debated questions in AI today: Are Chinese labs really 6-9 months behind U.S. frontier labs? How meaningful are distillation accusations? Can domestic chips like Huawei’s make up for restricted access to Nvidia GPUs? And is China’s AI ecosystem actually government-directed, or is the reality more fragmented and commercially driven? Ultimately, this episode is a more nuanced look at China’s AI ecosystem that looks beyond simplistic narratives about subsidies, copying, or geopolitics, and instead examines the technical, cultural, and economic forces shaping the future of open models. Check out his two recent articles here: * Notes from inside China’s AI labs [https://www.interconnects.ai/p/notes-from-inside-chinas-ai-labs] * How open model ecosystems compound [https://www.interconnects.ai/p/how-open-model-ecosystems-compound] To find the previous episodes of Differentiated Understanding, see here. [https://aiproem.substack.com/podcast] Every episode, I bring in a guest with a unique point of view on a critical matter, phenomenon, or business trend—someone who can help us see things differently. Season two will host a series of guests from early-stage investing, as well as builders, researchers, founders, and product managers. For more information on the podcast series, see here. [https://aiproem.substack.com/p/launch-of-differentiated-understanding] Chapters00:00 Insights from the China Trip11:51 Cultural Differences in AI Research18:15 The Role of DeepSeek in China’s AI Ecosystem25:26 Overview of Major Chinese AI Labs30:56 The Future of Open Source in AI37:50 Market Dynamics and Consolidation in AI42:28 Distillation and Model Convergence Controversies51:58 The Gap in AI Performance: US vs China61:09 Monetization Strategies in AI: A Comparative Analysis62:32 Government Influence and Misconceptions in AI Transcript (AI-generated for reference only) Grace Shao (00:00) Nathan, thank you so much for joining us today. Yeah, really, really excited to finally hear your thoughts on your big China trip, on what’s happening between the Chinese AI labs and the U.S. AI labs, what you think the potential compute constraints might mean for these labs and their performance in the future, and obviously the open-source ecosystem. So before we get into all of that, could you... Nathan Lambert (00:02) Yeah, thanks for having me. Grace Shao (00:23) Briefly tell us about how you ended up actually working on post-training and open language models. Just a bit about yourself. Nathan Lambert (00:29) Yeah. So I actually started my PhD at Berkeley in 2017, not working on AI things. I was an electrical engineer by training in undergrad, which is funny looking back, because that’s the same year that the Transformer paper came out. And I was like, I think I should do this AI thing, and tried to get the famous advisors to mentor me. And they’re like, we can’t take you. So I had my PhD as this wandering path to become an AI researcher. And then I ended up at Hugging Face after that, which was, realistically, the only industry research job that I had, but also a very hot startup and very fun to learn kind of at the intersection of these tools that people use a lot for AI and research, which is what I was doing. And then when ChatGPT hit, the kind of RLHF thing blew up as the hot word on the technical side of things. My PhD had ended up being in reinforcement learning, which is just the first half of reinforcement learning from human feedback. So it was kind of a natural pivot to be like, well, I might just do that. And Hugging Face was a good place for doing that, because the whole company is kind of all for that, which is like: figure out how to support the community on the hot thing and build platforms there. So they were very happy about that. And I helped build a team at Hugging Face. And then I was kind of burnt out on the remote-work time-zone thing and found out that the Allen Institute was doing such similar stuff. And I was like, wow, I have people that could be in-person friends and do similar things. I was like, quality of life — I need to do this. And a few years later, I ended up building a bunch of models. And I think being at a nonprofit opened me to this ecosystem vacuum of information, where there aren’t many people who can talk about what they’re doing. So then, with some luck and committing to write every week, I just feel like my influence filled the vacuum of nobody saying reasonable things. And it is this nice synergy between what I write about and what I work on in my day job, and it just kind of got bigger and bigger in a very fun way. I think that, generally, at the highest level, I’m motivated by wanting AI to go well on this trajectory. And I worry about a lot of near-term things, whether it’s social unrest in the U.S. and just kind of the massive hatred for AI — I think is a very big near-term problem — and then, medium term, concentration of power, because I think AI will be super powerful in ways that people don’t expect. So generally, open models are a nice way to curb both of them by being a bit more transparent to people, and it naturally is a hedge against concentration of power. There have been different reasons throughout that, but that’s kind of a recurring theme in my life in the last few years. Grace Shao (02:50) Definitely. I love your work because I think you help non-technical people like myself really understand what’s behind what’s happening in these labs a lot better. And then I actually just spoke to your former colleague, Tiejin Wang, and he was with APAC Hugging Face just last week. He was saying the same thing. Open source, in many ways, is kind of the best way to go forward as we know that this technology will not stop evolving, but it’s the best way to kind of put up guardrails and checks and balances for the monopolies. Okay, I don’t want to take up too much time on that side of things today because our focus really is about your China trip. Before we get into the weeds of all that, I want to hear about the trip itself. Most people who are writing about Chinese AI are getting their information secondhand. You really went there, you spent time with the researchers, you met with people who are building the models. Tell us about what you meant when you said you came back with great humility, right? Your eyes are a bit more open, whether it’s the good or the bad. Tell us about your trip. Nathan Lambert (03:50) I feel like I kind of went in — I mean, I had this horrible English phrase in my writing, which was like, “I knew I knew nothing about China,” which kind of tried to indicate that I knew going into the trip that I knew nothing. And it was still the fact in my current writing. This is a horribly written sentence that I had in there. And I only talk about it because somebody called me out on it. It’s like, what is this? And it’s like, leaving, which is knowing that it’s such a big country, there are just such vast amounts of talent working on these problems, and how unpredictable it is as a human to model people with very different worldviews and upbringings and training systems. Realistically, the way that people are trained in China is very different. And I just think that even being there, you can’t fully grasp: what are the pockets of three to six researchers doing that is actually a bit different than in the West, even if they’re working on the same goal? I think you could get down to that level of granularity and a sociological study and actually see differences in what they’re working on, and that’ll always change the output. I didn’t get to that level of granularity, but it’s just to start having real experiences and understanding how people explain how they work on these problems. And for me, realistically, a lot of it is coalition building, which is just like: I want there to not be vitriol at the level of the technical companies doing things in international bodies. So just meeting all the labs on both sides is really nice, because you need to do that for them to talk to you about more sensitive issues in the future. I got some criticism on the piece, which is like, this is how you shouldn’t visit China. And it’s like, well, what are you going to do if you’re going on an official visit to a bunch of companies? How do you expect to get in the door without being nice? You have to start somewhere, and I think it’s important to be respectful. Grace Shao (05:31) I think the piece was, frankly — I don’t think the criticism was fair, to be honest, because I think you were really transparent with the fact that you’re not a China person, right? It’s not like you’re going there and exoticizing everything. And if anything, a lot of people, even with China backgrounds, like to use certain dragons and tigers to describe things. I feel like you actually were really humble going and being like, I’m just a technical dude meeting with these labs, talking about their technical research, right? And then because you were physically there, you had observations of the culture and the people. So yeah, I actually thought your piece was quite good. And yeah, sorry. Nathan Lambert (06:05) I agree. I was willing to let that sail past, but I think it’s important for people who listen to realize how actively these companies are trying to court Western audiences, which is why we could get in the door. I mean, we had some prominent people on this trip, but that’s why we got all of them in the days that we wanted them, except for DeepSeek. So essentially some, like Catherine Rintel, who works with me at Interconnects, and some other creators... Grace Shao (06:23) How did you get everyone? Yeah, how did you get everyone? Nathan Lambert (06:29) He used to live in China and has connections in China. So he kind of orchestrated the mix of his connections and leveraging my connections to labs. We had some bigger names on the trip as well. Just stringing all of these together to get all the various labs in place is a few months of networking to make sure the trip lines up with people with established networks and contacts with the various labs. But these people want to look good to Western audiences, so they’re only going to say yes to the right researchers. And the researchers know that there are two to four comms/ops people in the room, hanging out, making sure that it goes well. Especially the bigger the company, the more comms people. You go to Alibaba and there are three to five various people, from the head of comms to some special offices. You’re not going to get these people in the office, or at all, without accepting the cost of these types of handlers. It’s the same thing in the U.S. You’re not going to just plop a senior executive into a chair. So it’s also good because now I have the WeChats of a bunch of researchers from China that I could just text about things. It’s like, hey, congrats on the new model release. It’s like Lay Lee works at Xiaomi, Xiaomi MiMo. It’s like, talk to this guy for an hour at a mall — I don’t remember the name of the tea store — but it’s like... Grace Shao (07:27) No, of course. No, of course. Nathan Lambert (07:49) Now we have these relationships, which is very useful, and that helps information spread across the ecosystem to these trusted parties, which doesn’t really exist. There are not that many, I think. And the opposite direction of the trip is very hard because Chinese researchers can’t really enter the U.S.; the visa purgatory is too complicated. A lot of us on the trip were either Canadian or entered on a transit-without-visa entry, which makes it very easy for American technical talent to go to China right now, which is why I think there are so many trips. I think there’ll be more of them. We’ve got a lot of inbound from VCs and open-source labs in the U.S. that want to establish collaborations with these various labs because they’re the best open-weight models, and they want to build a stack for companies in the U.S. building open-weight models. So I think there are going to be more prominent, but not gigantic, U.S. startups going to try to build these relationships, which I think is a really interesting technological development because we’ve never seen this type of professional work trip in China from U.S. tech companies. Most tech companies have a “bring a device to China, it auto-bricks itself, and you have to hand it into IT.” So to actually proactively send people in a professional capacity is a really big change. There are a lot of angles you could take this, and I think it’s cool to see how it unfolds. This isn’t even really about the trip. This is the follow-on that we’re hearing from people that are like, hey, how’d you do this? We want to do this trip. Grace Shao (09:06) Yeah, definitely. Actually, from my end, I hear about VCs or investors always being quite active going to China because previously American funds were very, very active during the internet era. People were kind of always trying to find a way to either get into these good deals or potentially keep their pulse on it. But I think it’s really, really positive for the whole AI ecosystem to have this kind of fair, transparent exchange in some capacity. But to your point, there’s no way that star researchers can come out and talk to you off the record without any compliance, because that doesn’t happen in the U.S. either. That’s just companies protecting themselves. I just think your trip was quite meaningful, and I want to bring it back to your observations. You talked a lot about the cultural aspects of it. You talked about how you felt like in China there was less of this star-researcher celebrity status around people. People were more humble, or there was more humility. It was very focused on execution. You argue that Chinese labs are particularly well suited to the current LM-building game because they’re very focused on meticulous stack-level work. And there’s less ego sometimes to work on the dirty work, or the non-sexy work. So kind of unpack that for us. Why do you think that is? You kind of touched on it — you said they were brought up differently, they were taught differently — but what’s so different? Nathan Lambert (10:27) So essentially, an interesting part that synergizes on this trip is that we stopped by some academic institutions. I think it was like AIR and Tsinghua and stuff. And you hear all of these academic leaders talk about how they’re pushing hard to try to change it. So yes, they know China is producing more papers than anyone else, but they still think that it’s not as transformative of research. And they think that they’re trying to cultivate the academic domestic ecosystem to change just the type of work it works on, and the distribution, and take more risk. And then you would talk to some industry leaders off the record behind closed doors, and you would hear things like, it’s never going to change because the education system is so structured. There are so many layers of the funnel that reward things like memorization and stuff that they’re just like, this research culture is not going to emerge. And then the follow-on with the AI labs is that these labs are doing fast-following. They kind of have a proof of concept, and they know what it needs to look like. Therefore, in that domain, you’re not trying to invent the new paradigm. You’re not trying to make the model that is o1 or o3, or the first model to work in Claude Code. You’re like, I see it, and I’m going to try to do that and make it the best thing. And I’m going to try to make it cheaper and just maximize that goal. A lot of companies don’t need to invent the new paradigm. OpenAI has done this so many times. That’s their bread and butter: never doubt OpenAI’s ability to release a blog post and a plot that changes how people think about AI. I still think it’s going to happen a few times in this massive boom over the next four years. OpenAI just kind of has that sense of what is the thing that you can push on a bit earlier and just transform things. But I don’t expect — and other people wouldn’t expect — the Chinese companies to do that as much, because it’s just such a culture of, I guess, building. I don’t know how to describe the positive version of this. Maybe it’s slightly more practical-minded, in terms of: it’s your job to build this thing. A lot of the researchers, maybe because they knew their managers — some of them had managers in the room — see their role in the company as being to make the models excellent. And especially for students, I work with students and that’s what they say. I work at the Allen Institute and we have students that will co-lead our language models. It’s not that surprising, because if you do an industry research job in the U.S., a lot of mentors will tell you that you’re kind of free of the burden of bureaucracy and politics. So the naivety of students, and the simplifying, is actually so good at just getting a lot of technical work done. There’s also the life-stage side. If you’re younger, you don’t have as much family, and you normally haven’t built up as many habits and other things you do with your life. Language models are so complex, and the amount of context that you need to absorb to understand what the bottleneck is — there’s so much information, and you have to be able to pick what the bottleneck is and break it. If you just don’t have the mental space to absorb all the context, you kind of end up doing things that are cute but don’t make breakthroughs on the model. So that’s kind of a difference that I’ve seen in people who were both very successful academically before language models. Some of them are able to pivot to this practical mind, which is: what is the state of the system? How do I improve it? And then some try to make kind of these abstract frames of what’s happening and approach it like an academic, and it normally doesn’t improve the model as much. So I just kind of see, if the academic system is a bit more practical-minded, a bit more structured, and the work you’re doing is structured in the language model — make this kernel implementation faster, make this idea work — then maybe it can be... I think it’s an oversimplification. I push on that a bit in the piece just to really contrast what you could think a U.S. lab would look like. And I have a few anecdotes. I’ve heard a U.S. lab paying off a researcher to be quiet about their thing not being in the model. All of these one-off things are more storytelling devices than anything, because most one-off things don’t matter at all. But also Llama 4 imploded, and that was because it was described as a Game-of-Thrones political-style environment, with all the VPs vying for influence and showing that their thing made the benchmarks go up. It kind of fell. Many, many people will tell you that. And we’ve had the Qwen turnover, but it doesn’t seem like it was quite the same type of thing as Llama 4 or xAI. xAI barely exists now. There have been some dramatic things in the U.S. with how these companies have kind of come and gone out of the fold. Grace Shao (14:55) Yeah, I kind of agree with you, but also I would push back on that. I think there’s obviously a more rigid and competitive academic system, which by default in East Asia results in a culture of students following the bureaucracy and authority a bit more. So I agree with you in the sense that they’re very pragmatic. They focus on the task that is given to them. However, I wonder if things will change with how AI will disrupt education. That’s number one. But also, a lot of the young researchers that you’re working with today seem quite different. At least a lot of the entrepreneurs I meet today are born in the ‘80s and ‘90s, some even younger and born in the 2000s. And I think there’s a kind of aura or confidence coming from them. If anything, you want to say they’re a bit more individualistic-minded. You went to Shanghai, right? They are dressed very, very uniquely. They have these outrageous outfits on the streets. People are seeking individual ways to showcase their personality. So I wonder if that will shift. But for sure, for the academic institutions like the Tsinghua and the Beida of the world, they are still very old-school. But I would say that is the same maybe in some academic institutions in the West still. Okay, I think on this topic we can go off on a tangent on academics, but let’s go back to China’s ecosystem. When DeepSeek V4 came out, we talked about it offline, the two of us, quickly about a piece I wrote saying how DeepSeek is starting to look a bit more like a base layer for China. And if anything, some of the labs kind of admitted to that. They’re like, we have very limited resources. And to your point earlier... Nathan Lambert (16:11) Yeah, you could take that in so many tangents. Grace Shao (16:34) Limited people — these labs are tiny. They’re run by 100 to 200 people max. Limited capital, obviously limited compute. They have constraints all around. And in that sense, in a way, the ecosystem’s looking less like a zero-sum game and more like different players optimizing their own strengths. So correct me if I’m wrong, but DeepSeek is providing a base layer where a lot of labs will quickly follow and basically adopt a lot of their engineering breakthroughs. And then Zhipu, Z.ai, will focus on the coding; MiniMax focusing on the multimodality, et cetera. There are a lot of these different players. ByteDance, obviously, very, very focused on their video models. And Qwen, like you mentioned, had the whole open-source saga break apart with Lin Junyang leaving. But in general, they’re still kind of the leader in hyperscalers on that front. So everyone’s doing their own thing almost, instead of really... Nathan Lambert (17:27) I agree with the people specializing, which I think is normal business evolution. You figure out a bit where you’re good at. And there’s so much opportunity that they are like, okay, I’ll follow this because they see that they’re good at it. I just am more skeptical of DeepSeek as a base because I have no idea what DeepSeek is doing. And some of the labs when we were there, because DeepSeek V4 had just come out, were like, yeah, we look at the things they’re doing, but they seem more intricate than needed. And if you read the paper, there’s just so much going on in this model. As a researcher, I’m like, some of it seems a little fake or a little dependent on their setup and not necessarily going to work in every model. Grace Shao (18:04) What does that mean? Break it down for me. Nathan Lambert (18:18) Essentially, I will say that building an LLM is dependent on where you have your GPUs, your pre-training dataset, your intended deployment setup, and stuff like this. So you make decisions based on your constraints, and you build the model. DeepSeek has these constraints and they end up with their model, but Moonshot and Zhipu have different constraints, maybe more flexibility, and they ended up building a different model. They will test the DeepSeek innovations. So they’ll say things like, X innovation doesn’t improve our model. These two organizations are on different development paths that have core similarities, like these large mixture-of-experts models and the general methods are similar, but a lot of the parts end up being a bit different. That’s why I’m like, I don’t know exactly. If DeepSeek was a base, you would see the Chinese labs just do post-training. We just take the base model that’s out there and we adapt it to our domain of specialty. And we have users that do that, which is something that I think about a lot. I’m thinking about starting a post-training lab and how to format post-training research better. So I think about this a lot. I think about what a shared base actually would be. They go through — some of these labs put an extreme cost on creating their base model. And if they didn’t need to do that, they wouldn’t. One of the labs told us how long their pre-training run was, and my jaw dropped. I was like, that’s way too long. Any U.S. advisor would be like, you’re taking way too much risk on this pre-training run. If they didn’t land that pre-training run from one of these past big MoEs at a Chinese lab, I don’t know if the company’s dead, but that’s a huge amount of time. Most U.S. companies now know that you don’t want your big pre-training run to be more than a few months because it’s just so much risk and time to put all your eggs in that basket. That’s a sign that, in that case, they don’t have as big of a peak-size cluster. Essentially, pre-training time can come down a lot when you have a bigger overall cluster; you can just get more throughput on it. But if your biggest cluster is smaller, it’s harder to get a certain amount of throughput, so you use that one for longer. That’s a compute constraint. To loop it back, I think the specialization is real, but I’m more like, I have no idea what DeepSeek is doing. I know they’re raising money now. I don’t know what the plan is there. They seem the most without a specialty in the Chinese ecosystem. Grace Shao (19:59) Dependency on. Yeah. Mm-hmm. No one knows, though. No one knows. They’re secretive. But that’s my point, right? I feel like they’ve been kind of nationalized, whether willingly or not, because they’re taking the Chinese government’s money. They’ve kind of gone secretive. And it’s not like there’s a secret that they prefer Chinese-educated researchers. They’re keeping a very domestic stack, from talent to capital to the whole stack. So to me, it seems like they’re being Huawei’d, in some ways, because they did well and they got their name globally, and then by default they’re becoming the next Huawei, willingly or not. Nathan Lambert (21:01) I don’t think nationalization makes you a base for the other companies, at least not at this stage. There could be something, but it’s hard to force. Grace Shao (21:06) But then you have some incentive, right? But then it is some incentive. You’re like, well, if you can propel one of the teams and propel the whole industry as a whole, it could be in your KPI or some kind of unspoken expectation. Nathan Lambert (21:17) The coordination problem is so hard. Essentially, both in the U.S. and China, even the open labs, what they do is they fork open-source code and match it to their internals, and every company does this. Therefore, all the improvements that could potentially be going to the open code and forming this base that is far more efficient — they’re not completing the feedback loop. I think China could be closer to it. If people really lean into DeepSeek as a standard architecture and DeepSeek shared their training code and all the specifics and how to do this, from a Chinese economic perspective, that would be a huge win because you’re just saving compute. But I think it’s too decentralized and too competitive to have that happen. It wouldn’t happen in the U.S. either. Grace Shao (22:04) It’s so cutthroat. Yeah. Nathan Lambert (22:08) Even though I think for open models to be closer to the frontier, it would be better. I talk about open models in the U.S. needing a consortium. But there’s definitely enough money to make a consortium in the U.S.; then you fail because the model won’t be good because you’re feeding too many asks into the model. That’s the only way to create a shared base. Grace Shao (22:25) Interesting. So it’s not really just commercial. Yeah. It’s not the commercial reason. Okay. So if you had to give a high-level commentary on each of the major labs, what would it be? If you look at ByteDance, Alibaba, Tencent Hunyuan, if they’re relevant, DeepSeek, Moonshot, Zhipu, MiniMax, Meituan, Xiaomi now being part of the ecosystem too. Nathan Lambert (22:46) You might have to prompt it or say more, but I could just kind of ramble through them, which is kind of fun. Alibaba: cloud-focused, understands that open models can enable more usage of platform. So I would say Alibaba is very, very cloud-focused. ByteDance: mostly characterized by everybody else being intimidated by them, and very user-focused, including multimodal. Kimi: vibes of the office were great. It would be one of the best startup vibes that you would visit among U.S. or China. Zhipu: very AGI-pilled, surprisingly cautiously excited about being entity-listed, even though they have no idea why they are, because they’re like, it stamps them as a big deal. And then there’s some... Grace Shao (23:27) I think they previously worked with SOEs. That’s the main reason. Or they still do, but that was one of their main sources of income. And unfortunately, because a lot of these labs spun out of Tsinghua, and Tsinghua is, for people’s context, in Beijing. It’s really close to the government, obviously. But the thing is, when it’s close to the government, it could mean there are three layers of agency underneath the actual government apparatus. But then people like to link it to the fact that it’s taking government money, so therefore they are suspicious. It’s very unfortunate, I think. A lot of companies get thrown into that category. Even companies like Lenovo and a few other Chinese companies have previously been called out by U.S. senators saying, they’re taking Chinese government money, but really it’s that their scientists or their research labs spun out of a certain government-affiliated or government-funded academic institution. That’s what it is. Anyway, yes, go on. Nathan Lambert (24:23) Yeah. Some more would be: Xiaomi — surprisingly great research vibes for a new team at a random company. They seem to be crushing it. Grace Shao (24:31) What do you think of Luo Fuli? The star researcher. Nathan Lambert (24:31) I didn’t get to meet her. I think she’s as close as they have to a star researcher right now. There’s the tier of star CEO, which there are obviously others — Dario and Sam, the analogies are there — but the star researchers, like the Sholtos of the world in the U.S., obviously you can come up with many more. She’s the closest you have to this. I need to watch more interviews. We’ll see. But she wasn’t in our meeting. But they just seem to be doing the right thing. They’re making general models. They don’t really have specialization yet. Florian, the person who helps me write about open models on Interconnects, and I took a detour to go see Meituan because we’re like, why is Meituan building these models? And they’re very practical about it. It was a less glamorous visit at a normal tech office. It wasn’t an official visit for them. They were like, yeah, we’re a major online platform. We obviously are going to use LLMs everywhere once we need to build our own LLM and specialize it to our products, which, surprise, is very practical-minded. I’m guessing there are many more companies in China like this. Grace Shao (25:39) That’s what Tencent’s saying too. It’s because they want to serve their existing consumers and optimize their LLMs for their own distribution and their own basic interface or activity loop. Nathan Lambert (25:52) Yeah. After I left, some people in the group went to Xiaohongshu, like RedNote, and they’re there. They’ve released some language models that are multimodal. They’re like multimodal data-processing things. So a lot of them are not that surprising. The startups just have different cultures. I have met some MiniMax people before, so I left the trip early before MiniMax on this one. But MiniMax was quirky. They have a ton of women in their company, which was very fun. And they have products. They’re maybe slightly more product-focused, but I feel like the quirkiness of the company kind of matches maybe Western confusion over what their products are doing and what they’re trying to do. But it kind of matches their language models that are a bit more efficient. Grace Shao (26:35) Well, they came out with a lot of very consumer-focused applications, right? They had Hailuo and Talkie, all these character companion-bot products before. Nathan Lambert (26:45) Yeah. And then the last one I went to was Ant Ling, which is also very corporate, but in a less intense way, because I think they see it as serving their own products, whereas Alibaba Cloud is like, this is the gold mine we have to win. It’s a much bigger deal for them than Ant Group. But a lot of these things, when you list them — I don’t know, eight to 10 companies — they’re all pretty reasonable with respect to the age of the company and what the company does best. There’s not as much confusion. Grace Shao (27:14) Yeah. And Ant is low-key best at medical chatbots right now, which I guess makes sense because everyone has access to Alipay. And then for seniors, apart from WeChat, it might be the only application they’re using on a regular basis. So it became the default medical consultation app, which is really random, but it’s their niche now. Yeah, I think you’re pretty spot-on. It’s pretty cool that you got those takeaways, even just meeting with them for a couple hours. Nathan Lambert (27:41) I have been reading about them for so long, so a lot of these priors are easy to confirm when they kind of fit with things you have seen. The Chinese showroom culture is so interesting, and also one of the most surprising things to have at software companies. It’s so funny. They’re definitely appealing to Western audiences. Z.ai had poorly translated merch. What was it? Something so — it would be borderline inappropriate translation in the U.S. It was like “ship big, go hard,” or something. Just some really weird translations. And they have live API statistics in their showroom. So Z.ai was like, we’re serving 5.5 trillion tokens a day. All the U.S. companies are so closely watched for when they announce token statistics. I know at least one of these numbers is wrong. It’s something like Fireworks does either 30 or 300 trillion tokens a day — or I meant Together for that one — and then one of Fireworks or Together, and the other one, are like 100 trillion tokens a day. Don’t take these as sourced; go look them up. There were some public announcements recently, but those were the first updates that anyone has on major infra companies in the U.S. Inference is a huge market. You don’t hear anything from Fireworks because they’re just struggling to demand and they’re making bank, because inference is a much better thing to sell than bare metal. Essentially, inference is selling the software implementation to serve tokens more efficiently, and you can just get more margin when you improve the stack for a fixed model. So a model comes out and you host it, and then you can make your stack more and more efficient on that model. You just get more margin and hopefully growing usage. That’s way different than GPUs, where the best case is that you lock in a huge commitment for a long term. Just being able to walk into an office and learn about their API is interesting because they also had geographic distribution, which was like: China was, I don’t know, two-thirds; U.S.A., 20%; and then the last percent was Singapore, Korea, Japan on the Z.ai API. So that’s cool. This is s**t that I always want to know about the companies, and I have no idea. One of the things I always want to know is: how are open models being used outside of the U.S. and China, and has this decades-long process of technological diffusion started to kick in in a way that any company can measure? I don’t think anyone has good data on it yet, but I think it’s obvious that at some point, open models that are cheap to run are going to have some interesting playbook across the globe for the long tail of countries. Maybe I’ll just walk into the front door of a Chinese open-weight company and get my answer. Grace Shao (30:30) But actually, I think the culture of these labs — a lot of them, because they’re run by really young, passionate people — you would feel like they’re a lot less commercialized or less corporate, or at least less sleek. They’re not sophisticated with, you can say, the capital-market side of things, but you can also say that they’re just really naive and open-minded and passionate about the product they’re working on, with less of a corporate guardrail built around them. Nathan Lambert (30:56) Yeah. Grace Shao (30:57) Okay, I want to talk about... Nathan Lambert (30:57) Yeah, go ahead. It’s like one of the people at Z.ai who’s known on X — I don’t know, 9,000 followers — it’s like Lu. She came up and was like, hi, I’m a student, I’m 20. I’m Lu from X. And I was like, that’s hilarious. There was a lot of s**t like that. It was like, oh, okay. I don’t want to call her a kid, but it’s like... Grace Shao (31:06) Yeah, yeah, yeah. And I think the one that runs Moonshot’s developer ecosystem or something is literally a girl fresh out of school, right? And she just posts hilarious memes all day long. There’s no filter on her social media. It’s funny. Okay, we go on these tangents, Nathan. We need to come back on track. Open source, open weight. Why? Why do you think Chinese labs are adopting it or embracing it, however you want to put it, especially after visiting them? Is it because they simply have to, because of what we talked about — they are leaning on each other because of all the constraints they have? Or do you think the philosophical drive is actually bigger in that ecosystem? Or is this a bigger strategic thinking for diffusion in the long run? Nathan Lambert (31:54) I actually don’t feel like it’s that special ideologically. I think it’s easy to say the ideological line when you are doing it. Now you can look at Zuckerberg: he said the ideological line when he was doing it, and then he stopped. I think it’s mostly just that, for one, distributing within the U.S. ecosystem, especially to enterprises, is the highest-value market, and they can’t sign many enterprise deals. And the closest best thing is things like Cursor adopting Kimi’s model. Even if Kimi doesn’t get paid for that, they’re happy. That’s the biggest sign of credibility for them, and they can figure it out in selling tokens or whatever in the future. Practically speaking, one, the only way to influence the U.S. market is by releasing these models. And two, it seems like they don’t feel like they’re losing as much if they release and share things. If the model was closed, they just think they would get less influence, they would be seen less, fewer people would use the model, their actual paid offerings would be adopted less. It just seems almost overwhelmingly obvious, because there are all these benefits and not as obvious of a drawback. There will always be better models, and just keep going. But I think every scientist loves... Grace Shao (33:08) Then why are so many U.S. labs against it, or not willing to? Nathan Lambert (33:12) Because they can make as much money without it. Anthropic and OpenAI make more money by not releasing them. They can just make so much money, so why bother thinking about an open model that doesn’t make money? There are different scales of influence. Same with Google. Google’s making so much money. I think Meta will make a lot of money by having good AI models in their products, if they get their act together. Even Google could release more models. They have so many surfaces other than Gemini that need AI to be commoditized and used, like the cloud and all of this. Meta could release the models. It’s just not worth the effort for some of them. They’re like, we need to do this high revenue target; it’s too much of a pain to go through legal and make it ready to release. Why bother? I don’t know, maybe it’s a little bit of a cynical take, but I think Microsoft and Meta could release their best models openly because they benefit if it’s a commodity layer. But I don’t expect them to, because it’s just kind of like the benefits of focus are so high, and they just kind of see it as something they don’t have to do. Grace Shao (33:56) And it’ll be good for them. But then eventually, we will see some consolidation in the market as well, assuming — because you can’t really have 10 labs in each dominant country right now all exist. Nathan Lambert (34:26) I do expect consolidation. I think this is potentially a subtle cultural point, which is that the U.S. labs are more likely to buy into “we’re special, we need to go fast, keep it closed,” and the Chinese labs are not. There could be something there. That’s also who the decisions funnel up to. I don’t know. I talked to the Alibaba people that make these decisions. I can’t say all the things that they say about them. Some of these were two-on-one and off the record, so I can’t say all these things. But at all the other labs, there is a person that makes the call, I’m guessing. I think those are senior leadership that we’re not talking to. So it’s kind of hard to know exactly what they really think. I definitely expect consolidation. My thing is that I expected it in China faster because the capital markets aren’t as strong as in the U.S., but I don’t have a model for that. I think you can model it, which is: what do you think the revenue growth would be? What do they need to do to raise to keep training bigger models? What is the compute cost? Then you look at the potential raises and think about which country would not be able to do that race first. But also, it’s this wild thing with OpenAI raising $120 billion. Are you kidding me? What is that? Grace Shao (35:47) Yeah, the valuations in the U.S. are not really understandable by anyone else right now. I think in China — so on your point on that, I’ve been writing about this and I think it would make sense for Tencent just to buy out one of the labs. They have the money, they need the capabilities, and frankly, they’ve really been struggling to compete with their LLMs, with all the labs talked about just now. So my... Nathan Lambert (36:05) Their licenses are so bad. They release all these models that have horrible licenses. They’re not that good, and the licenses are just horrible. Grace Shao (36:13) So I feel like it financially makes sense for a company like that to optimize and just buy out a lab. Then the labs can also lean on their distribution, because at the end of the day, how are they going to win consumer mindshare or distribution in China right now when it’s really just dominated by Alibaba, ByteDance, and Tencent? That’s my spiel. But when I spoke to some of the researchers... Nathan Lambert (36:33) I think big companies have a lot of inertia, and the senior leadership has the call, and they can have inertia. I still think Apple just ends up buying some lab for $25 to $50 billion. It’s not the worst thing. Just golden-handcuff the researchers. Some will still quit. Grace Shao (36:43) Yeah. But I think right now they don’t want to. The labs still have a dream. Some of the researchers still have a dream. So when I spoke to a lot of them, they’re like, no, we don’t want to do that. We want to commit to our own frontier research. If I wanted to join one of the big tech companies, I could have. So why would I want to sell? That’s what the researchers think. But to your point, we don’t know what actually the one person or two people at the very top think, especially if they continue to have hurdles with compute access and capital access, which brings me to the question. Nathan Lambert (37:14) It also depends on your view of inference. You can ask your next question. I don’t need to cut you off. It depends on your view of inference. If these agents are just so much inference, I do think it’s going to be an oligopoly-style market, not a monopoly-style market. And what’s the difference financially between two and four or five big companies with great models? Is that actually not sustainable if there’s so much demand? There are a lot of cases where we have two or three, like the cloud, but what’s stopping that from being four? Grace Shao (37:40) I think they will be the infrastructure providers. Yeah, yeah. And they would kind of lean into each of their existing ecosystems or distribution, whatever you want to call it, and serve certain specific models for specific uses. So enterprises can choose what matches their needs the best as well. I do want to bring the conversation to a more contentious topic, which is on distillation and model convergence. You raise the question of whether Chinese models are structurally different. Often we are hearing claims saying a lot of these labs are about three to six months or six to nine months behind U.S. labs. There’s obviously a lot of noise or allegations and accusations from certain U.S. labs saying Chinese labs are distilling them. How do you actually see that accusation or that kind of dynamic? Nathan Lambert (38:36) The biggest unknown that I don’t have an answer to, which actually has a lot of sway, is how much of the Chinese companies are actively trying to hack APIs versus just showing up as a customer and paying. If you’re trying to hack the APIs, normally you get reasoning traces out so that you can create a reasoning foundation that would be similar to the model that you’re trying to do this from. That’s very different than the API standard form, which is just the output of the model, which is a less direct process for learning from. I don’t know the magnitudes. If it’s more just like, I walk up to an Anthropic API and I use it as intended, but I’m making a competitive model, I’m not very sympathetic to Anthropic. They could ban it if they want to. And I think the impacts are kind of a standard practice. You can do it with many different models and so on. The evidence Anthropic provided is not large enough scale where I’m like, this is industry IP theft at mass scale going on 24/7/365. So there’s definitely some gray area to what is actually happening in distillation. That’s why, on the policy side, I try to push people to not call all of it the same thing. Essentially, using any API endpoint to make synthetic data to train your model is some form of distillation, but it’s very different if you’re trying to break this model so that it gives us a different behavior that is hyper-useful for training and not get caught. Those are pretty different actions, and they’re all looped into this common phrase of “distillation” right now. That’s my biggest problem, which is that academic researchers and small companies use distillation extensively as the core of their business and the core of research methods. So if the U.S. government nukes that as a thing that could be done in the AI ecosystem, it’s mostly bad for small players, bad for U.S.-China tensions, and bad for academics. That’s my primary concern. And then trying to get the labs to actually say more. There’s a distillation side and then performance is the other side, which on benchmarks, it does seem like the Chinese labs tend to be six to nine months behind. When it comes to general use, I’ve always found the closed models to be better in ways that are hard to measure. So I go very back and forth on whether the closed models are better. I think we will especially see Anthropic and OpenAI pull ahead on knowledge-work tasks like legal, healthcare, financial services, because I just don’t see the Chinese labs paying for that data. All that data is going to be people that charge hundreds of dollars an hour to annotate and create these environments. So it’s a whole new capital build-out that goes on there right now. It’s going to be billions of dollars if you’re going to buy a billion dollars of data and a billion dollars of compute and a billion dollars of talent to train your model. Grace Shao (41:30) They don’t have the money. Nathan Lambert (41:30) I don’t think they have that. Mercor has some of these evals, and I think there is a bigger gap there. So it’s very interesting. Florian, the guy that helps me, and I disagree on it. It’s this fine line between, yes, the evals — coding and lots of these things, and even random evals that surely the Chinese labs aren’t training on — the open models really are genuinely crazy impressive scores. So I think there’s also a tester’s bias, where I don’t use the open models as much. Maybe it’s hard to ground in my head what I was doing with AI six to nine months ago. I wasn’t even using Claude Code as extensively. I guess the question is, at the end of this year, can I use an open model in something like Claude Code and feel like it works at all? That’s the test on the performance gap, starting in June, June to August, and whether or not that hits. I don’t think the open models have hit that yet. I think it would be way more of a narrative if all the companies spending billions of dollars on Claude are like, oh, we can spend 1% and just use DeepSeek. These CIOs and all the big companies — some companies spend more on tokens for their employees than on headcount. These are normally startups. But they would happily reduce that token cost to 1% expenditure if it really was that similar, because then you could just use 10x the tokens. I don’t expect that to happen. And I expect things like the latest Claude and GPT-5.5. I expect more of these things through the year, and we’ll see if I end up being right. Both are right at the middle of us, as a world, getting more clarity on them. They’re like 18-month-long stories unfolding, and I feel like we’re just in the middle of performance gap and distillation and learning more. Grace Shao (43:25) Yeah, it’s interesting. You mentioned — it helped me recall a conversation I had with other people as well. The point on distillation is that I just had a conversation with your former colleague at Hugging Face, who leads APAC, called Tiejin Wang. He was just saying, look, the distillation accusations don’t really make sense because we’re all distilling off of each other as we speak. I’m learning from you; you learn from me. We’re distilling. It’s so vague of a terminology to just use that to accuse all these various behaviors. So to your point, I think people in the technical world who understand what’s happening actually want more clarity on what is the gray area, what is actually black and white, and what is not appropriate or unethical. That needs, I think, the industry to come together to really put guardrails and rules around. Now, number two on the compute side and the data side. Something anecdotally will be interesting to you is that when I spoke to one of the lab researchers in Beijing, I think in February around Chinese New Year, they were saying, look, they want to get better data, but they can’t because usually a lot of American labs would pay tens of millions, if not even more, like a hundred million dollars, for a set of very obscure or niche datasets, but they would have an exclusivity contract. What the Chinese labs will do is that they will literally wait out the exclusivity contract and then, say two or three months later, pay for it at one-tenth or one-twentieth of the price for that same dataset. So then once they start post-training on that dataset, that’s where the three to six months or six to nine months come in as well. Yeah. On that note, I want to... Nathan Lambert (45:00) Yeah. I think the data industry in the U.S. has two things. One, the lab asks the data vendor, we need this specific type of data. And the data vendor is a network that connects the people to the lab. The other thing is the data vendors know evals that are important, so they try to create good data for hill-climbing on specific evals. That data could be sold to multiple people, but is less expensive because they make it once and expect to eat margin or take margin on it. There could be a pipeline where once OpenAI is at the cutting edge, creates this new thing, they create deep research, then the data industry is like, let’s make things that are a little bit cheaper to sell. So there is time lag in these various things. But I heard the same thing on the ground, where they have a negative view of the data industry. It’s like, quality is bad, we don’t really have access, we do some in-house. That’s a very big difference from today, which is that you have the data companies in the U.S., which is insane. Grace Shao (45:53) Yeah, the American data companies are so mature. It’s its own sophisticated ecosystem. Before we get into data, I actually want to ask you this question. I think recently a lot of the narrative is now saying, look, Anthropic and OpenAI have kind of proven that pre-training scaling laws continue to hold, especially with the recent models. There’s an obvious compute constraint on the China side that we talked about. And then it will likely be even more amplified with the absence of Blackwells in the coming months. So as we move forward in this race, per se, if you have to put it in China versus U.S. in that sense, will we see a wider gap between the performance and benchmarks between the Chinese labs and U.S. labs? As in, will we see the gap going to 12 months, 24 months, as Chinese labs are very, very constrained on compute for pre-training breakthroughs? Nathan Lambert (46:40) I think it’s more of pre-training as a thing that you could actually finish. How big can you pre-train a model that you can finish and serve? The Chinese labs could train models that look like GPT-4.5, which is this giant model, but you can’t serve it. They end up training a model that is 2.5 trillion parameters and they release it, and no one can use it. They could barely serve it on their API because they don’t have Blackwell NVL72 racks or something — these racks that are definitely what are serving these large MoE models. They just don’t have the quantity of these. So there’s a difference between models that you can build and models that are actually useful. I think some of the Chinese labs are definitely like, we don’t need to release the gigantic models because nobody is going to use them in open weight. The biggest models end up getting served via API. So there might be some segmentation in that market. But I do think the inference and amount of economic resources that you have to serve your customers is becoming a thing that dictates what models are built. That’s why I think the gap will continue to rise. All signs point to GPT-5.5 being a bigger model, and I don’t expect that to stop. And then the economics of it is just the basics of: you need a certain volume to have the margin to support the research, because you can’t keep raising these ridiculous rounds forever. I think OpenAI, Anthropic, and Google are the only people with that AI usage volume to keep marching down the scaling laws to another 10x of training compute, which is mind-boggling amounts of investment in a model. That’s why, when the economic markets slow for fundraising, the model gap between these big three will just show a lot more. That’s the distilled way to say my prediction of when things will look different. It’s like these labs can’t fundraise, they go public, they can’t generate revenue more on their paid services, and then it’s just: look at how much training compute can be allocated or can’t be allocated. Grace Shao (48:41) Yeah. Basically, we’ll see a bigger gap, I think, in the coming months. Then what can make up for that? Domestic chips, or, like you said, better data. And why is it that sometimes people assume China has a very strong data ecosystem or data products, but actually the data vendor ecosystem is very weak in China? Nathan Lambert (48:41) So generally, I think I agree with what you said. I don’t know on the data side, but the way domestic chips could help is that if Huawei chips are fine for inference, and if they have sufficient volume to support the inference economics, which then trickles back into revenue, my read is that they just don’t have the volume of the chips, especially spread out across the amount of companies that they have. Essentially, the total FLOPs of Huawei, all the things produced, and it’s going to all these different places — it’s just not big enough. It could be something like ByteDance and Alibaba, with offshore data centers, can keep up a lot longer because they have access to Nvidia compute and have for a long time through this kind of offshoring. Maybe that stabilizes the ecosystem, and we’ll see what the AI startup, the younger startups like Kimi and Z.ai, end up doing. No one wants to do this, but if they pool resources, they last an extra year. You get another order of magnitude if they all pool together, but I don’t see them doing it. Grace Shao (50:00) But that’s the thing we were just talking about, right? MiniMax and Zhipu, how can they possibly compete with the hyperscalers at this point if you need offshore data centers? And the fact that Zhipu is on the Entity List doesn’t help, right? It’s not going to be easy for them to access these data centers either. Nathan Lambert (50:12) Yeah, I think they can’t. I think they won’t. Human nature will make it so they won’t collaborate. They’ll just do something smaller. They’ll just have successful businesses that are different. Grace Shao (50:22) They just have smaller ambitions, want a smaller piece of the pie. Yeah. Okay, so you wrote something like, nothing’s a secret, but everyone wants Nvidia chips. They want it, they don’t know how to get it, they’re fighting over it. Nathan Lambert (50:34) Yeah. They’re the only thing that works for training. All the models are trained on Nvidia. I don’t believe the DeepSeek propaganda that it’s trained on Huawei. The only models that are trained on Huawei are tiny. Inference on Huawei works. Every lab is like, inference on Huawei works. The labs that don’t have meaningful inference are like, we are told to get Huawei, so we buy them, but we don’t use them. Earlier research labs are like, we don’t have any inference and we don’t have a need for Huawei. Any company that has meaningful use of their models has figured out how to run them on Huawei for inference, which, to Jensen’s credit, is like — it’s happening when he said it was going to happen, but it’s not that surprising. Grace Shao (51:11) Yeah, the Dwarkesh interview. I don’t actually understand why he got so much hate for it because even without your political stance, what he said actually made sense logically by saying, if you don’t sell them the crappier versions of what we have, they will have an equally quite crappy version to serve themselves, or they would just want... Nathan Lambert (51:28) I think they would buy both. Buying both is actually true. The amount of Nvidia chips that you would have to sell to China for them to stop buying Huawei — because Huawei is almost surely way cheaper because Nvidia margins are insane — when would they actually stop buying both? Grace Shao (51:43) But then you have to go on CANN. You have to reroute everything back on CANN. The developer ecosystem is not there. That’s Jensen’s point, right? Or the habits are not there. So I think that’s what, when I talked to a research lab... Nathan Lambert (51:50) Yeah. But I’m saying they would also use Huawei. I think they are so supply-limited, they would use both. Anthropic uses everything. A lot of companies in the U.S. will use multi-platform. Meta is a huge buyer of AMD. Demand is so high that any chip that is potentially viable on the models within a few generations is very valuable. And the fact that you can run some reasonably large model on any Huawei chip is a big line crossed for Huawei. I don’t know if they can produce the volume of chips and scale that quickly, especially as they try to move to lower nodes. That’s the standard semi debate. But the question is: can Huawei scale production? That’s the only question. And if Huawei can manage to scale production, Jensen will just look really right. If Huawei can’t scale production, Jensen will look a little bit like a lunatic, but it will be outside of his hands. Grace Shao (52:43) And we don’t really know what happened during this trip. It seemed like nothing really substantial happened after this big Trump delegation. It was more like a high-profile tourism trip versus an actual deal trip. Okay, I want to ask you something you wrote about that’s a bit niche, not something you usually write about. It’s on the SaaS side of things. You said that there’s a common argument that China struggled to monetize AI because they’re unwilling to pay for enterprise software. We looked at how China tries to monetize on consumer AI, but clearly that’s not really been proven yet. In your piece, you push back on the claim and say that there’s a distinction between SaaS spend and cloud or inference spend. Tell us about what you think about that ecosystem and how Chinese AI labs are trying to make money maybe a bit differently from American AI labs. Nathan Lambert (53:32) I don’t know if it’s necessarily different, but I ask a lot of researchers about this. They say that everybody is trying the new AI tools when they come out. If they don’t like them, they stop using them. If they like them, they keep using them on the consumer side. So something like Claude Code would be an example: tons of people tried it. I’m guessing lots of them churn in China, just like in the U.S., but consumers are very quick to adopt and try new things, but won’t stick if it’s not actually serving them. And then the enterprise is like: there’s definitely cloud that exists. Digital services are gigantic. They essentially think that there’s more runway for making money on AI models that falls into that. And they all use coding agents; they all use Claude. It’s a hilarious thing. They’re all very Claude-pilled. There’s almost no mention of Codex, where in the Western media, Claude versus Codex is this whole thing. They all use Claude. And that is obviously a paid service. So I think there are cracks in the argument, and I expect AI models to be seen as a bit of cloud, but potentially it is the thing that changes some of the expectations, where it’s just so transformative because they’re so competitive, and it could be seen as a bit of a phase shift. Grace Shao (54:41) Yeah, and I think it’s a generational shift, a phase shift. Also, actually, recently Doubao raised their prices on Seedance usage and whatnot, and it’s a shift into trying to capture the prosumer market. You can say the average uncle and auntie on the streets still don’t want to pay for a consumer app, but I think there’s more prosumer market share that could be captured in China, maybe not fully enterprise either. I want to ask you about government roles and geopolitics. I know there is a common narrative that usually people assume Chinese AI labs are heavily subsidized. Actually, when I was in San Fran in March, I was at a dinner with a couple of investors, mostly public investors, and one guy asked me, “Hey, are all labs just basically subsidized by the government?” I was like, definitely not. The majority of them are not. If not, they frankly don’t want to take money from the government. It was really hard for him to understand that, because I think the misconception is all Chinese labs or Chinese tech are just funded by the government. Kind of to our point earlier, where any affiliation to any government agency, just by default, is assumed to be therefore backed. First of all, the government, I don’t even know if they have that much money to give out. Number two, I don’t think that’s how competition works, right? So what’s your thought on all of this? Nathan Lambert (55:55) It seemed more like a provincial government trying to help the companies do stuff, which is like get offices, get talent. I don’t know what the provincial government can do. In Beijing, there’s Beijing Academy for AI or whatever, which is a real research institute that’s just funded by a certain neighborhood in Beijing. It was like, okay, the U.S. could do that. But much less of the Ant Group-style thing, which is government takes major ownership stake in an investment round and goes on. Maybe Kimi’s latest round, there were mentions of government-backed VCs, and I don’t know how that kind of intermediary works. So I still think it’s very indirect. And because the government system is so competitive across the different layers, each of those layers are competing to help the companies, but they don’t have piles of cash sitting around to buy GPUs. Grace Shao (56:44) No, they don’t. And they frankly don’t know what they’re doing half the ti

19 de may de 20261 h 3 min

AI x education, a contentious but unavoidable future. Designing tech for children with Dex's Reni Cao

I spoke with Reni Cao, the CEO and co-founder of Dex. Dex Camera is a language-learning camera for kids. Reni is a dad, a former product lead at YouTube, and on a mission to build technology that does good for kids and gives digital autonomy back to parents. We dive into his personal story from his high school days that drives his passion for AI, and why he believes the current education system is a “cookie-cutter” that fails curious kids. We get really into the nitty-gritty of what makes “good” tech versus “bad” tech for kids and why the category of ‘children-first tech’ is very overlooked. Reni explains why most children’s apps are built on an “attention economy” model that forces them to compete with addictive content, and why his team needed to build physical hardware to break that cycle. We tackle the hard questions, including the pushback from parents who believe in “no tech” childhoods. And he shared his most non-consensus view: that the era of standardized, industrial education is over. He believes we are entering a golden age of “scaled homeschooling” where AI meets kids where they are. Whether you’re a tech investor or an anxious parent, this conversation about nature versus nurture, “nei juan” (involution), and raising resilient humans in an AI world is a must-listen. Every episode, I bring in a guest with a unique point of view on a critical matter, phenomenon, or business trend—someone who can help us see things differently. Season two will host a series of guests from early-stage investing, as well as builders, founders, and product managers. For more information on the podcast series, see here. [https://aiproem.substack.com/p/launch-of-differentiated-understanding] To find the previous episodes of Differentiated Understanding, see here. [https://aiproem.substack.com/podcast] Chapters 00:00 Reni’s Journey to Dex Camera 03:48 Designing for Children: Principles and Insights 08:05 Technology’s Impact on Child Development 12:09 Bridging the Gap: Business and Product Design 15:36 The Role of Parents in Tech Development 25:20 Leveraging AI and Language Models 29:48 Value-Driven Pricing Strategy 32:05 Defining the Product Category 34:33 Subscription Models and Content Delivery 37:58 AI and Parenting: Balancing Technology and Safety 43:29 Unexpected Use Cases and Impact 47:29 Personalized Education and Parenting Philosophy AI-generated Transcript Grace Shao (00:00) Reni let’s start with your personal story. Who are you and who are your team members? Because when I met you in SF, I was so enamored by the product and I thought your story was so interesting. So please share that. Reni Cao (00:11) Hi everyone, my name is Renny, CEO and co-founder of Dex. We’re a technology company in San Francisco, almost all parent company, which is pretty special in a startup setting. We’re a bunch of parents that having trouble with the same kind of like a reality where like our education system is a sort of like cookie cutter and our entertainment is also cookie cutter for children. So we’re like, can we harness technology, especially the latest development of the AI, in different way for families that really gives children a chance to become the best version of themselves and ⁓ give the digital autonomy back to parents themselves rather than accepting the fact that they have to struggle between technology versus no technology. So yeah, we’re the parents, of like a bunch of missionaries in this journey together to explore how can we make the best use. of the AI and our first product is called Dexta Language Learning Camera where kids can take pictures and turn the whole world into language immersions. And it’s a product targeting young children three to eight. And we’ve sold 10,000 pieces so far and ⁓ ratings has been high and we’re pretty excited about this. But yeah, this is pretty much about us. Grace Shao (01:24) But Reni, tell us a bit about what you did before Dex actually. What kind of led you to this path? I know becoming a parent really did inspire you. You have a young daughter, I think similar age to mine, around three years old. But before that, what really led you to this path? Were you always passionate about children’s tech or education? Reni Cao (01:41) I actually have been a product management guy for the last decade in Silicon Valley, some big companies like YouTube and LinkedIn, some smaller s***, ZFS, Wish. But I have been a builder since the beginning. I would actually say that my passion for decks actually originated much earlier than I started my career. It actually started right when I was at school, but happy to say more if you’re interested. Grace Shao (02:07) Yeah, no, do tell us a personal story there. Reni Cao (02:09) So I was always this random kid with tons of questions back in high school. And very unfortunately, I think the education system, especially in East Asian countries, is not designed for meet kids where they are. So every time when I come up with a random question, my teachers are usually a little bit impatient and will be like, can you just go back and finish your quiz, et cetera, et cetera. So the moment I saw when GPT-4 comes out, I was thrilled and I posted a long like blurb on LinkedIn. Basically saying like, you know, if I had, have this as a kid, I would have grown into a more complete human. So this kind of like, I feel like this like generative AI’s capability to meet kids where they are, especially meets your needs for curiosity. It’s game changing. So. I feel like I’m building this product first and foremost for a younger me that could have benefited so much from this. That’s pretty much the story about me. yeah, I know we see it and of course our parents right now we see there is a tectonic shift in terms of the skill landscape and what the future of workforce is going to be and even the existential challenge of what does human mean in a future society. So we do want to build something that’s centered around children, centered around the family to help them find what they love and build agencies around it at the end of the day. So yeah, that’s the two main driving force of me coming to Dex. But I would be honest about it. It’s like very random. When I want to start a company, a lot of my colleagues are very surprised, being like, oh my god, Renny, you’re getting into this field. But yeah, I guess I finally find the work of my life. Grace Shao (03:48) I love it. think you need to understand the passion and the personal reason behind the businesses to really understand why the design was frankly so intuitive and why you’re so passionate about building this and leaving such a comfy, know, like cushy corporate role. I think that’s the one thing that stuck out to me. The product itself is actually so natural to how children behave to your point, like my three year old. from morning to night, know, morning she wakes up, it’s like, mommy, what’s this? What’s this? What’s this? What’s this? How do say this? Why do you know that? Sometimes she gets angry at me. If I don’t know something, she’d be like, but you’re an adult, you should know everything. But the reality, especially with languages, it’s really difficult. So for example, yesterday she was coming back from her Mandarin class and she said, liu shu, she was pointing at random tree. And I was like, that’s not liu shu. All I know is not liu shu, but I actually don’t know what liu shu is in English because I think it’s only really common in mainland. I’ve never seen that kind of tree. Well, I guess it’s a willow tree. You don’t see it very commonly elsewhere. And then she kept on pointing at trees, but in Hong Kong, you clearly don’t have liu shu because Hong Kong is like tropical. And then she got really, really mad at me. And that moment I was like, wow, if we had a Dex camera, that would have been perfect. But I was literally trying to take a picture of it while we’re moving car and try to upload it to GBTB, like what tree is this? What’s the name of it? So anyway, I think it’s really great product design. And I want to kind of get into that a little bit. When you were designing it, what was the thinking? Like, what does it mean to be children first? Reni Cao (05:10) I think there are three layers of children first as a principle. The first layer we already touched upon that. So young children, their hand anxiety is very different from adults. they tend to use one hand to operate a device and another hand they want to use for sensory explorations, like they want to touch. Sometimes they want to just move things around. So this requires a different form factor that one handed use, very tactile, very intuitive for young children such that they can explore a world while harnessing the power of AI in this case. So this is kind of like the user, the special things about the user. And it’s a different design. think that’s layer number one. I think the layer number two is also that the device itself is a metaphor for the market as well. And in the market, we want to build something that’s drastically from the so-called adult-centric smart devices, namely the phones and tablets, to send the market a message that there could be a different option. There could be a good technology. There could be a family-centric technology. And we’ve picked this form factor utilizing the metaphor of magnifying glass. It is something you use to see some hidden wonders, otherwise you cannot see. I do think that’s the ⁓ second layer of the things, which is like metaphor and category creation. And at the end of the day, I do think we intentionally make the device kind of worth finding this fine balance between engagement and learning or kind of like a healthy aspect of the technology, meaning like we add a assistive screen, but we make it really kind of like limited and not the center of the whole kind of like a user journey. And we want to kind of like find a new way to put all the components in our consumer electronics world in a way that it strikes a more delicate balance and ⁓ let the device itself to be kind of like, you know, retentive. for children without getting them to be addicted. So it’s kind of like we intentionally make it a little less stimulating, actually much less stimulating than a lot of a thought-centric ⁓ product. So that’s the main three kind of like principles around the product design. There’s a lot of conflicting constraints here, as you can see, but we do our best trying to find what is the answer. And here we go. Like what you see right now is our first, you know, ⁓ answer we have thought through and I think the market validated the answer quite well so far. Grace Shao (07:39) Yeah, definitely. think exactly to your point, know, like a lot of times, I think when we as young parents looking at introducing technology to children, really worried about the big screens, addictive nature, or even the parental, even though a lot of them allow parental control, it’s the unlimited access to a wild, wild internet out there. Like all of these things are basically concerns and or reasons why we hold back technology from our kids. So actually on that note, do think you kind of mentioned it, right? Like technology over the years, especially big tech frankly, has garnered a bit of a bad reputation. And I think that was really tied to the rise of social media and all of this mental illness that came with it. And obviously like you mentioned the addictive nature. So what do you think is actually harmful to the children’s development when we are looking at tech? What are areas actually we can really embrace technology? I think you kind of touched on it lightly, maybe explain it to us in an even deeper, more technical way. Reni Cao (08:37) Yeah, our thesis is that why a lot of parents think technology is negative for a good reason. And the reason is that all the main status quo technology for children are built on top of the attention economy, as we call it. Everything revolves around time spent and how much attention, how much engagement in terms of like a minute, seconds, sessions you can get. That, is the reality because, think about it, you build an app on an iPad, immediately you’re entering a competition with Roblox, with YouTube Kids, with all the videos, all sorts of things out there. You could do well. You can try to do good for the society, for the families, but you’re effectively competing against more like... addictive kind of like a form factor of information and it’s a losing battle and as we call it is a rat race. So no matter what type of like educational apps or content you’re trying to deliver at the end of day you have to deliver them in more and more engaging way more and more gamified and more and more animation used etc etc. That’s I think that’s why it’s another reason why we need hardware at the end of day. I think the first step how we can create alternative reality is that we need to create a new world, a new kingdom where the business is built upon outcome rather than attention. Meaning like it’s not the time spent logic anymore. It’s like, can you use this device? For example, for Dex, you can use the device and you see the child speaks better after two months or your kid starts to have a like a love to speak Mandarin and not preserve the rest of their childhood. I do think there is a business model there like that, but I believe that business model warrant a totally kind of like a different design of the experience from ground up, from the device layer to the software, to the content, all the way to like user interaction. So I do think like that’s why the current technology is considered bad because it raised towards attention. And I think ultimately, inside Dex, I believe the final answer to create that alternates like a reality is can we deliver something that’s purpose built for children before we build a general sort of like, you know, like time spent logic, like a product in, in, in the case of the decks, is something that, you know, purpose built around the languages. cannot do a lot of things. It cannot, it’s not a chatbot. It cannot, it cannot play videos. But I do think even do one thing super well with the Frontier technology already delivers so much value to the families such that you can build a viable business model on top of that while creating values for families. I think being courageous enough to limit our scope to something to begin with, like really hold onto our principle, deliver a promise, create values there. is another internally operating principle to get there in terms of how to harness the technology. And I want to say that it’s very interesting. What we noticed that is a lot of people are trying to use the AI in quite an all-in-one way. So you can see a little device with tons of features in there. can generate pictures. You can talk to celebrities at chatbot. You can talk to Elon Musk on that device. And we think, actually, that would be a very slippery slope. ⁓ in terms of harnessing the technology at the end of the day. yeah, purpose-built is another very critical principle we’re holding on to, to create a good technology. Grace Shao (12:09) No, I love that. But I mean, from a business perspective, sometimes people might not have purpose built businesses, right? Unfortunately, some are not. Then thus, how do we basically help the industry align the business incentive to the product design incentive? Because, know, like what you’re saying right now, it makes a lot of sense. And I think once I saw Dex Camera, I was like, wow, why is there not something like this on the market? but it does feel like there’s a huge gap where, like you said, there is a big devices and the big tech. There’s this tiny niche little products, whether software product or hardware for children’s ⁓ use, but it doesn’t feel like people are taking it seriously, even though we all know parents are willing to spend on children if it’s for their good. It’s not like the economics doesn’t make sense. So why is there’s that gap right now? Reni Cao (12:56) I think you’re hitting on one of our most recent realization that the parenting needs and the children’s needs are quite long tail or as we call it, very like a versatile, right? Different parents have different parenting needs. Even when you look at the language as example, there are tons of different languages and even more dialects you wanna learn. Like let’s say you wanna learn Mandarin, you still got so many like a dialects there. There hasn’t been a real... kind of like technology that can enable a venture scale business that attracts talent, that attracts a good backing in terms of like a capital to build something that’s like a generational. But I do think this is the moment AI is strong. We finally have to make sure we can build one system. that can consolidate all those long tailed needs. Even for Dex, very specifically, you can learn a lot of languages and even more dialects with just like a nine person team building the hardware plus software. I think it’s the catalyst that’s much bigger than Dex itself. And I’m really excited about that. But I think another very interesting angle is like, despite the technologies there, you have another question. It’s like why there is not more company like Dex. I have a personal opinion here. When new technology comes out, people will tend to use it in the most sloppiest way possible. They were trying to just like, OK, you can chat with the AI, so why don’t we just shovel AI into a little box and put it into a Talking Fluffy and call it an AI toy. And that’s it. That’s my business. I do think it is like a gravity that’s pulling people away. from deeply think how to harness technology and pulling them towards something that’s so trivial and it’s just almost like a shortcut. I think that’s kind of like also, I would call that a trap on the entrepreneur side, that the technology is changing so fast and everyone’s a full mowing, everyone just wanna use it in some way. But I think in this sense, we as Dex, the company, we believe in that. we need to think very deep about how should we use this technology to meet users where they are and deploy like AI in certain ways so Shell can deliver the value. So that’s why we start small, but we’re going to expand from there. Grace Shao (15:09) Yeah. No, it makes a lot of sense, but I think I wonder if you guys all being parents like you just said have made a huge difference. I hate to overgeneralize, but like, you I’ve been in the tech space for 10 years, but usually either I meet men who are like 20 years older than me or they’re very young men who have not, you know, settled into a family yet. And I’m just saying when I tell people my mom, it scares people. They’re like, I don’t know what to say. I’m like, OK, like. I’m not trying to scare you off by telling my mother, but the reality is most of us one day will all have families. And when we do, we start thinking about the things around us very differently, our perspectives shift. And I think to your point when you guys had a lot of purpose designing this product, I wonder if it has a lot of, you know, reason because you guys are parents. Whereas if someone is an entrepreneur for the sake of being a business person, they might not have the nuanced understanding of what a kid needs and what they even think is good for a kid. So to your point, they create little stuffed animals with an L-I unplugged into it, which is horrendously scary. I would never introduce that to my kid, right? I’m getting very, very agitated about this. But you know, another one that we talked about kind of offline was like, I should be ambassador and be paid by Tony Box at this point, because I probably gifted at least like 20 of them out to friends with kids. I think they’re just like, on the surface, you think about it, they’re like, ⁓ a little box that plays music. You’re like, this is so easy. I can just use my iPhone. Reni Cao (16:13) Me neither. Grace Shao (16:32) to exactly to your point. It gives the kids agency, allows the kids to start navigating the world themselves and have preferences. For context for people who don’t have Tony boxes or kids at this point is you put these little miniature IPs, essentially they’re Disney or whatnot, and you can put them on the little box as a magnet. And then the box starts singing and has like seven or eight pre-programmed music or ⁓ stories. And then you can control with your little hands. And basically like you press the Reni Cao (16:54) stories. Grace Shao (16:58) big ear, the ear just like the volume goes up, small ear, the volume goes down. It’s like really, really great. So basically introduce technology to kids where they’re like, oh mom, I can control what I want to listen to today. But I don’t need to nag you about it to control the iPhone. I don’t get exposed to a screen. And I can sit there and be entertained for like half an hour myself. So I think Dext really falls into that category for me. Like, you know, we kind of skip the part where we explain how your technology work really and in a very day to day way. It’s basically like you hold a camera, you point at things, you click the button, you can say, what is this? And you default choose languages, right? You actually explain better than me, please. Reni Cao (17:35) So there are actually four questions here. So I want to actually react to all of them one by one. I think this is a lot of good insights here. I think Tony Box and Dex share one thing in common, which is they are children-led, or they are child-led in this case. Think in the POV of a child. The world is kind of like a scary place that you’re told to do this or that. you are brought to here or there, there’s not much quote unquote autonomy you could have. But now there’s a device that your parents actually are willing to let you operate and you can decide what type of content media or interactions you can get. That is just a huge reward to children’s like unlimited curiosity and their like a strong needs to be considered sort of like, you know, a big kid or a even grown up in a way. I think that’s the intricate magic that if you were not a parent, you haven’t interacted with children a lot, you will miss. So instead of saying like a parent’s made us a better product builder, I think at the end of the day, it goes back to the product 101 that you really need to know your user. You really need to know who are using your product. We spent such a long time with our kids every day. And early days, which is very funny, like ⁓ the first group of users using DAX is just our own children. And that gives us a huge edge there. Right. And I do think you mentioned that a lot of like startup founders in this category, sometimes they’re doing something with raised eyebrows of the parents. I do think they’re a little bit distant from the kids is one reason. And another reason is I do think there is a misconception that children are less. at the end of the day, lot of founders think, you know, those are toys or some gimmicky stuff. Kids, you know, you just give them something that can flash, they can make some sound, and children would love to use them. But I reject that answer. I think that assumption is completely wrong. Children are actually smarter than adults in certain ways. They just cannot verbalize it. But as I said, they already got their little taste. as the famous word, popular words, they got their taste and they sometimes can tell what’s a soulful piece of story versus it’s a very sloppy kind of story. So children actually knows that and they want quality experience, they want quality product, they can actually absorb something that’s really built well for them. I think that just gives us kind of like this endless. sort of motivation to polish our product as if we’re building this for the most critical sets of adult users because we think actually children are more and they deserve more. Now, coming back to how Dex works at the end of the day, I think the core loop of Dex is quite simple. You just take the little camera. I’m happy to actually send a video to be the bureau here. You just take a picture. ⁓ And they would just literally just tell you, let me actually take a selfie here. Hi. Let’s see what I can learn about this. Look at that big smile. It’s like spreading happiness everywhere. Can you say a smile? Just smile. smile. Yeah. This is like you get unlimited, like smile comes with some laughing too. It’s when you make happy sounds like, ha. Can you say laughing? Laughing. like the ones we use to listen to music. Do you like music too? Can you say headphones? This is actually English immersive mode. So you can, you can improve your vocabulary there. Grace Shao (21:06) how many languages you have now. Reni Cao (21:08) We have 16 languages and more than 30 dialects and it’s still expanding. And interesting observation here is like the smaller, the more niche the languages is, the stronger the demand is there, which we find is super interesting. Grace Shao (21:21) probably just harder to find offline solutions otherwise, right? Or like harder with the communities, assuming you’re an SF, finding a Mandarin community is not that difficult. You know, if you’re in England, finding a French community, probably not as difficult. if you go, you were saying like maybe like Arabic languages like that are not as mainstream, maybe in San Fran, you have people in San Fran wanting to do that, right? Or like people in Dallas last time you said, trying to learn Mandarin, which again, you don’t have a huge community. Very interesting. I’m sorry, I got very passionate about the topic. So I want to of swerve back to our conversation here about raising children with technology. I’m sure you get pushback. think people right now, there’s the other side of argument where everything should be organic. Everything should be very simple. Reni Cao (21:53) Yeah, of course. Grace Shao (22:08) And I myself, I’m a big fan of a lot of the Montessori toys. You know, they’re not buttons or not even power charged. They’re just little wooden blocks, but they’re designed very well for them to, you know, develop motor skills. So how do you kind of explain to parents today who are saying technology should be rejected in the childhood. Kids should just be reading physical books. should learn the way that we learned or even like previous generation learned. We should go back to touching grass only. So Like, yeah, what’s your argument there? Reni Cao (22:37) First of all, you are completely right. Every once in a while, we got a comment on our social media that, why don’t you talk to your own daughter to teach that language? Why do you need a device to do that? So your assumption is completely right. And my response to that is, first of all, actually, I respect that parent a lot. I believe in the most ideal world, organic human-to-human interaction and free play in the real world is great. There’s a lot of tech, like researchers actually Prove that right, right? However, I do think the parent miss out constraints here. Number one, you may want to talk to your daughter, but you don’t know Cantonese, for example. So there’s no way for you to teach some subjects or some skills that you want them to learn or you want to immerse them with. And second, all of us know that the contemporary society is more and more fast paced. Not all the parents enjoy this privilege. of saying, let’s slow down, set up a dedicated time for children to go out to places. All sorts of this ideal family style back in the 80s and 90s changed a lot, I would say. So we are, believe, rather than just blaming the parents, not spending enough organic time with their children, I do believe that technology should be introduced more as an option, as kind of like a gap stop. as one of the extra tools on the table. That’s why when we design decks, we don’t introduce chatbots, but we spend so much time on sharing the insights that what your children are interested in. What did they take a picture of? What do they want to geek on? What did they learn today towards the parent app? And just give them this little window to see the world through their children’s eyes. Give them good downtime topic. giving them a way to reconnect even as asynchronous. So I do think the concern is real and the overall kind of like, you know, judgment is very well reasoned. But I think what that’s the approach here is much more nuanced than saying like, let’s use technology to replace human. It’s not, it’s actually using technology to connect the humans, connect the parents and kids better. That’s the nuance I have to take a bit. Grace Shao (24:42) I see what you Yeah. No, no, I love it because actually I’ve seen some parents even give kids like little Kodak cameras these days and these little toddlers go around the world, take pictures of how they see the world and they’re so cute. My own daughter sometimes takes my phone and takes pictures around the home and I come back with a lot of selfies and pictures of her sister’s foot or it’s just very cute because you see the world through their eyes, right? And it gives like, it’s like technology doesn’t take all connection away. on technology. wanted to ask you about the technology. How do we understand that? Like how are you actually leveraging LLMs? How do you route through different LLMs or different languages? Is this something we talked about briefly? But I wanted to understand that bit more. Reni Cao (25:20) to share details. Where should we start? Grace Shao (25:22) Like how does it work? right now? So basically for the little Dex camera, can’t ask it, like he’s to your point, you didn’t build a chatbot. So I can’t ask a question. I can’t have a conversation. It’s not a companion, but I can ask it what’s this? How does all that work in terms of the back end technology and the guardrails you built up? Reni Cao (25:38) Yeah, I think in a 30K feed view, Dex are utilizing basically all the multimodal LM capabilities to understand what the children are looking at. And on top of that, we build sort of like a profile, interest profile for the children and the parenting need profile for the parents to help contextualize, you what responses should we give in that case? To give an example, if you’re a three year old, just starting to learn Cantonese and you are sort of like interested in a bunch of like a museum topics or you love like dinosaur skeletons and stuff like that, we will render you more challenges around kind of like hey let’s bring Dex to a museum and learn about different terms there and it will be English the primary languages teaching entry-level Cantonese things there. So basically like the visual understanding you certainly use like a multimodal LLM The response definitely use kind of a conversation API of a lot of like an ALM. And I think building out this context layer or this memory layer of like a children’s interest and parenting needs, that actually is more complex. That takes kind of like a full agent system to try to understand what matters, like condensing or distill insights into a profile and gradually kind of injecting that into our responses. I think that’s on a very high level. That’s it. We do use a wide range of LLM, mostly with Gemini and OpenAI. yeah, that’s kind of like the high levels. Grace Shao (27:08) I’m going ask a question you might not like, but I’m going to put you on the spot. When we talked last time, said specifically on Cantonese and Mandarin, you do use different LMS, but the accents can be quite funny. Like they’re a bit off. They’re not native sounding. Why is that? And how do you overcome something like that? Or other maybe non-English languages. Yeah. Reni Cao (27:12) No, ask me. First of all, you need to try again because we have a solution already. But definitely, hit. We are already squeezing. I’m so hard that we’re hitting the boundary of a lot of like, in this case, it’s a TTS of the leading providers. Because I think about it, I’m pretty sure you’re using English plus Cantonese. It’s basically using English to learn Cantonese. Is that the case? Grace Shao (27:50) Yes. Reni Cao (27:51) is a mixture of languages cases. The challenge there is that without fine tuning, there is very limited sample of someone that speaks very good English and very good Cantonese, and they mix them in like one sentences. So the data, the training data to begin with is a little flawed. Either you have accent English or Cantonese as the more common cases. That’s the fundamental root causes of this. And we’re having kind of like heavy lifting tasks to kind of like solve that. And with the foundational model getting better and better, think one day we’ll get there. And we can see that to be fully fleshed out in the next six months. You definitely hold us accountable. And I think this is right observation for mixed languages. It’s really hard. Yeah. Grace Shao (28:32) Yeah, I bet. how does it actually work right now? Like in terms of economics, like people pay you about $249, right? That’s the price of the product pre-tax. That’s not cheap. Like it’s much more expensive than a toy, but obviously bit cheaper than an iPad. How do I understand the pricing decision there and price? And then how does that relate to, I guess, how you pay for your token usage right now? Does that cover it? Reni Cao (28:57) Yeah. Yeah. Oh, big time. We actually have a pretty healthy margin and the tokens are getting incredibly cheap. Much cheaper than where we started. I’m talking about like in 96, 97. It were a fraction of the token cost of where compared to when we just getting started, which is back in 2024 February. At that time we don’t even have GBD4, we have GBD3.5. that’s the kind of like, that’s the kind of like, actually that time we have GBD4 but is we don’t have GPT-4.0. So it’s very expensive at that time. So now pricing. Actually, I have a let’s talk about the user-centric view and a business-centric view. On the user side, we’re actually adopting this value-based pricing model, which is like any enough day, language is a high value skill to acquire. I sent my daughter to a language immersion in the US. I’m very embarrassed to mention how much I spent on that school. Grace Shao (29:33) Okay. Reni Cao (29:49) And if DAX can offer 1 % lift or enhancement on top of that school, the price is fully adjusted and much more than that. So this is what I mean by like, and very funny that you mentioned toy, right? Toy is something that you get it, you play it for a couple of days, then you don’t see it, you don’t worry about it. And this is not what we’re trying to do. What we’re trying to do is we want to use a relatively high price to keep ourself honest about the value we’re delivering to the parent. Do we really teach a language or do we really get the kids to fall in love speaking that language? If we do so, that price is well-justed. If not, we’re going to give you 90 days of free return period. No question asked, just return it to us. I do want to use this pricing model to push us to deliver more value for the user. So that’s one aspect of it. And on the business side, very funny, you mentioned, I hate when people box us. into toy category. I don’t blame them. Natural reaction, but I want to send a signal to the market that if a team of talented people, hardworking parents, put their heart and soul in building a purpose-built device that harnesses AI and delivers concrete results, we could get out from the typical, stereotypical, like a toy average order value band and go much higher. Above that, it’s less about, I to keep my is more kind of like, want to send a signal to prove that the market, we have enough parents waiting anxiously for something similar to this and want to pay a perceptually higher price for it, a premium for it. But yeah, that’s kind of like we landed on that price. And it’s so funny that so many people in the early days tell us, you’re going to do $1.99, because anything that started with a one Grace Shao (31:22) Premium, yes. Reni Cao (31:36) is night and day different than like two, that it started with two. But I actually, I’m like launching a suicidal mission. was like, let’s actually make it start with two, but let’s deliver more value there because it’s never like, it’s not a retail business at the end of the day. We’re trying to create a new paradigm of digital parenthood and childhood. We need to hold a high bar for ourselves. And the price is very telling, like in that case. Grace Shao (31:59) No, I actually agree and I think would you categorize yourself in the same box as Tony box vertical? Would you? Reni Cao (32:06) Not really. ⁓ Tony Box is a, I would say they are a content business. they are, same thing with Yoto. Actually, their founders have deep backgrounds in labels, music labels specifically, and IPs. So they are effectively a distribution business that they are creating a new channel to distributing those IPs from Disney, from Spin Master, and et cetera, et cetera. And the other side, you can see that at Dex, we’re not Grace Shao (32:19) I see. Reni Cao (32:31) I think like IP partnership or putting characters on our device. And we actually optimize for value and outcomes, like I promised to you in one of our principle. So I would put ourselves in, I don’t know, the de facto smart device for families. Just very honestly, the family device, the family technology, maybe like this is where we’re trying to go to, but it’s a completely like non-existent category before we’re still exploring. Grace Shao (32:33) Yeah. Yeah, yeah. Okay, like family tech device. Reni Cao (32:59) and it may change how I call it. Grace Shao (33:00) think there’s some more similar things maybe in East Asia because the audio learning like you know even when I was very young like I remember my grandma had a 步步高步伏机 I don’t know if you know what that is it’s like those like tiny little yeah yeah basically what it is it’s like people learn English with it and I think it’s very very like mainstream in China for a while but like you know these things been around I think in East Asia because everyone is using it to literally learn English Reni Cao (33:11) The steps are fine. Grace Shao (33:24) But it’s very one dimensional. It’s like one language to one language. They basically embed a dictionary, make the dictionary into a digital one. And you can ask search questions. You can ask what this word is. might, more advanced one might be even like with images, but I think, I don’t know, in the 90s, I didn’t see any images. But yeah, it does remind me of that technology and that vertical. haven’t seen something like that too mainstream in the West growing up, you know? I think if I was when I was learning French and German growing up, that would have been so helpful to your point. But yeah, so I want to bring it back to sorry, I just want to bring it back to the the business. On the Tony box comment, I do believe their business actually could be really high margin because their product is only say like 199 or something like that, right? Like they’re the box. But each character is not a 20 bucks or 30 bucks. ⁓ My daughter is drying me up here because every two months she asks for a new figure. But my point is, it’s a great business, right? Like that thing just keeps selling. It’s like Spotify and a physical thing. So would you guys have add-on any services, software, hardware, anything? Reni Cao (34:32) We do. That’s a lot of investor has been pushing us regarding this razor razor blade business model. I think for us though, what we are ultimately delivering is a business more like an app store. It’s like where you can get personalized content and software for your parenting needs and for your children’s growth needs at the end of the day. We’re launching, not we’re launching, we launched two tiers of subscription so far to validate that. One tier, $10 per month, you got unlimited LTE, plus you actually got a curriculum packed in like a content library. Every day we give you one topic and in the topic you can explore a lot of new vocabulary, expression, know, new languages and it’s good kind of like content to consume. And I think what’s most interesting is our future vision is actually a $20 per month tier. In that tier, you can actually create activities for your children, personalize. Grace, can be like, I run this podcast. I’m a podcast host. How do I explain that to my kid and make it a little bit fun, exciting, and even adventurous as if the recording a podcast is a little journey? And by the way, my kid likes this way of storytelling. You could give a lot of like a prompt there. They’re actually based on the profile, the context layer, we’re gonna build sort of like interactive, like a content that involves taking pictures, speaking, and just like looking at the device for explaining what does podcasting mean. And this tier actually got really good like attraction. And when we look at their subscription retention, is above like 90 % in three months that shows early signs of product market fit. But this is what I mean by like our business setting of day. We are a channel to deliver like harnessed intelligence to parents such that they can build whatever content and software that adapt to their needs rather than just a purely search, then filter or control kind of like a timer. I really want the digital world to revolve around them, running out of way around. So in this case, Put it in a simple way, we give them a tool to build whatever they want, and we charge on the usage of the tool, pretty much. Grace Shao (36:41) No, I actually really see that. I love it. Because I think my husband was trying to use chat GPT for a while to create stories with my daughter. Like, add a pig, add a dog, add a whatever in this. And obviously, it’s not made naturally for this. So the stories don’t come out as, I guess, natively understandable for children. So I see where this can go. And the funny thing, you use my profession as an example. Reni Cao (36:49) Exactly. Grace Shao (37:05) example, like my daughter just thinks I talk all day, that’s my job, and she thinks that her dad sits at a computer and press buttons all day. So between the two of us, none of us are doing it much, just talking and pressing buttons. So it’d be really great if, you know, I can, I guess, lean on technology to find a better way to explain to children modern day careers, you know, that may be not as easy to explain as, know, mommy’s a doctor, and doctors go help people and save lives, which is like what my family has. you know, explained to us when we were growing up, you it was very clear. I want to kind of go on a little bit more about AI and parenting. I think there’s a huge discourse right now in the US, especially, I think from my point of view, where I sit in Hong Kong, in Asia, even yesterday, I was speaking to someone from South Korea, venture capitalist, they’re saying that parents and society seems to be a lot more open to bring technology into their day to day lives. They’re much more open to the idea of leaning into technology for personal use and less worried about privacy and you know these kind of issues I guess. So at a high level, what do you think, should we be concerned when we introduce technology to children? will they, you know, for example, taking pictures themselves that automatically goes into one of the LLMs. Is that something that... he should be mindful of or are there guardrails that can be built in? Reni Cao (38:26) We should be definitely mindful. That’s why we enforce ZDR, zero data retention across our stack for images. So even let’s say your kid take a picture of themselves, you cannot retrieve that picture even you want. You can ping me through my personal email. You cannot find that picture anymore. And OpenAI and Google signed a contract with us to burn a picture immediately, like zero data retention on all the usages. But overall, I do think Grace Shao (38:48) See. Reni Cao (38:51) It’s the company’s responsibility to introduce technologies to family and the family should hold a high bar there for sure. Because like the AI is so early and it’s way too powerful in certain way. And it’s like a kind of like a black box in certain way in a lot of different ways. that I definitely, I’m not a, I’m not that one of the technologies that wanted to like, you know, glorify AI and it is the future and stuff like that. comes with a lot of risk, especially like unproven. aspect how it impacts the children’s cognitive development and something like that. That’s also a reason why we work with researchers and professors ⁓ closely like in Mount Eucalon from UCSF and Harvard professors doing education and doing research using text. I do think there is a substantial risk here such that the And we as the entrepreneurs and we as the parents, we need to hold a high bar for ourselves and roll out things one by one. So I guess that’s why you will hear more about like, oh, that’s like, you could have done this. You could have made it more engaging. You will hear this much more often than you’d be like, oh, there is like an incident because, you know, we always prioritize, you know, safety first. We’d rather the device to be boring in certain way rather than introducing consequences that we don’t understand. So I think there’s a very interesting dynamic between the Western and Eastern in terms of their views about technology. And I don’t think it’s a family parenting only. It’s also even a whole society, general perceptions. Happy to chat about that, but maybe it’s a little bit off topic here. Yeah. Grace Shao (40:12) comes from the mindful design as well. ⁓ No, we can definitely talk about that a little bit, but I kind of just follow up on what you just said. So how should a parent evaluate an AI device or tech device when they are purchasing for children, right? ⁓ I’m sure there are different devices out there, maybe not exactly doing the same thing as what you’re doing, but other devices are tech native ⁓ or AI enabled for children. How should parents kind of go about this? Reni Cao (40:52) I’m not a parenting coach. I will share my views. Number one, do think we should bias, we should start from our needs first. Maybe let me put it this way. Don’t get carried away with all the possibilities of the AI. Ask yourself, what is the unresolved parenting needs you have and find solution there. Rather than, this AIX, then that’s just to buy that AI device and give it a try. That’s number one, I would adopt that. Number two, I do think it’s important to see what a company’s method is. They definitely put their methodology somewhere, their belief somewhere, their principles somewhere, they’re kind of like, like how you ask me about how we use ALM. I believe that the parents should definitely hold the company accountable to explain those details and ask, verify, and that’s crucial step. That’s the due diligence on them, right? And I do think that Finally, for any sort of AI product, I actually even think the parents doesn’t have to be getting into this searching and validating mental model. They could literally build their own in some sense. Given all the agent codings rising up and reducing the piece cost of software so low, I do think for lot of stuff, they should try. to accommodate their own parenting needs in certain ways. Like I saw tons of the parents go into cloud code generating like a, know, almost like a story writer for their daughter. That’s actually my previous colleague at Wish. And it was awesome. It’s just different blanks to fill in. It’s kind of mad lips type of like a story. I do think the parents can change also their way that they are in the autonomy right now to build whatever they want to build. Having said, it’s still a little bit of kind of like a Silicon Valley bubble type of answer, because honestly, in the world, the adoption of a cloud code is probably less than 2 % or 1%, I’m pretty sure. But I do think I would encourage parents to use AI themselves and explore a boundary, it can do, what it can does well, what it doesn’t. So then, the kind of I make a decision from there. Grace Shao (42:45) Yeah, no, I I appreciate that. It’s a very like thoughtful answer because it’s not just like A or B. of the day, think it’s parenting itself is so personal. It’s on how your family dynamics work, how you prioritize your time, how you want to parent. So when you want to buy technology for your children or incorporate that into their lives, it’s also a personal decision. I wanted to ask, actually, do you have any good case studies to share with us just a little bit? Reni Cao (43:11) We have quite a lot. What aspect, what, what type of case does he want? Grace Shao (43:14) Just like, I don’t know, like things that unexpected people use. For me, I mean, by default, just assume, yeah, people use it in urban areas, right? But then I think when I met you, you said, actually a lot of people use them, you know, in unexpected places, like orders come through all over. Reni Cao (43:19) Alright, I’ll give you one. One of, I immediately think about one thing, one, almost like ⁓ close to 5 % of our users, they bought Dex to help with speech delay. That’s something we never anticipated, but those parents are very frustrated with all the, as we call it, sometimes autism tech or the speech therapy tech there. It’s not meeting their bar and they saw Dex, they’d be like, I would try everything right now for my kid. And surprisingly Dex helped them. and it makes them real happy. And you can find actually all those real reviews in our review sections. Quite a few family mentioned that their kids refuse to speak certain languages or just even English, but that’s kind of necessitate the language as a fun activities. And all of a sudden, the kids start to open up and speak more, and the parents are really happy about it. This same exact story happened with my co-founder, who is really worried about his, at that time, two-year-old young son having speech delay. But I want to disclose the name, but the song he actually first time spoken like coherent like ⁓ Chinese phrases using Dex and he caught it on a video. That was one of the most wholesome moment of our kind of like a user feedback in our channel. And right now we’re actually ⁓ volunteering to develop this special need mode. That’s kind of like, you know, customizing to special needs children. especially like April is the world kind of like autism awareness month. And yeah, we just want to do it. And we want to donate to Dextre researchers and speech therapists to help us do it. This is a totally kind of like a side quest, but it just like give us it gives us so much kind of like energy. You’re thinking about technology can be used in a way that’s like immensely helpful. Grace Shao (45:00) That’s amazing. Yeah, and something unexpected, right? Okay, I think I want to wrap up our conversation because I don’t take up too much of your time, but I do want to ask you one big macro question. With you working on whether you like to call it physical AI or not, essentially like a physical product hardware time software, how do we understand that trend going forward? Do you think AI will be essentially integrated, plugged in to more more hardware devices? What’s your view on that? Reni Cao (45:33) I do think there is a consensus that every wave of software technology revolution, there will be kind of like a device revolution following that. We are at the tipping point there. That’s like people starts to reimagine, where is this? this cloud? Is this the recording card? Maybe it should be a separate, like an ⁓ AI. Or this is a sort of like a little pendant that can kind of like ultimately listen to your life, help you organizing. I do think we’re at the ⁓ the dawn of a next wave of hardware. But it’s less about we’re doing the hardware because of the, I do think this is a software or technology driven type of hardware revolution out there. I do anticipate that. I do at least what I’m 100 % sure is like smartphones are not designed for children. Tablets are not designed for children. Families deserve something built with their interest. their needs in the center of the spotlight. And I see that happening. And that’s why we started this company. And I bet there’s going to be tons more use cases there. Grace Shao (46:33) No, amazing. Thank you. I think ⁓ one last thing. Is there anything I missed or anything you would like to share with us? Reni Cao (46:39) By the way, I time. If you want to turn through all the questions, I’m happy to be here. I don’t have anything else after this meeting. Grace Shao (46:44) no, don’t worry. think it’s a lot of times like I use them as prompts. But you know, when we’re chatting, like we actually covered most of it, you know. ⁓ Yeah, is there anything you think we missed? But from my end, like I feel like I covered most of it. You know, we did technology, we talked about children, AI philosophy, talk a bit about your business model. Reni Cao (46:50) Yeah. Yeah. Yeah. I do think you would want to talk about. Yeah, go ahead. Go with one last one, and I have one for you. Yes, go ahead. Ask yours first. Grace Shao (47:04) I think one last one. You go. So I want to ask you one last question, which is a question I ask every guest that comes on the show. What is one differentiated view you hold? I feel like your whole thesis around devices right now on the market are not made for children is already a differentiated view. But is there anything else you think that you hold that’s non-consensus? Reni Cao (47:29) Yes, with this view, I got beaten up so many times, but I still got to say it, right? I believe that education should not be cookie cutter. It should be highly personalized. So is entertainment. So is the parenting software. And we’re about to enter the golden age. Finally, this is becoming the reality. And let me say it this way. You look at a school in the US, how you tell the school is good or not, you look at one ratio. It’s called a teacher-student ratio. One teacher taking care of less, but why? Because then the teacher can accommodate, individualize the needs. I actually have a very radical view in terms of our education system is definitely lagging, significantly lagging against how our society evolves, how the technology evolves. It’s still a one size fit all and industrial way. to handle education, handle like, you know, testing, standard testing. It hasn’t really changed in the past couple of decades, but the world is a different place now. And I guess my view is like, it shouldn’t be that. The default shouldn’t be that. The default is like every kid should almost have their personalized tutor and the playmate that deeply understand them. Unfortunately, that’s impossible before, resources-wise. But I guess we need to strive to get there. as a race, as a humanity. Because each kids just come up, come with their own spark. that will miss out the window to make that spark their lifelong journey. But I’m not trying to attack on educators or school systems, something like that. I just feel like there needs to be more forces from the society, especially from the tech side, to help together build this alternative, enhanced of like a system that really delivers individualized education. Sometimes I use the word scaled homeschooling. And you cannot imagine how much people hate that. people are like, homeschooling, you’re taking away the social aspect of it. People are very constrained on the vocabulary of how they describe things. But I guess when I say homeschooling, it’s not about keeping the kids at school and hiring a teacher. And that specific process right now, I’m talking about really meet children where they are in terms of their growth, in terms of their needs, in terms of the skills they’re going to develop. I call that a differentiator, but maybe actually lot of people will share the same views. I’ll be happy to know who shared the same view and please join us in the journey. Follow us along. Grace Shao (49:49) I definitely think that view is definitely, feel anecdotally a lot more prevalent in SF when I visit. I’ve met other people like yourself, other people in the tech space or, you know, investors who are embracing this idea of modern homeschooling. And they say the same thing. They’re like, we don’t like to use the word homeschooling because, it sounds like a bit more cultish, but it really isn’t right. Like it’s really focusing on individual ⁓ growth. I think it’s amazing because I also think it’s because Silicon Valley itself kind of harbors this kind of growth and mentality and that the fact that people can succeed without degrees, people can succeed by building different things, people can succeed in just being different and being themselves, but the best version of themselves have always been, I think, what drives a lot of people who want to go to Silicon Valley because it’s like in many ways, as a mayor, talk to see like the best version of my talk to see, right? I think in East Asia, even as I put my kid in school right now, I find people definitely a lot less like that minded. ⁓ I don’t know if it’s a cultural thing because like, know, for you, you know, I grew up in Canada for me, I always felt like, you know, having that freedom to learn, explore when you’re young, which is more Western kind of way of, I guess, education was good. But I think a lot of peers here actually believe that, you know, for the first like say eight to 10 years, that foundational education should be drilled in. know, ⁓ grit should be taught, discipline should be taught. But it’s very interesting because it does kind of, I guess, manufacture different kinds of stereotypes. And I think it’s fascinating. And I think one more comment on that, I know this conversation has been more personal than we thought it would be, but I love it, you know. I don’t really get to talk about motherhood that much in my podcast. It’s usually about tech and bros and tech bros and about, ⁓ and about finance. but I think even, you know, when you have kids, people talk so much about nature versus nurture. And what I realized is I was shocked to see the nature come through, as young as like six to eight months in a child. Reni Cao (51:36) You Grace Shao (51:53) their personality starts coming through and by the time they’re one to one and a half, they start kind of babbling, start demanding things. I realize 80 % of it is all nature. It’s like their preferences for how they socialize, their preferences of even noise, even you can realize like your point, your taste. You’re going to find a six months old who just wants to sit in a corner in a play group who just wants to flip through books literally and just undisturbed. You’re going to find someone who’s screaming in the middle of the whole group. you’re gonna find my daughter who’s rolling over everyone and just like trying to knock everyone out. And I don’t know why. You know, you’re gonna realize all of it is nature. And even I believe agency, autonomy, grit, and desire to actually succeed, that itself is nature. And I don’t think you’d be taught. And I think this is a bit controversial. But definitely I think my husband and I have been thinking a lot about this. We’re like, we can just provide them what we can. But there is no... point of even pushing them when they don’t want certain things. the best is to push them in a direction that they want to be pushed and they will tell you. I think this is like kind of the difference in our generation of parents. yeah. Reni, thank you so much. ⁓ Yeah, go on. Reni Cao (52:49) Exactly. Yeah, but can I, I know this is over time, but can I add one last comment towards what you say? But I think what you said, especially growing up in East Asian, like, you know, education system, it has been industrial for a good reason, right? At a time where stuff like AI doesn’t exist, the most effective way, Grace Shao (53:03) No, of course, of course. Reni Cao (53:19) to develop fundamental knowledge workers, plus finishing the job of dividing the children into different segments and give them different levels of education. That education system works perfectly. I entrance exam, as I’m talking about, taking standard tests and stuff like that. But all we know that is AI is sweeping through all the knowledge works. and specialized in knowledge works, honestly, Asian parents like favorite jobs, like being a doctor, especially radiologist, you know, and or being a lawyer, you’ve got to start somewhere as associate. Now it’s getting kind of like his hardest. The world has already changed. The tsunami already hits. But I don’t think people actually understand the level of the s**t. A lot of like everyday people in the world, they haven’t felt. this like a tsunami, right? So when you say you want to kind of like, you know, like find define your children’s nature and push them towards kind of like what they are intrinsically motivated about and give them resources to set them up for success, building grades are on the way. I do believe that I think I will 100 % agree with you that it will become the most fundamental aspect or element of education in the next like five years or even sooner to be fair. That’s why I don’t send my... to put it in a simple term. I don’t send my daughter to Kumon. I don’t want my daughter to do Russian math. I never benchmark her against like, oh, like the other kids can read at the age of like three and a half. Why don’t you? Actually, I don’t because I fully understand that kids have their own time zone. Kids have their own spark. All you need to do is think deeply to define that, to understand that, understand why my daughter sometimes is super sensitive, understand why sometimes she got frustrated and want to hit. Grace Shao (54:31) I feel very validated. Reni Cao (55:00) Don’t take that on a surface level with the other tools you have. Go deep, understand that, and build these programs that’s personalized to her and help her. And I think like this is why I if I, talk about the word of Nei Juan a lot. If I have to dream on anything, right? I have to like a rather like ruthless compete on anything. I complete the deaf understanding of my daughter rather than anything else. Grace Shao (55:03) 100%. Reni Cao (55:26) Because I actually think that’s the thing people gloss over. People must be like, education is just checklist. You got to check, check, check, check. And there is a better checkbox. Like Ivy League school, there’s a OK checkbox. There’s a worse checkbox. Forget about a checklist. That checklist is obsolete already. So I respect. I think we vibe together in terms of our schools of parenting. Grace Shao (55:33) Yeah. Yeah, 100%. No, I agree with you. parenting style. Yeah, yeah, yeah. Reni Cao (55:51) But you’re so fully intuitive. don’t know whether I’m right or wrong, but this is what I firmly believe in. And I believe someone’s going to join this journey. Grace Shao (55:58) I think there’s more people who are aware, especially people who are more plugged in with the technology because they realize how fundamental society will change. I just thought about when we were young, I’m sure your parents also told you to go to university, go to this, go to that, right? For sure there was a hierarchy in their mind, what kind of school you should go to, what kind of degree you should get. Now I really don’t think that’s the case. Actually, a lot of my readers would even know. my dad really forced me, well, pushed me, encouraged me to go into finance. And at one point he was like, if you don’t study finance and don’t work in finance, you’re not like following my footsteps and blah, blah, blah, blah. Right. And it was a very, it became a personal reason to do it. It’s not because I wanted to, or I was good at it. And there was actually a battle between us being like, I want to go into journalism. And he’s like, no, I was like, no, I’m going to go to journalism. He’s like, I’m not going to pay for it. You figure it out. But the beauty of it is actually found a way resourceful enough to get a full time, full scholarship. And I still want to journalism. Again, I recognize how lucky I was. I I found the opportunity to do that. But most kids actually just end up then doing what their parents told them to do and they never, and they never actually live their best life or become the best versions of themselves because they’re doing something not actually fundamental. Reni Cao (57:08) you hit a very critical, I think it’s a background or context. There wasn’t an abundance before, right? Growing up, let’s say in the eighties, It’s a relatively kind of like a society. It’s relatively kind of like not that sort of like, you you wouldn’t call it abundance at the time. Let me just put it that way, right? You still need to compete for stability, compete for resources. That’s why there’s a rat race in education, which I totally understand. That’s kind of like. It’s like a whole

18 de may de 20261 h 0 min

The reasons to open-source and the future of AI bootstrapping with Tiezhen Wang

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios