Differentiated Understanding
Joining me today is Tiezhen Wang (Tom), formerly of Hugging Face, where he worked with researchers in China, Australia, South Korea, Japan and across APAC, to help make open-source models more discoverable, usable, and visible to the global developer community. In this conversation, Tiezhen explains why Hugging Face became the GitHub for models and why open source is not just a distribution mechanism but a different way of coordinating research. We discuss why Chinese AI labs have leaned so aggressively into open models, how DeepSeek changed the commercial logic of open source, and why Qwen, Kimi, GLM, MiniMax, and others are using openness as a way to win attention, recruit talent, and accelerate the whole ecosystem. His core argument is that China’s open-source AI push has three layers. At the researcher level, open source preserves attribution and career mobility. At the company level, open models can become benchmark-led marketing, developer distribution, and a recruiting advantage. At the ecosystem level, government and university incentives are beginning to cultivate open-source culture among younger engineers. We also discuss why US frontier labs have pulled back from openness as research and business have become more tightly coupled, why distillation is much murkier than the public debate suggests, and how DeepSeek’s releases increasingly function as shared R&D for the broader AI ecosystem. The conversation then turns to monetization: why open-weight labs can still make money through API tokens, base-model access, post-training services, and inference optimization. Finally, he lays out his current thinking on AI bootstrapping: the idea that agents may eventually help improve their own harnesses, generate training data, and even improve the models they rely on. We close on a more philosophical question: if a handful of closed labs control access to frontier capability, open source becomes more than a technical preference. It becomes a check on the concentration of power. Tiezhen/ Tom is based in Sydney, Australia. Feel free to reach out to him on X to chat. [https://x.com/Xianbao_QIAN] To find the previous episodes of Differentiated Understanding, see here. [https://aiproem.substack.com/podcast] Every episode, I bring in a guest with a unique point of view on a critical matter, phenomenon, or business trend—someone who can help us see things differently. Season two will host a series of guests from early-stage investing, as well as builders, researchers, founders, and product managers. For more information on the podcast series, see here. [https://aiproem.substack.com/p/launch-of-differentiated-understanding] Chapters 04:07 The Philosophy of Open Source at Hugging Face 12:51 Challenges and Opportunities in Open Source 17:12 The Role of Collaboration in Research 21:50 The Future of Open Source and AI 33:58 What Constitutes Distillation in AI 37:18 Navigating Copyright and AI Distillation 37:43 The APAC AI Landscape: Insights Beyond China 43:08 Understanding the Ecosystem: Labs vs. Hyperscalers 46:21 Monetizing Open Source AI Models 52:02 The Future of AI: Bootstrapping and Self-Evolution Transcript (AI- generated for reference only) Grace Shao (00:00) Tie Zhen thank you so much for joining us today. I’m really excited to have you on. We’ve been trying to make this happen for a while and just so glad the timing’s finally worked out. To start, can you tell us a bit about yourself, your journey, and where you’re at right now in your career and how you see the whole ecosystem? And also, just help us understand Hugging Face a little bit as well. Tiezhen Wang (00:19) Yeah, thanks, Grace, for inviting me. I know, sorry for the long delay. It has been a while, but I’m recently in transition because I just left Hugging Face. So to give you a quick information about very high-level overview, you can think of Hugging Face as the GitHub for AI. If you are not familiar with GitHub, you can think of Hugging Face as Amazon, where you can find all kinds of models in one store. And we are helping, so my job is to help researchers to get their models, which is the open source models on Hugging Face. And they can use the best, like all the tools, all the services on Hugging Face to make their models more discoverable and available to everyone. We also offer all kinds of technologies. For example, we allow them to create demos so that developers do not need to download the whole models. and they were able to try it out and see how it goes. And we also offer services so you can create your own agent using open source models. We do all kinds of scaffolding on top of open source models. another part of work that we do is to help them get more traction. We use LinkedIn. I use Twitter mostly to help them getting well known by the public. And we write analysis on their models and letting people know what are the new inventions from the model, et cetera. we work with researchers across the world. Like myself, it’s focused on APAC, especially Chinese researchers. Yeah, that’s pretty much the goal. quick overview of what I do. If you have any questions, just let me know. Grace Shao (02:03) And how did you get to this role? Because I understand you were with Google for quite a while as well. Tiezhen Wang (02:07) Yes, I was with Google as an engineer. work on ML frameworks. But then we had a bunch of reorg. And I was assigned to a project which is not open-source. But I really like talking to people in the open source world. It’s kind of very different. So when you are paid to work something versus you want to work on something yourself, Like you have very different mentality and very different feelings. So when I was working on the open source machine learning framework, I talked to people outside Google. And I can see the stars in their eyes. They do want to work on something they want. And even though they may not get paid, et cetera, I really like this feeling. So after I was assigned to the non-open-source project, I want to try something like new but also in open source and I was like talking to people in Hugging Face and I really liked them. At that time, like Hugging Face was not like part of the mainstream. It was like a niche product for researchers where researchers can upload models. But I do see there’s a huge potential for Hugging Face to grow up because first I believe in open source and the second like Hugging Face is going to be the entry point where like all people will come in and search for open source models. But the most important of all is that I feel that Hugging Face is a company who understands how open source works. Open source is a huge leverage. If you use it well, it’s going to be very powerful. And Hugging Face is like 200 people, like very small companies compared to other companies growing up from the same area. But they are able to use open source as a leverage. and called for collaborations across the world and do very impactful things. a lot of people, a lot of big companies are doing open source, but they just don’t understand this age. That’s the essence of open source. And I do feel that Hugging Face is doing really well there. That’s one of the reasons why I want to join Hugging Face. Grace Shao (04:06) Yeah, I think that’s amazing. I think that’s something we definitely will double click on later, especially when we talk about why China’s labs seem to have been embracing open source. Just kind of one last question on just the whole ecosystem and how hugging face fit into it. What was the philosophy really held by the whole company? Because I actually listened to one of the founders interviews, Clem’s interview recently. And during the interview, he talked about how Chinese scientists have always been long term contributors to open source technology. And then he said it was really like kind of a pivotal moment around 2022 where American open source contributors kind of took a step back and then there was a sentimental shift in the ecosystem. Why is that and how does Hugging Face kind of view the whole ecosystem? Tiezhen Wang (04:47) Yeah, there are several questions. Let me try to address them one by one. The first one is the philosophy behind Hugging Face. I think it’s really the mindset. so anything that we see where we can have a collaboration, like Hugging Face will just reach out and see if we can collaborate. So if you go to see a lot of work released by researchers, they will have paper on arXiv. and also their project on GitHub. And you’ll see me on all of these issue number one, which is the first issue after the repository has been released. And we just write something saying, offer blah, blah, blah. Do you want to collaborate on something? So for anything that we can collaborate on, we will just call for collaboration. And some we’ll go through, some we’ll not. But this collaborative mindset is very, very different from. like a business point of view. From a business point of view, you will first think, what is my edge and how I win the market, how I compete with others, and what are the end areas. After the competition, what’s the end game, how it will go. So that’s the way of how you can justify the investment and everything. In open source world, it’s totally different. It’s like, I want to do something. I just say it and I do it and there are developers who want to join in and we do it together and we grow the pie gradually. we do not have like, let me put it the other way. So if you see an open source model coming from one of the Chinese lab, for example, GLM 5.1 is released and you may think like Kimi or Minimax like other open source model provider. in China would compete with them. But actually not. Like you will see they are commenting on the Twitter saying, congratulations, et cetera. This is a collaborative mindset where everyone is stepping up on each other. we can do a lot of, as a group, can continue to push the frontier forward. So I think this is very, very different. Yeah, and talking about your second question, the Chinese, well, I wouldn’t say labs. Chinese researchers, labs, companies, et cetera, they all want open source. I think there are three different folds. The first one is on the researcher side. A researcher would always prefer if their work is open source. That’s coming from their academia background, because when you Like on the CS world, when you write a paper, you have to show that it’s actually working. You have to show that all the numbers are real. Other people should be able to verify that. And you can only do that by releasing your code, releasing your models to the community so that other people can evaluate. So a researcher, after they graduate and they go to a company, they will bring this mindset forward. And by default, they are open source people. And another perspective is for their self, for the career development of themselves. So as an engineer in big companies, it’s very often that you are working on some project and nobody knows that you are working on that project until you say that out on the game or on your resume. But open source is very different. We know precisely who has contributed to DeepSeek before. And that’s very attractive for for researchers, because if I have done great work, I want the whole world to know that I’m doing excellent work. This will help me have better branding, help me to do more collaboration, help me in the future step in the career. So a researcher would always love open source, by default. So that’s the first part from a researcher’s level. The second one is from business level. So well for individual is quite easy to embrace open source from manager level from the executive, they need to justify the investment on open source. I have to spend tens of millions in training a model and you want me to give it for free. That’s crazy, right? That’s how people think before DeepSeek. Although we have lot of open source models before DeepSeek, but the trend is completely changed. Before DeepSeek, people were thinking, oh, maybe the model is not that good. Maybe I’ll just open source it. But if the model is good enough, maybe I’ll keep it for private. And that’s one of the reasons why you see a lot of people were saying open source is not that good, especially from Robyn. And lot of people do not understand how the open source works. works. But then people do realize that if they do not open source, they do not even have a chance to stand on the market. Because their model first is not really good. If they just compete on the marketing level, on the business level, they do not stand a chance, not even a chance. So you spend tens of millions and you get nothing. But if you open source, at least you have some sharing and people will remember. And also you can have the market from. for the researchers. I think Qwen team was one of the first team who understand it from a business level and start like open sourcing work. And as the result, it’s very, very good. Like they almost taken the ecosystem from Llama and now they are becoming the default for researchers to do research, which is like a huge branding for Alibaba. And like, I guess like if Alibaba wants to do any kind of business, like it’s quite easy for them. to approach to researchers saying, we are not nobody, right? We are the provider of Qwen and everyone wants to talk with them. And another side for the business is that they find it really hard to attract top talent if they do not do open source, because all these talents want their name on papers, et cetera. if they can pay a lot of money. but they still do not have the best talent. But on the other side, if they do open source and the researchers know that they come to this group and they can have their name marked on history, it’s going to be very attractive. So like this company, even not releasing the best models, they try to release something to make researchers happy. It’s kind of like their...company perk. So that’s another route. But after DeepSeek, everything changed. People know that if I do open source, I can have huge branding for my company. DeepSeek is not doing any kind of commercial stuff, like alteration to cusTiezhen Wangers. Yet they still have a huge evaluation of, I think the most recent number is [unclear: “14 million HKD” in transcript; confirm figure]. That’s a lot of money. So by doing open source alone, they can make money. And that changed the mindset for lot of people. so after DeepSeek, Kimi, GLM, Minimax, and StepFun, they all come into this open source world. actually, they have made a lot of success stories, like GLM and Kimi, by doing open source, lot more people understand them. And they kind of open up. the global market, not just the market in China. for them, I feel that it’s not like losing a lot of money because they doing advertisement in a different way. Kimi was spending tens of millions RMB per year on advertisement. And the result is very short retention. People know them, come to their side, and they do not feel any different. And they just move away. Now, the researcher team, the manager, the executive means, knows that the best score on open source benchmark is the best advertisement. So they can concentrate all their power, not wasting them on advertisement, but concentrating all their money and resources on training the best model. But this best self, it’s the best marketing, and they can create great models and start earning money. So I feel that on the business level, everything starts to make sense. But now there is a new challenge, which is how you can stop people from taking the free ride. It’s a longstanding problem for open source. I did something, for example, I made a database. I spent a ton of engineering hours. I open sourced it. But I’m not making any money, because the cloud provider is taking that for free and start making money and monetizing it. it’s happening for open-source world as well. I open-source the model and all these inference providers and chipmakers and BDA-AMD are making money, but not the researcher who created the initial model. That’s why you see some licensing change and discussion on that. Kimi did the first non-commercial license, and then MiniMax made a more restrictive version. Tiezhen Wang (13:40) made a more restrictive version. But I don’t think that’s the final version. People are still trying different things. And I believe maybe in one or two years, we will have a more standard way of balancing open source and commercialization, et cetera. So that’s the second level. The third level is the third level. So the Chinese government is really encouraging people to do open source. If you do open source, you have extra credits on your bachelor education, et cetera. And Shenzhen recently announced a very interesting policy. So you can have housing points if you do open source on GitHub. basically, they are categorizing. Grace Shao (14:21) So the incentive, yeah, go straight to the students, like even in academia, while they’re still in university. Tiezhen Wang (14:27) Yeah, so it’s kind of cultivating this open source culture when other researchers and developers are still in universities, which is really good. So I do feel that the culture of open source is, if they are winning the young students, we are going to see more open source projects. And to be honest, I do feel that that’s the right approach. Because if you’re not thinking about open source, you are thinking like traditional way of collaborating with people, which is company or corporation. And I feel that the essence of why we had cooperation or company is not keeping peace with how we evolve now. I think about, you set up a company in Hong Kong 200 years ago. Why? Because you have a group of people. You want this group of people. That’s why it’s called company. You have a group of people and you want them to work together. And how you can make sure that everyone had their benefits. Everyone is doing a lot of work. Obviously, they want to have a return. And you do that by setting up the shares and also the voting system. that’s how a group of people is working together. But now the word company has changed. It’s more like a multi-international company where the worker in the company has no work in deciding how the company runs. Whereas open source work is more likely the original version of a company. You have GitHub, you know who has contributed what. Everyone knows your contribution, and you can have your name listed. the group of people coming from all around the world, can. collaborate on something. They do not need to be part of a big company going through all the interview process. They can just collaborate. So I think that’s very, very interesting. And now with Zoom, Tencent meetings, and all the Google Docs, it’s much easier to collaborate internationally. I don’t need to know who is contributing to the PR, but I know someone is interested in my project, and we can work together. And I feel that. That’s probably the future way of how people can collaborate. that’s to end the last point on society level. I think the society is advocating for open source. also open source is probably the way how the society will evolve. Grace Shao (16:48) Thank you. is like so insightful pack that I have to digest that. But you you mentioned quite a few different topics, which I can definitely take this straight, conversing different directions to start. have two questions and they’re actually unrelated. So one at a time. Number one is you really make a point about China being really, you know, strong advocate on open sourcing the LLMs. However, I think Could you tell us the history of open source in China in general? Was there a tradition to want open source technology even pre-LMDs? That’s number one, first half of that question. Second half of that is you say there’s a lot of incentive for researchers to actually want to open source everything, right? And then therefore they can claim their contribution. Well, in the recent interview between Zhang Xiaojun and... deep minds, Yao Shui Yu, I think maybe you’ve also listened to it. You know, one thing that really stood out to me was how he was saying people need to be like responsible. And like for someone who’s not technical, I actually really struggled to understand what he meant at first until like actually Jiang Xiaoxuan actually asked him to clarify as well. His whole point is that in academia, people are so used to only claiming a certain section of what they contribute. So for example, for a big piece of paper or research, that you would take credit for what you contributed, right? And you want to make sure that it’s best optimized, known, heard, seen, whatever, right? Recognized. However, in terms of how LLM can work properly in terms of the long run, whether it’s like, you know, further in post-training and further, you know, know, usage, whatnot, it’s important that people don’t claim so much credit to their own part of the work. It’s more important that people work collaboratively. But kind of to your point on open source that, you know, they can work collaboratively and make sure that each piece works together better instead of each piece working best on their own. So it kind of contradicts your comment on why people want open source, because in that sense, wouldn’t it make sense for people to not want open source? I don’t know. That’s another question. And the third part of this is really if open source makes so much sense for tech companies and makes so much sense for academics. then why are the American labs so anti open source right now? Like what is driving that? Is it purely because commercial reasons or philosophical reasons? This is very big, but you did throw a lot at me. So I’m going to throw these questions back at you. Tiezhen Wang (19:07) Yes, sorry for my very long answer. I think it’s probably by itself worth writing a blog post with enough content, and I can elaborate more. But great questions for the story. Can you remind me? I guess we can go through them one by one. Can you do mine? Yeah. Grace Shao (19:25) Just like in general, source China, China open source. What’s the sense on that? Beyond LLM, right? Like why did Chinese companies always contribute to open source technology? Clem talked about this in his interview, but he didn’t go into that about it, right? So number two was just about, yeah, number two was just about like, why do these academics want to claim their names, right? Is it better for the company in the end or is it just best for them, like the selfish reasons? Tiezhen Wang (19:37) Yeah, okay. Let’s try it. Yes. Mm-hmm. Yep. Grace Shao (19:52) And number three is why are American labs kind of anti open source right now? Tiezhen Wang (19:56) Yeah, so let’s try to address the first one. I think it’s a great question. And I do see the shift. So I feel that AI is probably one of the very few areas where Chinese open source contributors dominate. If you look back to, for example, I would say the initial days of modern open source comes from like an Linux or Apache or database and everything. And where you do see a lot of individual contributors from China, but you are not seeing enough Chinese company creating a project. And then the project gets adopted globally. You are seeing that gradually when we move to the area of cloud-native, like when the Kubernetes comes out. And a lot of Chinese cloud providers are trying to really pay attention to this whole open source world. And you will see that this grows. But now it’s like this. So it grows exponentially. So I think it comes from two folds. The first one is the Chinese participation in the global market. It needs time to warm up. Like for example, lot of Chinese contributors, they can only contribute two projects in Chinese because of the language barrier. So that kind of limits how much they can actually do. And now with larger language models, with better education in the new generation of developers, the language barrier is not that strong. That’s why. That’s how the Chinese open source contributors can make a better impact. And another one is, so in the traditional way of a company’s, like how a company’s structure itself, if you do open source project, it’s kind of hard to justify your credits because the open source by itself is not the core business of a company. There are very, very few companies who had their core business made on open source. Like PingCAP could be one of them. PingCAP start with open source and then find monetization plan. But that’s so small. So few of them. And in the new areas, a lot of companies, their core business is open source and plus monetization. Even for IPO companies, for public list of companies, Minimax is basically one such example. They have their best models, open source. and then trying to make money. So this is very different. If your core business is open source, of course you will put more resource on open source. And it’s more likely for your project to gain a lot of developers. And I feel that the third one is the international collaboration has never been easier before, apart from language barrier. So after the pandemic, I feel that all of a sudden everyone is used to like Zoom and Hangout and collaborating with someone who you don’t see face to face. And this is a great chance for open source project to ramp up. Because before that, you have to meet face to face, and the bandwidth and the people you can meet is kind of limited. And now you have a huge, like, as long as your project is great, like you have a huge pool of potential developers. And the last one is probably AI by itself, like coding agent itself. Although it does make code review much harder because there are probably a of AI scope. But it really lowers barrier of who can contribute to an open source project. Before, you want to contribute to a project. They are probably developing language you do not know. And also, the code base is pretty strong. A developer might not be able to. contribute to the project until he has a very thorough understanding. And that’s probably like months of work. Now you can just ask AI how this part works. And I only need this feature. And what are the code I need to modify? And I can just give it a test locally, and it works. I contributed to some Rust project without being a Rust expert. So that’s how AI makes everything better. So I think it has all these reasons. There are probably more, but I think someone at the end of the day, in two or three years, maybe starting to write some history about how every single aspect of technology, moment and everything, people’s mindset shift, how to cultivate the open source spirit. But I think, yeah, that’s the... Top ones coming out of my mind. Grace Shao (24:26) And then the second question was just that would researchers focused on their own name and frankly ego in this sense actually be the best way to help cultivate the best LLM or whatever whatever product that’s the end product that’s to be shipped. Does that make sense? Because it kind of contradicts what Yao Shunyi was saying on Zhang Xiaoxuan’s podcast, right? He was saying in that sense a lot of researchers will try so hard to only own what they are working on, but they have less of a sense of responsibility for the bigger project. Tiezhen Wang (25:00) Yeah, I haven’t really read the broadcast entirely. But talking to your point, feel that open source or not, or having researchers name these data on papers or not, it’s actually a game changer. If you are a researcher and you might want to stay in academia, why? Because all the papers you publish is very important to your career. Everyone sees. Like you have published which paper with like who and the paper well like also mentioned that you contributed which part of work are you Look like my corresponding author. Are you the main person or you are just like contributing a part of it? And you there’s h-index to measure the impact of a researcher So like you have all that infrastructure working for you if you stay in academia But if all of a sudden you want to start working for company, you lose off that because the company might be able to say, we have a policy where all your open source and all your paper publications, even writing a blog is controlled by the PR team and they have to decide if you can do certain level of things. So your exposure is reduced. You might be very happy because you earn much more, like 10x the salary compared to be an assistant professor. Like after five years, if you ever want to go back to academia, that’s impossible because you lose all of your track record. And with open sourcing, things are very different. The top nailing app wants to hire the best talent from academia. And the people from academia wants to work for the app. But at the end of the day, the two system It’s the same system because all the record is public. So it’s very easy for people to come in and come out, come in and come out. It’s kind of different from people coming from academia and then lose track in and get lost in the company world. So I do feel that having your name listed on the work you publish is very important. It is kind of the concept of [Chinese phrase unclear]. So you. your branding grow with whatever you have done. So your reputation is based on whatever you contribute. So you need to pay extra attention on that. Grace Shao (27:16) I see what you mean. And the last bit of just now what we’re talking about was just why is that if you believe open source makes so much sense to these researchers, that so many researchers in the US or at least some of the companies, the entities right now are not willing to go open source. Tiezhen Wang (27:31) Yeah, there are lot of companies who change their position. Google used to be the company which impressed open source the most, like Google open sourced and like TensorFlow Kubernetes and bunch of other important open source projects. The open sourced transformer, which is the cornerstone of our modern AI system. So Google was really impressing open source. But I feel that At some point in time, Google stopped doing all the open source work because they kind of lighting OpenAI and other companies taking the free ride. Google did a lot of fundamental work and then it’s kind of taken by other companies for free. And also, OpenAI and Anthropic, I feel that they are still contributing to open source, but that’s not their main project. Their main project are hidden secrets that they do not want to share so that other people can catch up. And the research and business is getting so coupled. So for example, a researcher in OpenAI found a way to improve the intelligence level by 10%. Let’s take o1, for example. They managed to find out how to make model think. And by this chain-of-thought and thinking process, the model is like way much smaller, way much smarter. But they do not want to share the gist. As long as they do, other people will catch them up. So it’s kind of a very restricted environment. Although researchers may still want to open source some of their work and have their name listed there and sharing very detailed observation. But the business doesn’t just allow them to because you are trying to. Yeah. also, researchers can also talk to like in some conference. I know how open source, sorry, I know how o1 was roughly made by reading bunch of YouTube videos from the internet made by OpenAI researchers. But after that, you see less and less very detailed research sharing, even the videos or recordings by them. I guess they kind of learned the lesson. But like, yeah, yeah, that could be part of the rhythm. On the open source world, like, it’s kind of different. Like, so before DeepSecR1 was released, lot of people were speculating how OpenAI was doing o1, and they’re trying things in different way. And after DeepSecR1 is released with all the recipes and all the data they shared, like, the open source world seems to be converged on the path. Although that path might Grace Shao (29:46) There can’t be compliance reasons. Tiezhen Wang (30:12) be different from o1 because we never know how o1 was made. But then because of DeepSeek’s contribution and sharing, everyone knows how to make thinking chain. And the whole ecosystem is evolving really, really fast. That’s one of the real value of open source because everyone can just collaborate. No one is holding secrets. Well, there are still a lot of secrets on how you can run as efficient as DeepSeek, but that’s Like too technical, like that’s not too much on the research side. yeah, like I... Grace Shao (30:44) So on that note, yeah, I want to follow up on that. I think I recently wrote about something which is, speaking to our researchers, it got me a sense that DeepSeek in a way is now becoming essentially like a foundation for everyone because, you know, a lot of the labs in China are looking to DeepSeek to see if there’s any like, you know, engineering breakthrough, like your point, and they build on top of each other. Help us understand like each of the labs, because you said, they’re cost constraint, they’re compute constraint. Tiezhen Wang (31:06) Thank you. Grace Shao (31:12) They’re teleconstrained, right? Their resources are constrained and every single asset you can think of compared to the American peers. Now, why does it make sense that they all open source and how are they all optimizing for their own goals at this Tiezhen Wang (31:24) Yeah. So open source by itself, as we just talked about, is an accelerator of the whole ecosystem. So DeepSeek shared all their like, knowings and discoveries and what things work, what things doesn’t work. This by itself is accelerating the whole industry, not just Chinese open source, but also like US open source and US like closed source. Like they just don’t say how much they learn from DeepSeek, but I believe everyone is learning from DeepSeek. Not just that, DeepSeek also contributed to GRPO, which has become the most used algorithm, reinforcement learning algorithm in the industry. So they did a lot of contributions. if you check recent model architecture evolution, what’s proposed by DeepSeek is becoming the standard and getting adopted by many people. For example, Kimi 2.5 was using a model architecture very similar to DeepSeqs. And GLM 5.1 was adopting a lot of components from DeepSeek architecture as well. So it’s kind of sharing and learning and co-evolvement is one of the, I would say, secret of how China is able to. catch up with the US in certain area, although having restricted compute and restricted capital, I would say. If US open source is working again, like the whole ecosystem, like everyone was trying to open source, I would say the human race would be evolving much faster than what we are doing now. Grace Shao (33:01) So on that, how do we understand the accusations of what is being distilled? What is technically shared? What is, how do I understand the gray area of that? Like the accusations from a lot of American labs, Chinese labs right now, like you just said, a lot of American labs are learning from Chinese labs. Frankly, within the researcher community, it’s not even Chinese versus US, it’s really just labs with each other and against each other if they have to, right? Intellectually competing. So then how do we understand what the, industry agreement is on a distillation, why is it so contentious right Tiezhen Wang (33:32) On distillation, yeah, that’s a great question. I can only give you my perspective. first, distillation is a very broad word. We are distilling from each other as well. I learn from you, you’re learning from me, and we are all learning from books and papers and all this public information. So I would say, distillation is a very common practice, like basically how you learn from others. Like you might have a model which summarizes the books and like doing bunch of explorations. And the way for the model itself to move forward and evolve is to distill from its historical data and historical experiments. And like that works for like another model trying to learn like your model as well. And on the research field, distillation is very common. DeepSeek R1 was released with MIT license. Specifically, so I actually asked the team about it. They choose MIT license because they want their model to be distilled by others. Because that was the only model that works really well with the thinking chain. And they want all the open source model to be able to have that. Like they have shared all the recipes, but others do not have data. So DeepSeek design like they’re like small models so that and also the recipes so that other people can easily distill DeepSeek, getting the thinking chain and use that on their own models. like this distillation is happening like everywhere. And I think like US companies are distilling from each other as well. Like I’ve seen like the recent discussion on Twitter in public. where Elon Musk and Sam Altman were kind of battle on that. yeah. And if you think about it the other way, so if you do not allow a model to distill, I mean, the output of a model to be able to train a model which is from a competitor, it’s kind of a very interesting point. Like if we say, I’m reading a book. I’m telling you the story. So you, after reading the output from me, which I think of me as a model, you’re reading my summary and you are not allowed to share the summary to others. You have to read the book, the initial book, not using my summary because of the license, et cetera. That’s kind of ridiculous. That’s not how human transfer knowledge in the past a few thousand years. Like I have a very bold argument. I think that anything like generated by AI should not be copyrightable. So like it should be in public domain, like anything generated by AI, because like anything generated by AI is a distillation of like human entire history and everything that human has created. And if you just take that for free and asking other people do not use that. Like it’s kind of a waste and it’s kind of like blocking people from evolving forward. Because like human content do not have this restriction and why you are putting this restriction on something not copyrightable and generated by machine. So that’s something I do not really understand. So I do see there are terms and conditions saying that my model output cannot be used to improve other models. But I don’t think that’s kind of valid. I’m not sure if someone eventually will file something on the court and we can have a case on that. currently, think there are a lot of things to discuss, but it’s not about if we can distill a model or not, but about something bigger. Should the model creator even have this right to restrict others from distilling from their models? Grace Shao (37:18) That’s really interesting. I think that a lot of the discussions in the public space is really about whether you can use copyright work of human output. And then the argument is always like, just you cannot distill because the company said there’s no distillation allowed. But like to your point, there is no actual clear black and white rule of regulation around this right now. And in fact, it’s it’s bit murky. Yeah. Yeah, yeah, that’s interesting. Tiezhen Wang (37:37) I’m not a lawyer, but I can find a clear answer on that. Grace Shao (37:43) Okay, I want to kind of go to China. Like we’ve kind of talked a bit about the big picture. Well, a lot about the big picture. But let’s look at just the China labs. mean, I know that you represent APAC back then with Hugging Face and you worked around APAC, you lived in Australia. But for the sake of this, know, Chinese labs right now probably are the most relevant out of APAC. Do you think I’m missing anything actually on the APAC conversation? Like, do you think anyone else in the region is relevant in this space that we can talk about? Tiezhen Wang (38:08) Korea is doing really, well. Yeah, Korea is really well. Well, the most, one of the best model is probably Upstage. they, initially they create a Korean model leaderboard, like open source version of like model leaderboard. Well, no, no, the leaderboard was not funded by government. The, the, the, Grace Shao (38:10) Yeah, yeah, give us some picture on that. That’s funded by their government, right? That’s their government funded. Tiezhen Wang (38:26) So Korean, it’s actually a very impactful country, but as the other days, there aren’t enough Korean data. Even for ChatGPT, I think until ChatGPT 4, the model doesn’t speak good Korean. So the model was able to speak very good Chinese from day one, like from ChatGPT 3.5, but because of the data volume, et cetera, speaking Korean was always a challenge until ChatGPT 4. At the time, like now, the open source model is able to speak like Korean. So Upstage create a leaderboard. So the way they solve problem is very interesting. They’re not solving problem by solving problem. They’re solving problem by helping others to solve the problem. So instead of creating a model right away, they create. Grace Shao (39:09) I heard about Upstage from VC in Korea as well, but I don’t know the detail about it. Tell us more about who they are, what they’re doing. Tiezhen Wang (39:15) Well, I don’t know too much about who they are, but I only see their open source contribution. I think the founder is a professor in Guangzhou, but he’s Korean and moved to US. Correct me if I’m wrong. I’m sorry. I’m not really up to date with that information. But I just want to call out because I think that’s a very interesting paradigm. For example, if you are a company, you have your own problem you want to solve. Tiezhen Wang (39:42) Like, how do you want to solve it? Like, you are going to hire some people and define a problem and try to use your own people to solve it, right? So that’s the old way. What’s the open source way? Is you publicly define the problem. You have a leaderboard. Like, you might do a private eval or public eval. It all depends on you. the problem is you have to list your problem. publicly and you have to tell everyone that you can contribute to this problem by submitting a model to a URL and we will do evaluation and see how each model is evolving on this area. So they basically have a leaderboard. And you will like a lot of researchers would be very interested because now they have a problem to solve before they do not even know Korean what’s the problem. So now they have a problem to solve and you will see that the curve goes like. it goes up because there are more and more researchers coming in and all their work are open sourced. So a new researcher wants to jump in the field. They will first have a look on the leaderboard to see how far away from a really usable benchmark. And then he can investigate all the previous attempts and find his own way of kind of just changing something a tiny bit. and apply that to the past people’s work and submit to the leaderboard. And now we are seeing people making progress on the leaderboard. So that’s a very, very clever way because it’s not one company solving the problem. It’s like we are opening the door for everyone to come into this playground and try to solve the problem together. I think within a few months, they were able to get thousands of submissions. which is really massive because just imagine you hire 10 % people, you won’t get that. And now it’s by this new way of doing things like building public, evolving public, you’re having a lot more submissions and you are educating people, et cetera. So they have this very impactful and inspiring leaderboard and then they release a model called Upstage for something. I can’t remember it has been a while. And the Korean dataset and the Korean models are accelerating very fast on high-netics. I think it is now the fourth largest models, speaking Korean. Yeah. Grace Shao (42:04) Very interesting. Yeah, I’m going to shamelessly self-plug in. People can listen to the episode I recorded with one of the leading Korean VCs as well that was published last week. He gave a good AI ecosystem breakdown of stuff. Tiezhen Wang (42:12) okay. Yeah, could you help me like do some like DD first and like just make sure that are correct. Yeah, you can. Grace Shao (42:22) Yeah. No, no, no, he did talk about Upstage as well. It’s very interesting. Yeah, I want to... sorry, go on. Tiezhen Wang (42:29) Yeah. And also, so you asked for APAC. So in Singapore, there are a lot of great researchers, like lot of Chinese researchers will go to Singapore as well, like Cancun too. Yeah. Grace Shao (42:43) Yeah, I think the ecosystem is a bit overlooked by I think Western markets, but definitely there’s a lot happening in around Asia. Like APAC has been including Australia as well as Southeast Asia, East Asia, and Northeast Asia. Okay, I want to bring it back to China. We’ve been kind of talking about China kind of more on the high level sense. Now looking at the companies themselves or the labs, we want to break it down. Just give us a sense like, how do we understand moonshot? Mini, Max, Deep Seek, Zhipu, if you have to put it in one bracket, versus the hyperscalers, Tencent, Alibaba, and ByteDance, in terms of their strategy, in terms of the capabilities. Like how should we understand this ecosystem right now? Are there other relevant players that you think I’ve missed, maybe like Xiaomi or anyone else? Tiezhen Wang (43:25) You mean like how the model creator, model lab, are collaborating with hyperscaler? Is that your question? Grace Shao (43:32) No, no, I just think it’s like the people, the people who are creating LLMs, like are researching on how to deploy LLMs. These are the main players, right? Now, how do they defer? How are they similar? What are we seeing like on the ground? Are some of them becoming more irrelevant? Are some of them becoming maybe say, we just talked about DeepSeek becoming almost infrastructure provider for the whole ecosystem. Tiezhen Wang (43:39) Yep. Grace Shao (43:58) You know, are that mini-max is very, focused on multimodality. Zhipu is very focused on coding capabilities. know, Alibaba really trying to push out commercialization by their existing applications. How successful that is, that’s a different question. Just like an overview of these players. Tiezhen Wang (44:14) Yeah, I do think they’re kind of converging. Yeah, because everyone knows that coding is, the whole market for coding is booming. And if you have a good coding model, you can sell it for profit, for large profit. And I do feel that everyone is rushing for coding. There are people exploring different things, like, For example, Tencent is putting a lot of efforts on Hunyuan and doing OCR stuff. And lot of other companies are doing video generation. But at the end of the day, think from a strategy level, I don’t feel that there are a lot of difference. It’s more likely a case where, you have data? For example, it makes a lot of sense for ByteDance and Kuaishou to work on video generation models because they have a ton of data. And also, do you have? like a large enough scale. Like for example, Kimi is not very active in making all the apps. Like Tencent is making models. They are making like great apps. Like for example, Yuanbao, like a QA app, like based on all the Tencent data, it’s very popular. Like they make QClaw. Like Tencent is able to do that because Tencent has a huge talent pool. Like Tencent is a huge company, Whereas like if you look at the Kimi, Kimi is very conservative in... doing all that because Kimi is still a very small company. So I think from a very high level, everyone was on the same page about the strategy. It’s just more, how much resource do you have? What are the advantage of you? Do you have data? Do you have distribution channel? Do you have product design, success story, et cetera? So yeah, I’m not sure if I answer your questions. Grace Shao (45:57) No, no, that’s good. So we kind of talked about why researchers want to open source. We talked about these companies are somewhat doing the same thing. So then this leads me to the question. We know that open source, open weight does not actually mean they don’t make money. However, obviously means that it’s harder to commercialize as we like alluded to with the US labs, why they make those decisions. Then how do these companies find ways to monetize and sustain their businesses then? Tiezhen Wang (46:21) Well, in the US, are also labs dedicated in making open source models and still making money from other donations or from other parts, like selling apps, et cetera. It’s basically the same way in China, too. For example, DeepSeek is run by, I would say, donations from the people who play the stock market. like there are labs run by VCs and lot of labs are already profitable by like selling tokens like GLM has recently raised the token price because like they see a huge number of demand and they’re like running short on compute. yeah, like open source can make money. Like there are a ton of ways for open source model provider to make money. I have a lot of ideas. if in case you are interested in like making your not profitable, can contact me. But honestly, there are lot of ways. The simplest way is to sell token. If you have the best model, you can sell a token for profit and people will actually buy your token. so it’s very interesting because when we combine science and technology, always consider it’s the same thing. Grace Shao (47:15) Yes, everybody find Tiezhen Wang. Tiezhen Wang (47:37) For model, it’s the same. When we think about models, we just think of a model that generates tokens, et cetera. But actually, there are two different parts. The first one is training, where you have the model. And after you get training, open source the weight you trained. Another part is the inference. So you need to run a lot of optimized CUDA kernels in order to make your token cheap and fast. Either bracket can make a lot of money. For example, you can open source the fine-tuned model, not the base model. So if a company want to use open source model for fine-tuning on their own data, they cannot be building on a fine-tuned model. cannot build. They have to find the base model. And if the base model is not open sourced, you can sell that for profit. And also different clients might have different requirements on the model. The NeoLab can collaborate with the client directly and provide some kind of training and post-training support. So that’s a way of making a lot of money, actually, because training is very expensive. It involves very expensive researchers and data and compute. On the inference side, too. Grace Shao (48:45) Yeah. Tiezhen Wang (48:52) Because the inference is tightly coupled with the data center you own. So your optimization strategy does not, there’s no guarantee that your optimization will work on a different cluster. So a lot of people just do not open source the inference recipe because it’s not that useful. And also it’s kind of a moat. So the model provider who creates the model, they know how to optimize the model best. when the model is released because they have seen the model for four months and they have done a lot of optimization on the model inference. And when the model is out, like everyone else, it’s just starting to know the model and doing some optimizations. So of course, the model provider will sell token in a much efficient way compared to all other competitors. Three months later, when the outside inference provider gets to know all the secrets and do very optimized kernels, there’s a new model coming up. So the model maker, the people who know the model from day zero, always have an advantage on selling the tokens. So that’s one of the very important ways how they can make money. Grace Shao (49:59) I see what you mean. Mm-hmm. Yeah. And does DeepSeek v4 coming out have an impact on how the GLMs of the world or Kimi make money? Like essentially their strategy with the fact that you just said they just raise prices on their tokens. Tiezhen Wang (50:16) Yeah, so GLM and Kimi doesn’t sell DeepSeek or Qwen. So they are not competing with each other directly. I would say the capabilities are on par with each other. So it’s more like a user test. Which one is better? There is no clearly winning between all three models. So we’ll see like Zhipu’s stock price was getting down because people were so worried about DeepSeek. But then they realized that like the Zhipu token selling is not quite impacted, so the stock price bounced back. But at the end of the day, I would say it’s actually a good thing for them. So GLM 5.1 is adopting a lot of core design in DeepSeek with 3.2, I think, model architecture. And they were able to cut down the cost. by adopting all these exploration from DeepSeek. And now V4 Pro is out. I don’t know the details, but a very simple guess is that Zhipu is able to cut down the cost because they can adopt new things from DeepSeek architecture. So Zhipu on one side, because of the demand is so high, so they can increase the token price. and they can learn from DeepSeek and cut down the cost. So Zhipu is going, yeah, exactly, exactly. Grace Shao (51:34) you have a higher immersion. Yeah, this is something I think they’ve talked about as well, like really being able to learn from the engineering breakthroughs that DeepSeek puts out every time. Okay, I have mindful time. I just kind of want to have a few questions on the future outlook. You posted on X recently saying that you’ve been thinking a lot about how do we make AI bootstrap itself? And you you’re going through this transition yourself, you’re thinking about the future of AI. What does it mean for the open source future as well? Tell us a bit about where you stand right now and how you think of this bigger picture. Tiezhen Wang (52:06) Yeah, I’m still doing some exploration on my side. I think this whole AI bootstrapping logic has already been implemented by a lot of big lab internally. The idea is very simple. In compiler world, you can design a programming language and write a compiler probably in well-known languages like C. And then you will first implement this language using the C code. In the next iteration or after a few iterations, you are able to implement this language using your own language. So it’s called bootstrapping. You are basically evolving on your own. You are not relying on something which is not from your language. So it’s like putting it another way. If you see how normal living creature, how they replicate itself and how they evolve. I don’t need to have a screwdriver somewhere to engineer my kid, right? My kid’s just born. All by itself. But how far are we from AI to do similar things? Now we have a coding agent very powerful. We have our AI training pipeline recipe kind of stabilized, at least for small sized models. So are we really far from like AI able to get one of my idea, like I give him the direction, and he’s able to like first bootstrap a very simple version and gradually evolve towards that goal. Like I think it’s like highly possible. So at the end of the day, we might be able to like just tell him what I’m going to do without like giving him all the harness and all the like detailed guidance and. I’m not talking to him 100 times, and he’s able to first lay out what he needs to do and have a plan, and then probably design a DSL or agent all by himself. And probably he will create a model ways to help him to get adapted to this goal. And then he can just keep evolving. All I need to do is to give him more fuel. which is compute, and he’s able to do some evolution and all by himself. It’s kind of like if you have recently read Andrej Karpathy’s Twitter, there’s a concept called auto-research. But auto-research is just evolving on the model weight. It’s not evolving on the agent and harness. I think on the agent level and harness level, there are also a lot of things to do too. So I’m quite new on this journey. What I was able to do is to bootstrap a very simple agent and I can use that agent to optimize the agent. But I think eventually we will get the weights involved too. When the model realized, okay, I’m not just needing an agent, I can create a bunch of data and improve my weights. He’s able to evolve from that too. Grace Shao (55:05) So in the future, how important is the capability of the models versus the harness and then the industry expertise then? Because right now, so much of conversation is still about, you know, the models are very strong. We are seeing what you’re saying already, the agent’s starting to build out things auTiezhen Wangatically. But we still need the taste. We still need the industry expertise to guide them. I find it hard to imagine that, you know, you can plug in something just say, want this to be done. And the agent just starts doing it exactly to your taste and your... imagination? Do you really think that’s happening? Tiezhen Wang (55:34) Yeah, I do feel that it’s happening. Like we are using agent to like especially coding agent to code something that we are completely unfamiliar with. And I’m quite confident that it will actually work. The reason is that like I have defined a set of goals and as long as I see that is moving towards that direction, like I’m good. I do not need to understand the code line by line. Like it’s just the box. But like the difference is I’m using the coding agent. Grace Shao (55:58) Mm-hmm. Tiezhen Wang (56:02) to do something else. What I can do is to use the coding agent to improve coding agent itself. And using the coding agent to generate the data and train the model that coding agent is using. And I would call that a bootstrap, not like just using, like, I think I’m already quite happy with coding agent to do something else. But just like, yeah, yeah. Grace Shao (56:23) Interesting. I want to end on a more philosophical note. So do you view the argument that AI is going to replace humans then? Or do you think AI is going to be in the role to support humans if we could keep on going down this path? Tiezhen Wang (56:35) Well, it’s actually a very, interesting question. And I feel that people do have different feelings. But from a pure technology point of view, I do feel that it’s one condition of like, so technology is not the only thing that will decide everything. You mentioned that if it’s going to help humans, well, it’s not really technology by itself to decide. Like it can be used in different ways, in different social structure, in different like tradition and such. It’s like giving you a gun and you can do things in good. Yeah, it’s not. Yeah, yeah. But like just imagine that you’re going back to history with all your knowledge of modern society. Are you going to help the Grace Shao (57:12) It’s not a good analogy. But yeah. Tiezhen Wang (57:26) like the society, like the history you go back to? Or are you able to help? Like I think it’s basically the same. If you have AI that knows everything, you can just think of it as human being in like 2000 years in the future. And now you have it. And what it is going to help on the society. Like it really... Grace Shao (57:31) Yeah. It’s like that saying, your own capability of using it is the cap of itself. Also, I think there’s a lot of argument and discussion around the fact that even the society as we know it today, the knowledge work that we all have, that we normalize, are not even created until the recent 100 years. And if AI is to disrupt that and replace human in that sense. Why is it so bad? Because it alleviates us to do other things that human multifaceted beings that we are can do. Is that kind of part of the argument as well where like, even if it replace us or helps us, it’s only helping us actually alleviate some of the things, if you take a step back, the things that we don’t want to do, right? Where we can maybe go touch grass. I don’t know, maybe this is very optimistic view of it, but there has been people saying like, The cap on AI capability is a cap of your own intellectual, your own cap of your own ability to navigate or use AI. So the more you can use AI, the more it can help you. The less you can use it actually, the more it will replace you. Tiezhen Wang (58:45) Well, I think it’s very interesting to define what is you. Are you defining you as everyone, or are you defining you as people who have compute? Well, no, it’s not. It’s actually a very, very, very interesting question happening right now. You know, Anthropic coding agent is able to do lot of things. But people are of imagining that we are Grace Shao (58:53) We’re getting really philosophical now. Tiezhen Wang (59:09) Everyone is getting a lot more powerful with models. But what if one day Anthropic just say, you cannot use your coding agent to do certain things? It already happened. Anthropic said, you cannot use your agent to do auTiezhen Wangated tasks. The other thing, there could be other limitations. I have a very bold argument is that the reason why we are able to use AI so cheap that even us, like we do not own a data center, right? Even us can use that. It’s because our data is still valuable. You know, if you use subscriptions, your data is going to be distilled by Anthropic to further improve the model. And like they are able to give us a discount because they still need our data. Grace Shao (59:52) So your point is that one day when they capture enough data, they will not even give us this kind of access for free or for cheap price. Tiezhen Wang (59:59) It depends on how they define you. You ask them, like, you or something. How they define their user. How they define who could be part of the game. Like, if one day, like... Grace Shao (1:00:09) So then the question is, no, but then my question is, then there’s another argument where they’re saying too much power is in the hands of a few companies right now, right? Or a few founders, what not. We need open source, that’s your point, right? No, that’s really interesting. And that was actually gonna be the last question I was gonna ask you. What is one differentiated we hold? And I think you’ve already answered that in that sense, right? Yeah, I think it’s for us to really think about it. But then as the average user, my question is, how do you actually boycott? Tiezhen Wang (1:00:18) That’s why we need open source. Grace Shao (1:00:36) these companies or if not boycotting, how do you actually make an impact? Because if I’m not the developer creating an open source model for the average person to use, me as an average user, what do I do? Tiezhen Wang (1:00:47) Well, just use the model to do the thing you want to do. Try to embrace the model and be more patient for open source models because obviously the open source model is not as good as top tier closed source models. you kind of like, well, I mean, with open source models, you keep all your secret to yourself. So you can have. like better security and you have better control. Open source model will never betray you if you just write on your local laptop. So although the model is not performing as well because he’s not distilling you, right? So still you can trust on your open source models and give it a more task to do. Grace Shao (1:01:20) You host yourself. Tiezhen Wang (1:01:34) I do feel that a lot of open source model is actually capable of doing things. But the expectation might be, think of it as six months, like cloud version of, sorry. Let me put it another way. So think of it as old closed source models and be patient with that. And you can grow up with the open source model together. Grace Shao (1:01:54) That’s very interesting. Thank you so much for your time, Tiezhen Wang. Tiezhen Wang (1:01:56) And thank you, Grace. AI Proem is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to AI Proem at aiproem.substack.com/subscribe [https://aiproem.substack.com/subscribe?utm_medium=podcast&utm_campaign=CTA_4]
28 episodios
Comentarios
0Sé la primera persona en comentar
¡Regístrate ahora y forma parte de la comunidad de Differentiated Understanding!