Bilawal Sidhu Podcast

How Canva Built a $32B Design Empire (and Why Adobe Should Be Worried)

Canva quietly built a $32B empire — now it's redefining how the world creates with AI. In this deep dive, I interview Canva Co-founder & CPO Cameron Adams about how the company went from a web-based underdog to the OS for visual content, used by over 240M people every month. You can also watch it on YouTube [https://youtu.be/u4pOjXth7EU], Spotify [https://open.spotify.com/episode/2LUmivtPNOAtMnhYNIALHe?si=f5659caba3b1478b], or X/Twitter [https://x.com/bilawalsidhu/status/1952426099297644917]: Some topics we explore: 1. Why Canva is used by over 240M people every month. 2. The 3-tier Canva AI strategy: deep tech, partnerships, and marketplace. 3. How they're bringing AI video and code generation to their platform. 3. How batch creation and real-time analytics enable mass personalization. 5. Why Canva believes the future of design isn’t about features, but outcomes. Whether you're a founder, designer, or product nerd, this is a masterclass in scaling creativity at global scale. Hope you enjoy! Transcript formatted for reading: Bilawal Sidhu: What happens when you take professional design tools out of the hands of the 1% and give them to everyone? You get Canva, a company that's quietly amassed 240 million monthly active users, powering everything from TikToks to Fortune 500 pitch decks. Now, imagine having a graphic designer, video editor, and creative director at your fingertips, all powered by AI. So, how did Canva pull this off and where are they going next? Today, I'm sitting down with Cameron Adams, Canva's co-founder and chief product officer, to dive deep into how they're building the everything app for visual communication. What the future of human AI collaboration looks like and why AI might just create more creative jobs, not less. Let's get into it. Cameron Adams: The vision for Canva has always been about reducing this job that could take months down to minutes, and AI is letting us take it from minutes to seconds. You had tools that 1% of the world could possibly use. So we set out to create this product that enabled the other 99% of the world to access design. Bilawal Sidhu: More than 230 million people design with Canva every month. 373 designs created every single second. 240 million monthly active users. That is a flabbergasting number if you think about it. Now it's just table stakes for making a pitch deck, like, that's the tool you use. Cameron Adams: So that quickly takes one design that you're creating and multiplies it by 500 and then lets you do that in the space of minutes. Bilawal Sidhu: Do you expect it to change how people build interactive experiences? Cameron Adams: It is literally months to create some of these things that you can now generate through Canva code in minutes. Companies like Adobe very much approach it from the opposite direction. So when they try and bring something like a Canva to to life, they just don't have the same DNA. Bilawal Sidhu: Cameron, thank you so much for coming on the show. Cameron Adams: Such a pleasure to be here, Bilawal. Bilawal Sidhu: Congratulations, hot off the presses. Canva is number five on the CNBC Disruptor 50 list. Sounds like an incredible validation of the journey you've been on. Cameron Adams: That is a pretty good reflection of of where we've been and also where we're going. Still disrupting. Bilawal Sidhu: Well, speaking of the journey, let's go back to the genesis story. How did Canva go from being a niche design tool to one of the largest SaaS companies in the world, almost reframing the entire market for design and creation tools? Cameron Adams: Started out in 2012. Design for everyone was a category that did not exist. And we all realized the immense power of design to enable people to achieve their goals, but the fact that it was highly inaccessible to most of the world. You had tools that 1% of the world could possibly use, that they could afford and that they could understand and that they could actually get anything good out of. So we set out to create this product that enabled the other 99% of the world to access design. We started small. Uh we mostly focused on social media in the very early days, but in the intervening decade, we've expanded to presentations, you can create T-shirts, videos, websites, pretty much any visual content can be made on Canva. And it has truly inspired people. We've got now 240 million people that use the product every month, and they use it for an incredible variety of things. But it's really fascinating to dive into each and every design story of every person that uses Canva, because they're using Canva for impacting their own lives, whether that's putting a poster up in the window of their store or creating a pitch deck for their first business, or helping their nonprofit get more donations. Every design that's created on Canva has this amazing story behind it. Bilawal Sidhu: I love that. Yeah, and 240 million monthly active users. That is a flabbergasting number if you think about it, and it goes to show that everyone can be creative if they have the right set of tools. But I have to go back to the start and ask you, you know, one of the unique things about Canva was it's completely web-based. And I still remember back then, you know, of course there's like the Adobe incumbent tools like Photoshop, what not. Sketch was super popular, like native apps were super hot back then. Y'all went web first. Did it feel risky back then betting on emerging web standards to build a freaking design editor in the browser? Cameron Adams: I I love that you captured the exact zeitgeist of tech at the time. It didn't feel risky to us. It felt like a really interesting thing to explore because one of the things about the existing tools was the kind of barriers to access. Uh and downloading a big piece of software and learning how to use it over six months was just not something that people were going to tackle in order to get a very small job done. So giving them easy access to the tool through their web browser and letting them get up to speed in seconds, not months, um was a really critical part of helping Canva grow in the early days. We think that's the only way that you can truly grow a product and scale it. When you're talking about 240 million people, you need to have all these inroads through which they can reach the product, and the only way that we could think of doing that was through a web based product. Bilawal Sidhu: Yeah, it's funny. It's like my dad uses Canva for his Substack and like Twitter account now and it's it's kind of wild. It's like the time to first creation is just so low, like it does feel so accessible that you can create something versus, yeah, like you're saying grappling with convoluted tutorials and following stuff along just to get to something that feels like the equivalent of a hello world example. Cameron Adams: I love that your dad has a substack. Bilawal Sidhu: It's it's wild, yeah. Let's uh for a while he had more followers than me on Twitter, you know, it's just like he's got spicy hot takes on there. Um, so I am curious about the web stuff. Like did some of your work at Google give you any conviction that web was the way to go? Cameron Adams: Uh, not specifically my work at Google. So I've been working in the web for probably a decade and a half before I got to Canva. My work at Google kind of touched upon that, and I did a lot of work there prototyping what could actually be done in a web browser. I'd written a few books on JavaScript before that. I've been in the web standards scene. I think as you mentioned, like deeply in the tech, thinking about the types of technology that we needed in a browser to actually create apps and create really interactive websites. I've been pushing the boundaries of that for a while, and Canva was just kind of the next step uh beyond that. It was very much a nexus point of for for me of my experience, what was possible, the idea of Canva itself, and it all just coming together into this neat little package where we could be on the cutting edge of what browsers could do, like we were still testing out how many images you could have in a design, how to manipulate text, like how to have all these moving parts that could be rendered on what was essentially a web page at the time. You know, web apps was still something that was forming. Just on the cutting edge and at the right time to bring that into the browser and make it accessible to people. together with our product philosophies of democratizing design, it was just the perfect storm of technology innovation, bringing product to market and having the right customer base that needed this product just at the right time. Bilawal Sidhu: Now speaking of democratizing design, I'm curious how have Canva users changed over the years, right? Like when it started out, it was bloggers, marketers, small businesses, um you know, that wouldn't have traditionally, you know, had access to let's say like a like a staff designer or something. But eventually professionals and large enterprises came to the party and along with it immense growth. Uh talk me through the evolution of Canvas's user base. Cameron Adams: Canvas's user base now is just so broad. It basically encompasses the entire population. And I think that has mirrored the evolution in visual content. So 2012, Instagram was still getting off the ground, you know, it was mildly popular, Pinterest was a thing, but people were mostly consuming visual content, they weren't creating it. Uh and I think in the intervening 13 years, we've seen a lot more participation in the content ecosystem. And now everyone posts photos, everyone posts designs, everyone posts videos, people create websites. You got people spinning up businesses overnight. Like it's become a lot easier to get stuff done and to do it through the medium of visual content. Small to medium businesses started coming on because they saw Canva as a really great cost-saving tool, but also a great tool to start scaling their marketing. Um and from there we introduced presentations, which opened us up to an entirely different audience that wasn't just creating social media um graphics, but also creating presentations that got shared with other people and consumed in different ways. Um we opened up to print products. So people can get uh business cards printed, t-shirts printed, tote bags printed on camera that opened us up to another audience. Um we've added on videos and websites. We have long form text documents. Uh we now have uh camera sheets to bring data into the equation. It's like each time we're kind of building out these shells that push out the audience even further and basically make our total addressable market the entire planet. A few years ago, we um added in real-time collaboration into Canva. um and that was a game changer. We added it in at just the right time. It was pre Covid. So as uh as COVID kicked off, people were looking for new ways to engage both internally, their teams and externally their customers. Uh and they needed new tools to really communicate. Um and visual communication became even more important. So we saw things like video presentations, talking heads, uh sharing your presentation decks, just more visual ways of getting across the message that you're trying to explain became super popular. So our charts for presentations went from like 20 million a month to 50 million a month in the space of a three-month period as COVID started. Um and we started seeing that mirrored across all of our design types such as videos and websites. Bilawal Sidhu: Yeah, I love that. I mean it's a creativity suite with multiplayer, you know, baked in, you know, at a very core level and I think it makes it so useful. There's so many 3D VFX motion graphics type creatives I know that, you know, maybe previously would have scoffed at using a tool like Canva. now it's just table stakes for making a pitch deck. Like that's the tool you use and just like why go into keynote and have to deal with a bunch of like version updates and collaboration changes. It's just becomes easier. Cameron Adams: Yeah, we've seen um professional designers really embrace it. Uh we we kind of see it as two sides of the same coin. It's like creating really highly crafted elements. Like you might create a full brand kit, logo, etc in professional software like Affinity. And then we see cameras the other side of the coin where you scale your design work across your team. Um so we constantly see uh brand teams and design teams bring camera into their organizations because they want to scale the work that they're doing across all the other teams in the organization. So when they create an amazingly well-crafted brand or amazing marketing, and then they want to get that out to the marketing team, the sales team, the internal HR coms team, everyone that has to create content and interact with the brand, they go to camera to do that because it's the easiest tool for the rest of the team to use and it's the one that enables the brand team to scale the quality that they first envisaged. Bilawal Sidhu: Absolutely. I mean it makes a ton of sense. I think Figma is having a similar reaction where it started off as a designer first tool, but then other stakeholders that were touching the product also got into using tools like Fig Jam and other collaboration features. But it's interesting comparing y'all to, you know, your incumbents or competitors, if you want to call them that. Um most of what I'm seeing in the market is sort of like what I describe as the unbundling of the creative suites of Adobe and Autodesk. Clearly, y'all are going in the opposite direction. As you mentioned, you have presentations, photo editing, websites, video, spreadsheets, and now even vibe coding. You know, how do you think about Canva as a product today and where is it headed? Cameron Adams: We always see that there's this ever expanding pie for us. Uh the need for visual content spans all industries, all specialties, uh and really all goals. Fundamentally, we see Canva as a goal achievement machine. We're helping people reach this end goal which is beyond the app. So when you do a presentation, you're actually trying to land your first customer or get that first investment deal. You're trying to achieve something out there in the big wide world. Behind every design in Canva, we see the goal that people are trying to make and we're always trying to move them closer towards that. When we think about designing a presentation, we also think about how to make it easy to present that presentation in both a synchronous and an asynchronous manner. We also make it easy to get insights on that presentation so you can see who's logged into your presentation, where they were engaged, how you might improve that thing and how you might better get your message out to the end customer. Uh so there's all these nuances of visual content that is being created in Canva, we want to be part of all of it because it's not just moving a few pixels around the screen that determine the success of a design, it's actually about delivering it to customers and producing an outcome, whether that's getting your investor or that's getting a donation for a nonprofit. Bilawal Sidhu: I love that. Yeah, because visual communication is a means to an end, why not help the user with the end itself? Cameron Adams: Exactly. Bilawal Sidhu: What do you think about the startups in the space and also the big AI labs that are doing a bunch of research. Obviously all are working with Open AI and Anthropic and a bunch of others. And we'll definitely come to uh Leonardo and you alluded to Affinity as well. But it seems like similarly in the the startup and AI lab space, everyone's sort of deconstructed the creation problem space into all these little primitives, right? Like some folks are working on the best like sound generation approach, character animation, video generation, image editing. Uh y'all are in a very unique position to take all those Lego pieces and pull it all together. I'm curious like is that the right way to think about your AI strategy and you know, maybe are there examples of recent AI launches that are resonating the most with users as you push beyond just pushing pixels and towards this sort of like goal-oriented assistive uh creation suite? Cameron Adams: We have a pretty multi-layered AI strategy at Canva. We think about it roughly in three buckets. There's foundational models and deep AI tech that we need to produce ourselves. Um so we are the best design tool on the planet. We have an immense understanding of how people design and the needs they have for creating visual content. We see ourselves to do the deep tech research into how AI can best develop content for that. Um so we've got actually quite a large R&D team who who focus on that and push out Bilawal Sidhu: Shout out Bhautik Joshi. Cameron Adams: I'm glad you know him. He's one of our amazing R&D folks and I always love chatting to him because he he always blows my mind with uh what's possible right now. Bilawal Sidhu: He's got the mad scientist energy for sure. Cameron Adams: He definitely does. He's uh he's one of my favorite characters at Canva. Um and yeah, folks like Botic are are pushing the envelope on what's possible with AI tools right now and doing that right within Canva. Um so that's one area of of of kind of an AI pillar that we have. The other is working with the world's best partners. So you mentioned Open AI and Anthropic. Uh being able to integrate them into our product uh and do so really quickly and rapidly um has been one of the keys to us moving fast in the AI space and really capitalizing upon the opportunity that's available there. And figure out how they can plug in. Like a large language model like GPT, there's 101 things you can do with it, but plugging it into your product in the right way and meeting those user needs, helping them achieve their goals faster, quicker, higher quality, um is really important when you're thinking about developing a product. And then the third pillar we have, um thinking about AI in our product is through our ecosystem. So we've actually got a full ecosystem team who work on our app marketplace and how others can integrate into camera and integrate out of camera. and having that developer community who are really passionate about getting their stuff onto camera because it helps them scale immensely. um has really helped as well. An incredible number of AI developers have integrated into Canva. So we have, you know, talking avatars that you can put in your presentation, AI generated music that you can include in your video clips, uh countless image generation apps, all sorts of different jobs that you can be done uh through AI. Those three pillars are how we think broadly about our AI strategy, but fundamentally, we see AI as a massive unlock for everyone that we're trying to empower to design because the vision for camera has always been about reducing this job that could take months down to minutes and AI is letting us take it from minutes to seconds and really reducing that gap between idea to actual outcome. Bilawal Sidhu: I love that. Let's let's double click on that last thing you said where, you know, you have an intention, a goal that you're trying to achieve and, you know, kind of how do you turn your mind inside out in the fastest way possible and and share that with somebody else. Um given that Canva kind of the breadth of the Canva suite, right? You know, one of the few places where as you mentioned, you know, you can design a post, edit a promo video, compile data, build a landing page with an interactive widget, all on one platform. That seems exceedingly valuable. When you think about that, are there types of AI experiences that will only be possible on Canva because you have that have that breath? And what are some examples that you're excited about on that front? Cameron Adams: I think when you think about AI as a collaborator, it really opens up whole pallet of different interactions and product experiences that you can deliver. Um so I think at the moment a lot of people think of AI as this box that you type into and you have a bit of a back and forth with it. But when you think about visual content, there's so much need to craft the experience and allow people to interact in different ways. Sure, you might start in a chat box, you might give it a broad idea that you want to type into a keyboard, but then refining that thing, remixing those those content pieces into videos and websites, inviting others on your team into there to share in that experience, to give their input, to then jam with the AI themselves and and redesign it, change some of the colors, have different brand kits for different brands that you operate on. Like there's this whole long tail of the experience that needs to be offered that goes far beyond a prompt box. And we're thinking about all those aspects of that experience and the different types of products that you need to deliver, the different types of UI that you need to interact with that. um and making sure that it's AI first but ultimately outcome driven. And outcomes can be sharing with your team, so you need a collaborative platform to bring people on to to share your designs with so that they can have their input. Um you need to be able to quickly make refinements to it and often typing into a box to get it to move this little thing in the bottom right corner by two pixels isn't the most effective way of doing that. Um so having that what we call the final mile of being able to generate a design and then edit it just as you would any other camera design is super valuable. Um and it's been great that we've been able to build up this visual suite over the last decade that has this really in-depth design tool that you can manipulate, you can invite your team on to to collaborate with, but is also backed by AI where you can start in AI, you can call up AI at any time through the process to help you out, um and deliver that amazing content that you need at the end of the day. Bilawal Sidhu: You know, when you start describing this vision, it feels like you're almost building like a graphic design Jarvis. And I'm kind of curious, you know, what is collaboration between human and AI look like? because you're totally right. It's it's definitely more than a prompt box. Obviously there are various multimodal interfaces that are super exciting these days. The last mile edibility that you talked about. At some point, it's easier for you to take your like two degree of freedom mouse and just like move or edit the thing yourself rather than describing it. But like what does it mean to author content at this higher level of abstraction and what's that right creation experience in your mind as you're interfacing with an increasingly powerful AI that can orchestrate a bunch of functionality for you. Cameron Adams: I think there's these different levels that you're going to step into and get down to and then want to go out. So at the moment, a lot of people start very high in a prompt box. Like they have a very vague idea and they want to give it some shape. So they'll pass that in. gets to a certain stage where it's kind of getting what you want, but you want to add a bit more of your voice in. So you need to like go down into a deeper level to start editing it. Once you're editing it, you might reword a paragraph yourself, but then you want some ideas for different words in there. So you need to be able to call out, hey, can you give me an idea for this word or change this paragraph slightly? So you need that much more granular editing interface, but then you still need to interact with the AI. uh but then you also might need to interact with a human who has some knowledge that you don't have and that the AI definitely doesn't have. so they need to bring their context into it. So it's this really fluid workflow now between talking to a machine, doing some work yourself, getting other people to have their view and their review on it, going back to an AI to refine that a little bit and you know, through that process, you're gradually getting towards that final piece of content that you're going to be happy with. But it is this dance and figuring out the right product and the right UI that maps to that dance, I think is still under development. We're very far towards it in in our uh Canva product, but we're definitely not at the final state that it will be in 2030 or 2050. Bilawal Sidhu: There's so much mundane work and treachery that goes into, you know, localizing assets, you know, uh formatting them for different platforms. Like do you see a world where a personalization becomes key to the Canva experience where, you know, as you said, pixels or code is like sort of a means to an end, um or the means for final delivery. But is there a world in which the content itself is responsive to the person consuming the content based on the the device that they're consuming on and is that something Canva is going to play a role in? Cameron Adams: It's something we've touched on a bit uh over the last five or six years of product development. Um we we found it very powerful to personalize our own content. So when we're thinking about marketing for Canva, uh uh landing pages for Canva, uh serving Canva in different countries, personalizing and localizing the experience has been a massive part of our growth. Um so we started thinking about how we could bring that to our own audience, how we could help them scale their content and personalize their content. Uh that manifested through things like magic translate where in any any place in camera, you can get it to translate a text string into any of 100 different languages. So that immediately lets you take this one social post and scale it 100 X to different audiences around the world. We've started thinking about how to like 10X that experience. So how do you go from just a translate tool to enabling people to truly scale their content across multiple axes, not just language, but the visual asset that you use. instead of a shoe, you might use a purse, instead of a purse, you might have a handbag, and then you multiply that by 50 different languages and it quickly like scales. That was one of the impetus behind us launching camera sheets earlier this year. That's like the data layer that sits underneath camera um and enables you to think in a more structured way about the content that you're creating. So instead of thinking about a social media post as a bunch of pixels on the page, you can actually structure that in a camera sheet, which the camera sheet can include images and visual content. So you can have your template that you want to base something on, you can have your big product list of images that you might want to include in that template. You might have your taglines that go with each image and then you might translate each of those taglines into 20 different languages. That quickly scales into like 500 rows in a camera sheet and each of those rows corresponds to a design that you might uh export. Uh and we allow you to batch export those designs as well. So that quickly takes one design that you're creating and multiplies it by 500 and then lets you do that in the space of minutes. Um so that's like how we're thinking about scale and personalization and allowing people to bring their products to a truly global market as well. Uh when you throw AI into the mix, uh AI then has the capability of producing those sheets for you, thinking about your different markets, thinking about not just one design but 50 designs that you might need to go out to that multiply across your entire product inventory, all the languages that you're working in, different variations of tagline that you might want to produce. And then ultimately getting it out to the channels and making sure that it reaches the right audience. There's this massive unlock that AI itself can do and I think combined with data, combined with visual content creates this amazing content engine that you can now power from within Canva. Bilawal Sidhu: That sounds really exciting and I mean even closing the loop on the thing where you know, you talked about the deck that you share and then you can look at like which parts were people skipping or you know, spending a time on and so you can refine your own pitch for example, having that feedback into the process. So the next time you're building your design, like those analytics are used to refine how you produce the creative next round. It sounds like amazing and like super useful. Cameron Adams: Yeah, we have a part of the product called camera insights, which uh allows you to see how people are using your particular design. So that might be a presentation, it might be a website and gives you the metrics that you need to track engagement across that. Throw AI into the mix there, you can think of AI analyzing that for you and then automatically optimizing different parts of your design based on the feedback that it's getting. It's like a a really exciting area of development. Bilawal Sidhu: On this channel, uh we often talk about the lines between code and content blurring. Um where do you see Canva code as a product evolving in the future? Like do you expect it to change how people build interactive experiences? And similarly, do you expect like a lot of the vibe coding first platforms starting to build in design primitives and kind of try to replicate a bit of the magic that you have? Cameron Adams: Yeah, Canva code is a is a pretty exciting part of the product that we launched through Canva AI uh just two months ago. Um and it came out of us really wanting to democratize the process of bringing uh interactive ideas to life. Um because time and time again over the course of Canva, we've seen the value in prototyping, um and being able to really bring this flat static idea into a real world situation where people can click on it, engage with it, uh and get a much higher fidelity experience. Um and over the last couple of years, we'd seen a lot of our teams start to use some of the platforms to help them with their coding and to bring their their static ideas to life. And much like static design many years ago, we thought that we could democratize this process of creating interactive elements on Canva itself. And it was a natural extension for our presentation product, for our website product, being able to integrate these interactive widgets right into your design has proven to be a really engaging uh piece of content for people. in the case of most interactive content, like the amount of time it takes to engage an engineering team, communicate to them what you need, do the get them to do iteration cycles and actually build the thing. It is literally months to create some of these things that you can now generate through camera code in minutes. When you put this product in front of people, it just totally opens up their creativity. We've seen school kids get excited about ways they can um create games to help them learn. We've seen small business owners been able to um kind of uh take their workflow and their process down into a a simple app that they can they can get their customers involved with, just unlocks so many different things uh and brings a new dimension to visual creation. Bilawal Sidhu: Yeah, the shareability is cool too. The fact that you have these widgets embedded in the Canva deck that you can then share around and other people can comment on. Yeah, just makes it, like it feels like the hub where you're building stuff. I want to talk about the ecosystem a little bit more. You know, I think I think most people in the AI community may not know this, uh but you've acquired some pretty serious AI and creation tools talent over the past like several years. Affinity, as you alluded to, but more recently, Leonardo AI. Yet these companies are operating independently. Um I'm curious why is that? Like is Leonardo a sandbox to try out the bleeding edge stuff for like the AI native creators before graduating the best ideas into Canva proper? or what's the right way to think about that strategy? Cameron Adams: See, we take different approaches to different acquisitions based on what their product is like, what the landscape looks like, and how we think we can best work together. Uh in the case of Affinity, which is a professional design tool, uh they continue to have their super high performance uh desktop apps that professionals can use. And we very much see that as the place where professional designers can go to really craft the content that they're trying to create. So detailed illustrations, really complex photo editing, um you know, big print layouts, they can do that all in the Infinity products and then bring it over to camera when they need to scale that as I mentioned before. So when you need to take that uh intricately crafted brand and then get it out to the rest of your team and your global audience, you can do that in Canva. So those two products work really well together. um and we're quite happy to affinity for affinity to maintain its brand and its products as a kind of separate audience. Um for Leonardo, we saw immediate synergies that we could have between integrating the products together, but also keeping them separate as an amazing innovation arm and a really strong research hub. Um so Leonardo continues to innovate on their product and they can quickly move on the foundational models that they've developed for image generation. They can experiment with different UI patterns for interacting and creating those images, but they also have an incredible tech stack that underlies that, which made it really easy for us to integrate it directly into Canva as well. So image, all the image generation in Canva is now powered through uh Leonardo's Phoenix model. um and we're able to get that up to speed in three months right after the acquisition, which is an extremely fast time for actually completing an integration after an acquisition. The main reason we could do that was because they had their well- structured code, which we could integrate with and we had a team ready on our side to just bring it in and start bringing the volume of canvas 240 million users uh to their image generation model. So, I think you need to take each acquisition from different angles and figure out how it fits together like a jigsaw. Bilawal Sidhu: You know, I'd be remiss if I didn't ask about, you know, the sort of incumbent, the legacy competitors, whatever you want to call it like, um, and some of these are new products, right? Like so Microsoft has designer, Adobe has Express. You know, they've got ostensibly on paper, immense distribution, resources. You know, I haven't really seen these tools take off in the same way as Canva. Why do you think that is? Cameron Adams: And I think it's because we fundamentally think about the experience that we give customers is to democratize design. We truly want to empower the world to design. Whereas, you know, companies like Adobe very much approach it from the opposite direction. They're a professional design company, they have a very fixed audience that has, you know, particular business model, particular features they need. and it's just a different philosophy in product development. Um so when they try and bring something like a canva to to life, they just don't have the same DNA. Um and you know, we strongly focus on that kind of overarching achievement of goals, which I think is a fundamental philosophy to Canva, which is somewhat different to to how I think a lot of companies approach software development, which is a lot more feature-driven. And I think getting people to that finishing post where they've created something amazing and then felt the impact of it is part of the secret of success for our community and the camera community is super passionate about camera. You go on Tik Tok, you go on Instagram, there's countless people talking about how they use Canva, tips and tricks, what they achieved with it. So I think like engaging that community, understanding their goals and creating the product that helps them achieve those goals is ultimately what camera is there to achieve. It's not to bring a particular typography feature to 500 million people or to like make vector editing democratized for the world. Uh it's really about creating amazing content that has an impact. And I think that fundamental philosophy to our product is what drives the difference. Bilawal Sidhu: I love that. And clearly the fans of Canva, the Canva community if you will, is very excited about all this stuff. Like in preparing for this interview, I was going through, you know, the last uh you know, Canva event that y'all did where you showcased Canva AI and all these functionalities. Like, you know, AI in service of the creative workflow. And the comments are disproportionately positive, which is very rare in the in the sort of AI creation space. There's a lot of negativity, especially from people who've sort of honed that traditional craft. and it's weird to say traditional because, you know, I'm getting s**t from kids who like grew up on blender and I'm like, you know, I used Maya and 3DS Max back in the day, y'all don't know what you're talking about. But the response really has been positive. Why do you think that is? Cameron Adams: I uh I was also really pleased to see the comments on our on our YouTube video because uh yeah, it was probably the first supremely positive comment thread I have seen on YouTube. Um and it was an amazing reflection of of the community and we focus hugely on the people behind camera, like the people using it every day and who are getting value out of it. And one of the key things that we actually did at camera create was something that we call closing the loop. We expressed it as uh granting people's wishes at camera create. But making sure that we're constantly in contact and listening to our community, hearing what problems they might have with the camera product, what they might wish they could create in Canva. um and what we call closing the loop. So making sure that when they express a desire or a wish, we listen to that, we pass it on to our product teams. Those product teams develop the right solution for them, ship that solution, and then we actually go back and tell every one of our community members who talked about that thing that we now have it. And that last bit is one of the important parts of closing the loop because it really activates your community. Letting people know that something they said had an influence on a product that they use every single day is incredibly exciting for people because it shows that you're actually a two-way product. It's not just shouting into the ether. Um and it creates this amazing connection with our community who then feel like they're building the product alongside us. And when they see that we've improved it, that means they go and tell more people about Canva, which brings even more people into the camera community and it creates this amazing flywheel of positive community interaction and ultimately creating an amazing product because you can only create an amazing product for people if you're listening to those people and responding to them. Bilawal Sidhu: I love that and and you're right, like especially if you're closing the loop on mundane but also ambitious things. The mundane part makes me wonder if some of the other incumbents have a challenge where some of their beloved software isn't as stable as it used to be. And people are like, why are there stability issues still there? And now you're pushing AI down our throats. I've definitely seen that as being a very common narrative as well. You know, I think I think it was a Patrick Collison interview where you said the real crux of Canva is storytelling. Um clearly you're pushing hard into video. I think most people may not even know there's a video editor built into Canva and you've probably seen this like trend of like the cursor for X. I'm curious, is the cursor for video editing going to be Canva? Cameron Adams: Well, I hope it's not described as the cursor for video editing. I hope it's just the camera for video editing. Um but video editing is definitely a really important part of Canva. We've got tens of millions of people now who produce videos in Canva. I think it's at this really interesting stage. We're seeing a lot of video generation models come out. Actually, we're we've just integrated V03 into Canva. um is one of uh crucial partnerships that we've developed with Google. giving people powerful access to like video that can produce audio that syncs with the video. It's like what you can create is amazing through it. Um and I think video generation as a content type is definitely going to play an important part in terms of like video creation. But you also have this the exact same sort of problems and opportunities that we talked about earlier around properly creating content, being able to to complete your idea to do the fine grain edits that you need on top of the AI generation that you asked for at the very top, bringing other people in, getting their context, getting their input, having this true creative workflow that works between machines and humans and creates exactly the right content that brands and businesses need at the end of the day. Um and it will be this wonderful marriage between AI, proper editing tools and collaboration tools. And that's what we're aiming for with with Canva video is to have the best of all those worlds. enable you to start with some AI brainstorming, get it to create some content for you. It then puts it into a properly editable canva video that you can move the scenes around, you can extend things, you can change the music, you can edit that bit of text, you can invite your teammate in to drop that product shot in, you can have all the transitions and play around with them exactly how you need them, and then push out the highest quality video at the end of it that is human created, AI enabled, uh and the best piece of content that your organization can utilize. Bilawal Sidhu: Well, even as you described that, I was just thinking through my current workflow for creating AI generated content and you could really collapse a lot of that complexity with the primitives you mentioned, including the spreadsheet. Like I usually literally have a Google sheet that has like the the shot list if you will, and the various iterations of the prompt that I'm trying. And then I'll go to a Figma board where I'm like storyboarding this thing. Then I'm doing video generation, then I'm bringing it into a non-linear editor. And if you could just do it all in one place, oh my god. Especially if you can tell Canva AI to do a bunch of that for me while I just sip on a peda, that'd be great. Okay. Let's end on the big picture. You know, many people worry that generative AI is going to shrink creative jobs, yet countless businesses credit Canva with giving them a professional voice, right? Like that they couldn't afford before. Over the next decade, do you foresee a net expansion or contraction of creative work? And if you were advising today's designers, developers, and storytellers, where would you tell them to lean in? Cameron Adams: On the whole, I think it's a net expansion of creative work. So, we've had exactly the same question asked of us ever since Canva was launched. Like, do you think this spells the death of creativity? Uh and the answer is always, no, because what we're doing is democratizing creativity. We're bringing more creativity to more people. We now have 240 million people who previously did not think about design at all and it's now an integral part of their everyday life. Um so when you put amazingly democratizing tools in the hands of people, it creates more opportunity. We haven't seen designer jobs go off a cliff because of camera, we've actually seen it increase because it does it lets them scale their interactions with their team and their customers and their clients and lets them think on a different level. Uh they might not have to produce every business card that goes out. They can produce the framework that enables great business cards to be created by their team and then scale a heap of other content that they might have otherwise got to. And that's just the fundamental foundations that can but when you add AI into the mix, we very much see it as enabling greater conversations between team members, people being able to fulfill their ideas and get them out to other folks on their team faster so that they can jam on this new product idea or this new marketing that they want to push out. Uh it's not about constraiining people or removing things, it's about making a bigger pie that people can use to communicate, to scale their business, to get more ideas out into the world. So we're very optimistic with what AI is going to enable. It will engender some change. It will it will definitely change how designers, sound engineers, podcast interviewers, you know, everyone who's creating stuff that will change how they do things and the types of things they need to do, but we ultimately think that it's a level up for everyone and enables them to think about more new ideas, explore them even more deeply and do it to a higher quality than they could before. Bilawal Sidhu: So the next generation listening to this, what advice would you give them? Like should they lean into the sort of classical tools? Should they learn the new tools? How deep should they go into AI? Any words or wisdom for them? Cameron Adams: I kind of feel like they need to know the the start and the end and probably less of the middle. So the start is about fundamental understanding of what works, ideas, uh being able to think creatively and then think about how you're going to take that creative idea and then get it out at the very end. I think there's there's elements of editorship and taste that you kind of need in that early stage. Uh and then in the final output, like thinking about how you're going to express this to an audience, what feedback you take from that audience to then go back into the creation loop and then create more stuff. Whereas in the middle, that's all the kind of slightly boring stuff that you normally have to do to get something out. And and AI can be hugely helpful in like removing that or speeding that up or really scaling that part. So the the main interesting part is like the early stages of coming up with an idea and then getting it out to an audience and then interacting with them. And to me, and I think most people, those are the those are the best parts of the creative process. and the middle part is the part that you want to get rid of. So, really skill up in both of those areas. Um and I think particularly in your skills of editorship, being able to look at something and say that is good, that is bad, and know which direction you need to go in, um will become an increasingly important skill. And actually I wrote uh an entire blog post about that. Bilawal Sidhu: Cool, we'll put the blog post in the comments below. Cameron, thank you so much for joining us. Cameron Adams: It's been a real pleasure. Thank you. Get full access to Map the World by Bilawal Sidhu at www.spatialintelligence.ai/subscribe [https://www.spatialintelligence.ai/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

4 de ago de 2025 - 44 min

Roblox’s Cube Model: Creating Interactive 4D Worlds | VP of AI Explains

How does Roblox use AI to power a 3D platform for hundreds of millions of users? VP of AI, Anupam Singh dives into their new Cube AI model, integrations with LLMs for complex 3D/4D world-building, and the future of vibe coding and in-experience creation on Roblox. Watch this episode on YouTube [https://youtu.be/S5Vgxj_7Gtg], X/Twitter [https://x.com/bilawalsidhu/status/1922306590377705781], or Spotify [https://open.spotify.com/episode/5QTrHaSpFC3ZWp6d2YLQWl?si=d9b095cf7d884e7b]. Topics Covered: * Roblox is the "YouTube of 3D" * The Roblox Vision: 3D Creation & Consumption at Scale * The AI Backbone: Safety, Moderation & Infrastructure at Roblox * Cube AI: Towards a Foundational Model for 3D * Using Cube AI with 3P Large Language Models (LLMs) * The Journey to 4D: Crafting Truly Interactive Worlds * The Rise of "Vibe Coding" at Roblox Scale * Empowering Players: In-Experience 3D Creation * AI's Impact on the Creator Economy * Powering Discovery: Roblox's AI Recommendation Engine * The Evolving Landscape: The Future of UGC and AAA 3D Content * Anupam's Advice for Aspiring Creators & Developers Links to Roblox Releases: * Roblox Cube AI: https://corp.roblox.com/newsroom/2025/03/introducing-roblox-cube Cube AI * Cube GitHub Repo: https://github.com/Roblox/cube * Voice Classifier: https://github.com/Roblox/voice-safety-classifier Get in touch: * Join My Newsletter: https://spatialintelligence.ai * Connect with me on X/Twitter here: https://x.com/bilawalsidhu * Everywhere else here: https://bilawal.ai * Business inquiries: team@metaversity.us Interview Transcript: Bilawal Sidhu: Ever wonder how the metaverse gets built? Not just the idea, but the worlds and experiences millions dive into daily. I'm not talking hypotheticals. Roblox is a colossal platform. In late 2024 alone, 85 million people jumped into Roblox daily, spending an average of two and a half hours in user-generated worlds. Roblox isn't just a game, I call it the YouTube of 3D experiences. And the opportunity is massive. Last year, creators earned nearly a billion dollars on Roblox. But here's the catch: making 3D content is still really hard. Imagine what'll happen when you slash those barriers to entry, just like we've seen in video creation. The phone in your pocket is practically a visual effects studio. Today I got something special for y'all. We're sitting down with Anupam Singh, VP of Engineering at Roblox, the man leading the charge on AI and ML at this amazing company that is building the literal instantiation of the metaverse. Roblox recently announced their Q model, as well as their plans for building a 3D foundational model for this specific purpose of creating interactive 3D worlds. So stick around to hear why they went down this autoregressive approach to tokenize 3D, allowing them not just to predict words, but predict shapes. And how they're building towards this future where one AI model can understand geometry, textures, full body rigging, interactivity, enabling true 4D creation. We'll also dive into how they're using their own 3D models with the reasoning power of any large language model to build richer, more complex worlds faster than ever before, allowing you to literally speak 3D worlds into existence. And lastly, scale. This isn't just a toy project. This is Roblox building something for their hundreds of millions of users. So if you want to understand the future of 3D and 4D creation, you're not going to want to miss this conversation. Let's get into it. Anupam Singh: Team is very excited that I'm talking to you. They almost wanted to re-brief me on our technical work, thinking that you're talking to Bilawal, you need to be briefed. I'm like, I've been there since day one when we started this effort. Bilawal Sidhu: (Laughs) Anupam Singh: My name is Anupam Singh. I'm VP Engineering here at Roblox with responsibility for infrastructure, the AI platform, discovery, ads engineering, and many other things. Been here at Roblox for three and a half years. Uh, two-time entrepreneur before that. Uh, to summarize my career, uh, it's been about reading some great paper, uh, uh, which is super geeky at at its time, like MapReduce or Transformer, and then spending 10 years trying to make it production-worthy and getting it to billions of dollars in revenue and billions of users. Bilawal Sidhu: Yeah, don't tell the researchers that. They think it's the zero-to-one innovation, but how do you get that thing out to market at scale? Anupam Singh: We have those on our team. Uh, you know, we, we have, uh, the person who wrote the ControlNet paper as an advisor, uh, on, on the Cube team, and I always joke with him that it, it'll take 10 years for us to even understand all the implications of ControlNet, for example. Um, but for, for the researchers, it's always very obvious. The future is very clear and obvious, and then it falls to engineers like us to make it actually happen. Bilawal Sidhu: So speaking of that, Roblox is such an interesting application and ecosystem. I've been calling Roblox the YouTube for 3D experiences because it has that, like, closed loop between creation and consumption. But unlike video, 3D has historically been super challenging and very high barrier to entry. But that's changing. Tell me about what Roblox is doing to shatter these barriers. Anupam Singh: I think it goes back to almost our founding principle. Uh, uh, we have this principle called Long View, and, uh, since its founding, uh, Dave, our, Dave Baszucki, our founder, has always tried to make it easier and easier. Let's say Bilawal wants to create a 3D game today. Of course, the core coding is hard. You, the core imagination loop is yours, but then you don't know how to get traffic, you know? Um, but if you publish it on the Roblox game, the discovery system will start seeding it with some people, some, some players, and see if they're getting engagement, and then the flywheel starts happening. So the proudest thing for us is when somebody creates a game and within 30 days, they found their audience. So distribution and infrastructure, uh, are the big things. Now, the third leg of the stool is, of course, AI, that's what we're going to talk about. Bilawal Sidhu: Yeah, I mean, I love that, right? It's, uh, it's, you know, creating a 3D experience is one thing, distributing it at scale and having a huge audience of folks that can experience it from a plurality of devices, I think is equally key. Um, you know, a lot of people talked about the metaverse and kind of equate it with AR VR headsets, but I've always been a fan of the definition of like, the metaverse needs to be AR VR optional. So why not include that low-end Android device as much as like a kitted-out PC that somebody may have or a headset in the future? Anupam Singh: And the technology to enable that, right? If you have a 2 Giga phone or you have a network connection that is not strong, and you still want to play one of our games, downsampling it, upsampling it, all of that is infrastructure that we want it to be invisible both to our players and creators. Bilawal Sidhu: Cool. Anupam Singh: Yeah, I've been on calls with some of our top creators, and they sometimes are curious on what happens after they hit publish. And I want to tell them, that's where our challenge starts because some of the creators are able to get two or three million people into their events. And imagine two or three million people are pressing play at the same time. And you have to distribute this new update to 40,000 servers worldwide across data centers, match you with your friends, and get you inside the game because your patience will last not more than three seconds after you press the play button. So much to your point earlier, Bilawal, it is much more complicated than video because video is one way, whereas if you and I are playing Roblox, I have to make sure that we are synchronized and we are having a great experience irrespective of whether I am on a PC and you are on an Android device. Bilawal Sidhu: Absolutely. But let's be honest, the metaverse would be a rather empty place without interesting content. So what is Roblox doing to make it easier to populate these virtual worlds with amazing 3D content? Anupam Singh: The first one is invisible infrastructure so that people don't have to worry about where, where do the bits go. Um, second one is matching you to your audience. So it starts with matchmaking, which is the ability after you press play to put you into the right instance. But a lot of our machine learning and AI work is related to, um, uh, discovery and recommendations, whether you are in the marketplace to buy the latest avatar or whether you are on a homepage trying to figure out what next game you want to play. But one of the core values for us, and that's why I'm so proud about working at Roblox, is safety. Most of the people when they think about ML and AI, they think about recommendations, they think about monetization. But our heavy investment is in safety. Bilawal Sidhu: Is that moderation? Anupam Singh: Yeah, it could be, let's take the basic stuff. You and I are chatting on the platform. Uh, every one of the words that you type in goes through a text filter. Bilawal Sidhu: Wow. Anupam Singh: And that's, you know, uh, the last public information we've published is maybe 4 billion calls a day, more than 30,000 requests per second. And we might be the one of the few platforms on the planet where if our moderation, if our text filter goes down, we actually take our chat down. We don't do unfiltered chat just because it's too expensive or it's too hard to build. So a lot of our investment has been in safety, and then we open source it. So we've open sourced our voice safety model, uh, where literally while you and I are talking, the best demo that that our founder does with our head of safety engineering is they get on a pretend one-on-one in the town hall and he uses, um, an inappropriate word, and it gives him a warning saying, "Hey, you've just used an inappropriate word and we are, we are giving your first nudge," if you will, right? But doing it in real time where we take voice, feed it to a machine learning model, get an answer in real time has been very challenging, and then we decided to open source it because we think the internet should be a safer place. And so why not open source it? We're starting to see more than 10,000 downloads of the open source voice safety model. Bilawal Sidhu: I love that. And it's, it's kind of wild to even think about this, you know, notion of moderating voice chat at scale, right? Like in real time. That's actually a staggering engineering and infrastructure problem. It's amazing that y'all are able to pull that off. Once we have these massive spaces, as they're, they're only going to grow. Like the notion of these third spaces that kids spend a lot of their time in are only going to grow. There is no possible way for human moderators to make all of that stuff happen. This is sort of happening under the hood, but it's still nudging you as needed. Anupam Singh: Yes, yes, yes. And, and, and what has happened over the last 24 months which got us excited is we went from safety, personalization, economy, user experience to changing creation itself. And I think that we are still in day zero about how do you create 3D objects? How do you create 3D worlds? How do you make them 4D, uh, which is make them functional? And so that, that, I think without a transformer architecture would have been extremely difficult to pull off. Bilawal Sidhu: So let's dig into that. You know, uh, obviously, y'all published a paper called Q, and you're open-sourcing some models associated with that too. Basically, you can type in a text prompt and get out like a 3D object. What was fascinating to me is that y'all, as you mentioned, the transformer architecture, decided to go with this like autoregressive transformer approach versus everyone in the industry, the kind of text and image to 3D models we've been seeing are more of this like diffusion model with some sort of like neural radiance field or Gaussian splat optimization step. Why go about trying to tokenize 3D? Anupam Singh: Firstly, amazing question. I know you are, you are as much as an expert or even more of an expert in this field than I am. Firstly, full disclosure that we had a lot of debate about whether we should just use diffusion transformer. You know, that, that paper, the Vision Transformer paper, which led to, of course, um, I, you know, ideas and products like Sora, that seemed so, so seminal that we were all fans of it. We're still fans of it. But video is about predicting the next pixel. We wanted to build something that enables four-dimensional interaction. So we don't just want to build the car, but we want to be able to open the door of the car, get inside it, right? Now, that seems a little bit for, at least for our research team and our, um, engineers, that will sound like marketing speak. They're going to laugh at me for saying that. But in reality, it is about tokenizing 3D, right? You take a, take a 3D object and you tokenize it such that it can be cross-attentioned with, and I know that's not a real English word, but you can cross-attention it with, uh, tokens from other modalities. So that's what we set out to do. Um, and honestly, initially, it still seemed like maybe the vision transformers are, are sort of a better idea because we could see people generating video after video and it seemed that vision transformers were making much more progress than tokenizing 3D. With video, you might feel that you can play the game, but you're not really inside the game. You're not inside the car. It is giving you a feeling that you're inside the car. Bilawal Sidhu: And so just to double-click on that, it sounds fascinating because you're saying not only do you want, you know, future iterations of this model to reason about the surface geometry, but also like the texture atlas-ing detail, but also like the full body pose rigging information, but also the scripts. So you can kind of have a model that can do it all? Is that, is that where this is headed? One model to rule them all? Anupam Singh: Yeah. So we wanted to solve the hardest problem first, so that if we create a car, if we create a geometry, we should be able to model the interior details, right? So if it's a house, it has room. If it's a car, it has a seat inside. And we've always believed that if you are native 3D, your objects have interior details that you've already modeled in quotes. Um, and that's what we set out to do. Um, again, pretty difficult initially, but now we can say that our objects have meshes, they have parts, they can be part of layouts, they can be textured, they can have scripts, they can have rigging, as you said, right? Um, because we have tokenized 3D, they have all this intelligence built into it. Bilawal Sidhu: I love that. And I loved in the paper how y'all talked about sort of like the latent space, the sort of hidden layers of meaning that this model makes is like semantically meaningful, where like things that have similar shapes are sort of clustered together in this latent space. And that's kind of wild, right? Because you're kind of creating this sort of like, you're teaching a model to have like vocabulary or grammar for like the structure in the world. Anupam Singh: Yeah, yeah. And also, one interesting thing, and again, thanks for reading the paper. We always love to geek out with folks. That's why we made it open source. We wanted to be extremely open about our techniques. One other, uh, topic of debate, I can't believe it's already 18 months or more since we were having these debates. One, of course, you talked about was diffusion transformers. But the second one was, should we train our own sort of text model? You know, there's this temptation that we have a lot of CPUs and GPUs, why don't we train our own text model? Uh, we do have a bunch of text that is unique on our platform. But then we decided, no, we want to tokenize 3D such that it can work with another large language model, which is trained on the text modality. And so combining that intelligence, right, saying that this is a red car positions us to, if somebody says, "Build a red racing car with rocket boosters on it," we don't do the semantics of the actual text. That's coming from a large language model because that has world knowledge, that has reasoning. We just have to intersperse that with 3D tokens. And voila, now you have something that understands both the geometry, but also the reasoning behind why a car is a car. Bilawal Sidhu: This is such a crucial point because I think it's almost like the the genius part of the approach that y'all are taking. Because yeah, like build the models and of course, open source the stuff that y'all are good at. You have access to amazing 3D data given your ecosystem and a bunch of amazing open source data sets. But yeah, these LLMs, there's a bunch of companies that are in this sort of like pseudo arms race to build the biggest, baddest model of them all. I mean, just shortly before our conversation today, Google dropped Gemini 2.5, their like reasoning model, which is like apparently better than DeepSeek and Claude 3.7. And your approach can kind of just take advantage of that because you can just plug in a new LLM. So can we break down a little bit for listeners and in the viewers here, like how does that work exactly? Because in the paper, y'all talk about text to shape, which makes a ton of sense. I give, you know, some text and you predict the shape. But you can also be given a model and end up with a text description. And it sounded like from that, y'all create what you call like this, you know, essentially just to use geek parlance, like a scene graph, like this JSON format description of like what was in that model. But now it's just text, meaning GPT-4o or some other large language model can manipulate that text, and then you can go back into 3D. If you say, "Hey, I want to build like a kitchen or a garage," you can, like you said, lean on the world knowledge of these LLMs to be like, "Well, what kind of objects are typically found in a garage?" "Well, these type of objects." And then you use your text-to-shape model to like generate 3D renditions of those like objects and very quickly start populating a scene that looks cohesive. It's put together. And then again, since you always maintain this like text representation of what's in the scene, adding stuff becomes more conversational. And I suspect that'll get exciting too, because maybe right now I'm just prompting in text to be like, "Oh, add like, you know, another Porsche 911" or, you know, "add like, you know, a bike rack over there." Uh, I could start providing images and maybe other forms of modalities in the future. Anupam Singh: Yeah, and that took us more time, honestly. Coming up with that architecture where, you know, there is a temptation to say, you know, maybe Bilawal has trained an image transformer, but I'm going to do it better. Like that arms race that you talked about. But real technology advancement happens by actually respecting what you have done. Personally, a big fan of open source and then building it on open source. Operating systems have been built like that, databases have been built, networking stacks have been built. So why not in AI? And, and so we had this very important decision point. Uh, um, Kiran Bhat, who, who, who, who you're familiar with, you're going to talk to. Um, uh, is somebody who, who really thought hard about it. And we worked with professors at Stanford. We had a very extensive, um, uh, academia team that we worked with. And we said, "You know what? We, any other modality we are going to interconnect with, we are going to cross-attention." Now, to your point, that gives us the ability to, I don't really know what a cricket match looks like as a model. We have not trained our model to understand that. To your point, we start with, let's say, a cricket stadium, then we add players to it, then we add countries to it. And the large models are super good at reasoning. So they tell our model what to build. We are very good at building 3D objects, but they tell us what should be the layout, how big is the pitch, how many wickets does cricket have. So that's how we think about interspersing with LLMs to build entire scenes rather than just objects. Bilawal Sidhu: I love that. And so obviously the next step from scenes, right? Like we talked about objects, you've got scenes. Now we got to infuse them with interactivity. This notion of 4D creation that you talked about. You know, how people or objects in the scene behave and respond. How do we get to that next level of creativity, you know, that is going to be unlocked by something like this? What's on the roadmap for y'all to achieve 4D creation? Anupam Singh: So with 3D tokenization, there are things like parts and rigging and scripts that get enabled. And you mentioned that earlier that, "Oh, you know, this could be possible." We want to go from this could be possible to make it possible this year, where as soon as you build a racing car, now how many doors does it have? How fast does it go? What sound does it make? One, a lot of that intelligence will come from large models that are being developed by the industry. We are not building a model which knows that the car door opens, but we've spent years and years, you know, uh, maybe 15 or more years in building a physics engine that knows how a car door opens. And this, this, the Cube AI sits in between the great understanding and reasoning that large models have given us, and our amazing engine that runs worldwide on being able to interact with objects. And it's essentially the intelligence layer that sits between, take the reasoning and make the object function. And so this year, we are planning to add more and more interactivity to these objects that you can generate with the Cube. The interesting part is, you as a creator don't have to worry about building that interactivity in. You build your car and leave the rest to us. We will start building interactive in, in, into it. But you can progressively edit the object if you don't like the door, the way we opened it, you can change it. Bilawal Sidhu: I love that. And this, I think very nicely distinguishes the direction y'all are taking this from the sort of more diffusion, Nerf-based approaches we're seeing for 3D object creation. Like, yeah, that model looks awesome, but it's not interactive and like it's not broken into a bunch of these different parts. It sounds like that's another area y'all want to focus on next is there are these primitives that you use inside the Roblox game, and you want the ability for the model to generate, "Oh yeah, I'll create a very nice mesh for this, but let me use like this iconic style of like Roblox primitives for the rest of this stuff." And I think that obviously multi-multiple parts to an object then obviously opens up the ability to infuse it with interactivity as well. Anupam Singh: Yeah, if, if, if I go back to, to sort of last year winter, the biggest "oohs" and "aahs" in our internal meeting was when we took a mesh, let's say a semi-truck. I think a lot of video models can generate semi-trucks. It's a well-understood concept. And then for our internal demos, we then showed that we can break it down into 140 parts or 250 parts. Now suddenly, you can change the wheels off of that semi-truck. Then we put a bounding box around it, and I can change the dimensionality of it. Like it could be a very snug 18-wheeler, or it can be a very long 18-wheeler. Given that now you're giving it geometry, uh, you're now giving it parts, you can start rigging it, and you can start giving it behavior. So that was our magic moment. That is when we decided that we got to release this, this, this Cube model. We want to make it open source so that the community can do interesting things with it. But the fact that we can give it parts, the fact that we can give it behavior is a massive difference in how we are approaching 3D AI compared to generating video. Bilawal Sidhu: And I think what's funny about that is like y'all are obviously building this for the Roblox ecosystem, which is like this interactive gaming platform. But a lot of the other 3D and VFX creators that I talk to also want exactly this ability because sort of, "Oh, great, I hit the slot machine and I got the perfect looking semi-truck." But now I just want to change one or two other things, and doing that in video fashion is very, very hard. You can still make it work with in-painting and imagery, but sort of the 3D equivalent of that, it doesn't exist. To go back to the point that you mentioned sort of as like Cube is sort of this like middle layer between like the intelligence of like, uh, you know, world models, LLMs, and Roblox itself. How do you see that kind of evolving? Like what's the model that's like orchestrating all these other things? That's the LLM, it seems like, right? Like, do you imagine a future where you maybe want to create like a fine-tuned version of an LLM that's like the perfect Roblox conductor model that conducts Cube, you know, all the various third-party models and kind of does that for you? Or how do you think about that? Anupam Singh: So, two-part answer to that. Very good question. Excellent question. We think about this a lot. Number one, zooming out, the biggest orchestrator is you as a creator and your imagination. Bilawal Sidhu: Love that. Yes. Anupam Singh: So, I, I'll tell you this. Uh, I've been playing with the model personally. My aesthetic and creativity are not what I'm known for. I'm a database infrastructure person. And we kept playing with the model. And then one day, you know, fine day, we gave access to our designers. And suddenly we saw designers building these amazing cohesive worlds versus I would create an object, it would be a car, a red car, and then a green tree, and if Bilawal looked at it, it's like, "Hey Anupam, what are you, what story are you trying to tell?" Right? Bilawal Sidhu: It's your 3D version of the house with the tree next to it and the sky in the background. Anupam Singh: Exactly. Exactly. Just because you give me something that can, you know, reproduce a beautiful painting doesn't mean I'm going to become a great painter. So the creator is still the centerpiece of our plans, of our thinking, okay? So that's the orchestrator as far as creativity and imagination is. Now comes to the second level, which is, which is a great part of your question. I think the industry is still trying to figure this out. You know, every weekend I go home with a reading list of, I'm, I know you, you might mention this later, Vibe coding. Honestly, Bilawal, at this point, on March 25th, um, 2025, I still don't know what Vibe coding is, right? Uh, but the MCP stuff is exciting, um, because we need a protocol such that these, uh, uh, LLMs can talk to each other. So, um, these large models, they're not necessarily just language models, these large models can talk. Bilawal Sidhu: Sure. We really do need a better term there because it's, it's not quite visual language models either, because a lot of these reason about audio, so it's like multimodal large model? Anupam Singh: Multimodal large models, right? You know, uh, uh, uh, and we need a protocol on how they talk to each other, right? So, number one, the creator is the most important in all of these. They are the real orchestrator. Number two, uh, orchestrating across APIs is most likely MCP. And then number three, the thing that is my favorite is we run more than 250 models in production today. Bilawal Sidhu: Wow. Anupam Singh: Uh, as a company. Whether you are chatting with somebody, whether you're uploading something, uh, um, it's, it's all pervasive. And so, it's another level of, of question in AI on what keeps production AI going. And so that orchestrator will be different. So you have the creator, you have an orchestrator to talk between models, and then you had that in your question, which is a fine-tuned or a distilled version of a model running in production because you can't really run a massive model in production where Bilawal needs like 40 GPUs just to run his creation, right? It's unaffordable. Uh, so I think all three of these are going to happen in the next six months, which is why I'm super excited about. Bilawal Sidhu: I love it. I mean, since you brought up Vibe coding, let's dive into it because it totally has like, you know, shout out Andrej Karpathy for coining the term. Sort of this idea of like you're mostly using like voice chat to just talk to your coding editor and asking it to do stuff. And you kind of pretend like the code doesn't even exist. It's sort of in the background. Yes, yes. And you're just like asking it again and again, iterating with it to produce something. Now, you know, somebody who thinks about, you know, productionization and shipping at scale and, you know, like performance criteria, that might make you, you know, like obviously, you know, well up with a little bit of anxiety. But it is to your point, like such a great way to like, like you said, like time to prototype has gone down so drastically. And I think that's really the role Vibe coding plays today. And one of the demos that I I'm I'm going to put up on screen here, one of the demos that's amazing is like, you know, with these like uh Claude MCP integrations to have like this like server that can talk to Blender. You know, it's basically you provide an image reference to this model or to Claude, and it says, "Oh, here's what's in this image that you provided and here's the rough angle. Hey, now I'm going to use like some diffusion, you know, text to 3D model generator to make all those objects for you. And then I'm going to like recursively just take like screenshots and try to figure out how to place, you know, these models in a scene." And a lot of people are having this sort of like aha moment of like, "Holy crap, it isn't just about generating video." It's like we can use like these models to control and create stuff and the tools we all know and love. And control being the keyword because that's exactly what creators want. So when you see stuff like this happening, is it exciting to you? Because it feels almost like since Roblox is so vertically integrated, you own the authoring tool, you own the rendering engine, you're obviously doing the distribution and the serving too. There seems like there's going to be some very interesting opportunities. Um, so when you see Vibe coding sort of taking fold, do you imagine that's also going to happen in the Roblox ecosystem? Like a lot of people that might have been scared to open up Roblox editor might want to pop into it? Anupam Singh: 100%. I think, I think the thing that we've been doing for, you know, since the founding of Roblox is to make 3D and I would even go one step further, make programming approachable, where somebody can, you know, we, we, we have an amazing experience called Dress to Impress. I mean, you go in there and the interaction is so amazing. And they've never had to worry about infrastructure. They've never had to worry about, you know, engine graphics and, and, and cross-device. So in that way, Vibe debugging, coding, uh, and that was not, not, not a slip of tongue because that's what… Bilawal Sidhu: The debugging part is the hardest part. Anupam Singh: It's, I'm seeing so much code getting generated, and the thing that keeps going on in my mind is how are we going to debug this code? Roblox has a culture where even the senior executives in the company are on a PagerDuty on-call rotation. Bilawal Sidhu: Oh wow, cool. Anupam Singh: So if an incident happens, I get paged as much as the frontline engineer or the frontline incident commander. Bilawal Sidhu: How many sev 3 incidents are you getting all the time? Anupam Singh: Exactly. And so now, it's a quality of life issue for me as an engineering, as an executive at Roblox. It's like, oh my god, how many times am I going to get paged because Bilawal just talked to his phone and created a 3D game, hit publish, and it's now on our platform. But more importantly, what are the implication of Bilawal, the creator who created the game, got their first 10,000 players. You're successful, and now you're trying to update it. Updating a game might be more creative and more complex than actually getting your initial success. Bilawal Sidhu: Totally, yeah. Anupam Singh: And we see this all the time. Our creators are continuously updating their creations. They're adding more to the world, they're changing the, the, the interaction patterns. I'm, I, I am as much, uh, a student as, as you are right now of this field. It's like, what happens to Vibe debugging and Vibe updating? Vibe refactoring? So, uh, still an open question. Maybe we'll talk next year and we'll say, "Ah, this is how you do Vibe debugging," but I do not understand it yet. Bilawal Sidhu: Well, I'm glad you brought that up because it almost feels like it's easier to get to that like, you know, uh, initial prototype, and it is not necessarily that robust foundation for you to ship subsequent iterations. And it's almost easier. And this seems to be a bit of a Vibe coding maxim right now is like, just start from scratch. Once your project reaches a certain place. But of course, if you're trying to build a world that you iteratively expand upon an experience where that has existing audiences, that's easier said than done. Anupam Singh: It's, it's, it's harder. And I think we are, we as industry, right, is pivoting too hard on the code being generated. The way we like to think about it is Bilawal makes an amazing experience, and I want to add to it. I'm one of your players, right? I join the game. Can I do in-experience creation? You have this beautiful, let's say, you know, I live in San Francisco, and so I'm, I'm a big fan of, of, of the city that I live in. And so let's say Bilawal created a San Francisco experience. But now I want to add fireworks to it, right? And I'm one of your players, and can I just say, "Add fireworks to it"? And can I propose to you, can we make San Francisco futuristic? Right? Suddenly, it's not really San Francisco anymore. You and I are effectively collaborators, whereas you are the creator and I am the player. Now, imagine we, we, we, of course, have games that have millions of players. Imagine millions of people adding to your creation, and now suddenly we are all collaborating, but keeping that cohesive, right? Where you can either accept my creation or within my instance, you can let the creation happen, but maybe not to the other players who like their San Francisco more of 1800s. I like it more in 2010s. So how do you manage the entire lifestyle? So for us, you know, as much as I think about Vibe coding, I think about in-experience creation. And we are seeing this, by the way, since the launch of the Cube, uh, I can't say the exact numbers, but I can say tens of thousands of creations have already happened on our platform. Bilawal Sidhu: Very cool. And this thing just dropped. And it's funny, it's like in a way to think about in-game creation, which is really fascinating as a concept. And I saw some of the amazing examples that y'all showcased at GDC. It's, it is like the most accessible form of Vibe coding in a sense. Like you're in a world already, in an experience, you're like, "Oh, I want to change this about the environment or the setting that you're in, or I want to spawn like a new type of vehicle that I want to drive around, or create a new type of prop, or change my outfit for like the vibe of like, you know, kind of the, the experience that's going on right now." And being able to do that on the fly is pretty freaking magical, right? Anupam Singh: Yeah, within 30 hours of the model dropping, we had an avatar being created using the Cube, put in our marketplace, and then we were very proud as engineers because Dave Baszucki, our founder, changed his avatar to that Cube-created avatar. Bilawal Sidhu: That's cool. That's awesome. Anupam Singh: It's seldom that you this happens in your career, uh, in the industry where within 30 hours, somebody has created it, and somebody has adopted it. And that if that happens to be your founder CEO, that is even more fun. Bilawal Sidhu: I love that. Yeah, and these loops from creation and consumption are just going to get tighter and tighter. Anupam Singh: Amazing. Yeah. Bilawal Sidhu: Um, building on this creation conversation, both like authoring experiences and then, you know, um, in-game, uh, creation and collaboration. I want to zoom out a little bit and talk about, you know, economics, value creation, and the creator power law, if you will, right? And so, right now, the creator economy often resembles this very steep power law, as, as, as it's been said. There's very few creators that capture an enormous amount of the value. Do you think Roblox's new generative tools could meaningfully shift that balance where like more creators could sustainably thrive, similar to how short-form videos sort of broadened the creator base? Anupam Singh: I think so. Uh, you know, zooming completely out, we have publicly talked about our, uh, goal of, uh, being 10% of the entire gaming market. Okay? Wow. What is, how does that happen? It happens because we are able to, number one, give you more and more unique experiences if you are a player, right? And so what does that mean for creators? We've had, uh, a 61% increase where games can appear in the top 150 within 90 days of creation. So, to your point, to your questions about the power law, you're 100% right. UGC platforms tend to be, get concentrated on this top 10 or top 150 or top 10,000. Two years ago, we started the journey of, of, of indexing every creation made on the platform so that we are able to see everything that is getting created on the platform. As long as it is safe, as long as it is past moderation, we are able to bring them some impressions to see whether they are getting traction. I know we always talk about the modeling side of AI, but I am equally passionate about the production side of AI, which means that we should be able to rerun this model for you while you're playing a game. So Bilawal pressed play, you're playing something, I am seeing what you're interested in. By the time you exit, I should be able to give you a different 3D experience. It might, you might have played volleyball and I might recommend you anime, but there must be something that I've seen or my model has seen to recommend you that, right? So, to us, that is the, the distribution part of it, democratizing distribution is a huge part, but the other part of it is what you kind of mentioned even at the start of, of our conversation is the ability to create fast. You know, this is what all of us are excited about, that you have to open a, a, a coding studio and then you have to build the game, etc, etc. That is going to fundamentally change, which will get more creators on our platform. So both distribution and creation are going to undergo a lot of change, and we are very well positioned because we were UGC from the get-go. It's just that AI 100Xes the speed at which you can create UGC, 3D UGC content. Bilawal Sidhu: I love that. Yeah, it's like the volume of content is going to go up, but yeah, then you have to connect that with audiences and people on the other end to experience it. And, um, yeah, I was reading a stat where like 90% of Roblox's engagement originates from algorithmic homepage recommendations. Yes. Yeah. And so like, do you imagine these models getting more and more sophisticated about understanding, you know, obviously you've got models today that can kind of turn a 3D object or a scene into this like very detailed text description, the scene graph representation even. Um, where do we go from here? Like, is it going to be, are you going to like start looking at other signals of how people are, uh, if you had to give like a, I don't know, like a Cliff Notes summary of how recommendation works today and how you imagine it working in the future? Yeah, what would that look like? Anupam Singh: Oh, beautiful. I would give you two words that are buried deep in our technical report, which is scene understanding. Understanding a 3D scene is much harder than it sounds because our creations are very unique. You know, you have Dress to Impress, you have Pet Simulator, there's lots of amazing creations on our platform. Each one of them is unique. So the scene understanding gives us the foundational, the Cube layer. That's why we named our model the Cube, uh, because it, it tells you much more about a 3D world. Now, moving up the stack, when you think about recommendations and personalization, or you think about safety, this scene understanding changes how we think about personalization, which many other platforms cannot. So, recommendations generally, Cliff Notes wise is, you played a game for 10 minutes, I say, "Oh, Bilawal has great engagement for 10 minutes." Then you go to another game and you play it for 30 minutes. You know, today's models will say the 30-minute game has higher playtime than the 10 minutes, so you obviously had more fun. But there's a, there's a chance that you had more joy and fun in the 10 minutes, right, yeah, than in 30 minutes. And being able to do that is only possible if I understand the scene in which you placed yourself, right? So, I'm very excited about recommendation, discovery, uh, marketplaces and safety changing because of this fundamental advance in scene understanding. So, uh, that that's going to be, in a few months, we are going to be able to change a lot of our algorithms to understand scenes. Bilawal Sidhu: That's super powerful. Yeah, and I can only imagine where you'll go from there. It's like if you have a good understanding of like the scene that you've been navigating, then you can start looking at like the interactions of what people are doing within that scene, and on and on it goes. And yeah, it's like, it's, it's the thing that I think makes YouTube unique over X. Like, obviously I love X and Twitter, it's like where a bunch of the AI community is. But yeah, like on X, it's like there's like a couple days where your content gets attention, and then it falls off a cliff. How do you sort of create this like search and discovery experience where a lot of interesting content on the platform can continue to get attention? Um, so super excited about. Anupam Singh: Even for this conversation, I found your videos to be almost more interesting and engaging than the tweets. The tweets tell me what you are thinking about, but then the videos give me more deeper context, yeah, uh, context, uh, which, which is a loaded term in AI right now, but, uh, uh, but I think that's where we are going, where we, we are such a unique experience for everybody who comes into our platform. Each one of our creations is unique. We need to reward that uniqueness. We need to recognize that. For that, we need deeper understanding of 3D world, which goes back to if you can tokenize a 3D world, then you can understand it better. Bilawal Sidhu: Now, you brought me back to a question I was meaning to ask you earlier, so I'll ask it now, which is, um, you know, you talked about sort of this autoregressive transformer approach, and you just mentioned context windows. And so obviously, you need a massive context window to like, you know, reason over all this different type of data, and not just like, you know, like the triangle soup with the textures and the like scripting, uh, the rigging, all that stuff we talked about. I didn't see a public number on this. Are you able to share how big is the context window and like what it will be in the future? Like how big does it need to be to have these like totally unbounded scenes, uh, you know, coming out, which seems to be where y'all want to take things next. Anupam Singh: Yeah, stay tuned for that. As we talk more and more about scene generation, uh, uh, generally I'm very comfortable sharing, but, uh, sometimes, you know, I have to think about my responsibility as an exec in the company. So, but, but your question is very, very valid. Bilawal Sidhu: I think it does get exciting and it's like, uh, where like, you know, Google with their one or two million token window, I think a lot of people don't know what to do with that much context. Given when I, when you're talking about the domain of like text, okay, how many Harry Potter books am I going to pop in? When you come into like our world and we're talking about 3D and interactive 3D, Yes. We could put that context window to use very, very quickly. Anupam Singh: 100%. Not just the context window, uh, you know, if the tokens get more and more, tokens are not intelligent, but tokens give us intelligence. Sure. I think the scenes can get bigger and bigger and bigger, right? And we'll be talking more about as, as we work on the next versions of our model. We are very excited about this question. So in, in quotes, hold that thought and you will hear more from us. Bilawal Sidhu: Okay, so to your point about distribution, um, we talked about sort of like the YouTube and kind of Twitter analogy here. I want to make the YouTube and Netflix analogy. In other words, sort of UGC 3D and then AAA 3D, okay? When you look at the video landscape right now, like two interesting things are happening. On one hand, you've got like top creators on YouTube, like MrBeast that are spending like a million on every single YouTube video that they're doing. So budgets are kind of going up. And there's like more people like MrBeast. There's like almost too many creators, right? On the other hand, with the Netflixes of the world, like the OTT platforms, have this, are experiencing this tremendous downward pressure where they're like, "Well, we can't spend 10 million on this one movie. Let's like green light 10 movies for a million each instead so they can hit a broader market." When you fast forward things a little bit, like what is this future? And I mean, really, let's like go into speculation territory now. Like what's going to happen? Are there going to be AAA gaming games that we all play? Is it all going to be UGC? Is everything going to be like, I'm going to come back from the end of a long day and like prompt a game into existence that I play with all of my friends? What does that future look like a decade from now? Anupam Singh: Well, you know, this might be ducking the question, but I would say all of the above. Uh, and let me explain that. Right? Um, I should be able to create a game to just play with Bilawal, my friend, you know, uh, uh, located in, let's say, uh, in another country, okay? It's, it's a game for only two of us. It's just fun for just the two. You know, if you hang out with your college friends, there are certain jokes that will work only between your college friends. Yeah. Everybody else in your family will be like, "What are they talking about?" Right? That's the, that's the, that's the minimum. And then the maximum is, I know we do make these distinctions in, in video between extraordinary high expense, high budget content and, and, and, and the content that somebody makes while walking the streets and eating street food. Our thinking is we are going to improve our engine and our infrastructure so much that it will be almost indistinguishable that it is this so-called high-end UGC content and so-called low-end UGC content. And if you look at it, if engine is sort of the thing that, that streams the, the, the bits to you on your phone or your device, and infrastructure are the servers that make sure that this runs fast. What I am excited about is sitting in between is the AI platform that can enable the engine to look and feel like it's an amazing, your words, not mine, AAA experience, right? But it's indistinguishable between from the two people who built a chess game, right? And for that, infrastructure, AI platform, and engine, all three of them have to work in concert. And that's what we are investing in. Bilawal Sidhu: I love that you brought up the, the, the example too. Like I, I think a lot of people may not know that you can make very photorealistic experiences in Roblox. And I saw this like first-person shooter example recently, and I was like, I had to do like a triple take. I was like, "What? Like you can actually do this in Roblox?" It's kind of wild. Anupam Singh: Yep. And, and for that, you know, you need these things to work in concert because we are trying to build a high-quality experience without you downloading the actual game. We truly believe that that's possible for 3D content. An Android 2G phone, 2 gigabyte phone with a weak network, we have the right AI, if we have the right investment in our physics engine, and if we have the right infrastructure, these three things will work in concert. And you as a creator or a player do not have to worry about it. Bilawal Sidhu: Building on that point, it sounds like, you know, there's a very unique advantage here since you own the full stack, as so to speak, when it comes to Roblox. You've got the creation tools, you've got the content delivery, you've got the social network, you've got moderation, and now you've got this own core, like the generative AI core, and a plurality of other AI models, the AI platform that's doing a bunch of functions. I'm kind of curious, are there like certain types of creative risks or experiments, um, or product experiences that only Roblox can build? And you described one of them, which is just ubiquitous distribution of 3D experiences. But is there something super audacious you haven't attempted yet, but you believe this like full-stack Roblox approach really positions you well to try? Anupam Singh: If I think deeply about it, I think bigger and bigger and more complicated worlds, uh, uh, can only be enabled by a streaming 3D platform. So, let's take the example of the Cube. We talked about that, that our objects have intricate details inside them, right? So, what does that mean? Let's say there are 10,000 cars in your view. I created a world where there are 10,000 cars. As you approach a car, because I know that you might enter that car, I might actually quickly put in the parts. Yeah, yeah. But the other 9,900 and some cars, empty, but I know that they are a car. And so, depending on your proximity to an object, I can ascribe more and more intelligence to it and make it more and more functional. Love it, yeah. Right? And so, I am just humbled by our creators on what they will build with that. We are constantly surprised, by the way. Even when we released the Cube, we were just hanging back and seeing what people like you will create on the platform, right? So, that's how I think about it, that very complicated, complex worlds can be, can be created. And my personal dream is to, you know, build either a cricket game with 100,000 people, reproduce some of some of the games, and then place myself in it as, as the leading batsman and not somebody else. So, so that's my personal goal. Bilawal Sidhu: I don't know if you saw this, but like, uh, there was this like tennis open game where they were like, like an Australian network didn't have the streaming rights to stream the game. So they turned it into this like Wii tennis version of the live stream. You might have seen, yeah, so you did see that. Yeah. I thought that was really, really cool. And I keep imagining, I know like, uh, Apple and Pixar did something like this with an NFL game where they like basically brought like an NFL game to, uh, you know, like, uh, the bedroom in Toy Story, like with all the toys coming to life. I could totally imagine that being a thing for cricket. You should do it. And hey, Roblox is probably one of the only platforms that could probably, like, actually distribute it at scale right now. Anupam Singh: And I know you've talked about this before in, in your, you know, other talks, but bringing the real world and the 3D world together would be amazing. Everything that I say about cricket or you talk about tennis, it's about bringing the real world and the, the, the sort of fantastical world together. And that intersection is going to be amazing. And so video platforms did that very well because they brought, you're walking around and you're talking about street food in Brazil, right? And somehow I feel like I've been to Brazil with you if I see your video, right? Let's say. I can imagine then you, you know, in a 3D world, I can actually walk with you, which is very different from, "Oh, Bilawal went to this country and he did this." Instead, I'm walking with you, I'm, I'm chatting with you, even though you might have gone to that country maybe a year ago. So, there's a lot of fantastic stuff that can be enabled, but the tech has to work, uh, right? Because it's, it's very expensive to, uh, to run 3D. Um, and as somebody who's responsible for infrastructure cost, that keeps coming back. Bilawal Sidhu: I think that makes total sense. And I'm, I share your vision. It's like, uh, I used to work on 3D maps at Google, and one of my dreams was like, eventually there will be a world where we're like, Google's making these immaculate 3D replicas of reality, but what if you could explore San Francisco and get like a guided walking tour, like in avatar fashion and exploring like maybe this like Roblox-ified version of San Francisco, you know? Anupam Singh: And, and in seconds, from your idea to distributing it, to somebody playing it should happen in seconds, not in minutes, not in hours. Bilawal Sidhu: I love that. I think that's also another place where, I think more people in the Vibe coding community probably need to go try Vibe coding on Roblox because if you see some of the most viral Vibe coding demos, you've probably seen the flight simulator. Most of the tweets that are coming out is like, "Holy crap, how do I deal with like updating like the real world's position of, you know, and it's like, you know, levels, like he's not a game dev. Peter Levels is like crudely, 'Oh, every one second I upload, update the XYZ coordinate of every single person.'" And, you know, there's so many other optimizations that y'all are taking care of that it's one hell of a canvas to go and create an idea and then have it, again, freaking, not just work on a high-end MacBook Pro with like WebGPU, but like the low-end Android devices too. That's amazing. Anupam Singh: And we will take care of all of that for our creators. Bilawal Sidhu: I want to talk to you a little bit about your forward-looking stuff and personal reflections. Um, one of the themes I often explore is this idea that technologists and creators, such as yourself, have their own sort of frontier, like the bleeding edge for you and the territories that you're personally excited or maybe even slightly intimidated to venture into next. So as somebody who's like guiding Roblox's AI growth and platform evolution, what's your own personal frontier right now? Stuff that feels almost a little mysterious or especially exciting to you? Anupam Singh: Oh, thank you for asking that question. Uh, uh, I'll tell you what I'm excited about these days. Four years ago, a friend of mine got me into this autonomous vehicle in San Francisco. So, uh, to me, more and more systems will become autonomous. If you can drive through San Francisco, what else can you do? Can you do debugging on your own? And I know it's getting a little bit polluted because everybody's talking about agents, but when that physical, you know, sort of 5,000-pound object moves through Union Square, I'm thinking we're just, it's just day zero. There's a lot that's going to happen. That's what I'm excited. Bilawal Sidhu: I'm also curious, what's your advice for, you know, the young creator, the independent developer, or even somebody outside traditional game dev, hearing our conversation, what advice would you have, uh, for them from your own unique vantage point at the edge of AI and dare I say the metaverse? How can they best navigate, but also meaningfully contribute to this frontier as it unfolds and we build the future together? Anupam Singh: The, the first one would be, uh, uh, use AI. Just get a hands-on experience. And you're a great example. I'm almost intimidated by your ability to take new technology and understand it and then explain it to people. But I think all of us should try that. It, it, it looks and sounds intimidating, right? Understanding Nerf, understanding the Cube, understanding Vibe coding. It seems very overwhelming. Uh, and yet my advice would be, you know, engage with it. Just engage with the technology. Don't be intimidated by it and don't believe all the negative stuff around. Part two of it is, read the papers, not the tweets. Okay? So… Bilawal Sidhu: Go to the source, the primary source. Anupam Singh: The primary source, right? As I was, uh, preparing for our conversation, I loved watching your long-form videos because they go deep into stuff rather than, you know, uh, a tweet. A tweet is just informative to jump to something deeper, uh, uh, in learning. So, I would, uh, advise builders who are just getting into, uh, technology to go deep into one topic rather than, like I spend, this week in reading our own paper, and I've seen every version of it, and yet I found something more interesting about scene understanding or how we should think about 3D tokens. Bilawal Sidhu: I love that. You're totally right. And it's the same experience here, to be honest. When I saw the Twitter coverage of Cube, it was all the usual, "Oh, 3D modelers are cooked. Something, something, Roblox has a new thing," and then shiny visuals. And then I went into it, I was like, "Oh, now I get why they're doing autoregressive. Oh my god, they can do like text-to-scene and text-to-shape and shape-to-text, and that enables scenes. And oh my god, that could enable 4D." And then you have a lot of these revelations that you don't have unless you go to the source material. So that's very well said. Anupam Singh: Just go engage with the open source community for Cube. That's what I would, I would ask all your listeners to, you know, download the model, uh, play with it. We have a Hugging Face application, play with that, or go into Roblox Studio and use our generative AI stuff, which of course uses the Cube. Bilawal Sidhu: I love that. Y'all heard it here. Vibe coding is cool, but try Vibe coding in Roblox. You've got a lot of the primitives at your disposal to make something interactive, not just cool to screen record and share on Twitter, but actually get potentially hundreds of thousands, if not millions of people to play. Anupam Singh: Yes. Thank you very much, Bilawal. Bilawal Sidhu: That is a wrap with our conversation with Anupam Singh. Look, I absolutely enjoyed this because we journeyed from Roblox's massive infrastructure challenges to the very frontier of AI-driven 3D creation. I find it very clever how they're able to basically use their own foundational models that tokenize 3D, along with the plurality of large language models out there that are not really even large language models. They're more like multimodal large language models, pulling from their world knowledge to make it easier for you to create 3D worlds. And where I would have thought in some of our like Blender MCP videos and other things we've talked about, where the 3D models don't fully understand how to like create a scene, turns out if you just give them a couple of examples of scene graphs, that's enough. That kind of blows my mind. It tells me that few-shot in-context learning is probably more than enough to give us a true 3D creation workflow. And of course, their vision isn't just static objects, it's fully interactive 4D experiences. And they've got a very interesting data set to be able to train off of. So it is very clear to me, with a platform like Roblox that has ubiquitous distribution already, where the vast majority of users are playing user-generated content. If you can unlock creation, we might see the rise of 3D in a similar fashion to the rise of short-form video today. As that barrier to entry got commoditized, instead of being a YouTuber where you got to have this like big camera, this microphone, a bunch of editing that happens, you just use the phone in your pocket. And we're increasingly headed towards a future where you can literally just describe the kind of world and interactivity you want and iterate with these systems to generate that automatically. Anyway, I hope you enjoyed this conversation. A little bit different than the kind of content I do on this channel. If you do like this interview, let me know who you'd like to see me interview next. I want to carve out a bunch of time to have space and place to talk to the people who are actually building the future that we talk about every single week. With that, Bilawal signing out, and I'll see y'all in the next one. Cheers. Get full access to Map the World by Bilawal Sidhu at www.spatialintelligence.ai/subscribe [https://www.spatialintelligence.ai/subscribe?utm_medium=podcast&utm_campaign=CTA_4]

13 de may de 2025 - 55 min