Crazy Wisdom

Ep548_The Pixel Path: From Perception to Action, and the Future of Intelligent Robots with Nizar

56 min · 25. Mai 2026
Episode Ep548_The Pixel Path: From Perception to Action, and the Future of Intelligent Robots with Nizar Cover

Beschreibung

Stewart Alsop interviews Nizar, CEO of Pixel Robotics, on the Crazy Wisdom Podcast to explore the intersection of AI, robotics, and perception. The conversation covers a wide range of technical topics including how transformers enable multimodal representation across text, images, and voice, the role of world models in predicting physical interactions, the advantages of diffusion models over traditional LLMs for certain applications, and the challenges of achieving real-time processing for robotics applications. Nizar explains Pixel Robotics' work on creating accurate 3D meshes from smartphone cameras for companies like L'Oréal, moving away from specialized sensors to make the technology more accessible through sophisticated algorithms, and discusses the future of robotics as closing the perception-action loop to enable robots to perform real tasks beyond simple demonstrations. To find out more visit Pixel Robotics' website [https://pixel-robotics.eu/]. Timestamps 00:00 Stewart welcomes Nizar, CEO of Pixel Robotics, discussing what a pixel is as the smallest visual unit on screens composed of red green and blue colors 05:00 Discussion of perception systems and how logarithmic laws help compress signals in both human and artificial systems, exploring normalization layers and sigmoid functions in deep learning 10:00 Exploring how transformers unified different data modalities including text voice and images, creating common representations through methods like contrastive learning 15:00 Nizar explains transformers as brute force learning systems with room for improvement through focused attention mechanisms and knowledge graphs rather than processing everything 20:00 Conversation about loss functions local minima versus global minima and how mixture of experts uses specialized small models instead of one massive generalist network 25:00 Discussion of deterministic versus probabilistic systems and how explicitly defined task graphs often outperform orchestrator-based approaches in AI systems 30:00 Exploring world models as predictive physics-based systems that learn environmental flows and transformations, complementing rather than replacing language models 35:00 Nizar discusses real-time processing challenges for robotics requiring millisecond responses with small memory footprints using vision transformers for faster experimentation 40:00 Pixel's work creating three d meshes from smartphone cameras for companies like L'Oreal, moving away from specialized sensors toward accessible software-based solutions 45:00 Explanation of different three d representations including voxels point clouds and meshes, with meshes being optimal for manipulation and rendering in applications 50:00 Future direction involves closing perception-action loops in robotics, moving beyond dancing toy robots toward practical multimodal systems that perform real tasks 55:00 Pixel's goal is democratizing high-quality three d scanning through smartphones, making mesh creation accessible to unlock applications in gaming cinema and virtual showrooms Key Insights 1. Pixel Robotics derives its name from combining perception and action in robotics, where the pixel represents the digital perception component and robotics represents the physical action component. The pixel serves as a metaphor for how robots must quantize and digitize continuous analog information from the real world into discrete units that computer systems can process, similar to how pixels are the fundamental building blocks of images on a screen. This quantization process is essential because numerical systems cannot work with truly continuous data and must convert reality into tractable digital representations that algorithms can manipulate. 2. The transformer architecture has created a fundamental unification in how different types of data can be represented and processed across multiple modalities. Before transformers, researchers working on natural language processing, computer vision, and audio analysis used completely different approaches and methodologies. The breakthrough of transformers was establishing a common representational framework that could handle text, images, voice, and other data types using similar underlying mechanisms. This unification is what enabled the development of truly multimodal AI systems and represents one of the most significant advances beyond just the language modeling capabilities that initially gained public attention. 3. Current transformer-based systems represent a brute force approach to learning that will likely be superseded or enhanced by more efficient algorithms. Despite claims that we have exhausted internet text data for training, significant improvements continue to emerge every few months through algorithmic innovations rather than simply adding more data. Future developments will likely involve more specialized attention mechanisms that focus on relevant information rather than correlating everything with everything, mixture of experts architectures with small specialized models, and approaches inspired by biological systems such as logarithmic compression laws and event-based processing that humans use naturally. 4. Diffusion-based language models represent a promising alternative to standard next-token prediction that could produce more accurate outputs through an iterative refinement process. Unlike traditional language models that predict one token at a time and cannot revise earlier outputs, diffusion models treat text generation like image denoising, starting with a noisy representation and progressively refining the entire output across multiple steps. This holistic approach allows the model to reconsider and improve all parts of the response simultaneously, potentially leading to higher quality results, though it may be slower than current autoregressive methods. This represents an important direction for overcoming fundamental limitations in how language models currently generate text. 5. For robotics applications, real-time performance and small model size are critical constraints that differ significantly from the requirements of large language models deployed in data centers. Vision transformers are being used as a testbed for developing efficient real-time algorithms because they require far fewer computational resources to train and test compared to large language models, making them more practical for rapid experimentation. The goal is to achieve millisecond-level response times with minimal memory footprint so that robots can react quickly to dynamic environments and run on affordable hardware that can be embedded in actual robotic systems rather than requiring expensive server infrastructure. 6. Practical robotics implementation requires moving beyond specialized sensors to software solutions that work with ubiquitous devices like smartphones for tasks such as three-dimensional reconstruction. Pixel Robotics evolved from building specialized scanning hardware to focusing on algorithms that can generate high-quality mesh representations of environments using only smartphone cameras, making the technology far more accessible and practical for real-world deployment. This approach enables applications ranging from industrial robotic arm control to virtual showrooms, and more importantly, it allows anyone to capture three-dimensional data without expensive equipment, which can also help generate larger training datasets for future AI development. 7. The next frontier in AI and robotics is closing the perception-action loop to enable robots to perform real practical tasks rather than remaining as demonstration systems or toys. While significant progress has been made in cognitive capabilities through language models and in robotic mobility through mechanical engineering advances, the critical challenge is integrating perception with action through systems like Vision-Language-Action models. The fundamental starting point for learning this integration is simple perception-action exercises, such as programming a camera mounted on servo motors to track and center a colored object, which demonstrates the basic principle of using sensory input to drive physical response that underlies all more sophisticated robotic behaviors.

Kommentare

0

Sei die erste Person, die kommentiert

Melde dich jetzt an und werde Teil der Crazy Wisdom-Community!

Loslegen

2 Monate für 1 €

Dann 4,99 € / Monat · Jederzeit kündbar.

  • Podcasts nur bei Podimo
  • 20 Stunden Hörbücher / Monat
  • Alle kostenlosen Podcasts

Alle Folgen

15 Folgen

Episode Episode #551: From Trash to Tools: The Open Hardware Revolution Powering Solarpunk Science Cover

Episode #551: From Trash to Tools: The Open Hardware Revolution Powering Solarpunk Science

In this episode of the Crazy Wisdom Podcast, host Stewart Alsop interviews Joshua Pearce, the John Thompson Chair in Innovation at the Department of Electrical and Computer Engineering and Ivey Business School at Western University, about the revolution in open source hardware for scientific research. They discuss how three-dimensional printing, Arduino controllers, and open source designs are dramatically reducing research costs—often by 85-95%—while democratizing access to lab equipment worldwide. Pearce shares stories from his 2013 book "Open Source Lab" and explains how the movement has exploded since then, covering everything from filter wheel changers and ball mills to metal three-dimensional printers and battery research equipment. The conversation explores recycle bots that turn plastic waste into filament, the role of AI in accelerating hardware development, and how open source licensing creates a global knowledge management system where improvements are shared across the scientific community. For those interested in learning more, Pearce recommends checking out the journal HardwareX [https://www.hardware-x.com/], repositories like Thingiverse [https://www.thingiverse.com/] and My Mini Factory [https://www.myminifactory.com/], and appropedia.org [https://www.appropedia.org/Welcome_to_Appropedia] for open source scientific tools and appropriate technology designs. Timestamps 00:00 Welcome and introduction to Joshua Pearce, discussing his work on open source lab equipment and the evolution since publishing his book in 2013 05:00 Early development of open source hardware including the breakthrough filter wheel changer project built by a high school student that saved thousands of dollars 10:00 Discussion of how Arduino and RepRap three-d printers enabled the democratization of scientific tools, making complex equipment accessible to anyone 15:00 Economic impact showing average tool savings of 85 percent, with Arduino and three-d printing combinations reaching mid-90s percent cost reduction 20:00 Case study of PhD student Mariam building complete battery research tool chain from scratch using open source designs and three-d printed components 25:00 Recycle bots enabling transformation of waste plastic into three-d printer filament for pennies, revolutionizing material costs and sustainability 30:00 Collaboration between universities and open source companies creating fluid handlers and acquisition systems, accelerating research capabilities globally 35:00 Large language models assisting code translation and research planning, though hallucinations require careful verification and domain expertise 40:00 Importance of fundamental knowledge when using AI tools, comparing vibe coding acceleration with necessity for understanding underlying principles 45:00 Testing standards and calibration methods for open source equipment, balancing precision requirements against cost-effectiveness for specific applications 50:00 Metal and ceramic three-d printing developments including MIG welding techniques and sintering processes for creating functional parts 55:00 Knowledge management through open source licenses, repositories like Thingiverse and Apropedia enabling global collaboration and continuous improvement Key Insights 1. Open source hardware has evolved dramatically since Joshua Pearce wrote his book in 2012-2013, to the point where he can no longer keep up with all the developments in the field. What started as a collection where every single example could fit in one book has exploded into an entire ecosystem with dedicated journals and thousands of researchers contributing. The vision was that scientific papers would eventually include hyperlinks to equipment designs that anyone could download and replicate, and that future is largely here today. There are now so many open source hardware articles being published that no single person can read them all, which represents a massive success for the movement. 2. The fundamental breakthrough enabling open source scientific hardware came from combining several key technologies, particularly the RepRap three-d printer project and Arduino microcontrollers. Pearce's introduction to the field came when he needed a sixty-five dollar plastic part for a solar laptop project and discovered Adrian's open-sourced rapid prototyper that could make its own parts. This led to building equipment like a filter wheel changer for testing solar panels with a high school student in about a week, replacing a device that would have cost two thousand five hundred dollars with five months lead time. The democratization of tools like three-d printing and Arduino, combined with extensive code libraries and shared designs, means that even high school students can now create sophisticated scientific equipment. 3. Open source scientific hardware delivers massive economic benefits, with the average tool saving scientists around eighty-five percent compared to commercial equipment, and savings reaching the mid-nineties when using Arduino and three-d printing. The economics are so compelling that the tax paid on a normal scientific tool can cover the cost of an open source alternative. A thousand dollar three-d printer can manufacture scientific tools worth more than a thousand dollars in a single Saturday. This dramatic cost reduction makes sophisticated research accessible to laboratories around the world regardless of their funding levels, fundamentally democratizing scientific capability. 4. The knowledge management approach enabled by open source licenses creates a powerful collaborative improvement cycle where thousands of people worldwide contribute to evolving designs. When researchers publish equipment designs with strong reciprocal licenses, anyone can use, modify, or even sell the designs, but improvements must be shared back with the community. This creates a dispersed international engineering effort where equipment continuously improves through contributions from researchers across different institutions and countries. The RepRap three-d printer exemplifies this process, starting as barely functional prototypes but evolving through community contributions to surpass commercial alternatives in speed, resolution, and material capabilities. 5. The integration of large language models and AI tools has significantly accelerated open source hardware development, though with important caveats about their limitations. LLMs excel at translating code between languages, suggesting experimental approaches, and helping researchers navigate unfamiliar fields by quickly synthesizing information from scientific literature. However, they suffer from hallucination problems and cannot be trusted for writing scientific articles or conducting complete literature reviews without verification. The key to effective use is having enough foundational knowledge to ask the right questions and verify outputs, using AI as a powerful acceleration tool rather than a replacement for expertise. 6. Material science capabilities in open source hardware have expanded far beyond plastic three-d printing to include metals, ceramics, semiconductors, and composites through innovative adaptations of basic equipment. Pearce's lab has developed methods for metal three-d printing using modified MIG welding for as little as twelve hundred dollars, created slot-die coating systems for seventeen nanometer semiconductor layers using converted three-d printers, and developed techniques for ceramic printing through various material mixing approaches. The recycle bot technology enables converting waste plastic into high-quality filament for twenty-five cents instead of twenty-five dollars per roll, dramatically reducing material costs while enabling circular manufacturing practices. 7. The infrastructure for sharing and discovering open source hardware designs has matured into a robust ecosystem spanning academic journals, commercial repositories, and specialized communities. Hardware X and the Journal of Open Hardware publish peer-reviewed designs alongside traditional scientific journals increasingly incorporating open hardware sections. Repositories like Thingiverse recently returned to hardcore open source principles after ownership changes and contains millions of designs, while Appropedia serves as a wiki for appropriate technology with thousands of open source designs. The GOSH community hosts annual conferences bringing together university researchers, companies, and independent hardware hackers, while field-specific communities have formed around technologies like the OpenFlexure microscope, creating networks where knowledge accumulates and never gets lost.

Gestern59 min
Episode Episode #550: From Armies to Algorithms: Why the Biggest Player No Longer Wins Cover

Episode #550: From Armies to Algorithms: Why the Biggest Player No Longer Wins

In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with returning guest Ekue Kpodar for their third conversation together, covering a wide range of topics at the intersection of technology, geopolitics, and the evolving information age. They dig into Ekue's unconventional setup of running local AI models across roughly 15 computers, the growing case for open source models over closed ones from companies like OpenAI and Anthropic, and how Chinese open source models may be positioned to outcompete Western alternatives on a global scale. The conversation also touches on vibe coding and the democratization of software development, the strategic use of small models for IoT and enterprise applications, the role of Israel and China as dominant players in the information age, and how smaller nations and even individuals may wield outsized power as AI continues to collapse the cost of knowledge work. You can find Ekue Kpodar on X @ekpodar [https://x.com/ekpodar] and LinkedIn [https://www.linkedin.com/in/ekue-kpodar/overlay/photo/]. Timestamps 00:00 Stewart welcomes Ekue for their third episode, diving into vibe coding and AI-driven development changes. 05:00 Ekue explains using Claude on Chrome to auto-reply on Skool, burning tokens through screenshots, and Playwright as a more efficient alternative. 10:00 Stewart describes his Claude-dependent planning and coding agent system breaking after a model update, prompting him to build his own chatbot. 15:00 Small models discussed as critical for IoT, defense, and privacy-focused enterprises building internal APIs instead of routing traffic to OpenAI. 20:00 Open source versus closed source debated, with Chinese models gaining global traction while US foundational labs remain expensive and restrictive. 25:00 SaaS apocalypse explored as AI commoditizes knowledge work, with Linux and Terraform cited as proof open source still generates wealth. 30:00 OpenAI's sci-fi terminator fears explained as the reason they stayed closed source, ultimately handing China a strategic open source advantage. 35:00 China's economic dumping strategy applied to AI, potentially displacing US model dominance globally the same way manufacturing was disrupted. 40:00 Israel's signals intelligence dominance discussed alongside asymmetric warfare, drones defeating tanks, and information control replacing military muscle. 45:00 Global information age rankings debated, Israel leading, US and China tied, France and Poland emerging as sovereign tech players. 50:00 Qatar, NVIDIA, and Iran cited as proof that rare resources and technology matter more than population size in the 21st century power landscape. Key Insights 1. Running local AI models on a network of affordable computers can be more cost-effective than relying entirely on third-party APIs. By using compressed or smaller open source models locally, developers can handle repetitive or lower-stakes tasks without burning through expensive tokens from providers like Anthropic or OpenAI. 2. Small AI models are becoming increasingly important for IoT, defense applications, and companies that do not want to send sensitive data to external providers. Organizations can download open source models, run them on internal servers, and build proprietary APIs around them, creating something like an intranet of specialized small models. 3. The value created by AI tools is being redistributed away from traditional SaaS companies toward foundational model providers and individual builders. People are canceling subscriptions to software they once paid hundreds per month for, because AI now allows a single person to build comparable tools themselves. 4. Open source technology does not eliminate the ability to profit. Linux and Terraform are both open source yet made their creators wealthy. People will still pay for installation, setup, troubleshooting, and customization even when the underlying software is free. 5. China is applying its longstanding manufacturing dumping strategy to artificial intelligence by releasing cheap open source models globally, which threatens to erode US dominance in AI the same way Chinese manufacturing undercut other countries for decades. 6. In the information age, the size of a country or institution matters far less than its access to rare resources or advanced technology. Qatar, Israel, and NVIDIA each demonstrate that small populations or headcounts can wield enormous global negotiating power through concentrated technological or resource advantages. 7. Asymmetric warfare is redefining military power, with inexpensive drones defeating tanks that cost millions to build. This shifts the advantage toward nations that excel at signals intelligence and information management rather than those with the largest conventional military forces.

1. Juni 202655 min
Episode Ep549_From MS-DOS to Vibe Coding: How Non-Technical Founders Build Complex Software Cover

Ep549_From MS-DOS to Vibe Coding: How Non-Technical Founders Build Complex Software

Stewart Alsop sat down with Michael Shackelford to discuss their experiences building applications through vibe coding—the practice of using AI to create software without traditional programming expertise. Stewart, who runs the AI Whispers community in Buenos Aires and hosts the Crazy Wisdom podcast (with over 660 interviews), shared how he went from teaching people prompt engineering to building his own video conferencing software as a Riverside.fm replacement, while Michael opened up about his year-long journey creating Genrupt Inc, an AI-powered content generation tool for e-commerce sellers. The conversation covered everything from the decline in quality of Claude's reasoning capabilities and how Chinese companies used distillation attacks to copy Anthropic's models, to the importance of spaced repetition systems for managing knowledge in the age of LLMs, with both sharing battle-tested prompting strategies like asking AI to "explain it to me in genius terms" and using deep research queries to reverse engineer how competitors build their products. Show Notes: - Dan Martell's book "Buy Back Your Time" was mentioned as one of the best business books for thinking about life and business - Check out John Vervaeke's "Awakening from the Meaning Crisis" for understanding relevance realization and why AI fundamentally cannot determine what's relevant to humans without being told Timestamps 00:00 Michael discusses being exhausted from getting his app ready for launch, working nonstop with AI to prepare landing page for podcast traffic driving beta signups 05:00 Stewart explains starting AI Whispers in Buenos Aires after leaving OpenAI vendor company, meeting early adopters like Torin who was building mind-reading EEG technology 10:00 Discussion of how corporations resist AI adoption due to political games and job security fears while some companies use AI as excuse for pandemic-era layoffs 15:00 Stewart describes teaching workshops on using LLMs as linguistic tools rather than coding tools, noting technical people often lack humanities background needed for prompting 20:00 Explaining chatbot wrappers, API calls, and how Anthropic's reasoning quality declined after Chinese distillation attacks copied their secret sauce developed with philosophers 25:00 Technical discussion of model training, fine-tuning versus RAG for new information, and different approaches to updating AI knowledge beyond initial training 30:00 Stewart describes building podcast recording software to replace expensive Riverside, struggling with syncing audio and video files across different computer clocks 35:00 Discussion of critical factors in vibe coding, discovering unknown technical requirements, and how AIs don't automatically reveal missing information 40:00 Stewart's reverse engineering process using deep research function to study competitors' hiring and technology stacks, separating planning agents from coding agents 45:00 Prompting techniques including "explain like I know everything" and using spaced repetition systems to capture valuable prompts and technical knowledge 50:00 Michael explains his Generux app for generating ecommerce content using Amazon review data analysis to inform high-converting listing images and videos 55:00 Discussion of founder mentality involving self-delusion about project timelines, Michael working nine-plus hours daily for nine months on app development 60:00 Comparing Amazon's expert software to prosumer software approach, discussing distribution challenges and future robotics applications for customized products 65:00 Stewart demonstrates spaced repetition app for memory improvement and knowledge retention, explaining relevance realization problem that AI agents cannot solve without embodiment Key Insights 1. Stewart Alsop started AI Whisperers in Buenos Aires after leaving his role at Invisible Technologies, which was OpenAI's largest vendor for RLHF work. He noticed that machine learning engineers at tech companies lacked the humanities background needed to properly interact with large language models, which are fundamentally linguistic tools. This led him to create weekly workshops teaching non-technical people how to use AI effectively, running events every Thursday for two years straight. The group attracted intense geeks from the start and eventually led to Stewart speaking right after Vitalik Buterin at DevConnect, marking a significant milestone for the community. 2. Large corporations are resistant to AI adoption due to multiple factors including political dynamics within organizations and employees fearing job loss. Many companies that grew during the pandemic are now using AI as an excuse to downsize when the real issue is inefficiency from rapid expansion. Stewart observed that even technical people in machine learning often don't understand how to properly use AI tools because they lack linguistic and humanities training. The fundamental problem is educational, requiring companies to train people how to use these new tools while those same people resist learning them. 3. Vibe coding has evolved significantly with Claude Code being a game changer that reduced the technical barrier to entry. Before Claude Code, developers needed substantial technical knowledge to work through constant doom loops and debugging cycles. The success of coding AI tools stems from thirty years of testing infrastructure that provides clear yes or no feedback on whether code works. This infrastructure doesn't exist in the same way for manufacturing, science, and other fields, which is why software became the dominant area for AI assistance initially. 4. Claude's quality degradation over recent months resulted from multiple factors including distillation attacks by Chinese companies who reverse engineered Anthropic's reasoning capabilities. Anthropic had hired philosophers, sociologists, and psychologists to develop exceptional reasoning in Claude 4.5, but this was expensive to run. When Chinese models like Kimi copied these capabilities at one tenth the cost, and when mainstream users flooded the platform before Anthropic's planned IPO, the company had to reduce quality to manage computational costs. This represents a significant loss for power users who relied on Claude's superior reasoning abilities. 5. Stewart built a podcast recording application to replace Riverside because he needed API access to automate workflows, which Riverside wanted one thousand dollars monthly to provide. The technical challenge involves syncing audio and video from local recordings on multiple computers with different clocks through a server, then merging them so voices match lip movements. This problem requires understanding complex timing issues across different network conditions and file formats. Stewart has been working through AI psychosis for months on this FFMPEG pipeline problem, illustrating how vibe coding still requires building intuition about technical problems even without traditional coding knowledge. 6. The transition from expert software to prosumer software represents a major opportunity for AI-enabled tools. Expert software like Photoshop, Blender, and terminal interfaces have extreme complexity that intimidates beginners, but AI is making these capabilities accessible through natural language. The reign of specialists is ending as generalists with broad knowledge and curiosity can now build complete applications by leveraging AI to fill technical gaps. This shift particularly benefits entrepreneurs and founders who specialize in getting into difficult situations and figuring them out, even when they originally thought tasks would be easier than they turned out to be. 7. Building applications with AI requires accepting massive time investments beyond initial estimates and developing strategies for overcoming knowledge gaps. Michael estimated his ecommerce content generation app would take months but spent nearly a year working over nine hours daily, while Stewart spent months solving audio-video sync issues. Success requires using tools like deep research to understand how competitors solve problems, maintaining separate planning and coding agents, and learning to ask the right questions. The key insight is that vibe coders can achieve ninety percent of functionality independently, but the final ten percent often requires understanding specific technical concepts that AI cannot intuit without proper context and domain knowledge.

29. Mai 20261 h 10 min
Episode Ep548_The Pixel Path: From Perception to Action, and the Future of Intelligent Robots with Nizar Cover

Ep548_The Pixel Path: From Perception to Action, and the Future of Intelligent Robots with Nizar

Stewart Alsop interviews Nizar, CEO of Pixel Robotics, on the Crazy Wisdom Podcast to explore the intersection of AI, robotics, and perception. The conversation covers a wide range of technical topics including how transformers enable multimodal representation across text, images, and voice, the role of world models in predicting physical interactions, the advantages of diffusion models over traditional LLMs for certain applications, and the challenges of achieving real-time processing for robotics applications. Nizar explains Pixel Robotics' work on creating accurate 3D meshes from smartphone cameras for companies like L'Oréal, moving away from specialized sensors to make the technology more accessible through sophisticated algorithms, and discusses the future of robotics as closing the perception-action loop to enable robots to perform real tasks beyond simple demonstrations. To find out more visit Pixel Robotics' website [https://pixel-robotics.eu/]. Timestamps 00:00 Stewart welcomes Nizar, CEO of Pixel Robotics, discussing what a pixel is as the smallest visual unit on screens composed of red green and blue colors 05:00 Discussion of perception systems and how logarithmic laws help compress signals in both human and artificial systems, exploring normalization layers and sigmoid functions in deep learning 10:00 Exploring how transformers unified different data modalities including text voice and images, creating common representations through methods like contrastive learning 15:00 Nizar explains transformers as brute force learning systems with room for improvement through focused attention mechanisms and knowledge graphs rather than processing everything 20:00 Conversation about loss functions local minima versus global minima and how mixture of experts uses specialized small models instead of one massive generalist network 25:00 Discussion of deterministic versus probabilistic systems and how explicitly defined task graphs often outperform orchestrator-based approaches in AI systems 30:00 Exploring world models as predictive physics-based systems that learn environmental flows and transformations, complementing rather than replacing language models 35:00 Nizar discusses real-time processing challenges for robotics requiring millisecond responses with small memory footprints using vision transformers for faster experimentation 40:00 Pixel's work creating three d meshes from smartphone cameras for companies like L'Oreal, moving away from specialized sensors toward accessible software-based solutions 45:00 Explanation of different three d representations including voxels point clouds and meshes, with meshes being optimal for manipulation and rendering in applications 50:00 Future direction involves closing perception-action loops in robotics, moving beyond dancing toy robots toward practical multimodal systems that perform real tasks 55:00 Pixel's goal is democratizing high-quality three d scanning through smartphones, making mesh creation accessible to unlock applications in gaming cinema and virtual showrooms Key Insights 1. Pixel Robotics derives its name from combining perception and action in robotics, where the pixel represents the digital perception component and robotics represents the physical action component. The pixel serves as a metaphor for how robots must quantize and digitize continuous analog information from the real world into discrete units that computer systems can process, similar to how pixels are the fundamental building blocks of images on a screen. This quantization process is essential because numerical systems cannot work with truly continuous data and must convert reality into tractable digital representations that algorithms can manipulate. 2. The transformer architecture has created a fundamental unification in how different types of data can be represented and processed across multiple modalities. Before transformers, researchers working on natural language processing, computer vision, and audio analysis used completely different approaches and methodologies. The breakthrough of transformers was establishing a common representational framework that could handle text, images, voice, and other data types using similar underlying mechanisms. This unification is what enabled the development of truly multimodal AI systems and represents one of the most significant advances beyond just the language modeling capabilities that initially gained public attention. 3. Current transformer-based systems represent a brute force approach to learning that will likely be superseded or enhanced by more efficient algorithms. Despite claims that we have exhausted internet text data for training, significant improvements continue to emerge every few months through algorithmic innovations rather than simply adding more data. Future developments will likely involve more specialized attention mechanisms that focus on relevant information rather than correlating everything with everything, mixture of experts architectures with small specialized models, and approaches inspired by biological systems such as logarithmic compression laws and event-based processing that humans use naturally. 4. Diffusion-based language models represent a promising alternative to standard next-token prediction that could produce more accurate outputs through an iterative refinement process. Unlike traditional language models that predict one token at a time and cannot revise earlier outputs, diffusion models treat text generation like image denoising, starting with a noisy representation and progressively refining the entire output across multiple steps. This holistic approach allows the model to reconsider and improve all parts of the response simultaneously, potentially leading to higher quality results, though it may be slower than current autoregressive methods. This represents an important direction for overcoming fundamental limitations in how language models currently generate text. 5. For robotics applications, real-time performance and small model size are critical constraints that differ significantly from the requirements of large language models deployed in data centers. Vision transformers are being used as a testbed for developing efficient real-time algorithms because they require far fewer computational resources to train and test compared to large language models, making them more practical for rapid experimentation. The goal is to achieve millisecond-level response times with minimal memory footprint so that robots can react quickly to dynamic environments and run on affordable hardware that can be embedded in actual robotic systems rather than requiring expensive server infrastructure. 6. Practical robotics implementation requires moving beyond specialized sensors to software solutions that work with ubiquitous devices like smartphones for tasks such as three-dimensional reconstruction. Pixel Robotics evolved from building specialized scanning hardware to focusing on algorithms that can generate high-quality mesh representations of environments using only smartphone cameras, making the technology far more accessible and practical for real-world deployment. This approach enables applications ranging from industrial robotic arm control to virtual showrooms, and more importantly, it allows anyone to capture three-dimensional data without expensive equipment, which can also help generate larger training datasets for future AI development. 7. The next frontier in AI and robotics is closing the perception-action loop to enable robots to perform real practical tasks rather than remaining as demonstration systems or toys. While significant progress has been made in cognitive capabilities through language models and in robotic mobility through mechanical engineering advances, the critical challenge is integrating perception with action through systems like Vision-Language-Action models. The fundamental starting point for learning this integration is simple perception-action exercises, such as programming a camera mounted on servo motors to track and center a colored object, which demonstrates the basic principle of using sensory input to drive physical response that underlies all more sophisticated robotic behaviors.

25. Mai 202656 min
Episode Ep547_Dead Forests and Living Networks: Why the Future of Knowledge Looks Like Fungi, Not Filing Cabinets Cover

Ep547_Dead Forests and Living Networks: Why the Future of Knowledge Looks Like Fungi, Not Filing Cabinets

In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with Joshua Bate, founder of Bonfires.ai and DeciWorld, for a wide-ranging conversation covering knowledge management, graph technology, ontologies, decentralized science, and the future of how humans organize and share information. They break down the differences between personal and enterprise knowledge management, explore why flat ontological graphs may be the key to making diverse knowledge bases interoperable, and get into why traditional RAG systems break down at scale and how graph RAG offers a more principled solution. The conversation expands into the philosophy of categorization, the slow death of basic "gentleman science" under institutional pressures, and how decentralized protocols might restore a kind of mycelial knowledge network connecting small groups of researchers, enthusiasts, and communities — much like the original spirit of the encyclopedia before it was co-opted by institutions. You can learn more about Joshua's work at bonfires.ai [https://bonfires.ai] and deci.world [https://desci.world/] or follow him on X at @Bonfiresai [https://x.com/bonfiresai] and @DeSciWorld [https://x.com/DeSciWorld]. Timestamps 00:00 - Stewart introduces Joshua Bate, founder of Bonfires.ai, discussing personal versus enterprise knowledge management and their fundamental differences at scale. 05:00 - Joshua explains ontologies as classifiers for knowledge structures, describing their two-year search for a perfect ontology and ultimately building a flat, ontology-less graph protocol. 10:00 - Stewart connects categorization to shamanic practice and intercategorical theory, noting how major companies like Netflix and Yahoo built graph-based ontologies while the discipline remains underappreciated philosophically. 15:00 - Joshua traces Bonfires origins through decentralized science, explaining how NFT community excitement inspired redirecting capital toward funding unconventional researchers locked out of institutional systems. 20:00 - Joshua describes building federated knowledge networks through hackathons and conferences, comparing the vision to what Wikipedia could have been with decentralized incentive structures. 25:00 - Discussion shifts toward inevitable collapse of rigid scientific institutions, debating patchwork age theory, nation-state fragmentation, and rhizomatic versus arboreal knowledge structures. 30:00 - Joshua articulates the mycelial network vision, enabling direct cross-cultural information access where individuals control their own narrative lens, warning against collective we thinking and authoritarianism. Key Insights 1. Knowledge management exists on a spectrum from personal to enterprise, but the founder of Bonfires argues this split is artificial. He believes knowledge itself does not respect those boundaries, and that small groups, researchers, hobbyists, and large institutions all possess knowledge that can and should interoperate with each other. 2. After two and a half years of searching for the perfect ontology to structure their knowledge graph, the team concluded that no perfect ontology exists. Their solution was to build the flattest possible graph structure with only events, entities, and edges, creating a base layer others can build specialized ontologies on top of. 3. Graph-based knowledge systems are more efficient than traditional databases for AI traversal because once a graph is computed, it is relatively free to query. Graph RAG combines the discovery power of vector search with the structured precision of graph traversal, solving many hallucination problems associated with standard retrieval augmented generation. 4. Basic scientific research, the soil from which applied discoveries grow, is deteriorating because institutional funding structures only reward commercially viable outcomes. The founder built his platform partly to redirect community-driven capital toward researchers who are doing important work without institutional support. 5. The institutionalization of science has historically blocked the open exchange of ideas that drove the original scientific revolution. The human spirit for open inquiry has not changed, but people cannot pursue it without financial support, and building decentralized infrastructure could restore that possibility. 6. A federated knowledge network would allow individuals to access information from any contributor and filter it through their own preferred lens, rather than receiving information pre-filtered by centralized platforms. This represents a form of information symmetry similar to how mycelial networks distribute nutrients across a forest. 7. The concern is not whether current scientific and governmental institutions will change but in what direction the rebuilding goes. Those capitalizing on the transition carry the same incentives as the previous era, which risks reproducing the same problems inside new structures.

18. Mai 202658 min