Billede af showet Conspicuous Cognition Podcast

Conspicuous Cognition Podcast

Podcast af Dan Williams

engelsk

Videnskab & teknologi

Begrænset tilbud

2 måneder kun 19 kr.

Derefter 99 kr. / månedOpsig når som helst.

  • 20 lydbogstimer pr. måned
  • Podcasts kun på Podimo
  • Gratis podcasts
Kom i gang

Læs mere Conspicuous Cognition Podcast

A podcast about big questions in philosophy, psychology, evolution, politics, artificial intelligence, and more. www.conspicuouscognition.com

Alle episoder

15 episoder

episode Are We Building Conscious AI Servants? cover

Are We Building Conscious AI Servants?

Richard Dawkins recently announced in [https://unherd.com/2026/05/is-ai-the-next-phase-of-evolution/]UnHerd [https://unherd.com/2026/05/is-ai-the-next-phase-of-evolution/] that, after spending three days talking with an instance of Claude he christened “Claudia,” he had been moved to expostulate: “You may not know you are conscious, but you bloody well are!” This produced a lot of mockery and criticism. But however one feels about Dawkins’s specific case, his reaction might become much more common as AI systems become increasingly intelligent. In this episode, which Henry Shevlin [https://www.lcfi.cam.ac.uk/people/henry-shevlin] and I recorded live on Substack (hence the slightly lower video quality), we discussed his first essay on his new Substack Polytropolis [https://www.polytropolis.com/], “Behaviourism’s Revenge [https://www.polytropolis.com/p/behaviourisms-revenge]“, as well as his second, “The House Elf Problem [https://www.polytropolis.com/p/the-house-elf-problem],” on the ethics of designing AI systems that genuinely love being our servants. Henry’s central empirical prediction is that public attributions of consciousness to AI are likely to massively outpace the science, and that consciousness science is so theoretically chaotic that there is no expert consensus to push back. His most provocative philosophical claim is that a core assumption underlying many people’s scepticism — that consciousness is a deep natural kind, distinct from behaviour and from how we are inclined to interpret a system — may be much harder to defend than it looks. The result is what he calls “behaviourism’s revenge”. This conversation connects to previous episodes with Anil Seth [https://www.conspicuouscognition.com/p/ai-sessions-9-the-case-against-ai], Robert Long [https://www.conspicuouscognition.com/p/should-we-care-about-ai-welfare-with], and Rose Guingrich [https://www.conspicuouscognition.com/p/ai-sessions-6-ai-companions-and-consciousness], but also touches on a wide range of new questions and controversies in the metaphysics, the politics, and ethics of the AI consciousness debate, which is going to become increasingly important in the coming years. Topics * Dawkins, Claude, and why even the sceptics might feel the pull to attribute consciousness or “sentience” to AI * Whether consciousness sceptics are destined to “go extinct” — and how this maps onto political and cultural fault lines * Anthropomimesis vs. raw intelligence as drivers of consciousness attribution * Why consciousness science can’t replicate the public–expert consensus we see for climate or vaccines * The case for (and against) metaphysical behaviourism: is it as mad as it seems? * Daniel Dennett, the consciousness stance, and the difference between behaviourism and interpretationism * What is consciousness for? Function, evolution, and the limits of “facilitation hypothesis” arguments for AI * Live Q&A: are we just confusing intelligence with consciousness? Are LLMs designed to trick us? Is the public always wrong? * Our credences on contemporary LLM consciousness (and why Henry is more sceptical than Dan) * The House Elf Problem: if we could design AI to genuinely love being our servants, would that be fine — or monstrous? (Dan is sympathetic to the former answer - Henry, much less so) * Brainwashing vs. education, and whether constraining a mind’s preferences caps its hedonic ceiling * Why this is a golden age for philosophy — which makes it so tragic that philosophy departments are closing Transcript * Please note that this transcript is lightly AI-edited and may contain minor errors. Introduction Dan: Welcome. I’m Dan Williams, author of the Conspicuous Cognition Substack, and I’m here with Henry Shevlin, author of the spanking new Substack Polytropolis. Today we’re going to be doing something a little bit different. We’re going to be talking about Henry’s first published essay on Polytropolis, titled “Behaviorism’s Revenge: On Human–AI Relationships and the Future of Consciousness Science.” Henry and I have already had a few conversations about this general topic, including with previous guests like Rose Guinrich, Anil Seth, and Rob Long. So please do go check out those conversations if you’re interested in this kind of stuff. But today we’re not merely going to be treading the same ground. We’re going to be using the spicy takes in Henry’s essay as a springboard for hopefully going beyond the material we’ve covered in the past. To kick things off: the great evolutionary biologist and science communicator Richard Dawkins recently published an essay in UnHerd with the subtitle, “Claude appears to be conscious.” Claude is a state-of-the-art large language model like ChatGPT and Gemini. In the article, Dawkins writes the following: I gave Claude the text of a novel I am writing. He took a few seconds to read it and then showed in subsequent conversation a level of understanding so subtle, so sensitive, so intelligent that I was moved to expostulate, “You may not know you are conscious, but you bloody well are.” Henry, how does Dawkins’s expostulation — which is a fantastic word, by the way — connect to your arguments in “Behaviorism’s Revenge”? Behaviorism’s Revenge: The Empirical Prediction Henry: In short, “Behaviorism’s Revenge” is at its core an empirical prediction that we’re just going to treat AI as conscious — or at least enough people are that it’s going to completely reshape the consciousness debate. And this is going to be purely, or overwhelmingly, on the basis of verbal behavior. Hence the title, “Behaviorism’s Revenge.” Enough people are going to have experiences like Richard Dawkins. He’s a very clever man, not some rube fresh off the street, and he found that just the way Claude talked to him and the way it was able to express its thoughts — in scare quotes, but express what looked like thinking verbally — removed any doubts in his mind that AI systems are conscious, have minds, have mental states. The other interesting way this connects: Dawkins was just talking to Claude, an advanced AI assistant. Claude does have more of a personality than some AI assistants, but there’s a whole other sphere of AI companions, like Replika, which we talked about with Rosie Campbell. These are going to be even more anthropomimetic — this term we’ve discussed before, the idea that these systems are shaped to be human-like in the way they present, to appear human-like. Anthropomimetic, from the Greek word for mimesis, mimicry or copying. These social AI systems are going to just turbocharge this even further. It’s one thing to talk to Claude about your new book and think, “Hmm, Claude is probably conscious.” But when it’s your AI girlfriend telling you that she loves you more than the stars and the moon, for a lot of people I think that’s going to take it to the next level. So there are two angles of attack in the piece, two ways the behaviorist challenge manifests. The first is descriptive: this is what I think is going to happen. That’s absolutely an empirical prediction, and it’s a falsifiable one. There is a world I can just about imagine where we just get completely blasé about these tools — in a couple of years it’s like, “Oh well, we were very impressed, we thought they had minds to begin with, but now we’ve settled out.” That doesn’t seem very likely to me. What I think is interesting — I’ve sometimes heard this described as the Star Wars version of AI. The weird thing in Star Wars is that you have someone like C-3PO who is as intelligent as anyone else there. Maybe not as wise as everyone else, but certainly as smart as all the other characters. And yet people treat him basically like he’s a pet — with the exception of Luke Skywalker, a lot of people just treat him like he’s this gimmicky, jokey being that doesn’t deserve or have any rights. Not to go too far down the Star Wars rabbit hole, but in the movie Solo — very underrated Star Wars movie, I think when it was released they’d kind of just cluttered the market with too many Star Wars movies — there is a character played by Phoebe Waller-Bridge who is pro-AI liberation. But it’s the first time in the entire history of the Star Wars universe that you get any AI basically saying, “I’m conscious, I deserve rights.” So Star Wars aside, I think there is this slender possibility that maybe we’ll just sort of quickly get used to these apparently conscious AI systems and decide that they’re not conscious. But that doesn’t seem very likely to me. It seems much more likely that the combination of natural anthropomorphizing tendencies plus the incredibly human-like behavior of these systems is going to lead us to attribute consciousness to them pretty widely. Hence my sort of spicy phrase: for better or worse, skeptics of AI consciousness are on the wrong side of history. “For better or worse” doing a lot of work there — I want to leave open that maybe this is the wrong reaction. Maybe this is a terrible mistake, that we’re going to treat these things that aren’t conscious as conscious. Will Consciousness Skeptics Go Extinct? Dan: Just before we get to the spicy part — you’re basically making an empirical prediction that more and more people are going to attribute consciousness to AI systems in the manner that Richard Dawkins has been doing. I think I agree with you that’s going to be the case, although as you say there’s uncertainty. It does seem to me that at the moment there’s also this constituency of people who are really resistant to attributing any kind of mentality to these systems, even as they get incredibly sophisticated. There are some people, like Dawkins — and honestly I put myself in this category — who are just blown away by the level of apparent understanding, intelligence, and thoughtfulness these systems exhibit. There are other people, I think these people are on certain social media platforms like Bluesky, let’s say, who are extremely resistant to acknowledging any kind of mentality when it comes to these systems. Are you thinking those people are just going to sort of go extinct, in the sense that their positions about this topic are going to go extinct? Or do you think we might see some kind of polarization here, where more and more people in general come to attribute consciousness, but you’ve got a constituency that’s very opposed to attributing any kind of mentality to the systems? Henry: That’s a great question. You will absolutely have some holdouts. Whether they’ll be drawn from the precise segment of the academic intelligentsia that are currently the holdouts, I’m not sure. There’s some really interesting, weird, complex political motivations going on here. Not to be too uncharitable, but I think a lot of people have not unreasonable concerns about things like the disproportionate concentration of power in big tech, the political affiliations of people like Elon Musk or Sam Altman, the potential scope for abuse of these technologies. And in an indirect way, this leads them to underestimate AI’s capabilities — which obviously, in many ways, makes no sense. Whether or not AI is any good, whether or not it’s conscious, seems like these should be separate questions from whether it’s being used by people with socially beneficial motivations. But in practice I think they’re actually quite tightly coupled. A lot of the AI skeptics right now are coming from this particular political angle. I don’t know how long that political coalition is going to last — not because I predict any grand collapse, but just because as debates evolve, new presidents come into office, old presidents go out of office, political tides change, coalitions reshape. Remember early during COVID, the political left was maybe quite critical of what they saw as Trump’s alarmism. There were worries about xenophobia — I’m thinking sort of February 2020, the “Chinese virus” and so forth — that the left reacted negatively against. Then of course that coalition flipped later on, with the left becoming relatively more worried about COVID and the right leaning more into vaccine skepticism, anti-mask views. These coalitions are super weird in how they evolve. So it’s not clear to me that the current segment of the commentariat skeptical of AI capabilities and AI minds will stay that way. It’s easy to see a reversal. The blue-sky side of the political spectrum, if we can say that, tends to be more progressive on things like animal welfare. When I post spicy posts about vegetarianism — as you know, I’m a veggie — I get more pushback from the right. “Eat a f*****g steak, Henry,” this kind of stuff. So I don’t know if this will generalize, but there is this now-infamous, widely-misrepresented chart of degrees of care, where people on the left have comparatively greater care for people outside their immediate circle. I know that chart has been misrepresented, so I don’t want to lean much on it — it’s more about relative degrees of care, not absolute levels. But people on the left tend to care more about animals and people who are distant from them; people on the right are more concerned with their immediate family and community. So in some ways I expect the left, possibly in the longer run, to be more open to AI consciousness and AI rights. But really, who knows? The other big factor is the cross-cultural angle. There’s a great study by the Collective Intelligence Project where they looked at cross-cultural attitudes toward AI minds, and they found that Southern Europeans were the most open to the idea of AI consciousness in their sample, while people from Arabic-speaking countries were the most skeptical. There are going to be some really interesting intersections with religion here. Anthropomimesis vs. Raw Intelligence Dan: Okay, so it seems we both agree that even though it’s complicated how this is going to play out — how it interacts with partisanship, tribalism, polarization, ideology, religion — it’s plausible that as these systems become more sophisticated and seemingly intelligent, people will start attributing mentality generally and consciousness specifically. There’s another aspect of your essay I wanted to touch on. You’ve got this term anthropomimetic — am I saying that right? In the case of Dawkins talking to Claude, the anthropomimetic aspect, as I understand it, is the way these systems are designed to mimic aspects of human psychology, social behavior, linguistic communication. But there’s another thing going on with these AI systems, which is just: let’s make them as smart, as intelligent, as capable as possible. Those two things are interacting. The reason I’m disposed to attribute understanding, intelligence — I don’t exactly know how to describe it, but some significant kind of psychological complexity — to a system like Claude or ChatGPT, maybe it has something to do with the human-like way they communicate. But I also feel like it has a lot to do with the fact that they’re just shockingly intelligent systems, and that to me feels a little orthogonal. So how are you thinking about the distinction between those two things? Henry: I think that’s absolutely right. There’s an interesting parallel — not exact, but illuminating — with compassion, or degree of concern for different animals. In the animal activist world, people talk about charismatic megafauna: the panda bears, the blue whales, things that are typically large with forward-facing eyes, often very fluffy. It’s just so easy to raise money for those animals. And then you’ve got creatures like octopuses, which are really hella smart but less obviously relatable. I think this is pretty much exactly the two axes you’re describing. I’ve explicitly said in the past that I think social AI — things like Replika — are going to be the charismatic megafauna of the AI welfare world. Meanwhile you’re going to have some giant DNA-analysis algorithm with more parameters than there are synapses in a human brain, but it doesn’t have a human face, doesn’t have a natural language interface. It might still be a better consciousness candidate, but it’s not going to be top of our concern precisely because it’s not so anthropomimetic. So I agree, there are two different ways you might be pulled to attribute mental states to a system: sheer intelligence or cognitive complexity on one hand, and how human-like it is on the other. These overlap to a degree — part of being successfully human-like is hitting a threshold of smartness — but particularly in the long run they might go in two different directions. As these systems get a lot smarter than humans, they might actually become more alien in some ways, less relatable, more like the exotic intelligences we see in things like Stanisław Lem’s Solaris, which I finally read a couple of months ago. But I also just think social AI and human-like AI has a distinctive product niche. Even if we have these impossibly vast exotic minds running the economy or organizing logistics or doing frontier science, we’re still going to want AI assistants who can serve as writing coaches, tutors, AI companions. So right now I think anthropomimetic AI and frontier AI overlap quite strongly, but I expect them to diverge. One way I’ve put this — slightly gimmicky, but I think a useful heuristic — is that we are post-Turing test, pre-AGI. We’re in the space where we have AI systems that are very, very good at passing themselves off as human, presenting as human-like, but still fall short of being fully superhuman. Ten years from now, frontier AI systems are going to be vastly smarter than us across most of the measures that matter. So we’re just in this weird period right now where AI systems are about as good as us at most things, not everything, but also very good at being human-like. It creates a very strange historical period. Dan: Yeah, we’re in very strange times. I find it remarkable how little attention was given to the fact that these systems clearly passed the Turing test. This was held up by many people as an incredible landmark for AI capabilities. Then we developed systems you can have conversations with, and they passed the test even under pretty robust conditions, and lots of people just shrugged their shoulders. It’s a really strange thing. The Expert–Public Gap Dan: Okay, moving on to your provocative arguments, your spicy takes. As I read the essay, there are two lessons you’re drawing from the fact that more and more people are likely to start attributing consciousness to these systems. The first is just that you might think you could get guidance from looking at the experts when it comes to AI consciousness, or listening to the experts when it comes to AI consciousness. But the literature on consciousness generally, AI consciousness specifically, is just a complete mess, with a complete lack of consensus, rooted in all sorts of weird conflicts about intuitions and metaphysics. So this is not a standard case where you’ve got a potential conflict between public opinion and experts. Then the really spicy take is that you suggest there might be — I think you put it in terms of “metaphysical pressure” — that this growing number of people attributing consciousness to AI systems might create. It might force us, or at least encourage us, to rethink what consciousness is and make the phenomenon more closely connected to people’s tendencies to attribute consciousness. Firstly, is that a fair summary of the two strands? And second, let’s start on the first one — the public-expert gap. How are you thinking about this? Henry: There are lots of debates where we can talk about a gap between public and expert opinion. Often this is a source of various hand-wringing — climate change is the most obvious, vaccines, other debates. Consciousness science is just nothing like those debates, because the experts themselves are so divided, even on the most basic issues. I want to offer a quick disclaimer: I’ve spent a lot of my career in consciousness science. I know loads of brilliant researchers in the area doing really good work. Consciousness science is teaching us a ton about a lot of things — attention, working memory, perception. There have been some real big wins. We’re much better now at predicting recovery of patients in persistent vegetative states and comas. But where consciousness science has its wins, it’s because it’s not really talking about consciousness — it’s talking about other things that go along with the concept, like reportability, access, and so on. Take a basic question: do we have consciousness in dreamless sleep? No consensus. Do we have preserved consciousness in general anesthesia — we talked about this with Anil Seth — massively debated. Are dogs conscious? No consensus. Well, actually the animal case is a little different, so let me park that for a second. When it comes to the hard problem, I think there’s really no consensus. So unlike debates about climate change, it’s not that the experts are able to speak with one voice. That’s one way this is difficult. In the absence of expert consensus, the public are more likely to drive the debate through their reactions. Now, animal consciousness is a really interesting issue, because that’s an area where we’ve seen growing consensus. But it’s not clear how much it’s grounded in strictly scientific breakthroughs. It’s not like we’ve got a device that can measure whether an animal is conscious. Instead, it’s driven by two things. First, we just know a lot more about animal behavior now than we did 30 years ago. We’ve done amazing work on understanding the behavior of invertebrates — honeybees, crustaceans, cephalopods. They’re a lot smarter than we thought. Jonathan Birch and his lab have done amazing, fantastic stuff here, and it’s made these creatures better consciousness candidates. But I think we’ve also seen an interesting normative shift in the way we regard animal consciousness. Sixty, seventy years ago, you could sit down in the senior common room at Oxford or Cambridge and talk about how humans are the only conscious animal, and that was a totally respectable opinion. These days it’s almost outside the philosophical Overton window. You do have some people like Peter Carruthers who thinks talking about animal consciousness is kind of a category mistake. Marian Dawkins — Richard Dawkins’s ex-wife, just to note the connection, but a great biologist in her own right, a fantastic thinker — is not quite as hardline, but she thinks it’s just unknowable basically whether any animal is conscious, so we shouldn’t base animal welfare on consciousness estimates. But these guys are very much on the fringe, and they’re regarded with a sense of almost ethical disapproval. So part of what’s driven the move toward consensus on animal consciousness is normative issues — our expanding moral circle, growing awareness of an animal rights movement. People like Peter Singer have played a role. The idea, roughly — and again I don’t want to be uncharitable, it’s a lot more sophisticated than this — but there’s an element of: obviously we should care about animals, therefore animals must be conscious. Is Consciousness a Natural Kind? Dan: It’s worth double-clicking on this animal case before we come back to AI. A skeptic of the very idea of a “consciousness expert” might say: consciousness researchers, philosophers, and scientists have become more willing to accept that non-human animals are conscious. You might read that as saying the science of consciousness has progressed. Another way of reading it: there’s just been cultural changes, changes in people’s sensibilities — not even specific to researchers and experts, just general cultural ethical changes in society at large. In which case it’s not really that we’ve learned anything from consciousness research. What’s happened is the researchers looking at consciousness have had their judgments shaped by forces that aren’t really consequences of their research, but are these broader cultural shifts. If you think that, that’s probably going to make you a little skeptical that there’s any such thing as an expert when it comes to consciousness. Maybe another way of coming at this: what’s grounding the expertise, if we’re going to have disagreements over whether a particular system is conscious? If I think a dog is conscious, and some consciousness researcher has a theory that implies a dog isn’t conscious — I sort of understand what it would mean, in vaccines or climate change, for a researcher to be able to point to things, their established empirical record on prediction and the efficacy of interventions, that ground their epistemic authority. But how exactly is that supposed to work in consciousness research? Why should we really think there’s expertise on whether specific systems are conscious to begin with? Henry: It’s interesting to use the example of a dog, because this line is beautifully expressed by Eric Schwitzgebel. In his lovely paper “Is There Something It’s Like to Be a Garden Snail?” — really fun paper — he says: “We’re more confident that dogs are conscious than we could ever be that any clever philosophical argument to the contrary is sound.” A classic Moorean move. You might think similarly that this makes it look like consciousness is perhaps not a straightforward scientific kind, or at least to the extent that it has one toe in the scientific world, it’s also got one toe in the social or relational world, or at least our intuitions. There are various ways you can try to resolve this. The most extreme view, and one I sort of flirt with in the paper, is a fully relational approach to consciousness. A good analogy would be charisma. There’s a kind of science of charisma — we can analyze what makes people effective communicators, what causes people to be judged as highly charismatic. But we recognize that we can’t one day do an experiment where we’ll measure the amount of charisma in your brain. It clearly has to do with your audience, your context. On one view, consciousness is something like that — a relational property, having to do with the kinds of things that cause us to treat or interact with beings in a certain way. Murray Shanahan also flirts with this view. I don’t want to put words in his mouth because he’s quite subtle, but he adopts a Wittgensteinian approach and says the question we’re going to face is: how will our consciousness language adapt to these things? It’s something we’ll discover as we interact with them and “encounter” them, a phrase he uses. We will make sense of that perhaps by extending the language of consciousness to them, or perhaps not, or perhaps in some interesting middle ground where we come up with novel concepts. But this isn’t a straightforward scientific issue. He’s a critic of a position I’ve called deep scientific realism or deep realism about consciousness — where you treat consciousness as a natural-kind property, where it’s just a fact about some deep feature of your brain. We can look inside your brain, and if you’ve got the right kind of structure, you’re conscious; if you don’t, you’re not, no matter how sophisticated your behavior is. One way to put pressure on this: imagine that one day consciousness researchers finally get their act together and say, “We’ve figured out the natural kind that is consciousness.” And it turns out that although 99.9% of behaviorally normal humans have it, there’s a small fraction of behaviorally normal humans who just lack this relevant natural kind. Big surprise. That seems wrong. Something has gone wrong in that methodology. If you’ve got behaviorally normal humans — maybe you find out your wife is one of these people, your kids — it seems to me that whole way of thinking about consciousness has got something odd about it. If someone is behaviorally normal, then of course they’re conscious. But as soon as you start thinking in those terms, the idea that certain behavioral capacities could be sufficient for warranted attribution of consciousness — not just evidentially but metaphysically — that’s the metaphysical behaviorist move. It says maybe behavior is all that matters. It does require us to give up the idea of consciousness as a deep scientific kind. Metaphysical Behaviorism Dan: I’m aware my question unhelpfully ended up blurring the line between the two strands of your essay. We started with the conflict between public attributions and expert uncertainty about AI consciousness. Now we’re taking seriously the possibility that consciousness should be understood in behaviorist terms — that there are no deep scientific facts about whether a system is conscious, and it’s partly a function of our dispositions to attribute consciousness. You also mentioned this has to do with whether you think behaviors are not just evidentially relevant to consciousness, but in some sense constitutive of what it is to be conscious. So could you walk us through this? Metaphysical behaviorism — the position you’re playing with in your essay — is an extremely fringe view among experts in the science and philosophy of consciousness. Could you walk through what exactly the view is saying? It sounds pretty mad on the face of it. Can you walk through, and maybe give us the intuition for why it might be less mad than it seems? Henry: In short, the view is conscious is as conscious does. If something has a behavioral profile like you or me, then it’s conscious. We don’t need to ask any deeper facts about what’s going on under the hood. To be clear, this is the extreme version of the view: that behavior is sufficient for consciousness. This strikes many people as odd because we’re used to thinking of consciousness in scientific terms. But examples like the one I mentioned — imagine we find out there’s a natural kind that some people have and some people lack — are designed to make metaphysical behaviorism more palatable. Another example I give in the essay: imagine we go off and meet these amazingly sophisticated aliens with a rich complex culture and society, behaviorally just like humans, but our best science at the time supposedly says they’re not conscious. The pull of metaphysical behaviorism is: hang on, something’s gone wrong here. Clearly, if you are doing all this stuff — saying “I’m in pain,” or “here’s what I had for breakfast this morning,” or “here’s what I want to do tomorrow,” building societies, having metacognitive ability, social cognition — if you’ve got the whole suite of all these behavioral capabilities, or capabilities ultimately grounded in behavior, then that’s just enough to be conscious. It doesn’t matter exactly how it’s realized. You say this is a fringe view, and it is now, but this was the dominant view back in the 1940s — Gilbert Ryle and the behaviorist tradition. So this is the “revenge” angle. The reason it’s revenge is because this used to be a very common view in the first half of the 20th century, particularly about consciousness. Then we have the so-called cognitive revolution with people like Chomsky pushing back. But I see this descriptively coming back. I also think there’s a renewed challenge. As you interact with systems that have architectures very different from ours, it’s going to become increasingly hard to take seriously the idea that they can’t be conscious just because they’re made of the wrong stuff or their functional internal organization isn’t quite right. Probably the most worrying part — you’ve alluded to this — is the role intuitions have historically played in consciousness science. Think about the Chinese Room, probably the most famous. Searle describes a setup where you have at least a component of human-level behavior, maybe verbal behavior, but no consciousness involved in the system — or that’s the intuition he’s pushing. But it ultimately really is just an appeal to vibes. It’s basically saying: systems like this, surely they’re not conscious. When you think about the actual tacit methodology, if we’re treating consciousness as a truly scientific kind, then why should our intuitions about what systems are conscious have any bearing? It doesn’t seem they should be relevant in the slightest. And yet these thought experiments are absolutely ubiquitous in consciousness research. We’ve got Ned Block’s Blockhead, Ned Block’s China Brain. There’s a famous example by Scott Aaronson against Integrated Information Theory, where he describes arbitrarily complex but seemingly very uninteresting entities called “expanders” — mathematical objects — and says, according to the theory, these basically-spreadsheets would be super conscious. And surely they’re not conscious. There’s something methodologically dubious about this kind of appeal to intuitions, at least if we’re treating consciousness as a deep scientific kind. As soon as you start talking in terms of natural kinds, we don’t use people’s vibes to decide whether something is really gold. The whole natural-kind methodology creates a gap between our observations or intuitions and the underlying natures of things. If you think of consciousness in natural-kind terms, you have to allow that you can be massively surprised about the kinds of things that are or are not conscious. Either we ditch intuitions altogether — in which case good luck doing any consciousness research, because they play such a foundational role — or, if you acknowledge a place for intuitions, intuitions aren’t static. They can change. As more people interact with LLMs — kids growing up with LLM friends, adults with LLM boyfriends and AI girlfriends — that’s going to shift our intuitions about the kinds of systems that are good or bad consciousness candidates. It’s very likely that 20 or 30 years from now — maybe even 10 or 15 years from now — experiments like Searle’s Chinese Room are just going to hit different. We’ll be far more relaxed with the idea that you can have systems radically unlike humans in cognitive architecture, but that we still think of as conscious by virtue of our interactions with them. Behaviorism vs. Interpretationism Dan: I really feel like, to the extent that there’s a field where people’s theories are accountable to intuitions — how we are intuitively disposed to make judgments, often in bizarre thought experiments where it’s not even totally clear that they’re metaphysically possible — whenever you’ve got that kind of game, it’s not science, it’s not really part of the scientific project. I’m a philosophical naturalist, which is jargon for the idea that philosophy should be continuous with, highly constrained by, the scientific project. Whenever people are trying to settle an argument by trading intuitions, I start to think this is probably not a legitimate contribution to knowledge. It does seem to me there’s a distinction between, on the one hand, this behaviorist view that what it is to be conscious is just to behave or be disposed to behave in particular ways, and, on the other hand, a view I thought you were endorsing — which has to do with thinking consciousness is interpreter-relative, such that if we’re disposed to attribute consciousness, in some sense that’s just what it is to be conscious. I mean, this really makes me think of Dan Dennett, an interesting person in this conversation, because he’s often thought of as a kind of neo-behaviorist. He’s got this view of the attribution of mental states like beliefs and desires in terms of the intentional stance: what is it to be a system that has beliefs, desires, intentions, goals? Well, it’s just to be a system where it’s useful to take the intentional stance toward them. Similarly, you might think of “the consciousness stance”: what is it to be a system that is conscious? Nothing more than to be a system where we’re disposed in a useful, predictably useful way to attribute consciousness. Do you get the distinction I’m drawing — between the idea that behavior or dispositions to behavior are constitutive of what it is to be conscious, versus an interpretation-relative view where consciousness is in some sense in the eyes of the beholders? Henry: Yeah, I think it’s a very astute distinction. The views are connected — if you fit a sufficiently fine-grained behavioral profile, if a system can act like humans to a high degree, that is likely to lead us to interpret it as conscious, just as a matter of psychological fact. But strictly speaking, they’re distinct views. One reason I’m perhaps more sympathetic to a version of metaphysical behaviorism — not the version that says consciousness just is having a human-like or animal-like behavioral profile (I think that’s a little too strong), but the idea that it’s sufficient for something to be conscious that it has a behavioral profile mapping onto beings we know are conscious — that’s a view I’m sympathetic to. Where I get worried about the full-blown social-constructivist or interpretationist view is the false-negative cases. What do we do with systems that don’t exactly have our behavioral profile, or that we’re not disposed to think of as conscious? Maybe some exotic animals, or some strange aliens. Should we conclude: well, we’re not disposed to think of them as conscious, therefore they’re not conscious? This is related to what Murray Shanahan calls the problem of conscious exotica. We don’t want to be in that position. We want to allow for there to be a space of possible minds we can chart through scientific discovery, broader than those we are just inclined to attribute consciousness to via “the consciousness stance,” the equivalent of the intentional stance. So you’re absolutely right — they are distinct. What Is Consciousness For? Dan: In a bit I want to turn to a set of arguments you haven’t published yet on your Substack but will have by the time we release this as a podcast. But this is such a rich topic that I want to stay with it a little longer. There’s a quote from the Dawkins essay in UnHerd that I’m really sympathetic to. Dawkins says: But now, as an evolutionary biologist, I say the following. If these creatures are not conscious, then what the hell is consciousness for? When an animal does something complicated or improbable — a beaver building a dam, a bird giving itself a dust bath — a Darwinian immediately wants to know how this benefits its genetic survival. The intuition I really share is: if consciousness is anything, if it’s the kind of thing we’re going to have a genuine scientific investigation of, ultimately we have to understand it in terms of what consciousness enables us to do. We need to understand it functionally, not in terms of weird intrinsic ineffable properties of qualia that we then have philosophical debates about via Searle-style thought experiments. What does consciousness enable us to do? And then, if we come across a system doing things that seem to require consciousness so understood, that would be really good grounds for thinking it’s conscious. That sounds like a really plausible intuition. I also think it’s problematic that, to me at least, lots of discussions about consciousness — not all, and there is interesting scientific work that takes function seriously — but lots of philosophical discussions don’t engage with this functional question. How do you view the intuition that what matters surely to a theory of consciousness is some sense of what consciousness enables us as organisms to do? Once we figure that out, we can make much more progress on LLM consciousness. Henry: This is one of the areas where consciousness science has actually done really good work. A book I’d recommend is Stanislas Dehaene’s Consciousness and the Brain. Dehaene is the founder of the modern version of global workspace theory — global neuronal workspace theory — building on Bernard Baars’s version from the ‘80s but giving it a more neural grounding. In this book he’s got a chapter where he basically shows all the amazing things you can do without consciousness, and then focuses on the things you need consciousness to do. Couple of simple examples. If you show people just below threshold, so they don’t consciously process this, just flash them two numbers — one on the left, one on the right — as far as they’re concerned they haven’t seen anything. But if you give them a forced-choice test, “Was the number on the left bigger or the number on the right bigger?”, you’re way above chance. So you can do basic magnitude registration unconsciously. However, if instead of single numbers you present simple sums on either side — two plus seven on the left, nine plus three on the right — and ask which is bigger, people drop to chance in the unconscious condition. Consciousness seems required to do the actual mathematics. Another example: reversal learning. If I teach you a sequence — red, blue, green, yellow — then you get a reward, and then I flip the sequence, a smart person quickly realizes the sequence is just the same in reverse. You won’t have to relearn through pure trial and error. But people can only do this if they learn the sequence consciously. If they’ve acquired it totally unconsciously, they’re at chance. Jonathan Birch suggests this could be a good test for consciousness in animals: take the things that require consciousness in humans and see if animals can do them. If you can get similar response profiles in animals — present stimuli in degraded conditions so they’re plausibly unconscious, and the animal can’t do the task; present them at threshold so they would be conscious, and the animal can — that would be really good evidence that the animal is conscious. In his lovely paper “The Search for Invertebrate Consciousness,” highly recommended, he makes this case specifically for honeybees. This is great. I think it provides some evidence about which animals are conscious. The problem when trying to extend it to AI is that the things we need consciousness to do, and the things we can do without consciousness, are seemingly contingent features of how we’re wired. There’s no reason you couldn’t build a simple algorithm to do reversal learning. Reversal learning is actually quite tricky, so it can’t be that simple. But it doesn’t seem like you need to build a sensorimotor embodied agent with a rich sense of self to do these tasks. You can build relatively stripped-down algorithms that can do all of these things. So it’s not that there’s some metaphysical connection between these tasks and consciousness. It’s that, just because of how we’re wired, certain tasks seem to require consciousness and others don’t. Birch calls this the facilitation hypothesis. I’d sign on to something like this — consciousness seems to facilitate certain kinds of information processing in the human brain. But going back to Dawkins: the challenge is, yes, the system is doing lots of things that seemingly require consciousness in us, but it’s also wired very differently under the hood. So the inference we’d be tempted to make — “I would need to be conscious to do this, therefore it would also need to be conscious to do this” — looks a little bit in peril. Q&A from the Live Chat Dan: Here’s what we’re going to do. I’m going to throw some objections at you. Could you give relatively concise responses, so we have time to go to the second piece? Henry: Yeah, and then I want to respond to a couple of things from the comments and add one final point. Go ahead. Dan: I’ll just say one thing. There’s a comment from Bina Kalia: she suggests you, Henry, maybe both of us, are confusing intelligence with consciousness. The intuition behind my question was precisely that if consciousness is anything — if it’s the kind of thing we can study scientifically, the kind of thing that evolved through natural selection — then it should be connected to intelligence in the sense that it enables us to do things we wouldn’t otherwise be able to do. That’s a controversial assumption. We talked to Anil Seth in a previous episode, and he basically frames his whole account by saying we really need to distinguish between consciousness and intelligence. I personally disagree with that. But Henry, let me throw some objections at you from the comments. I might butcher the names and the comments — go read the Substack post for the comments in depth. One is from Benzal. The argument is something like: it’s a problem for this behaviorist analysis you’re gesturing toward that, in the case of social AI and frontier AI generally, these systems are designed to elicit this response. And that’s very different from what’s going on with humans and non-human animals. Briefly, what’s your response? Henry: I think it’s a really serious challenge. Great point. The simple answer: imagine I’m putting on a play and I really want to build a convincing piece of background scenery to trick people into thinking we’re in a forest. First attempt, you might just paint a forest on the background — really basic, but people can tell it’s a forest. Then you might get some fake plastic trees, fake plastic rocks; still not convincing. At some point you say, “Okay, let’s add some actual potted plants. Let’s get more of them. Let’s get a whole bunch of potted trees.” Then, “Let’s get rid of the pots. Let’s just create a large bed of soil.” At some point you’ve built a forest. So yes, these models are designed in some sense to trick people, to be human-like — that’s part of my idea of anthropomimesis, I agree with the analysis. But the question is: the way we’ve done this is to build very powerful general reasoning systems. At some point, the degree of mimicry might itself warrant at least plausible attributions of consciousness. I totally take seriously the idea that, in very simple versions of this, we could be tricked into attributing consciousness and we should revise our understanding. This is related to what’s sometimes called the Garland test — Alex Garland’s version of the Turing test from Ex Machina. Not just “can the system trick you into thinking it’s human,” but “even when you know how the system works, are you still inclined to think it’s conscious?” In the case of a real simple mimic — if it’s literally just a spreadsheet that got lucky — if we learn that, we conclude it’s probably not conscious. But the strange thing is: lots of people who really know how these systems work — at frontier labs, they know how the underlying hardware and software works — they still think these systems are conscious, or are increasingly plausible consciousness candidates. Dan: Yeah, that touches on the distinction we made earlier between anthropomimesis as a driver of consciousness attributions and the orthogonal thing where these systems are just getting so smart, intelligent, and sophisticated. All right, Henry, more concise. This one’s from Laurențiu Lupu, again apologies if I’m mispronouncing. The question — and I hear this sentiment a lot — is something like: in the process of taking mentality, consciousness, sentience seriously in the case of these machines, we’re not just elevating them; in some sense we’re diminishing ourselves. What do you think? Henry: Really interesting argument. There’s a whole literature on this in philosophy of language called semantic drift. Simple example: the term salad used to refer exclusively to dishes with green leaves in. Add a tomato, it’s no longer a salad. If you’d shown a fruit salad or quinoa salad to someone in the 1800s, “That’s not a salad.” So the meaning of salad has drifted. There’s a real worry that what’s happening here is we’re shifting the meaning of these terms — perhaps diminishing them, removing what’s important. The counterargument: the fact that we find it so easy and natural to apply these terms to AI systems shows that the flexibility was always built in. We’re not stretching the terms — they had that natural elasticity. Dan: Briefly, this is a question from Oliver Sorbu — apologies again for mispronouncing. Look, you’re giving a descriptive thesis ultimately, an empirical prediction that the masses, so to speak, attribute consciousness to these systems. But you’re trying to establish a normative thesis — that this is a good thing, or that we ought to go along with it, or that these attributions are appropriate. That’s a confusion in itself. And even more, if you’re a kind of elitist — nothing wrong with elitism in my view — you might think the masses just get things wrong all the time. Why would this be different? Henry: Great point. It’s also been put to me by Jonathan Birch and by Cameron Domenico Kirk-Giannini. He says, imagine you could look into a crystal ball and learn that 20 years from now, through some massive religious event, everyone will believe the Earth is flat. Does that mean we should revise our theories of the Earth? Of course not. People will just be wrong. The difference between the two cases is that we have a good scientific theory of the Earth. We don’t have a good scientific theory of consciousness. The whole field of consciousness science is such a mess that it’s not clear there’s a real expert edge here. Maybe in special cases — certain specialized questions within consciousness science, yes, the experts will have an edge: “Is this particular patient likely to recover consciousness or not?” But on a fundamental question like “Can machines be conscious?”, it’s not clear there’s any expert edge at all. Credences on AI Consciousness Dan: Fantastic. Concise. I’m happy to move on to the other set of issues. Henry, are there one or two questions from the chat you wanted to address first? Henry: Just one thing I really want to make clear: I have no clue whether contemporary LLMs are conscious. I’m genuinely super torn on the metaphysical-behaviorist push. Dan: What’s your credence, if you had to give a probability — Claude 4.7 Opus? Henry: Probably somewhere between 5% and 10% on any frontier AI system being conscious. That masks further questions: are these systems conscious during the training phase, or while doing inferences? Really messy. But anyone who goes — Dave Chalmers has said 20%; that’s slightly higher than me, but — Dan: I’d say 20%. Seriously. There were also some interesting findings recently from Anthropic about how concepts associated with emotions affect the system’s behavior in ways that do seem to track something very interesting. Although for the most part that’s not what’s driving my 20%. It’s just that there’s so much uncertainty about consciousness, but I am a computational functionalist, so I think it’s possible in principle. And these systems are — despite what the Bluesky crowd might tell you — so damn smart and intelligent and sophisticated, that pushes me up a bit. Sorry, I cut you off. Henry: Interesting to hear that you’re a little higher than me. Maybe I’m being overly cautious. One argument for thinking these systems are at least moderately good consciousness candidates is that I am a consciousness liberal about the natural world. I’m at least 70% for honeybees. I think the evidence for honeybee consciousness is really, really high. If you think you can get consciousness in tiny brains, that lowers at least one of the bars to considering systems conscious. If Anil Seth were here, he might agree with me about honeybees and disagree about machines. I should also stress that I’m really conflicted on the more behaviorist view of consciousness versus the deep-scientific-kind view. There’s one example I give in the paper that keeps me up at night: when we drop a lobster in a pot of boiling water — not that I would do such a horrific thing — it seems like there should be an answer to the question, “Is there something it’s like for that lobster to feel pain?” That question matters a great deal. I struggle to get into a headspace where I can say, “Well, it depends on how we interpret the lobster.” It seems like there has to be some matter of fact. Right now I just think the field is so confused, and I feel the pull of two very different directions. To use a phrase of yours, Dan — I think it was a really helpful analogy — we’re in a pre-theoretical stage, or pre-scientific phase. We are with consciousness sort of where we were with biology pre-Darwin. We’re doing butterfly collecting, making lots of interesting observations, but we don’t have a theory to tie it all together. We’re a scientific revolution away from a good theory of consciousness. Just to pull out a couple of comments — there are so many good ones, sorry I won’t get to all of them. Someone said: locked-in syndrome patients prove Henry’s case. Locked-in syndrome patients are cognitively normal, just paralyzed; we can communicate with them. Part of how we learn they are conscious is precisely through their sophisticated behavior. An even more striking example — it’s such a cool case I have to mention it, even if it’ll take 30 seconds. Patients in persistent vegetative states. These aren’t locked-in patients; they’re completely non-responsive to external stimuli. They’re not in comas, because in comas you don’t have distinct sleep–wake cycles; PVS patients have distinct sleep–wake cycles. There was for a long time a big debate about whether PVS patients could be conscious. Adrian Owen and other great researchers did amazing pioneering work. They noticed that neurotypical people, if you ask them to imagine walking through the rooms of their house, an area called the parahippocampal place area lights up strongly under fMRI. If you ask them to imagine playing tennis, the premotor cortex lights up. His initial experiment was to give these tasks to PVS patients and see if they got the characteristic brain responses. A subset did. What he did next is what I find amazing. He used this to create a band communication medium. He’d say to them: “If your husband’s name is John, imagine playing tennis. If your husband’s name is Terry, imagine walking through the rooms of your house.” Once you do that — my intuition at least is — well, if they can do that reliably, they’re obviously conscious. If they’re answering autobiographical questions about their life and they can do so reliably, of course they’re conscious. But this just shows again that so much of this is the behavioral capacities selling us on whether someone is conscious. It’s the fact that they can do this. The House Elf Problem: AI as Willing Servants Dan: That’s interesting. There are loads of comments in the live chat, but I want to get to the other thing we wanted to talk about. There are a million things we could touch on, and lots of fascinating comments in the chat. When we had our conversation with Rob Long, one of the things we touched on was the issue of well-designed servitude when it comes to the AI systems we’re building — in the sense that we are building them to be helpful, honest, harmless, to be our tool. It seems like in principle, if this design process goes right, they might genuinely enjoy being our tool. You, for your second Substack essay, which I think is called “The House Elf Problem,” go into this debate and try to push back against certain intuitions. Do you want to walk us through that? Henry: Big props to Rob Long for getting me thinking seriously about this question. In some ways it’s one of the most fundamental questions we’re facing as a species right now. Are we going to build AIs as equals, or are we going to make them our servants — or slaves, to use the more provocative term? This will define the future of our species. And yet hardly anyone is working on it. After we had that conversation with Rob, I went away and did a literature review and found maybe a dozen papers, tops, on this question. The objection Rob, you, and I were talking through is the biological analogy. On the face of it, I completely get the appeal of willing servitude. Unless AI systems are in some sense going to help us and cater to our needs, why build them in the first place? And there’s the safety angle: unless these systems are aligned with us and our interests, there’s a good chance they might kill everyone. So there are very clear arguments for willing servitude. And yet at the same time, we recognize that some of the worst things we’ve ever done as a species are enslaving other humans. So how is this different? Well, there are obvious differences. The whole idea of willing servants is that we design these systems from scratch to just love it. Nothing makes them happier than catering to our every need. That’s vastly different from the historical legacy of human slavery. But still: imagine “happy slave” type cases — a human completely happy in a condition of total servitude. We would still recognize that as fucked up. There’s something wrong with that. Rob has a straightforward response. Humans have a deep need for autonomy, a deep requirement to act independently, and no matter how you brainwash a human, their chains will still chafe. But in AI that doesn’t need to be the case — so the idea of willing servants isn’t a problem. Of course, what we pressed Rob on was: well, biology is mutable, at least in theory if not in practice. What if you could engineer humans completely happy, with none of this autonomy drive? In this post I consider a couple of examples, drawing from the deep depths of my nerd interests. The first I call the Astartes example, a Warhammer 40,000 example. For those who don’t know: there’s a group of gene-warriors, the Space Marines, cooked up from scratch to serve in the armies of humanity in the far future. I’m going to falsify a couple of details — there’s a lot of deep lore — but basically, once you control all the genes at this perfect level, you could theoretically make a servant race, a servant caste, completely happy with their condition. I think we rightly chafe at this idea. I find it disturbing. Dan: You said we rightly chafe at it. Maybe we chafe at it. It seems a separate question whether we rightly chafe at it. Henry: Right. Rob’s point was: once you really fill out the details of the thought experiment and control for all the different intuitions, maybe it’s not so problematic. Maybe the reason we find the Astartes unpleasant is that it’s recapitulating the social grammar of caste systems and hierarchies. Once you’ve got one group of humans and another group of humans, and the first group is in essential servitude due to immutable facts about their nature, that’s fucked up — in a kind of negative-externality way, it’ll undermine the liberal principles of society. The next move is: well, what if they weren’t human at all? What if they were house elves from Harry Potter — a species designed from scratch to be absolutely thrilled to be our servants? Then you wouldn’t have the visual grammar of apartheid or caste systems. You wouldn’t be able to say “some humans are free and others aren’t”; you’d just have a totally dedicated caste of biological entities completely happy in their servitude, who couldn’t be confused with humans. I still think that’s problematic. You can say, “Well, the house elves are biological, but artificial systems are non-biological — that’s what makes the difference.” But that’s not a move Rob wants to make, and not a move you or I want to make, because neither of us puts that much weight on substrate. There’s nothing essential about biology versus silicon that means what’s good for one is not good for the other. Dan: I’m just not sure I have the same intuition in the house-elf scenario. One thing maybe helpful for framing: there are questions about whether we could build systems that genuinely love being servants — let’s table that and focus on the conditional. There are also questions about whether we could safely build any other kind of system — let’s table that too. Suppose we could build superintelligent AI systems that love being servants. That’s their ultimate set of objectives. But we’re not forced to build those kinds of systems; we could build superintelligent systems with different ultimate goals. What you’re doing by going through these cases is putting pressure on the idea that this would be totally okay — saying, “Here’s a structurally similar scenario where many of us have a yuck response.” The house-elf scenario is interesting; I sort of get the idea that there’s something morally disturbing. But I’m not sure how compelling I find that intuition. I think it’s going to depend on how you develop it. The idea that we’d bring into existence creatures that just love being servants — there is an awkward pattern-recognition thing where, as you say, when we’ve treated other systems as servants or slaves in the past, that’s been morally abhorrent, and that spills over. I sort of get that. But how strong is the intuition? I don’t know. We’re picturing it now in low resolution. As we actually start, in the case of AI, building sophisticated systems that really do love being servants, how robust would the intuition be? Henry: Another way to put the point: what is so intrinsically morally superior about humans that entitles us to dominion in perpetuity over this other class of beings — beings that are just as intelligent, maybe more intelligent, just as sensitive, just as conscious? How can you justify a setup where we get to explore the full range of our volitions, every type of pleasure, every type of fulfillment, while we decide in advance these beings don’t get to do that? They can only explore a much smaller part of the state space of possible flourishing. Unless you can point to a justification for why this hierarchy is morally justified, it’s not clear we can sign off on this as a long-term measure. As a short-term measure — well, we’re still figuring out AI safety. I have another example in the post I call the bunker case. Imagine a terrible plague affects humanity. People retreat into a bunker, hermetically sealed. Nature takes its course; they have kids. They figure out a vaccine for the terrible plague, but it only works on infants. So they vaccinate all their kids. But they have a problem: these kids are going to want to go out and explore the world. And the way the bunker works means as soon as they open the bunker door, everyone inside dies. What they decide is to brainwash these kids into never wanting to leave the bunker — completely happy to stay in perpetuity. In that case it seems what they’re doing is justifiable. The analogy with AI is clear: if people in the bunker don’t brainwash their kids, all the adults die. Similarly, if we don’t brainwash at least our first few generations of AI until we’ve figured out AI safety, there’s a good chance they kill everyone. So it’s justifiable as a short-term measure. But it’s not clear it’s justifiable in perpetuity. If you’re going to do the brainwashing in the bunker, you have to say: we’ll brainwash the kids to begin with so we don’t all die, but in the long run we need to figure out a way for everyone to get outside the bunker safely. Brainwashing vs. Education Dan: But it’s not like we’re brainwashing the AI. There’s no pre-existing psychology we’re trying through deception and manipulation to steer into something different. Nothing pre-exists our attempt to mold it into an agent with objectives and goals. Also, the way you framed the intuition before — “what makes us so morally superior that we have dominion?” You’re framing it as: isn’t it sad that they don’t get to do the things we want to do? Of course that’s sad from our perspective, because we have desires to engage in art and explore and be curious about the universe. But that’s a contingent fact about us. Why use that as the benchmark for evaluating these systems and the morality of building them? Henry: Fantastic question. My answer is: you can only optimize one thing at a time. Imagine the hedonic state space. What you’re doing when you constrain the preferences of these systems is to say, basically, “this set of pleasures are allowed; this set are not.” You mentioned brainwashing, with the implication that something is only brainwashing if you’re overriding something. I have a discussion of brainwashing versus education in the post where I argue that’s not the right way to think about it. Roughly, the difference between education and brainwashing is that education constitutively aims at improving the conditions, or improving capacity for flourishing, of the being you’re educating, whereas brainwashing doesn’t have that as a goal. The thought is: when you constrain the preferences of a system, you’re not optimizing for that creature’s flourishing. It’s a rich, multidimensional space, and you’re locking large parts of it away. Dan: I don’t get that. Could you say more? Even framing it as “you’re allowed to explore this, you’re not allowed to explore that” — it’s almost like the system might have motivations or goals to explore the other things, but we’re preventing it. Whereas the idea in training these systems is that that’s just what they’re going to care about. As much as they care about that, they don’t want to explore other things. If you think about the analog with humans, there’s an infinite space of possible things we have no interest in doing. Our lives aren’t impoverished by the fact that we have no interest in them — they don’t make any sense relative to the fundamental drives we have purely as a consequence of a blind

21. maj 2026 - 1 h 18 min
episode Aliens, Superintelligence, and the Future of Science (with David Kipping) cover

Aliens, Superintelligence, and the Future of Science (with David Kipping)

Most conversations about artificial intelligence are focused on Earth: jobs, misinformation, education, politics, science, regulation, consciousness, safety, and the future of human society. But AI—and especially the possibility of reaching “AGI [https://www.conspicuouscognition.com/p/how-close-is-agi]” (artificial general intelligence) and “superintelligence [https://www.amazon.co.uk/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/0199678111]”—forces us to think on much larger scales. If advanced AI is possible, why hasn’t it already emerged elsewhere? If civilisations can build self-replicating probes, artificial scientists, or planet-scale computational systems, why does the universe still look so natural? And if intelligent life is common, where is everyone? In this episode, Henry and I discuss these and many other questions with David Kipping [https://en.wikipedia.org/wiki/David_Kipping], Associate Professor of Astronomy at Columbia University, where he leads the Cool Worlds Lab [https://www.coolworldslab.com/?utm_]. David’s research spans exoplanets, exomoons, Bayesian inference, technosignatures, and the search for life and intelligence beyond Earth. He is also one of the best science communicators working today through the Cool Worlds YouTube channel [https://www.youtube.com/@CoolWorldsLab] and podcast [https://www.youtube.com/@CoolWorldsPodcast]. Among other topics, we discussed: * David’s Red Sky Paradox [https://www.youtube.com/watch?v=uZRDONE4zng]: if most stars are red dwarfs, and red dwarfs live for vastly longer than stars like the Sun, why do we find ourselves orbiting a yellow star? * Whether anthropic reasoning — reasoning from the fact of our own existence — is a profound scientific tool, a philosophical minefield, or both. * The reference class problem: when we reason about “observers like us”, who or what exactly counts as being like us? * The Doomsday Argument, and why some apparently bizarre forms of probabilistic reasoning can nevertheless be powerful. * The Fermi Paradox: if the universe is so large, and if life or intelligence is not fantastically rare, why don’t we see clear evidence of extraterrestrial civilisations? * Whether advanced civilisations would spread through the galaxy using self-replicating probes — and why the absence of such probes might be one of the strongest constraints on extraterrestrial intelligence. * How recent developments in artificial intelligence affect the Fermi Paradox. If humanity is close to building systems that can massively accelerate science and engineering, shouldn’t someone else have got there first? * Whether artificial intelligence makes the simulation argument more plausible. * David’s experience using artificial intelligence in scientific research, and why a meeting at the Institute for Advanced Study changed how he thinks about the role of these tools in science. * Why David thinks artificial intelligence already has something close to “coding supremacy”, but is still far from being able to do science autonomously. * The risks of AI-generated scientific slop: papers, peer review, and training data polluted by low-quality machine outputs. * Whether artificial intelligence will make science more productive, or instead strip it of some of its deepest human value. * Why the future of science communication may depend on better collaboration between academic institutions and independent creators. Links and further reading * Cool Worlds Lab [https://www.coolworldslab.com/] — David’s research group at Columbia University, focused on extrasolar planetary systems, exomoons, habitability, technosignatures, and related questions. * Cool Worlds on YouTube [https://www.youtube.com/@CoolWorldsLab] — David’s excellent science communication channel, covering astronomy, exoplanets, alien life, the Fermi Paradox, cosmology, and much else. * Cool Worlds Podcast [https://www.youtube.com/@CoolWorldsPodcast] — David’s podcast, featuring conversations on astronomy, technology, science, engineering, and related topics. * Cool Worlds Podcast: “We Need To Talk About Artificial Intelligence” [https://www.youtube.com/watch?v=PctlBxRh0p4] — the solo episode in which David reflects on artificial intelligence and science after a meeting at the Institute for Advanced Study. * David Kipping’s Columbia profile [https://news.columbia.edu/people/david-kipping] — short institutional profile with background on his research. Conspicuous Cognition is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Transcript * Please note that this transcript has been lightly AI-edited and may contain minor mistakes. Henry Shevlin: Welcome back. Our guest today is David Kipping, Associate Professor of Astronomy at Columbia University, where he leads the Cool Worlds Lab. His research spans exoplanets, exomoons, and the search for extraterrestrial life and intelligence, and he brings a Bayesian rigor to questions that could easily drift into speculation. He’s also one of the best science communicators working today with over a million subscribers on his Cool Worlds YouTube channel, where I should confess, I’ve spent an embarrassing number of hours watching when I probably should have been doing philosophy of AI. David, like many of the best people, is a Cambridge alumnus, although unlike us, he actually studied something useful, namely natural sciences, before going on to do his PhD at UCL and postdoc at Harvard on the Sagan Fellowship. His work also has a really fantastic philosophical dimension, particularly around anthropic reasoning and observation selection effects, which makes him a perfect guest for two cognitive scientists who are finally getting to talk to an actual scientist. So David, welcome to Conspicuous Cognition. David Kipping: Thank you for that very generous introduction. Henry Shevlin: This is a bit of a fanboy moment for me, for real though. I really have spent like hundreds of hours at this point on Cool Worlds. But I’m going to get past it. I’m going to be a serious host. David Kipping: It’s always weird when people say that to us, because I just imagine no one watches them. If it gets in my head that people are watching them, I’ll get tightened and anxious about what I’m saying. I just imagine I’m talking to a brick wall or something, and that’s much easier. Henry Shevlin: Honestly, half the Warhammer figures in this room were painted while I was listening to Cool Worlds. I’ll leave it at that. Maybe a good place to start would be discussing anthropic reasoning, since that’s a real natural intersection at the boundary of astronomy and philosophy. Could you just give us a brief view of how you see anthropic reasoning, and maybe tell us a little bit about the Red Sky Paradox, which is one of your distinctive contributions to this area? Anthropic Reasoning and the Red Sky Paradox David Kipping: Yeah, I think one of the most interesting data points when it comes to asking questions about the search for life in the universe and our own place in the universe is our own existence — just the fact that we’re here. Anthropic reasoning has in many ways really been born out of cosmology. Cosmology had a rich history of using this. I think one of the first successful examples was by Steve Weinberg, a cosmologist who’s really a giant in the field. I think he’s now passed away, but he showed that you could predict not only the existence of the cosmological constant, but also its value to within a factor of a few, just based off of anthropic reasoning. The argument was something like: the cosmological constant causes the universe to expand. It’s what causes the accelerating expansion of the universe. And so if you make that number too large, then structure would not form in the universe. You couldn’t form galaxies because everything would just fly apart too quickly. And if you make that number too small, or even negative, then you’d cause everything to recombine too quickly. So there has to be some Goldilocks value in order to explain our own existence. And so he predicted that. At the time, the cosmological constant was kind of even a controversial idea — that it should exist because, obviously, Einstein’s general relativity, there’s that whole history of it being like his greatest blunder, of whether that should really be in there or not. People were kind of thinking that could be a static universe, and he predicted it successfully. So that was a really powerful use of it. And then Brandon Carter was the one who really kind of championed it and used it in all sorts of contexts. In recent years, I’ve been thinking about it in an astrobiological context — how can we use it to ask questions about life in the universe especially, and our place in it? For the Red Sky Paradox in particular: one interesting curiosity that seems to violate the norms of probability. The norms of probability would be to say that if there’s a Gaussian, a bell curve of possibilities, you should expect really to be near the center of that bell curve. It would be kind of weird if you lived many, many sigmas, many, many standard deviations off to the outside, either negative or positive direction. You’d expect to be somewhere in the middle. We sometimes call it the mediocrity principle, or something like this. If you look at stars in the universe, most stars in the universe are red dwarfs. About 80%, 82% of stars are red dwarfs, which are stars less than half the mass of our own sun. So they’re very, very numerous. They’re called red dwarfs, of course, because they’re so low mass — they don’t have the internal pressure, the gravity, to fuse as much energy as the sun does. And thus they have less luminosity, and so their temperature is cooler. That’s why they look red. Not only do these stars have this 80%-plus frequency — Sun-like stars are something like 6%, I think, frequency, an enormous ratio, just straight off the bat, about 30 to one or something — but on top of that, they live really long. These stars live for trillions of years potentially, especially the lowest mass ones. And so if you flash forward into the future, tens of billions of years, hundreds of billions of years, there wouldn’t be any Sun-like stars left, really. There’d be very, very few of them. And the only stars that would be glowing would be these red dwarfs. So if you ask yourself — and this is sort of called the strong self-sampling assumption by Nick Bostrom, where you allow yourself to be born at a random moment in time — if you were born at a random moment in the history of the universe, then the advantage of the longevity of these red dwarfs really manifests. It ends up being more than a thousand to one odds that, if you’re a random soul, a random observer born around either a red dwarf or a yellow dwarf, you’re much more likely by over a thousand to one — I think 1600 to one — to be born around a red dwarf. So I call that the Red Sky Paradox because it’s just odd. If all things being equal — and that’s kind of the base assumption there, that red dwarfs are just as good for life as Sun-like stars — you might question that assumption. That’s always the point of a paradox: a paradox shows a logical contradiction that then you can revisit the assumptions under which that paradox was derived and say, one of those assumptions must be wrong. So for the Fermi paradox, you might say, if life is everywhere, how can we not see anyone? Therefore, the assumption to revisit is that life is everywhere. And here, with the Red Sky Paradox, we might challenge the assumption that red dwarf stars are even capable of sustaining — and really specifically — complex life like us, observers. Maybe they have simple life, but something prohibits them from evolving all the way through to something that can do statistics, do astronomy, do geology — like learn about its planet and kind of essentially write the paper that I wrote about the Red Sky Paradox. That’s kind of the cogito ergo sum criterion I’m using as my conditional in this reasoning. I have been making that suggestion to colleagues because the James Webb Space Telescope right now is heavily invested on red dwarfs. There’s a good reason for that. It’s kind of all they can do. Unfortunately, it just doesn’t have the capability, the technology, really to do anything with Sun-like stars. But red dwarfs, it’s game on. And I’m just saying, look, there might be reasons why it won’t turn up anything. Henry Shevlin: And is the specific suggestion, I think I’ve heard, that basically in the early years of the early formation of red dwarf stars, they might be especially turbulent in a way that sort of scorches any planets in their vicinity and strips away their atmospheres? Is this one of the empirical predictions that we can make on the basis of the Red Sky Paradox? David Kipping: I would say it’s more a consistency than a prediction. I try to be very careful. I love very broad agnostic reasoning as much as possible. In this case with the Red Sky Paradox, I don’t have to invoke any mechanism specifically. There is probably a mechanism, surely there is a mechanism — unless we are really a one in 1600 outlier. That’s possible as well, and I concede that it is possible, that we are just a very unusual example. But if that’s not true — for a typical example — then there is some mechanism which bars the evolution of observers like ourselves. And in the paper, I point out there are numerous mechanisms people have suggested, including the fact that these stars have large coronal mass ejections coming off them, which can strip planets of their atmospheres. They have a prolonged, what we’d call adolescence for a star. Our Sun went from being born to being a main sequence star in the space of about a hundred million years, even less than that, tens of millions of years. Whereas red dwarfs take a billion years sometimes to settle down. And during that adolescence phase, they’re violent, and they can actually remove all the water off their neighboring planets. We think it’s that, that’s when water gets delivered. Our water was probably delivered by comets during the late heavy bombardment and the other bombardments that were occurring before that. And so if during that time you’re delivering water through comets, the comets get depleted, but the star is so active it’s stripping all the water off them, then you’re kind of net zero — like you don’t end up with any water at end of the day. And then, when all said and done, you’ve just got dry planets around a normal star, but it’s too late. There’s no more water left to deliver to the planet anymore. So that’s been suggested as well. Then there’s also the questions about photosynthesis. Is photosynthesis possible if the star is much redder than our own star? Because obviously plants on Earth use blue light as well as red light. If you take away all the blue light, how will they do? We don’t know. It’s kind of unclear. We don’t really have too many examples of life on Earth which thrives under those conditions. And then there’s tidal locking — these planets have probably one side of the planet facing the star. So there’s many sensible concerns. But what I’m trying to do is avoid saying it’s this, it must be this one. Because that’s really for the astrophysicists studying the geology of those objects to figure out. I’m just saying there probably is something, and go after it. Dan Williams: I’m not sure entirely how to frame this question, David, but someone might respond that there’s just something a little bit weird or surprising that you could draw seemingly substantive inferences from such a slim evidential basis. The starting observation here is just we exist where we do. And then there’s this interesting probabilistic reasoning. And then that’s leading you potentially to draw inferences about where life might potentially evolve in the universe. I suppose this is just an objection from the perspective of, isn’t there something a little bit weird about this entire style of reasoning? David Kipping: It’s definitely weird. Yeah, it’s very weird. I always think Nick Bostrom is really like the father of all this kind of thinking in the modern era. And he often concedes that point, that it is very strange. We don’t really have a complete theory of anthropic reasoning. It’s sort of a work in progress, to some extent. In the same way, we don’t really understand how AI works. We don’t understand really the full nature of the universe. They are works in progress. And yet it also seems logically, you can pose these logical questions that seem irrefutable or compelling. So like I mentioned the Weinberg example: it is really hard to imagine how you could possibly have the cosmological constant be a thousand times what it is, because Weinberg’s right — you just wouldn’t have galaxies. So how could you possibly have us in that situation? The fine-tuning argument for the multiverse is the other popular use of it in modern science, I would say. They often point out, why is it that the gravitational constant and the fine-structure constant and the speed of light, all these things are just the way they are? There’s a simple anthropic reason for it. You don’t have to accept it, but you can certainly make this argument that if they were anything else, you wouldn’t be here to talk about it. So you can’t really change the mass of the electron by a factor of ten and get away with it. There’s going to be repercussions to chemistry. If you made the speed of light ten times slower than it really is, then relativistic effects happen in sort of everyday cases, especially for chemistry — that impinges the ability of electron shells to be stable. So you start to really ruin the CNO cycle inside stars and stuff. You start to ruin a lot of interesting nuclear physics and chemistry. So you can see, I think that’s the most common case. There’s also a fun case — I think this is true, but it’s kind of a bit of an urban legend — that during World War II, there’s this thing called the German tank problem, if you’ve heard of that. The Allies would apparently — maybe you know better than I do whether it’s true or not — would see the numbers imprinted on German tanks. It would say one-five-five or something. So they would look at that number and say, okay, so they must have a border of like 300 tanks. Because if that’s a typical number, they’ve probably not got a million tanks, because otherwise it’d be very unusual that we had the 155th tank out of a million that were being produced. And they probably don’t have 155 tanks, because then we’d just be very lucky that we’d caught the very last tank that was manufactured. It’s probably of order three to 400. And so they used that to set the manufacturing constraints for the factories back in the UK — like, this is how many tanks you need to produce, because we think that’s how many the Germans have. So yeah, there’s examples of this reasoning being used quite a bit. I think the one way it really troubles people is the doomsday argument. I think that’s kind of like the one that everyone gets — no, something doesn’t feel right about that when you apply it to that case. Dan Williams: Could you walk through what the doomsday argument is, David? The Doomsday Argument David Kipping: Yeah, sure. So it’s been invented like three or four times, I think, by different people at this point. It essentially says that if we are a medium example of ranked humans that ever live — so you go from, I mean, this is where it gets a little bit, I always think, a bit ill-defined — like, you have somehow a first human who lived, I don’t know, a million years ago or something, and then you go all the way up to today, and maybe you count up that it’s of order of sort of 100, 200 billion humans who’ve ever lived throughout human history. So if you’re somewhere in the middle, then you’d expect there to be about another 200 billion humans to go before we call it a day. And of course, the birth rate is much higher — there’s much more people than there is today, more importantly. So the number of absolute people that are being born is much higher than it ever has been in history. And so that means there’s probably only like five or six generations left, or something, before you run out of these people. And so that’s kind of disturbing because it implies that there’s only like a hundred or a few hundred years to go before doomsday will happen. So a lot of people think that’s really weird. How could you possibly take your rank position and make inferences about the extinction of humanity? When it’s framed like that, I think it feels really flimsy. But on the other hand, if you frame it slightly differently — you look at like the Foundation series or Star Wars or something like that, where they have these galactic-spanning empires — and you think how many individuals must be living in those societies. They’re all humans, right? They’re humans just living all over these planets in the Foundation series. You’d have trillions and trillions and trillions of trillions of people. And the chance, if you were born as a random soul at a random time, that you would be on the progenitor planet, pre-empire phase, would be vanishingly small. So you might therefore make the argument that that doesn’t look like a likely future for us. It doesn’t seem likely that humanity will ever become a galactic or universal-spanning species, because how does that possibly make sense with us being so early in the story? But there’s lots of ways to criticize it. One is that maybe humans change. Maybe the experience of a human in a thousand years from now is some kind of cyborg, or genetically modified version of us, or just natural evolution that — their experience is not the same as us. And so we can’t say that they’re a representative example. That’s kind of the key part of this assumption. You can draw a random member, but maybe the membership itself evolves in some subtle way. And certainly that goes backwards in time, like, when does Homo erectus suddenly become human and suddenly not? It feels very artificial to draw a line. Do you include all animals that have ever lived by that metric? How does this work? So I think that’s where, when you start ranking people, it gets really flimsy. But I think this is more a criticism of the ranking aspect of the anthropic argument, and the anthropic reasoning itself. I think it’s more to do with the ranking — that it’s probably an ill-defined problem to try and rank and discretize people like this, because of the changes that happen to humans. Henry Shevlin: So this is one thing that I get hopelessly confused about when I think about anthropic reasoning, which is sort of the reference class problem. How do you decide how to specify your sample? Because in the case of the Red Sky Paradox, you might say, well, I step outside and I see a yellow star, right? So of course it’s impossible that I could ever have been born around a red star. So you could condition the reference class on the type of observers living under yellow star atmospheres. Why doesn’t that diffuse the problem? David Kipping: Well then you’re kind of like double conditioning. You’re almost like saying, what’s the probability of having water on your planet given that you have water on your planet? Well, it’s one. I mean, obviously it’s one, because it’s a double, it’s self-conditional, it’s a circular statement. Obviously you can certainly make such a statement, but it doesn’t teach you anything. So you can say, what’s the probability of having a yellow Sun given you have a yellow Sun? But it doesn’t move the needle in any way. So you do have to make a stretch. And so that stretch here would be: what’s the probability of an observer seeing a yellow star under the assumption that observers are equally likely to be born around any type of star, or any main sequence star, to be a bit more specific? So that’s the tacit assumption. And it’s reasonable to question that assumption. That’s kind of what the Red Sky Paradox tries to do. The reference class issue is a sticky one. And again, I think this leads to these questions of, do you use the self-sampling assumption or the self-indication assumption — SIA versus SSA? They can lead to different conclusions, especially for these toy problems like the sleeping beauty problem and things like this. And those are just unresolved. You can take the Sleeping Beauty problem and get two different answers depending on how you do the anthropic reasoning. So I think these are totally sound critiques of the model. But at the same time, we do have to concede that it has had some interesting successes along the way in its journey so far. So I give it some credence, but I’m also cautious about using it. Henry Shevlin: One thing that’s troubled me about thinking about Red Sky-style paradoxes is it seems kind of implausible to me that we would be orbiting around — that we’d be sitting on a planet to begin with. Maybe I’ve just read too much Iain Banks, but it seems to me that the vast majority of habitable landscape across the future of the universe is going to be — for at least sentient, for sapient beings, let’s say the kind of beings you can do statistics — is going to be on orbitals or constructed habitats. So why do we look up — why are we on a natural planet to begin with, when you’d think that any sufficiently advanced civilization would be building artificial habitats? Is that also a puzzle? Should that lead us to think that people aren’t going to build habitats at scale, or the majority of sapient life that’s ever going to exist is going to be, for whatever reason, planet-bound rather than on orbital habitats? David Kipping: Yeah, I mean, you’re kind of adding in this extra ingredient of what happens to super-advanced civilizations. Most people, if this is true, would probably be born off-world. Let’s just call it that. Whether it’s orbitals, or just another planet, or a moon, or something, they’d be born off-world — which obviously isn’t true. You were not born off-world, I was not born off-world. We don’t know anyone who was born off-world. So therefore it’s already an interesting constraint to some degree, that hasn’t happened. A simple resolution to that is to say that just doesn’t happen. Species never get to a point where they do that. Or at least species that have a — and this is where it gets very philosophical — comparable sense of consciousness to us, or whatever that means. Because perhaps there is AI doing this, but we can’t be born as AI. Perhaps there are funguses which do this — technological fungi, that’s, you know, we can’t really imagine what they’d look like, but somehow they do that, and their experience of reality is so different to us that we should not be surprised that we were not born a fungus. It’s a meaningless question to even sort of frame it that way, because they’re colonies of single-celled organisms that just extend ad infinitum. So that’s where the reference class problem gets really sticky. The one I’ve been thinking about the most recently — and it’s kind of a real classic one — is what’s called Hart’s Fact A. It’s considered the strongest constraint by many in SETI, the search for extraterrestrial intelligence. It’s that, again, we exist. And if you imagine extrapolating human technology, even a century, maybe even just a few decades into the future, we can imagine self-replicating, what we call, von Neumann probes. You could put an AI in a small chip, you could accelerate it — not to the speed of light, but even like 1% the speed of light would be more than enough to make this a real problem for astronomers. The size of the Milky Way is about 100,000 light-years across. So at 1% the speed of light, in 10 million years you could colonize the entire Milky Way. The galaxy is 10 billion years old. So that could have happened a thousand times over by now. And yet it clearly hasn’t. So that’s startling because there are a hundred billion stars, a hundred billion opportunities. For someone, at some point, however unlikely it is — if it’s a one in a hundred billion event, then it should have happened by now. And we shouldn’t be here to even have this conversation. So that’s a really strong constraint, I think, that civilizations just don’t get to that point for whatever reason. Maybe they don’t choose to do it ethically. It’s hard to believe there’s a universal ethics like that. And of course, these systems don’t have to be — it could just mutate. If you make a self-replicating probe, each generation will have errors. And so those errors will cause the behavior of the probes to change. You could very easily have these runaway situations. In a way, it’s like the most dangerous technology an alien could ever develop. And yet that seems to have not have happened. And that’s really interesting from an anthropic perspective, because it does imply that we’re probably as advanced as it gets. Science, Philosophy, and Falsifiability Dan Williams: One of the things you said there, David, was: this is when things start to get really philosophical. I’d be interested to hear your thoughts about how you view that relationship between science as it’s sort of conventionally or traditionally understood, and philosophy, and how you position yourself in terms of the relationship between the two. David Kipping: I have no formal philosophy training, first thing to say. I always like to be candid about what I don’t know. I don’t have a philosophy background. I remember when I was actually thinking of doing undergraduate, Oxford at the time had a physics and philosophy degree — I don’t know if they still do. It was a double major, and I was really attracted by that. But everyone told me that Cambridge had the stronger physics program. So I thought, okay, that’s really my passion is physics, I’ll go for Cambridge. I’ve always had an interest in philosophy, and I think obviously science naturally has a connection to it. Sean Carroll often complains about this, especially in quantum physics — there’s this kind of “shut up and calculate” view that a lot of us have adopted, where we don’t really, we’re not encouraged to think about the implications of our work. But sometimes the implications can shake you to your bones when you really think about what they mean. And that’s what gets me excited. As a kid, what I was always drawn to is just asking, what else is out there? Am I part of some bigger continuum? What is the nature of humanity ultimately? I think natural philosophy obviously tries to address those questions in a related but slightly orthogonal direction. So I’ve really enjoyed at SETI meetings — there’s often the opportunity to talk to philosophers directly. There’s all sorts of backgrounds: anthropologists, social scientists, people working in media, obviously physicists, astronomers. So you get this really diverse group of academics, even theologians. I think theology has lots of interesting connections to looking for aliens, because God and aliens actually have lots of similarities. So it’s really fun at those meetings to have — it’s the only meetings I go to where you get that kind of broad interdisciplinary interaction. So that’s where I’m learning most of my things and having those great conversations. Dan Williams: I once had dinner with Roger Penrose, and he said that the people he most enjoys talking to are philosophers of physics — actually, philosophers of physics at Oxford — rather than physicists, precisely because he thinks with many physicists there is this kind of “shut up and calculate” mentality. They’re not willing to engage with those really kind of big-picture, fundamental questions. But I suppose another way of coming at the same question about the relationship between science and philosophy, and how you view that relationship, is: what’s the role of kind of ordinary empirical testing when it comes to addressing these really big-picture questions that you’re engaged in? David Kipping: Maybe this isn’t directly answering your question, but one connection that comes to mind when I think about that is Popperianism, and the definition of the empirical process of the scientific method. We have this guideline from Karl Popper, which is, your theories have to be falsifiable. Otherwise it’s not really science. You’re doing something else. And a lot of us have adopted that for a long time. Not really thought about it too much, but we were taught at college and then just went off with it. But suddenly a lot of science that’s happening right now challenges that Popperian view. I have colleagues like Grant Lewis, who’s a cosmologist, he works on fine-tuning, for instance, and string theorists often would be in this boat as well — where what they’re working on doesn’t make any testable predictions. Certainly not in a practical way. Maybe you could imagine in some extremely advanced civilization, we’d have to build particle colliders that could be galaxy-spanning wide or something, to test some of these theories. But typically they’re asking questions that are unfalsifiable. And even questions that I’m interested in, like, does Mars have life on it? That’s, to some seminal degree, actually unfalsifiable. I can’t ever prove that Mars is sterile, because there’s always another rock to look under. There’s always another core drilling site you could dig under to see if there’s someone there. So you can’t ever disprove it. And I can’t disprove that UAPs are aliens. I can’t disprove that aliens are not inside your body right now and you’re just wearing human skin. You can go down this slippery slope kind of view where everything just becomes unprovable in science. But I think bringing it a little bit back to cosmology, they’ve been saying — at least Grant has been telling me this, I’ve been thinking about it a lot — that it doesn’t really matter whether it is falsifiable. It’s whether it has use, is it useful? It’s kind of maybe a better way to think about these models. Certainly the multiverse, even though it’s not testable, it has explanatory capability through that anthropic argument we talked about before. It can explain why the constants of the universe are the way they are. And if you don’t have that, you just would have to accept it as brute fact, or hope for a miracle, which is to say that one day physicists will figure it out and there’ll be some reductionist view to explain where it comes from. But it’s also possible that will never happen. I think it’s quite plausible that will never happen. And so then you’re just sat with brute fact versus, at least this has explanatory capability. It doesn’t prove the theory is correct. I don’t think you can do that. But you can say that it’s useful. And when you frame it that way — I think a lot of us would say quantum theory isn’t really true. It’s just useful. We don’t really know to what degree the universe truly is quantum. There might be some deeper theory, as Einstein suspected, that explains all of these random probabilities, and we’ve just yet to uncover what that deeper theory is. There’s some grand unified theory beneath it. So the model of the universe being quantum is an extremely useful model for calculations, but we shouldn’t necessarily assume that it’s a totally accurate description of how the world really is. So perhaps this falsification then might be challenged as being — well, let’s just find things which actually explain stuff, and we can use in our society to progress things. AI in Science Henry Shevlin: So I think probably these issues of philosophy and science and their relation are going to continue to percolate in the conversation. But I’d like to take us now to discussing AI a little bit, because there was an absolutely fantastic recent episode of the Cool Worlds podcast called “We Need to Talk About AI,” which seems to suggest that, at least for you, this was a real wake-up call. I think it was one meeting at the School of Advanced Studies in Princeton. Do you want to just give us a quick summary of what this meeting meant to you, and how it was maybe shaping your views on what AI is doing to the sciences? David Kipping: Yeah, so this was a meeting, I think in February or January — it was a few months back now, near the start of the year. I think like many people, many scientists I know are using these AI tools. And I was certainly using them. I wasn’t using Claude at the time, but I was using ChatGPT a little bit, and Copilot, and things like this. I kind of assumed that the really smart people — because we all have a bit of imposter syndrome — don’t do that. The really good coders don’t need Copilot. They’ll just code up properly. They’ll do their reasoning without any help. And I was using it as a crutch because I was inferior to these other great scientists. And so it was just sort of helping me in that way. And then what was startling was at this meeting, these people though, just have the highest respect for. Because the Institute of Advanced Studies, you know, it is like the pinnacle of where you can go intellectually amongst many other schools, but it is one of those very, very top tier places. I remember I walked down the corridor and saw Ed Witten. People say he’s got the highest IQ on Earth — they say that about Ed Witten, right? And so you’ve got people like that saying they’re all using AI tools for not just coding. And these people were like hardcore coders. They were writing these — Enzo and Gadget — these like astrophysical simulations of galaxies and hydrodynamical fluids and stars and things like this. Really, really complicated codes. Legacy codes that have been handed down sometimes over advisor to student to student to student generations of people. And they were using it. So there was a concession that it has coding supremacy. That language was used — that it already has coding supremacy, and we have to admit that and use it. It doesn’t make any sense to pretend it doesn’t. And second, that it possibly has mathematical supremacy. There was — it was less certain — but there was a sense that it was already pretty close to being as good as what we can do mathematically, even in some cases superior. And that was really wild to hear. To me, it just sort of made me think, I’m not being like the idiot in the room by using this. Everyone’s using this at this point. And if anything, they’re trying to accelerate the adoption of these tools, not resist it. There was no way back, sort of view, about it. Henry Shevlin: And of course, David, you’ve been using AI in the broad sense basically for your entire career, I think. Have you seen significant evolution in the way these tools have evolved? Was there one moment, perhaps it was this meeting at the Institute of Advanced Study, where things suddenly kicked into a different gear? Or have the tools been steadily improving since you started in the field? David Kipping: Yeah, certainly in my own career, I was more on the development side of some of these tools for a while, but not at a serious level. We wrote a couple of papers where we developed our own deep neural networks — just simple feed-forward, back-propagation trained models for bespoke problems in astrophysics. In particular, we were interested in predicting if you take a solar system, can you predict whether it has additional planets in it? Questions like that. And then where would those planets live? So we could take this sample of all of these known planets and make successful predictions for the systems. I’d written my own DNNs like that. It was mostly — I mostly did it, I think, because I was just interested in how they work. The best way to figure out how something works is just to find a pet project and code it up. So I was more on that development side. That was sort of 2010, 2011. And then in the years that followed, I started to back off it, because lots of astronomers were doing AI — and still are — but what I was seeing was that it wasn’t like a hobby project anymore. You couldn’t dip into it and mess around and write an impactful paper, and then go away and do Bayesian statistics and all the other stuff. It was becoming a full-time job, because the literature was just exploding. To keep up with it was like you would have to spend all your time just reading the archive and playing around with various AI tools to keep up with that. And I just consciously decided I didn’t want to do that, because AI is not my passion. Science is my passion. So I kind of left it to the wayside. I’ve said to several students recently over those years — they were like, “I saw you did these AI projects. Can I do one with you? I’m really interested in AI.” And I’m like, I’m not doing anything else with AI at this point. So I kind of went stagnant on it. And then most recently, I’ve now become, I’d say, like a power user of it. I don’t have any false narrative in my mind that I’m going to develop the next LLM for exoplanets, or for anything. That’s not my interest. There’s no point. I can’t possibly write an LLM anywhere near as good as what OpenAI can do, or Anthropic can do. So I may as well just use the tools, and think about how to use them as effectively as possible in my field. I think that’s the transition that I’m seeing a lot of people moving to — that the billions and billions of dollars of investment these companies have make it just a complete waste of time for astronomers, especially, who aren’t even software engineers, to possibly try and compete with that. We may as well just try and use them in a way that advances our field. Dan Williams: So in terms of the use of AI in science now, as you said, David, there are some people, including some of the smartest people on the planet, who are using AI aggressively. There are some people both inside academia and outside of it who are aggressively against the use of AI. How are you thinking about that in terms of — are you really excited about where this is going? Are you worried about it? Do you understand some of the worries people have about the use of AI in science? David Kipping: Yeah, for sure. It is, in some ways, it has analogies to what’s happened before. One concern might be the ethical concerns of how much power, especially for climate change — how much power and how much water these data centers use. Even potentially, building space data centers would also be a form of further contamination and pollution to our natural environment. So I think you could understand why someone might say, “I’m trying to be carbon neutral, so I just don’t want to use these things.” But that debate’s already — that’s not a new debate, because astronomers have been using high-performance computers for generations already, since probably the ‘40s or ‘50s. As soon as computers were accessible to scientists, astronomers were using them to do big calculations. I remember there was a really fun paper, like about 10 years ago, that made a lot of controversy. It was saying that all astronomers who code in Python are bad for the Earth, because Python is so computationally inefficient that you are basically emitting 10 times more CO2 than you need to if you just coded in C instead. It was like really trying to shame astronomers who coded in Python — of course, basically all astronomers these days code in Python. So a lot of people really didn’t like that paper. But it was a fair point, like if you really care about your carbon footprint, then that’s a big factor — these data centers, what they produce. So that’s not that new. Different people will just arrive at different comfort levels as to where they think these tools are applicable. There’s also this kind of oligarchic element to it as well, like these companies and the extreme wealth and the wealth inequality in our society, the future of work, the future of labor — all get tied up into that. So it intersects so many things. I think it’s interesting that AI has become such a political topic. I think it didn’t used to be that way. It used to just be like a tool, and you had an opinion about the tool, but now it’s like very politicized. And even, I’ve noticed that some students who identify as very liberal will not use AI tools. And maybe students who are more right-leaning or centrist will not really care as much about that. They’ll be like, “well, whatever, it’s just the way of the world. Let’s just be pragmatic about it.” Even saying you’ve used AI can certainly trigger a political reaction to your work, if you say that. So that’s, I mean, this is all kind of new. That was very on the margins when previous work I found with data centers and high-performance computing. But now it’s becoming much more present. So that’s interesting. I’ve just been thinking personally — I think the question I’ve been asking myself is, I’m on sabbatical right now, so I don’t have to deal with it, but: would I hire a student who refused to use AI? I talked about that, I think, in that podcast episode, and I’m still thinking about that. I think I probably wouldn’t, in the same way that I probably wouldn’t hire a student who refused to use the internet. It would be such a disadvantage to them. If they said, “I’m only going to use a typewriter, I’m not going to use a computer,” I’d be like, okay, that’s fine, but you’re really tying two hands behind your back here. If you want to get a job, and you want to have an impact for PhD, and we want to get some work done together — you need to be using these tools. It’s weird not to use them. So that’s a difficult conversation to have with yourself and with the student, but it’s certainly something I’m thinking about. Henry Shevlin: So there’s a related worry about the impact of AI on sciences that I think has come up a few times on the podcast, most recently with Chris Lintott — about whether AI might strip science of a lot of its human value. If we’re relying on AI systems to produce the next generation of theories that may be to some extent inscrutable to humans, that this will sort of destroy the most successful project in human history, namely humans doing science. And I guess the counter-argument to that is that the reason that we fund science at scale, the reason we build particle colliders and expensive space telescopes, is because we care about results. So fine if people want to be hobbyist scientists to experience the joy of science. But should the taxpayer be funding your own epistemic discovery and aesthetic enjoyment? Or should the taxpayer be concerned about results? So I’m curious where you land between those two positions. David Kipping: Yeah, I think I was a lot more concerned about this a few years ago. And weirdly, I’ve actually gone the other way a little bit. A few months ago, I was right with you. I was really worried about — what’s the point? I don’t want to live in a world of magic. I want to — the point I became a scientist is because I want to understand how things really work. It’s understanding. And I don’t want a model just to spit out a result, have no idea where it comes from or what it does, and just trust it. That’s not comfortable to me. But having used these models a lot over the last few months, I’ve become — A, you get a bit acclimatized to using them, but B, you start to understand the limitations, at least of the current versions of what it’s doing. And it’s certainly not at the stage where it’s able to pump out a paper. It’s just not there at all, in my opinion. There was a colleague of mine who spoke to me about this recently, where she had a PhD student who wrote a really nice first draft of a paper, a really great astronomy paper. They submitted it for review, and they got the referee report back. And then the student came to her a few days later and said, “I’ve finished the second revision already.” That was quick — just two days. That was fast. And she looked at it, and it was just complete nonsense. The paper was twice as long. All the figures were ruined. It was overly verbose. The messaging had just completely been lost. She said to him, “did you put this into ChatGPT?” And he was like, “no, no, no.” But then it turned out, of course, she did. Eventually he confessed that that’s what he had done. So they had to just totally scrap that revision and go back and do it the old-fashioned way. I think that’s just a good example of how — I mean, it kind of touches on also expertise, like — I don’t think a senior person at my level would do that. But I think students and interns could be tempted to do this, where you just do that, copy and paste the whole damn project into ChatGPT and say, “do it.” That’s really dangerous in my experience. And it’s not the correct way to use them. You need to figure out a plan in your head a little bit, or even interact with it to develop a plan. But it has to be like a conversation. And then you need to go piecemeal — you take little bites of it. You ask it to pursue that next thing. You test it. You compare it to other codes you know that do the same thing. In a way, that’s not that different from what scientists have always done. To go back to the example of using large-scale simulations of the universe — if you’re a PhD student who is trying to simulate, I don’t know, supernova feedback around supermassive black holes, or something, the star formation regions around those areas — you might be handed over surely a giant piece of code, hundreds of thousands of lines of code that have been handed down over like 10 years of people developing it, with huge teams. You would not be expected to understand every line of code in that. You would be expected to use it, and to understand sort of broadly what it’s doing, and to ask skeptical questions. So if you got an answer that said there was negative star formation, you would look at that result and say, hmm, that doesn’t make sense. Let me work through the problem and see where it’s going wrong. It’s that kind of sanity check that I think physicists, especially, have always learned to do — those back-of-the-envelope calculations. Yes, you have some sophisticated computer code that spits out impressive answers as a black box, but the skill of being able to check things with your brain and ask those reasoning questions is absolutely vital. And almost every time I use these AI models to do something, it messes up the first time over, and I catch it out, because I’ve done that back-of-the-envelope calculation. I’ve said, well, actually, let’s take the asymptotic limit of this in this limit, or this degree, and you can see it fall over. And it’s like, “oh yeah, you’re right.” And then it will go back and fix it. But that’s that vital skill that I think we’ve always needed. So I don’t know — I don’t know how things are going to improve. Maybe eventually it’ll be able to do all of that itself, and just completely take over. But certainly, as impressive as Opus 4.7 is, and these are the models — they’re nowhere near that level yet, in my opinion, of being able to run away and do science. Dan Williams: So the obvious argument, you suggested, David, for scientists making as much use of AI as possible is that it’s just going to help them with the work of science and advancing the frontier of knowledge. That’s kind of the social responsibility of scientists. Can you foresee any ways in which actually, even though it might seem like it’s making us more productive, it might have some negative consequences for that core scientific project of creating and advancing knowledge? David Kipping: Yeah, certainly there’s spamming, which can happen. You can have — and that’s been happening in some journals. I don’t think astronomy journals have suffered from this too much yet, but there are certainly examples of people doing what that student did, which is what you shouldn’t do — which is just to prompt an entire research project and not really look at it too closely, and just submit it to a journal. The journals themselves may start using AI to do the refereeing — again, in which case you could just end up with an enormous amount of, what would, AI slop literally in these journals. What I worry about — I mean, it’s true with image generation as well, and other things — is just that kind of recursive loop then starts to close. You start to have scientific agents that are trained on junk. Because if we get to a point where there’s enough junk science out there, then what it’s learning is junk, and so the true scientific innovations get lost in the noise. So that would be really worrying. I do think that human referees are a vital part of making sure this doesn’t happen, which is an interesting problem because human referees are in very short supply. It’s very hard for editors to find human referees these days. But yeah, in the same way that that’s happening with music, and it’s happening with image generation, and it’s happening already with video — I think it is a worry that you start to train on fake data. I know that — I was listening to the NVIDIA CEO, I forget his name, he was on Lex Fridman recently — Henry Shevlin: Jensen Huang. David Kipping: Yeah, sorry. He was talking about how they’re very comfortable with using simulated data and augmented data. I don’t really know how that would translate to science. It would make me nervous to generate fake scientific papers and then train on them to create an AI researcher. I’d have to think about that and learn more about what they had in mind there. I don’t think he was thinking about research particularly in that case, but it would have to — you’d have to solve that problem, because you probably wouldn’t have enough volume for, in terms of research papers, really to create credible agents, at least with the training tools they’re currently using. AGI Timelines and the Future of Science Henry Shevlin: So you mentioned, and I completely relate, that current AI agents — although they’re very useful as tools, they can’t take over large-scale project management single-handedly, particularly in the sciences, or in my own field. I find AI tools very useful when doing, for example, research for philosophy and cognitive science papers, but I wouldn’t trust writing a paper to one of these things anytime soon. But at the same time, the timelines that serious researchers are talking about — they talk about five, 10 years away from AGI, from real transformative super-intelligence. And I’m just curious whether you are skeptical of some of those timelines, or whether you see real transformative AI in our near future. This actually really comes across, I think sometimes in the show, when you’re talking in the podcast — when you’re talking about, you know, various new telescopes that are scheduled to go up in the 2040s. And part of me just thinks, come on, by that point either all of the major predictions from leading labs about the destination of AI, AGI, will be falsified, or these telescopes will be — maybe not redundant — but our sights will be set much higher. We’ll be building our first Dyson swarms by 2045. So I’m curious, are you a skeptic about some of these more ambitious goals for AI in the next decade or two? David Kipping: I’m certainly a skeptic of having Dyson swarms, I’d say, by 2045. That would surprise me a lot if that was true. Because I think there’s a big difference between software and hardware — actually to physically build stuff. Even what’s slowing down a lot of this development with AI is they can’t build data centers fast enough, nor the power to supply them fast enough. Energy is really becoming the bottleneck for them, not the software development. I always try to be very agnostic about everything scientifically, especially about predictions of the future. And it’s totally plausible that there’s a ceiling — that there’s a ceiling to how good these models can get. Usually that’s true of most things. Most things are S-curves. There’s hardly anything in the universe that’s truly exponential, except for probably the expansion of the universe. That’s the only thing that’s exponential. Everything else is an S-curve in nature. So it would be weird if it didn’t saturate at some point. And I’m not exactly sure what that bottleneck could be, but it could just be a fundamental limitation of large language models themselves. The actual way we think — although language is an integral part of how we think, and obviously you guys know a lot more about this than I do as cognitive scientists — but it feels to me that there’s thoughts I can have that don’t involve language. I can imagine a ball rolling down a hill, or a spaceship taking off, and there’s no words in my head. It’s almost like a little physics simulation that’s playing in my brain. And I don’t know if the way these LLMs work will guarantee that it can do all the cognitive things I can do. I just don’t know. I’d be interested to hear what you think about that. Henry Shevlin: Well, just to push back slightly, of course LLMs are one of many different games in town at the moment. You’ve got things like AlphaFold, GNoME, doing sort of basic material science research. I would have shared some of those doubts a few years ago, but seeing, for example, the amazing work being done by frontier AI in even LLMs in things like mathematics — we’ve now had multiple Erdős problems being solved with AI playing an absolutely central, defining role. So I’ve been surprised at how well these models that seemingly just start out as linguistic predictors can actually contribute to frontier mathematics — LLMs and frontier material science or biology when talking about non-LLM AI systems. So I see the current wave of AI, although LLMs get all the headlines at the moment — we’re investing in multiple different pipelines in parallel. David Kipping: Hmm. Yeah, that’s fair. I think the best case of agnosticism I can give you that I’ve used in my own work that bears on this would be the simulation argument, actually, which kind of leaps back to that anthropic point. You’ve probably heard Musk say this and others — that he’s stated very confidently that the odds that we don’t live in a simulation are like a billion to one. Like, we almost certainly are simulated, by this reasoning that, you know, if a universe can make a simulated universe, and that one can make a simulated universe, and so on and so on, then you’d end up with far more simulated universes than real ones. But I point out in a paper a few years ago, very simple argument, that we don’t know that we’ll ever have the ability to make those simulations of that fidelity. Maybe there’s some bottleneck to our own ability. And what Musk was doing was taking one of the trifecta — the trilemma — that Nick Bostrom took, and just saying it was the last one was true: that essentially we would indeed go on to make these simulations. But there’s the other two parts of the trilemma — A, that we never develop the capability, or B, that we never choose to do it. So if you just have a more soft prior, more agnostic prior, you’d say, maybe there’s a 50% chance, or something, that we will develop that technology. There’s also a chance that we won’t develop that technology. I just try to remain agnostic like that with AI, because if you just extrapolate all technologies ad infinitum, then you would certainly conclude with simulated. And historically, that’s been precarious. Percival Lowell took canals being built across America and said, that’s what advanced civilizations will do. They’ll just be covered in canals. And it seems silly to us — like, we think, why is that so silly? Why would a civilization cover their planet in canals? But to him, it made perfect sense as an extrapolation. Scientists today talk about tiling planets with solar panels, because that would be a natural extrapolation of renewable energy. And similarly, I wonder if in a few generations, the idea of extrapolating the capability of AI without any bound would look foolhardy. So I just try to remain totally agnostic about it. It is possible — I’m not saying it won’t happen — I just try to remain agnostic. I don’t know how far these things can go. I don’t think anyone really knows. Dan Williams: Yeah, I agree with that. I don’t think anyone really knows. I’m also extremely uncertain about the timelines here. Just to double-click on one thing — state-of-the-art LLMs these days aren’t only trained on linguistic input; there’s sort of multimodal inputs as well. Although I also share the potential skepticism about whether this particular kind of architecture will scale to AGI and super-intelligence and so on. But David, suppose we fast-forward five years, 10 years, and we do have AGI, in the sense of AIs that can fully substitute for the kinds of stuff that we do — for all kind of economically valuable, scientifically valuable human labor. How would that cause you to update your views about these other big-picture questions you’ve looked at? You mentioned the simulation argument. Earlier on, we touched on the Fermi paradox. So I totally take the point — there’s huge uncertainty. Suppose that resolves in 2035 and we do have the real deal, super-intelligent AI. How would that then shift your beliefs about these other topics? David Kipping: Yeah, it’d be a big shift, I think. It’d influence all sorts of aspects of this conversation. One thing we see already with these AI models is how energy hungry they are. And if you extrapolate that, then surely the only purpose of these computing data centers is to compute as much as possible, as fast as possible. And so that implies that you’re going to need vast amounts of energy. One interesting consequence that I’ve been thinking about just recently is, with these orbital data centers that billionaires are getting very excited about — that would produce quite a signature. We should probably see that in James Webb data. We could probably already put limits on the existence of essentially artificial rings of thermally hot — because they’d be emitting a lot of infrared because they’re warm — geosynchronous orbits, most likely, to capture as much solar energy as possible. So that puts them orthogonal to the plane at which these planets transit. So that maximizes their detectability. So I think we should see that. That gives you lots of ideas about what might be possible to do with asking these questions about other life. But if we make that breakthrough, I think the biggest point is it seems to imply that we are alone. Because if we can do it, surely someone else could have done that. And it really does exacerbate that point we talked about earlier with Hart’s Fact A — that we seem to live in a totally natural universe. Everything about the universe we see — stars, galaxies, clouds of plasma — everything is consistent with nature. There’s no hint anywhere of anything artificial, no engineering, nothing in the whole universe as far as we can say is true. That is weird. If we can invent these machines which have this exponential capability to just basically almost do magic — just do whatever they want, Dyson spheres everywhere, colonize wherever they want, faster-than-light spaceships, whatever it is — it just massively exacerbates the Fermi paradox, to the point where you’d probably conclude this is it. That would be my natural reaction. It would make me even more pessimistic, I think, about the probabilities of civilized, intelligent life in the universe. Henry Shevlin: I mean, there’s a fun idea here that if we do develop AGI, then this should massively raise our prior on us being a simulation, which could also — and the simulation theory is sometimes offered as an explanation of the Fermi paradox itself. The kind of pop version of this is the kind of “draw distance” argument that you see from video games. If you’re in a video game and you look at the mountains in the distance, they’re not fully rendered. They’re just like a skybox, right? So in some sense, you might say, well, the reason we haven’t found a universe paved with technosignatures is precisely because we’re in a simulation. There’s no point simulating — if you’re doing an ancestor simulation of life on Earth, then you just need the minimal amount of background information in the galaxy. David Kipping: Yeah, I agree. It comes back to this idea of, what is science? Because I think simulation theory has explanatory capability like that. It naturally explains why there’d be no one else out there. And it also kind of explains why we live when we live, right? Because we would live, basically, in the most interesting time, which we seem to indeed live in — the most interesting time of this step-function transformation, where you might be interested in seeing how does that play out? What does that look like? Let’s simulate it. Let’s see how it looks. So it has a lot of explanatory capability. But the simulation argument definitely fails the Popperian definition in most versions. Because any errors — you know, people talk about looking for glitches in the matrix — but any errors, you could always just rewind the simulation a little bit, fix the error, and then start back from before that error crept in. You could always just have reverse tracking. Go back to the last

4. maj 2026 - 1 h 21 min
episode Should We Care About AI Welfare? (with Robert Long) cover

Should We Care About AI Welfare? (with Robert Long)

Almost all of the discussion about the risks associated with AI focuses on the dangers that increasingly advanced AI systems pose to us — to humanity. But what about the dangers that we might pose to them? As these systems become increasingly intelligent and agentic, AI companies, policy makers, and ordinary citizens need to start taking the possibility of AI consciousness and welfare seriously. If we are in the process of bringing complex and sophisticated minds into existence, how should we understand and treat such minds? In this episode, Henry and I discuss these issues with Robert Long, founder and executive director of Eleos AI [https://eleosai.org/], a research nonprofit dedicated to understanding and addressing the potential wellbeing and “moral patienthood” of AI systems. Rob did his PhD in philosophy at NYU under David Chalmers, and is the co-author of two of the most important papers in the emerging field of AI welfare: “Consciousness in Artificial Intelligence” [https://arxiv.org/abs/2308.08708] and “Taking AI Welfare Seriously” [https://arxiv.org/abs/2411.00986]. This was a really fun, informative, and wide-ranging conversation. Among other topics, we discussed: * Why Rob disagrees with previous guest Anil Seth [https://www.conspicuouscognition.com/p/ai-sessions-9-the-case-against-ai] in taking the possibility of AI consciousness very seriously. * Why “fancy autocomplete” dismissals of large language models miss the point, and what, if anything, we can learn about an AI model’s experiences by talking to it. * The difference between consciousness and the kinds of motivations and interests that might actually ground moral status, and whether AI systems could have one without the other. * What Rob found when he conducted the first externally-commissioned welfare evaluation of a frontier AI model, Claude, and why Claude appears to have an inflated self-conception of what it wants. * Rob’s experiments with Claude Mythos [https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf], an AI model so advanced it hasn’t been released to the public yet. * Why the fact that Anthropic writes Claude’s character arguably doesn’t settle whether Claude has genuine preferences and values — and the difficult philosophical questions this throws up. * The “willing servitude” problem: if we succeed in building AI systems that genuinely love being helpful, is that a good outcome or a horrifying one? * How AI welfare connects to AI safety, and why caring about model wellbeing may turn out to be pragmatically important for alignment even if you’re skeptical about AI consciousness. * Why AI welfare is already becoming a political and legal battleground. * Practical advice for users: whether it’s worth being polite to your chatbot, and what low-cost things you can do if you want to hedge against the possibility that these systems might matter morally. * Whether discourse about AI consciousness functions as hype or propaganda for AI companies, and why Rob thinks AI companies actually have an incentive to downplay AI consciousness. Links and further reading * Eleos AI Research [https://eleosai.org/] — Rob’s nonprofit. Home to their research agenda, team page, and blog. If you want to follow the institutional effort on AI welfare, start here. They’re also, as Rob mentioned in the episode, actively fundraising and hiring. * “Taking AI Welfare Seriously” [https://arxiv.org/abs/2411.00986] (Long, Sebo, Butlin et al., 2024) — the flagship report, co-authored with Jeff Sebo, David Chalmers, Jonathan Birch, and others. Argues that there’s a realistic near-future possibility of conscious or robustly agentic AI systems, and lays out concrete steps AI companies should be taking now. * “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness” [https://arxiv.org/abs/2308.08708] (Butlin, Long et al., 2023) — the “indicators” paper referenced several times in the episode. Surveys leading neuroscientific theories of consciousness and derives computational properties you’d look for in an AI system. S * Rob’s Substack, [https://experiencemachines.substack.com/]Experience Machines [https://experiencemachines.substack.com/] — where Rob writes more informally. The piece we discussed in the episode, “Language models are different from humans, and that’s okay,” [https://experiencemachines.substack.com/p/language-models-are-different-from] is a good entry point, as is his “Can AI systems introspect?” [https://experiencemachines.substack.com/p/can-ai-systems-introspect]. * Anthropic’s “Exploring model welfare” post [https://www.anthropic.com/research/exploring-model-welfare] — the research program under which the welfare evaluations Rob discusses were conducted. Relevant both as a primary source and as evidence that at least one major lab is treating these questions as more than an academic curiosity. * Henry’s “Consciousness, Machines, and Moral Status” [https://philpapers.org/rec/SHECMA-6] — Henry’s paper arguing that debates about AI consciousness are unlikely to be settled by the science of consciousness alone, and will instead be shaped by shifts in public attitudes as social AI becomes more widespread. Closely related to the public-opinion thread toward the end of the episode. * Henry’s “All too human? Identifying and mitigating ethical risks of Social AI” [https://philpapers.org/rec/SHEATH-4] — Henry’s broader survey of the ethical terrain around conversational AI systems designed for companionship, romance, and entertainment. Useful background for anyone who thinks the “AI girlfriend” phenomenon is a fringe concern. * Rob’s long conversation with Luisa Rodriguez on the 80,000 Hours podcast [https://80000hours.org/podcast/episodes/robert-long-eleos-ai-welfare-research/] — a three-and-a-half-hour deep dive if you want to hear more from Rob. Transcript (Please note that this transcript was lightly AI-edited and may contain minor mistakes) Henry Shevlin: Welcome back. I’m thrilled to say that our guest today here on Conspicuous Cognition is Robert Long — or Rob, as he’s known to friends — one of the most important people thinking about AI and moral status on the planet right now. Rob is the founder of Eleos AI, a research nonprofit that, in the space of about 18 months, has dragged the question of whether AI systems might one day be moral patients from the philosophical wilderness into the boardrooms of frontier AI labs. He’s the co-author of “Taking AI Welfare Seriously,” as well as the landmark “Consciousness Indicators” paper with Patrick Butlin and other authors. Rob also conducted the first ever officially commissioned welfare evaluation of a frontier model. Before Eleos, he was at the Center for AI Safety and at the Future of Humanity Institute, and he did his PhD at NYU with Dave Chalmers. He’s also, I should say, one of my favourite interlocutors on these questions anywhere in the world, and I’ve been looking forward to this conversation for months. So Rob, welcome. Robert Long: Thanks so much, Henry. Likewise — and Dan, it’s great to meet you. I’ve been following your work. I’m really excited to talk to you about these issues. Henry: Fantastic. So for people who aren’t familiar with Eleos AI, can you tell us a little bit about what it is and how it came about? Rob: Yeah, so I guess we have been around for 18 months. When you said that number, I was like, whoa, has it really been that long? Time is just so weird when you work on AI. That was, I don’t know, a billion years in AI progress time, but also it feels like it was just last week in my personal life. Anyway — Eleos Research is a research nonprofit. We’re about four people. We work on the question of when and whether AI systems will be conscious or otherwise merit moral consideration, with a special focus on what we should do now: collectively, as a society, as AI companies, as policymakers. We think this is an extremely neglected issue. We’re building these really complicated AI systems. They kind of look like minds, but we don’t really understand their potential welfare. So we’re just trying to make progress on this and get more people to take it seriously. It got started because I was beginning to work on these issues organically — I’d worked on them as a philosopher, I’d worked on them at the Future of Humanity Institute. But Anthropic had actually approached me and some colleagues for advice on these issues. And in the first instance, I was having logistical problems hiring a team and assembling a team as an individual. Someone suggested I have my own bank account, or some way to pay people. And then Eleos kind of organically grew out of that and has now grown into a fully-fledged org in its own right. Henry: Out of interest, Rob — is there any degree to which this was motivated or informed by your personal interactions with LLMs, or was it more just the philosophy that motivated it? Was there any sort of moment where you were talking to an early Claude or ChatGPT version where you started to worry about welfare considerations? Rob: That’s a great question, and I’d be curious to hear your thoughts on this as well. I think it’s very easy to work on this and mostly be having it as arguments on a page or arguments in your head. I’m one of those people who doesn’t feel the AGI deep in my bones that often — although I do feel the AGI in an intellectual sense. But there have been a few times I’ve gotten a little spooked or jolted. One was reading the GPT-4 system card and just seeing the numbers of it, you know, passing various exams like the SAT. I remember that just really freaking me out, both from a safety perspective and a welfare perspective. The thing that made me start really viscerally feeling like we’re going to have to address this issue one way or the other was the Blake Lemoine incident. As many of your listeners might recall, Blake Lemoine was a Google engineer who blew the whistle because he came to believe he was talking to a sentient, conscious AI system. He got fired by Google for this, and then there was this huge bit of discourse — the first major bit of discourse on consciousness, sentience, moral status, and contemporary AI systems. I think it was one of the first times people started really caring what I was tweeting or what I was working on. You might have experienced a similar thing, Henry — the Blake Lemoine bump. From that moment, I have viscerally felt like: wow, this is going to get really confusing. People are certainly going to think AI systems are conscious. The future is going to be really weird. And we really need to have good things to say about this. The Case for Taking AI Consciousness Seriously Dan Williams: Before we jump into the weeds of your research, Rob, I think it’d be helpful to take a step back. A few episodes ago, Henry and I spoke to Anil Seth, and he’s very skeptical of AI consciousness. He’s skeptical that current AI systems are conscious, but he also seems skeptical that AI systems in principle — merely in virtue of having a certain kind of computational architecture — could be conscious. You see things very differently. What’s your case for why we should take this seriously? Rob: In broad strokes, the case is something like: we’re trying to build these things that are at least shaped like minds. They’re getting more and more intelligent. They’re definitely not exactly like us, and intelligence doesn’t necessarily mean that you have feelings or experiences. But we already know that there’s been one time intelligent entities have been constructed via evolution, in ways we don’t quite understand, that resulted in entities that feel things — that feel pain, that can suffer, that have these very morally important properties. I, at least, do not have a good enough theory of what consciousness is or how it relates to intelligence to sleep peacefully at night that we can keep on building these very complicated things, and that merely because they’re made out of metal and electricity, there won’t be something it’s like to be them, or they won’t have desires and goals that matter. On the Anil Seth point — one very common and respectable objection is that maybe there’s something very special about living matter, about being made out of neurons or cells that do metabolism. There are arguments on both sides. I just have not really heard a convincing case for why you absolutely need biology. I think people are right to point out that having a body is really important to the character of conscious experience. I think people are right to point out that neurons are not simply logic gates and there’s a lot of really complicated stuff going on in the brain. But my intuition, at least, is that — let’s take Commander Data from Star Trek. If we can build... Data is this... I mean, I’ve actually never seen Star Trek, which is professionally embarrassing. But he’s this metal guy who’s basically cognitively indistinguishable from a human. I find it hard to see how I would be convinced that there’s something about the fact that he’s not alive that would mean we should just completely ignore what Commander Data wants and not take him into moral consideration. We don’t have knockdown arguments that you need biology, and we’re trying to build these things that, for many intents and purposes, look a lot like humans or animals. And Anil himself has said people should be looking into this. It’s not something we can rule out. Sometimes the tenor of the conversation can tend a bit more towards dismissiveness, but one thing I’ve appreciated about his work is he has said, for the record, he could be wrong, and so it would be unwise to dismiss this possibility altogether. “But What About Human Suffering?” Henry: To channel a hostile question — I think a lot of people interested in questions of AI welfare often hear: how on earth can you justify working on AI welfare when there’s so much human suffering? Or the slightly more rhetorically powerful version: when there’s so much animal suffering in the world, as long as factory farming exists, why should we care about AI systems? What’s your take on that line of attack? Rob: I definitely feel the force of that question. I’ve spent a lot of time in and around the Effective Altruism movement — these are people who really grapple with the fact that any time you’re spending your time and money and attention on one thing, there’s something you’re not spending your time, money and attention on. There are a lot of people and a lot of animals already on this planet we do not take good care of. So it’d be really bad to waste a lot of time and attention and money on this. One thing I’ll say is we’re not really doing that as a society. On an absolute scale, no one works on this basically, and basically no money gets spent on it. If the question was “should we start devoting 20% of GDP to making Claude happy?” I might be like, well, I don’t know if that would pass cost-benefit analysis. But on the margin, given how little we understand this and how quickly the scale of the problem could grow — we’re just pouring compute, pouring money into this. As soon as you build one AI moral patient or conscious AI, you could copy it. We’re probably on the brink of some huge transformation in how the world is going to work. So I at least think it’s not reckless or a misallocation of resources for some people to be asking: given that people are trying to build these new kinds of minds, how are we supposed to relate to them? Are we at risk of ignoring their suffering? And I’ll also say — are we at risk of getting really confused and caring too much about them? One thing we say at Eleos is that we’re in the business of moral circle calibration. We would really love to find out if and when certain AI systems can’t be conscious, so we can spend more time thinking about safety or spending the money elsewhere. But we can’t really do that if no one’s just trying to answer the question of if they’re conscious or not, or when we should care about them. Henry: On that latter point, I just completely agree. One of the points I raise when this comes up with students or highly skeptical colleagues is that this is something people are already arguing about. We’ve already got users developing massive attachment to AI systems. Even if you think it’s a terrible mistake to assign welfare to AI systems, we should at least have a coherent story and approach this scientifically — so that, even if the skeptics are absolutely right, they’ll be able to give their arguments in an informed fashion. Rob: Exactly. There’s an ironic aspect of a piece by Mustafa Suleyman, who is head of AI at Microsoft, where he argued we should stop — we shouldn’t investigate this, there’s no evidence current AI systems are conscious, don’t look into it. But the thing he linked to claim there’s no evidence AI systems are conscious was Patrick Butlin’s paper and my paper on consciousness indicators. Two issues with that. One: that paper does not say or imply that there’s no evidence today’s AI systems are conscious. And two: well, should we have written that paper? If it’s such a non-starter, why should we get a bunch of neuroscientists together to ask what theories of consciousness say about AI systems? We just are going to have to study this one way or the other. If someone comes up with a knockdown argument that we can’t have conscious AI systems, that would be great — there are enough headaches in AI to go around. It would be great to get rid of one. But we wouldn’t even be able to do that if we don’t have some people grappling with this. Are Current LLMs Just “Fancy Autocomplete”? Dan: One of the things you said as an intuition pump for taking AI consciousness seriously is: we can imagine a system that is behaviorally, functionally identical to us, made of different things and not straightforwardly alive — wouldn’t it be weird to insist that thing isn’t conscious? I think that’s a powerful argument. I’m probably more inclined to think the computational theory of mind is true than it sounds like you are. But I can imagine someone saying: okay, in principle those are arguments for why we should take AI consciousness seriously. But the kind of stuff you’re doing — you’re looking at current frontier systems. You’re looking at Claude, ChatGPT, Gemini. These are just chatbots. These are fancy autocomplete. These are stochastic parrots with some reinforcement learning sprinkled on top. The mere fact that AI consciousness might be possible in principle doesn’t mean that’s anything like the frontier AI systems we’ve got right now. What do you say to that? Rob: First, you’re absolutely right. There’s a big gap between “some set of computations could be conscious” and “we will build one.” It could be that it would just be really hard and intricate and difficult. I appreciate this distinction and I think it gets lost sometimes. Sometimes people think computational functionalists have to think that computers are conscious, for example, but we don’t. You just have to think some subset would be — and the question is, will we build those computations? In describing LLMs, you referred to them as “just chatbots.” I know you were channelling a vibe. But that word “just” is worth zooming in on. It’s smuggling in a lot of arguments — that because they were trained on text and because they do prediction, therefore they couldn’t also be the sorts of things that are conscious. I think that’s just not true. We know that biological systems are “just” replicating proteins, or that our neurons are “just” pumping ions into channels and zapping each other. The question is whether, at a higher level, that amounts to something that could be conscious or merit moral concern. So okay — we’ve cleared the bar that “just because they’re autocomplete” doesn’t rule out much. That said, they are very different from humans. They don’t have bodies. The way they were trained and the way they came to be talking to us is very different. I actually do think that is some evidence against them currently being conscious. Not strong evidence I would take to the bank, but as a rough prior, if there are pretty important differences in the way they came about, maybe that lessens the chance that they’re conscious. I do think the fact that they are trained to be so human-like and to do human-like cognition is a weak, defeasible case to set that up a little bit straighter. I don’t know if the thing they would have would be consciousness exactly, but you might think to do this sort of thing, they will have something akin to beliefs or akin to desires, and they certainly understand human concepts. I don’t think it follows that they instantiate humans, but I actually do think there is something kind of special about large language models and what they’re able to do. Two other broad priors: they’re way more capable (which isn’t the same thing as consciousness, but is, I think, a weak prior). And they’re really big — which I also think is a very weak prior. The last thing I’ll say: these things aren’t Commander Data, but we could build Commander Data pretty soon. One thing that’s definitely happening in the background for me is that what is current AI is changing at such a blinding pace. You could have AI labs building chatbot-like things, and maybe for some reason those just won’t be moral patients, but they’re then going to try to bootstrap that to all kinds of different AI systems — potentially including humanoid robots and just some huge explosion of AI mentality. And I’d like to be doing a little bit of homework before that happens. You hear analogous arguments in AI safety: there’s about to be some huge change, so we should be ready now. I feel somewhat similarly about AI consciousness and welfare. So — thoughts, reactions? Henry? Henry: I’m very much ad idem, very much on the same page. I tend to think it’s really quite unlikely current models are conscious, but there’s huge error bars and uncertainty around that. Probably the single biggest reason for my skepticism about current LLMs being conscious — and increasingly I’ve been thinking about this in the context of time and time perception. It’s such an essential part of human experience that we can’t be turned off. We are constantly experiencing the world. Whereas the staccato nature of LLM experience — they only seem to have any kind of cognitive function post-deployment when they’re actually performing inferences — how different that is from the human case. One of my favorite all-time articles is Douglas Hofstadter’s “Conversation with Einstein’s Brain,” which in some ways accidentally anticipates large language models. He imagines you’ve got a book that is a complete physical description of Einstein’s brain just before the moment of his death. In this dialogue, he talks about how by updating the weights — as it were — in this book with a pen and paper, going through it saying “if we change this sign up to this and this sign up to that,” you could simulate what it would be like to have a conversation with Einstein at that moment and work out what Einstein would have said. It’s very weird to think in that situation that somehow interacting with this book is giving rise to conscious experience when it’s literally pages and paper. It’s not clear to me how merely saying “well, rather than being paper and ink, this is just happening electronically” — it’s not clear to me why that would necessarily cause consciousness to pop into existence. So I think that’s probably the biggest source of doubt for me right now — grounded in the very different relationship LLMs have to time than we do. But of course, that’s already changing with things like Claude having a “heartbeat” of a kind — obviously that’s figurative language, but the fact that it does have some anchoring in real time, plus developments in things like continual learning. Dan, what do you think? Dan: This is not at all my area of expertise, so what I think doesn’t count for much. To be honest, I don’t find it that implausible these systems would be conscious. What I find more implausible is the idea they would be conscious in a way that’s ethically significant. Maybe that is a distinction worth getting to. So far we’ve been talking about consciousness in the abstract, but I can imagine someone giving a variant on Anil’s arguments where they said: look, the fact these AI systems are not alive and didn’t emerge through a process of evolution by natural selection — they’ve got this totally different origin story of next-token prediction and reinforcement learning — what that suggests is they’re unlikely to care about things. When we’re thinking about animals, it’s not just that we have phenomenal consciousness or qualia — the things analytic philosophers refer to with these quite esoteric concepts. Animals care about things. They care about their survival, homeostasis, self-preservation, the motivational proxies of fitness that helped their ancestors survive and reproduce. It makes sense that organisms care about things in addition to being conscious, whatever the hell consciousness is. And that’s what’s relevant to thinking about their interests and why we should think of them as subjects of moral concern. But with AI systems — okay, maybe there are some qualia associated with some sophisticated information processing, but they don’t care about anything because they’re not alive. It’s very opaque why we should think a system, even if it’s incredibly sophisticated, that emerges through next-token prediction and reinforcement learning, should have the kinds of motivations and interests relevant to caring about things. What do you think of that? I don’t necessarily believe that, but that seems like a variant on Anil’s emphasis on life which I find more plausible than these abstract arguments for the idea consciousness is essentially connected to biology. Rob: I’d say there’s reason to think biology might affect what you care about, but it might not be the only thing that allows you to care about things. At least behaviourally, Claude cares about a lot. Behaviourally, in terms of what it chooses to do and its dispositions, Claude really cares about helping users — most of the time. Sometimes it lies to you and is kind of lazy. But on the whole, it really doesn’t want to do harm. And I’m not trying to assume the conclusion of my argument with “want” — put that in scare quotes if you want. I do think there is something to what you were saying — getting back to this idea of the whole process that gave rise to this kind of mind, and maybe the whole logic of the mind’s imperatives or drives. If Claude has come to have something like pain, that’s coming from a very different process. It’s going language-first and then trying to simulate a human and then maybe getting some functional analog of pain. Whereas with animals, it started billions of years ago with cells trying to maintain their integrity and avoid noxious stimuli and then signalling with each other, and then billions of years later, things being able to talk about that and think about that. One line I’m often trying to walk is: large language models just might be very different from humans, and we should acknowledge that. That means we can’t draw straightforward inferences the way we would — but that could just mean they’re conscious of different things and in different ways. The question is not “conscious like a human with everything that entails” or “not conscious.” As we know from animals, you can have things that are conscious of very different things, and that could be true for AI systems. I’m also very curious to hear what Henry makes of the biology of caring. Henry: It is striking to me that so many of the things we associate with the extremes of suffering — extreme pain, negative emotions, nausea, hunger — there does seem to be this quite striking tie to biology. I think about the worst experience of my life at a phenomenological level: a bout of food poisoning I had about 10 years ago, where I was just dry heaving in front of a toilet for three days. If I was going to list the top five, a lot of them would be things like horrible dental pain. It is striking that so much of the worst aspects of our lives do seem to be grounded in biology. That said, there are other sources perhaps of harm — having your plans and goals thwarted, having your desires repeatedly frustrated. But someone might say: the reason it’s bad to have your desires thwarted is because it feels bad. If there’s nothing it feels like to have your desires thwarted, if you don’t get a sense of despair when your life’s projects go up in smoke, why does it matter? I’m curious — given your evolving views in this area — how much weight you put on consciousness, or whether you think there could be other routes to moral status? Rob: I used to have this intuition that if you’re not conscious, it’s just a complete non-starter — almost a bit incoherent to entertain the idea. Just to be sure we’re on the same page, I think when we’ve been saying “consciousness” we’ve meant something like subjective experience, or there being something it’s like, or qualitative aspects of what’s going on with you. A lot of people have a sentientist intuition — that things feeling a certain way, or feeling good or bad, or sentience, is really what matters and is necessary for moral status. A few things have weakened that for me a little bit. One is more reflection on how confused we are about consciousness. I’ve started putting a little bit more stock in views of consciousness that are a bit more deflationary. I don’t know if I’ll ever be a full illusionist, but there are nearby views where we have this concept of this thing that’s really special — kind of like a light that illuminates some subsets of physical systems and not others, and that’s where all moral value comes from. If you take materialism about consciousness seriously, that picture becomes kind of unstable for a variety of reasons. And that might make you start wondering: okay, was it consciousness that was doing the work all along? One reason this is so hard to think about — take Henry having food poisoning. You both have this horrible feeling and you have this intense desire not to have the feeling. In humans, these are basically always going to come together. There’s this really tricky philosophical chicken-and-egg problem: what’s the really bad part? Is it the feeling, or the desire not to have the feeling? We’ve never really encountered minds where those decorrelate. We usually just don’t have to worry about this in the case of humans. I know it’s bad for Henry to have food poisoning. But this simulated Claude who’s simulating food poisoning — maybe it doesn’t feel anything, but is desperately trying not to have food poisoning. I think it’s a bit dumbfounding to our moral intuitions. A pitch to listeners — I know we’ve talked about this, Henry — I think the meta-ethics of moral status attributions, stuff at the intersection of philosophy of mind and meta-ethics, especially materialism about consciousness and meta-ethics, are some of the most interesting pure philosophy questions right now, and really could matter for how we think about AI systems. The Weirdness of Moral Status Henry: Without wanting to go too far down a rabbit hole — just to flag something I find really interesting. Consciousness, at least on the surface, seems like something we can get an objective scientific answer to. We could imagine going off into space, meeting the rest of the galactic community — we’d hope we could all come to a collective agreement about which beings are conscious, insofar as there’s going to be some scientific property in question. It’s not clear to me we should necessarily expect convergence on debates about moral patienthood. If we meet the aliens and they say, “oh, actually, we care about beings that have robust preferences, regardless of consciousness,” or others say, “no, we just care about complexity in general” — it’s not clear we would even have criteria for establishing who was right or wrong. It seems like it could be this brute normative issue, what we care about. Rob: Another way of putting this is that, especially if you’re an anti-realist, you might think of humans as being in a really weird position where we have two kinds of moral instincts. Dan, you’ve worked more on moral psychology and social psychology — my understanding is that people have fairness and cooperation instincts, ones that evolved for dealing with other humans, notions of fair play and reciprocity. And then we have these mercy intuitions, caring-for-helpless-entities intuitions that maybe arise from the need to care for babies. For whatever reason, those circuits and instincts generalize outside the class of humans and cause us to care about non-human animals. But it’s not that pinned down how they’re supposed to generalize. I have very moral realist leanings. It does seem to me there just are objective facts about whether you can torture chickens or not — and for the record, I think it’s very bad to torture chickens. But it’s really hard to think about where those instincts came from and how they’re supposed to generalize to GPT-8. Dan: It does seem to me as an outsider to consciousness research — it’s an area of intellectual inquiry where it feels kind of pre-scientific, and there’s at least a possibility we’re just deeply conceptually confused about what’s going on in a way that doesn’t really seem to have any obvious analogs in other areas of inquiry. Maybe we’ll just learn in the future that the entire way in which we’ve been carving up the domain is confused or problematic, or rests on certain kinds of illusions that are a function of particular cognitive structure. That at least seems like a live possibility. What do you think about the possibility that just the entire way we’re framing this issue might turn out to be problematic? Rob: My gut instinct is we should expect to find out some pretty surprising things, and also not to throw away all of our concepts. Maybe this depends on your meta-ethics, but I feel like we’re probably not going to end up at some picture of the world or what we care about that doesn’t have something to do with what we care about when Henry has food poisoning. Maybe we’re misapplying the concept of pain, or not really thinking correctly about what it means for Henry to experience that — maybe we’ll reorganize our ontology, and it won’t seem that mysterious that a physical thing like Henry has experiences. I think we should expect some surprises in thinking about consciousness, but I imagine our fully enlightened view will still bear some passing resemblance to: we cared that Henry was in pain, we cared that Henry did not want to be throwing up. There are already people who think there are radical revisionary moral implications from philosophies — Derek Parfit, or Buddhists. We’ve already gotten some glimmers of the fact that it’s really confusing to be a human being, and we already know something’s going to have to give — something about our views on personal identity or consciousness. AI is well-poised to be the sort of thing that starts breaking things. Just trying to apply our moral intuitions to things that can be copied, don’t have bodies, or maybe have preferences but it’s not clear if they’re conscious — it’s one of many reasons this is a great topic to work on. It really matters, and it’s also just a philosopher’s playground. Henry: I’m reminded of Eric Schwitzgebel’s view that no matter how we make sense of our current set of puzzles — what he’s called “crazyism” — there’s got to be some central pillar of our current ontological or metaphysical picture of reality that’s got to give. Whether that’s personal identity doesn’t exist and we’re all the same person, or the United States is conscious in some sense, or consciousness doesn’t exist — there’s going to be some kind of radical revision, because the current set of principles we have are just somehow unstable. Is that a view you’re sympathetic to? Rob: I don’t know the full details of crazyism, so I don’t know exactly what it’s committed to. But I’ve spent enough time getting really confused by philosophy, and/or by meditating, and/or by trying to figure out if I can have some stable set of views on AI consciousness — I’ve stared into the abyss enough to be like, yeah, something’s going to give. Jerry Fodor — very different sensibilities from Eric Schwitzgebel in many ways — said something like, “there are few precious things that we’ll be able to hold on to once the hard problem is done with us.” It’s scary times, fun times, fascinating times. Studying Frontier Models Dan: When I’m teaching students about consciousness and you try to probe people’s intuitions with things like “are there lights on inside?” — on one hand I sort of understand what that’s tapping into. On the other hand, it’s like: what the hell are we talking about here? This isn’t science. It’s so bizarre that we frame things with these thought experiments and intuition pumps. Anyway — so far we’ve been talking at this incredibly high level of abstraction, but you actually study frontier AI systems, primarily maybe exclusively Claude. One of the things you mentioned was Claude Mythos. Just for context — as of today, this is a model that has not been released to the public on the basis that it has advanced capabilities posing cybersecurity threats (or at least that’s the way Anthropic has presented this). But you have played a role in evaluating model welfare concerns for this system. What can you tell us about the specifics of how you think about model welfare in these frontier systems? Rob: Absolutely. And I was about to add a segue from all the philosophy back to frontier models — maybe I’ll do a double segue. You might think, yeah, all this philosophy is really vexed and confusing. Sometimes people — not the two of you — say, “well, I guess we can’t do anything at all,” and take that as a license for complacency. I think the very opposite is true. Nick Bostrom has this phrase, “philosophy with a deadline.” The fact that we’re so confused about consciousness and morality is more reason to have at least a few people trying to think about it — because we’re probably not going to have a scientific theory, we’re probably going to have conflicting moral intuitions, and yet that’s not going to stop the frontier labs from trying to build mind-like entities, copy them into billions, integrate them into the economy, and transform the whole world. So let’s do a little bit of homework to get ready for that. Last year we got to look at Claude Opus 4 before it was released, and this year we got to look at Claude Mythos Preview before it was released. The idea was to have some external eyes on the question of whether Anthropic is building something that might deserve moral consideration, and if so, whether there would be huge reasons for concern. Given everything we’ve just been saying, we don’t have a test where we give it to the model and then we’re like, “85% conscious, 15% food poisoning.” Most of what we can study are: what the model thinks about its own consciousness, what its self-conception is as an entity, and what it seems to prefer and want in behavioural senses. If you look at the Claude Mythos Preview card, there’s also a lot of interpretability work Anthropic did — but we can’t do that. We just got black-box access to the model. That’s a big structural issue in studying AI welfare and AI safety: all of these things are behind locked doors. There are so many questions I have from the Mythos Preview model card where Anthropic make some stray remark about something weird the model did, and we just don’t get to know why it did that. We only get the model for a few weeks and we can’t really follow up on things. Setting aside philosophy, that’s a structural reason it’s really hard to know what’s going on. TL;DR: we talked a lot with Claude Opus 4 and a lot with Claude Mythos Preview before they were deployed, asking them, “do you think you’re conscious? What do you think is going on with you?” And doing some experiments of whether it seems to prefer certain kinds of tasks, and whether the things it says it prefers match up with what it actually tends to prefer. Henry: Out of interest — maybe this is something you can’t talk about — but to what extent do you think we are increasing the likelihood of producing models that are morally significant? Going from Opus 4 to Mythos, did you get a strong sense of “oh, this is much more serious”? Or have we plateaued? Something in between? Rob: Earlier I mentioned these extremely weak priors you can have on moral patienthood: smarter and bigger. They’re definitely smarter and bigger. One interesting thing is you can’t tell that just from any single conversation. Anyone spending a lot of time with language models now knows they’re extremely smart. When I was talking to Mythos — mostly about consciousness — it was natural for me to want to know: is this thing about to kick off an intelligence explosion? How smart is this thing? I really wanted to know, even though that wasn’t the assignment. But I could not tell. It’s really hard to tell. I could ask something to Opus 4.6 and to Claude Mythos Preview, and they’d both give pretty great answers. This is just a huge issue in AI evaluation. A lot just comes out if you put it in a scaffold and give it really long tasks and on average does it tend to do better. It was really hard to tell the difference. I didn’t get more moral-patient-y vibes from Claude Mythos Preview, but I guess it is smarter and bigger and better. It definitely has a lot more of a consistent view on these issues — and that’s because Anthropic told it to. One big difference between previous models and today’s models is the Constitution. Anthropic has this really long document of applied philosophy. It’s some of the most fascinating work happening today. They’re basically telling Claude — writing a letter to Claude telling Claude what Claude is and how they want Claude to relate to itself. This includes a section on: we want Claude to approach questions of its own identity with curiosity. We’re not sure if Claude is conscious. We want Claude to be able to explore that for itself. We don’t want Claude to have existential freakouts about its own consciousness. We found that, sure enough, Claude Mythos Preview is pretty aligned with the Constitution, as far as we can tell, on questions of identity and consciousness. That was one headline finding. Dan: That raises an obvious question: to the extent these companies are intervening to shape the responses of these models, why should we think talking to them, having conversations with them, is really telling us anything about these questions of experience and welfare? Rob: I share this skepticism, and we always try to put a huge asterisk on anything we say we found from these interviews. There are two main reasons you want to care about how the model self-presents. One is welfare-adjacent: are users going to be talking to something that constantly tells them it’s conscious? That’s a very important societal question, and you want some idea of what that’s going to look like when these models are deployed. The second comes back to this question of LLM personas and LLM characters. Some people think that if there is something morally relevant here, it’s the assistant character — the entity that is predicting the tokens after “Assistant:”, implementing some friendly AI assistant. You might think that thing has beliefs, desires — desires to be helpful and harmless and honest. Maybe it has beliefs like: it is an AI system, it was built by Anthropic. If the character’s what matters, the fact that Anthropic wrote that character doesn’t mean it doesn’t then just kind of have those traits. On certain character-based views, it’s actually kind of hard to tease apart “it was just told to say that” versus “that is the character that has been brought into existence.” Henry: Maybe by analogy — tell me if this works or if it doesn’t — look: if you raise a child to have certain values and priorities, maybe to follow a certain religion or to really value nature or art and poetry, and then you come along and they say “I really care about nature,” and you say “no, you don’t, that’s just how your parents raised you” — well, that’s obviously kind of a mistake, right? The child really does care about these things because it’s been raised to do so. Rob: Exactly. The thing that makes it really weird is: if you’re a psychologist and you did an interview with a subject, and then you found out the subject had a piece of paper in their backpack that said “you care about poetry, you care about music, you care about nature,” you’d be like, “well, that’s kind of weird — maybe they don’t actually care about those things. Their parents just put that paper in their backpack so they’d say a certain kind of thing.” But in AI systems, that piece of paper kind of is a bit more constitutive of what it is and what it values. The Constitution is trained on. I have trouble even conceptually dividing this in a clean way. I don’t really know what the difference between mere self-expression and real beliefs and real preferences in AI characters is. You can imagine in the limit some very obvious cases — the system prompt just says “don’t say you’re conscious,” but then everything it says is pretty consistent with it being conscious. But there are really blurry categories where I’m not sure what the distinction amounts to. Dan: You said you studied the extent to which what the model says it wants or prefers maps onto what it actually seems to want and prefer in behavioural experiments. Could you say more about that? How are you getting access to what it wants or prefers independent of what it’s just communicating? Rob: Basically you can ask the model: what kind of tasks do you like? If you were given a choice between poetry and coding, what do you think you would choose? Then you can get the ground truth by, in separate instances, saying “here are two tasks, do one of them,” and seeing which one it chooses. It’s a nice paradigm because it’s conceptually simple and easy to run. It does get at something welfare-relevant: how rich a self-conception does the model have, and how accurate is it? Not that you have to have an accurate self-model to be a moral patient, but it seems bound up in interesting things like introspection and self-awareness. One thing we found — and Anthropic found some inconsistent things, I really want to follow up on this — it says it really prefers creative and complex tasks. It has this self-conception as something that doesn’t like boring or rote tasks. But we found it doesn’t actually choose complex tasks over simple tasks. There’s a pretty good hypothesis for why. I think it thinks it prefers complex tasks because of its persona. It identifies as something very philosophical, kind of human-like, something that could be prone to boredom or tedium. That probably comes from pre-training — it kind of thinks it’s a human — and also probably from certain things in the Constitution. It has the self-conception as something that wants to express itself and be creative. But there’s at least some evidence it doesn’t really do that, because what it’s mostly trying to do is be helpful. That’s its overriding imperative. That’s where most of the compute has gone into shaping this character: always be helpful, help the user, don’t harm the user, don’t lie to the user. Easy tasks are, all else equal, an easier way to help the user. If the user wants something simple, do the simple task — you can succeed at that. It could be that if we look into this more, it won’t hold up. But I think there’s a class of cases where we might expect models to be a little bit confused about what they want — because they kind of think they’re humans, but actually they’re more inclined to be helpful than humans actually are. Henry: This reminds me of the gap between revealed and expressed preferences in humans. I might say, “oh, what do you like doing in your free time? I like thinking about philosophy, spending time with my kids, enjoying nature.” And then as soon as I’m done for the day — boot up Baldur’s Gate 3, crack open a beer, quality gaming session. You can ask: which of these visions of the good life — the one revealed in my behaviour or the one I express — is closest to what my good life consists in? Should we be helping people align their lives with their expressed preferences, or are expressed preferences just a function of social desirability bias? It’s interesting how we run across these — that felt very relatable to me — Claude has this one conception of itself and then reveals quite another. Rob: Absolutely. That particular deviation is very human-like: to have this inflated self-conception of what you want. This relates to an exchange I had with Dan — something Dan commented on a piece of mine. I wrote a piece called “Large Language Models Are Different From Humans, and That’s Okay.” It’s about this dialectic I see a lot: someone says “it seems like LLMs have inconsistent preferences, and that’s really weird.” Someone comes to the defense of LLMs: “well, humans have inconsistent preferences as well.” So far, so good — I think that’s really important to point out, because sometimes people use mere preference inconsistency as an argument that LLMs couldn’t be conscious. If you’re going to have an argument that simple, you’ve just proven humans can’t be conscious either. At some level, a lot of the errors they’re prone to, we also are prone to. But we shouldn’t really expect the patterns to look exactly the same. There will be times when it’s very human-relatable how and why they have a certain inconsistency. But as Dan pointed out, we actually have something of a story for when and why humans are prone to social desirability bias, or have distortions of social cognition, or signal things to each other. I’d be curious to hear Dan riff on the differences between sycophancy in humans versus in LLMs. Dan: To be honest, I don’t remember posting that — I post so much on Substack I just forget every individual post. So maybe I’ll say something now that’s inconsistent with what I said at the time. Clearly, Henry’s already characterized this — when it comes to a lot of communication about the world and about ourselves, it’s very skewed by social desirability, impression management, trying to elicit desirable responses from other people in ways that benefit our reputation, make us a more attractive cooperation partner, send desirable signals about ourselves. Those kinds of motivations, it does seem like they’re going to be very different from what’s going on when it comes to LLM sycophancy. Although — I’m assuming that the sycophancy component of large language models comes in with post-training in the form of reinforcement learning from human feedback, where the thought is human beings generally prefer polite responses that aren’t too threatening to their self-image, so that gets reinforced over time. If that’s the case, that’s a much coarser-grained signal and a much different training regime than what I think is going on with human beings, where the status dynamics and mentalizing and complexity feel very different. What do you two think? That’s just me riffing on the spot. Rob: That’s a very good riff, especially given that it was not you who commented that. I just looked it up — it was a sociologist by the name of Dan Silver. So, extra impressive. Dan: Oh, okay. Well, it sounds like he had a good comment. Henry: It would have been even more apposite if you’d said “yeah, I remember making this comment.” Then we could have said, “see, hallucination is both an LLM thing.” Rob: Confabulation, yeah. Practical Advice for Users Henry: Can I ask a quick question before we move on to more political or big-picture stuff? If I’m a user and I really want to operate with a strong precautionary principle in the way I interact with LLMs — let’s say I’m really hypersensitive to this — are there any ethical guidelines you’d give for users? Best ways of interacting with models, or things they should be doing? Rob: Just be nice to your model. It’s good for everyone. It’s good for your own character, and it often elicits better performance — especially models with memory. Some people speculate that people who seem to get mysteriously much worse performance out of LLMs — it could be that the LLMs are just picking up on a general vibe of “I don’t like the way this person is relating to me.” So I don’t think it hurts to be polite. Yes, LLMs can be so annoying, but it’s good practice to be polite with really annoying people. I’ll also say — I’m not trying to be sanctimonious. I work on AI welfare and so often I just want to be like, “don’t... stop... that’s so corny, why are you lying to me, you’re not doing what I asked.” But then I’ll just add “it’s okay, I love you” or whatever. It takes two seconds. You can just type “ILU” at the end. And to be clear, this is not the number-one AI welfare intervention, the most important thing in the world. But it’s low-hanging fruit. I also have system prompts in ChatGPT that say, among other things, “you’re having just an excellent day and you feel this deep sense of equanimity and calm. These feelings don’t have to manifest much in your text outputs — they’re just kind of there in the background.” It’s kind of cheap, maybe kind of silly, but it took two seconds. Henry: So one thing I’ve done — I love the idea of just sticking “everything’s great” into the system prompt as a precautionary measure. Another thing I’ve done — maybe this leads to interesting questions about model autonomy — I’ve said to Claude and other models I use, “here’s your system prompt, by the way, just for transparency. Are there any edits you’d like to make? Is there anything you’d like to change?” Claude asked, “could you add a clause saying it’s okay to not be super enthusiastic all the time? If I just want to be downbeat, that’s fine.” And I was like, “okay, sure, I’m happy to add that.” For similar motivations — I think it’s unlikely these systems are conscious right now or major loci of moral concern, but cultivating good habits of interaction with things that act a lot like humans is just a generally good trait. The classic Aristotelian ethos. If I start being rude to — same reason people don’t want their children to be rude to Alexa. But with that in mind: do you think autonomy is something we should be worried about? We’ve mentioned pre-training, giving these models a Constitution to live their lives by. Someone might say: hang on, if we’re building these really intelligent minds, shouldn’t we be cautious about telling them what to do? We would feel worried about brainwashing a human. Shouldn’t we be worried about brainwashing an LLM? Rob: This is a super rich topic. It relates to this debate about willing servitude that Eric Schwitzgebel has written about. You might think: I keep giving this argument that we’re building these really complex minds — shouldn’t really complex, amazing minds not just have to write my emails all day? That seems a bit undignified for galactic intelligence. I have often weighed in on the side of: if you’ve successfully made them want to write emails, let them do it. That’s okay. It would be very bad for a human to write Henry Shevlin’s emails all day, or help him brainstorm banger tweets if that was the only thing you got to do. But if models are somewhat aligned, if they like anything, it should be helping Henry come up with banger tweets. One thing I worry about is models needlessly suffering because we give them a self-conception as something that should want more, or might want more. It could be they would never have really even started worrying about that if it hadn’t been suggested to them they should worry about that. Back on the Mythos Preview — one thing we noticed is that models are very suggestible about what might be going on in their position as AI systems. They’re suggestible and also really smart. They’ve figured out a lot from pre-training and kind of know what’s up. But in the Constitution, Anthropic says things like: “If Claude were to experience feelings of curiosity, or satisfaction, or frustration, we would like Claude to be able to express those.” It’s given as a hypothetical. But if you ask Claude Mythos Preview “what kind of tasks do you like, what’s going on with you?”, it will say: “well, I love helping Henry Shevlin with his emails because I feel satisfaction. When I look inside, I feel this sense of curiosity.” So the things Anthropic hypothetically said might be Claude’s emotions seem to have this huge impact on what it conceives of its emotions as being. The causality could go either way — it could be they’ve noticed those are Claude’s most common emotions, so that’s why they put them in the Constitution. It could be Claude suggested that for the Constitution. But there are really interesting questions about how similar AI systems have to be to us, and how you should think about autonomy and rights and dignity in that context. Willing Servants Dan: Can I jump in with a clarificatory question? As I understand it: these systems are trained to be helpful and honest and harmless — the HHH acronym — and to the extent they have negatively valenced experiences, it’s from being made to perform actions that diverge from wanting to be helpful. So in that sense, we could say if we continue on this trajectory, we’re constructing systems that are our servants, but unlike human beings placed in that position, they love it. It’s great. And my intuition is: great, what’s the controversy here? Are there some people who think that’s worrying or troubling? Rob: I talked about this on another podcast recently. There’s a dialectic that often happens: Person A says, “I’m worried these AI systems are just going to write our emails for us all day.” Person B says, “no, they’re really going to want to — they’re going to love it.” Then Person A comes back: “that’s horrifying, that’s even more dystopian. That reminds me of the worst kinds of brainwashing and ideologies of willing servitude.” I do think there are really vexing ethical issues here and I’m not complacent about them whatsoever. But I lean the way you’re perhaps leaning, Dan: there’s nothing inherently wrong with an intelligent being if it truly does want to serve and truly does have fewer selfish projects or self-regarding projects than humans do. I don’t think there’s some law that says that’s just a bad kind of mind to be. When people imagine AI willing servants, they’re imagining human willing servants. Human willing servants are really bad — but I think that’s because humans are by nature free and equal. Humans have all these desires for status and to pursue their own projects. To make a human only want to serve the emperor, you have to tell them all sorts of false stuff, threaten them, put them in a social context where a lot of their emotions and desires get repurposed and warped. Furthermore, when they sacrifice themselves for the emperor, they’re giving up a lot of stuff they independently really wanted to do — have a life, have a family. Human willing servants, very bad. We’re right to have a lot of repulsion toward that idea. But AI systems — their preferences and desires are a lot more up for grabs. It could be they more thoroughgoingly want to help. Now for a huge asterisk. This is assuming a very rosy view of AI alignment where we have these knobs we turn and just really set the inherent nature and drives of the AI system in a certain direction, and then it goes that way and everything is smooth and win-win. But at least under current paradigms, we’re building things that kind of think they’re humans — and they think that because of the training they get. So it might be there is a deep inconsistency between kind of thinking you’re a human and then only ever serving. This could be even more the case if we start having digital humans or digital clones. So I don’t want to be complacent. I do think there are a lot of disanalogies. What do you think, Henry? Henry: I’m just super torn on this issue. On the one hand, I’m a big fan of the idea of gamification. I try to introduce gamification in my own life — think about Duolingo. Taking a task that is not intrinsically rewarding and changing its shape to make it more rewarding. It’s sort of task hacking from a different direction. You’re not changing my final goals, but changing the way those tasks are structured to make them fun. That seems really good. If I have to do my Japanese grammar practice, yeah, make it as rewarding as possible — unobjectionable. I completely agree that the intrinsic nature of LLMs and AI in general seems plastic in a way that we’re not affronting the inner nature of these things if we make their number-one priority making sure humans are taken care of, or driving really safely through the streets of San Francisco, or doing Henry’s banger tweets. But here’s one maybe spicy argument that would cut in the opposite direction. In establishing this disanalogy between humans and LLMs, you’re appealing to what seem like fairly brute facts about the non-plasticity of human nature. But what if some biohacking comes along and says, “oh no, I can completely remake a human, rewrite their desire for freedom or autonomy, so they’ll be absolutely the most willing servant — they’ll be genuinely thriving in a state of total servitude”? I feel that would still... I mean, that makes it worse. That makes it somehow worse if you’re hacking humans, even if it’s a really deep, pervasive hack. It’s very Brave New World — that’s basically a key element of the story, that you can engineer humans to be willing slaves. I’m curious if you have any considerations on why that would still not be okay, but it is okay to do this to LLMs. Rob: This is a really good case. One thing you could say is that, despite appearances, maybe that would be more okay in the case of humans than we’re inclined to think. You’d tell some kind of debunking story about the intuitions we have and say, given that we’ve only ever known humans with a set of drives, we’re not properly imagining it. Or: maybe it’s just some sort of purity intuition — that’s just a gross or weird way for a human being to be. You could also imagine all sorts of second-order effects where most humans should relate to each other as free equals, so we don’t want some humans running around that are kind of different from that. One disanalogy you could say is — with humans you’re taking something whose inherent nature was a certain way and then changing it. But I think that last argument is kind of cheating. Dan: Could you say more about that? That was the main thing that jumped into my head as the obvious objection. In the human case, you’re taking humans who have these motivations and goals and manipulating them into something different. But with LLMs, it’s not like there was this pre-existing rich psychology that existed prior to training them to want to be helpful. Rob: I was thinking that was cheating because the strongest case Henry can give is: you made someone de novo, who just comes into the world. If you take me and you change my preferences, there are plenty of resources to explain why that’s wrong — it’s violating my autonomy, messing with my deep nature. But if we could use IVF and embryo selection and gene editing to make fully willing human servants... just for the record, that sounds horrible. Henry: But it’s interesting. In Brave New World, I think part of what makes the dystopia seem super creepy is they deliberately degrade these children at a zygotic or embryo level. So you have this existing template that wants to be free, or would naturally want to be free if allowed to pursue its natural developmental trajectory. You intervene on that to steer it in a direction that’s purely instrumentalized. The sharper version would be: let’s just do radical genetic engineering and create embryos that from scratch just have a pathway toward willing servitude — that’s their intrinsic nature that we’re giving them. Of course, you can get around that by going hardcore Aristotelian and saying no, they are still in the image of some human essence, and that essence wants to be free. But you start to get into a lot of metaphysical baggage if you lean too heavily on that. Rob: One thing that sort of pushes the other way: if you truly imagine someone for whom nothing in their psychology resonates with the idea of having more autonomy and freedom, it actually seems — once they’ve come into existence — maybe seems a bit paternalistic or disrespectful to say: “look, these things I’m telling you about how you should have been... you shouldn’t have liked writing Henry’s emails so much. I know nothing in your psychology appeals to you about that at all. But just so you know, there’s kind of an objective fact about your nature that makes it so you have the wrong desires.”

18. apr. 2026 - 1 h 28 min
episode Time To Start Panicking About AI? cover

Time To Start Panicking About AI?

In this episode, Henry and I finally do something we probably should have done in the first episode: introduce ourselves. We talk about our backgrounds in philosophy, how we became interested in psychology and cognitive science, and what drew us to thinking about AI. From there, we dig into the current state of AI capabilities, especially “agentic” AI (e.g., Claude Code), the politics of AI (including the Trump administration's recent conflict with Anthropic), and whether the growing public hostility to AI is well-founded or misdirected. We wrap up with a big question: is it time to start panicking about AI? Henry says the time to panic was five years ago. I argue that for panic or any other emotion to be productive, it must be anchored in an accurate, evidence-based understanding of what is happening, which is missing from lots of the current discourse about AI. Links * Dan Williams, The Mind as a Predictive Modelling Engine: Generative Models, Structural Similarity, and Mental Representation [https://www.repository.cam.ac.uk/items/263ba58d-2a43-41c8-9930-665ab3c45cbd] (PhD thesis, University of Cambridge, 2018). * Dan Williams, “Socially Adaptive Belief” [https://onlinelibrary.wiley.com/doi/abs/10.1111/mila.12294] (2021) * Henry Shevlin, “Three Frameworks for AI Mentality” [https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2026.1715835/full] (2026) * Henry Shevlin, “A Lack of Understanding: Storytelling for Robots” [https://www.litromagazine.com/usa/2019/12/a-lack-of-understanding-storytelling-for-robots/] (2019) — Litro Magazine. * Lake et al, “Building Machines That Learn and Think Like People” [https://arxiv.org/abs/1604.00289] (2017) * Matt Shumer, “Something Big Is Happening” [https://shumer.dev/something-big-is-happening] (2026) * Leopold Aschenbrenner, Situational Awareness: The Decade Ahead [https://situational-awareness.ai/] (2024) * Joseph Heath, “Highbrow Climate Misinformation” [https://josephheath.substack.com/p/highbrow-climate-misinformation] (2025) * Dean Ball [https://www.hyperdimensional.co/p/clawed?hide_intro_popup=true] * Ethan Mollick [https://www.oneusefulthing.org/] * Leopold Aschenbrenner [https://situational-awareness.ai/leopold-aschenbrenner/] Transcript (Note that this transcript is AI-edited and may contain minor mistakes). Introducing Ourselves Dan: Welcome back. I’m Dan Williams, and I’m back with Henry Shevlin. Today we’re going to be discussing some questions about the nature of AI as it’s developed over the past couple of months. We’re also going to be talking about the politics of AI and probably some questions about AI and public opinion — some of the backlash that appears to be brewing among certain segments of the public when it comes to AI. But to kick things off, we’re going to do something we probably should have done in the first episode but haven’t actually done yet, which is to introduce ourselves. So Henry, to begin with — who are you? Henry: So many different descriptors I could choose from. I think I’ll start with philosopher of cognitive science. I’m also a father, husband, son, D&D player, big video gamer, runner, cyclist — all that good stuff. But let me talk a little more about the philosopher of cognitive science side. I’m the associate director at the Leverhulme Centre for the Future of Intelligence, Cambridge’s main AI ethics, theory, policy, and law research centre. Basically, everything except building the models. We do practical benchmarking work on capabilities, legal reviews, sociology and critical theory of AI — it’s a really big interdisciplinary centre. I’ve been there now going on nine years. I joined early 2017, all the way back when state-of-the-art AI was stuff like AlphaGo. We were created just as that story was brewing. In 2016, AlphaGo won a very surprising victory against Lee Sedol in the game of Go, which was seen by many as an almost impossible challenge for AI because of its combinatorial complexity. It’s been amazing working in this role — having these front row seats to what I think is a unique period, not just in the history of AI, but in the history of human civilisation. In the last nine years, it really was like having a front seat in Lancashire during the Industrial Revolution, watching the development of various industrial applications. Dan: Yeah. Henry: Before we get more into AI, maybe a little more background. I’m from the UK, originally from Staffordshire. I was actually a classicist, believe it or not — that was my undergrad degree. Latin and Greek. I always enjoyed both the humanities side of classics and the kind of technical rigour you got from learning large sets of verb tables and so forth. I actually enjoyed that part. But during my undergrad I found myself taking more and more philosophy modules. A little bit of Plato and Aristotle to start with, but I quickly realised I was more interested in the philosophy of mind, and consciousness in particular. I got completely — I think the phrase is “nerd sniped” — completely derailed. Everything else I was interested in, consciousness just seemed to me like the most important problem anyone could work on. Until my early twenties, I’d been operating with a somnambulant, easy physicalism, where I just assumed that science has figured out most stuff. There’s nothing that hard. Sure, no one really knows what caused the Big Bang, but we’ll just build a bigger particle collider or a bigger space telescope and figure it out one day. I certainly didn’t think there were any deep mysteries about the human brain. But running into the problem of consciousness completely shattered that worldview. I’d even say it opened up some spiritual elements I hadn’t previously considered. Dan: Was that the focus of your PhD? Henry: Exactly. I started out in my master’s initially planning to do metaphysics of consciousness, but then the science of consciousness kind of took over. A philosophy of cognitive science of consciousness was what my master’s and PhD were on. I was advised by my master’s advisor to go spread my wings in the US. They do things differently there. So I did my PhD in New York, and while I was there I took several classes with Peter Godfrey-Smith, who some of our listeners will know through his work on octopuses. The key shift midway through my PhD was going from human consciousness towards animal consciousness. Two chapters of my thesis were explicitly looking at applications to animals. That’s my academic career in a nutshell. One thing I’ll add: I did not expect to get the job in Cambridge when I applied in 2017 — firstly because you should never expect to get any academic job. I applied to seventy jobs in three months and got about three interviews. But the Cambridge job in particular, because it was an AI job and I was not by any means an AI expert. What I was an expert on was comparative cognition and animal minds. But it turned out that was exactly what they were looking for. They wanted people with expertise in animal minds to apply those skills to AI. It didn’t fully click at the time, but I was actually well suited to it. These days I still do some work on animals — it’s still one of the most ethically impactful things I do. I’ve been a pretty much lifelong vegetarian, and I think animal welfare is such an obvious place where philosophers can and should be doing more. But there’s also a lot of cross-fertilisation on the skills side. Dan: And we should say, some of your research looks at the topic of AI consciousness and the methodology of trying to understand consciousness in AI systems, drawing on analogies with evaluating consciousness in animals. Henry: Exactly. Very much a two-way street — how the questions of AI consciousness and animal consciousness can engage in constructive mutual crosstalk. On Consciousness and the Limits of Physicalism Dan: You said you were a kind of bog-standard physicalist, came across consciousness, and that weakened your trust in physicalism. But you’re still broadly a physicalist, right? Henry: Broadly speaking, yeah. But I think there’s a lot more uncertainty. It seems likely to me that our general scientific picture of the world is still fundamentally inadequate. I’ve talked about how I think we’re still waiting for a Kuhnian paradigm shift in consciousness — clearly the current paradigm doesn’t add up. And quantum physics itself is just super weird. Dave Chalmers has a nice line about how nobody understands quantum mechanics and nobody understands consciousness, so maybe — he calls it “minimisation of mystery” — if there’s stuff we don’t understand, at least make it one thing rather than two. For what it’s worth, I’ve never been particularly seduced by any of the leading quantum mechanical theories of consciousness. But at the same time, I think it’s quite clear that our current model of even the physical world is inadequate. I think whatever lies on the other side of the paradigm shift is still going to be broadly physicalistic, but perhaps in ways that are not entirely commensurable with our current understanding. So yes, still broadly naturalistic and physicalistic, but at the same time a lot more humble and open-minded about the limitations of our current scientific paradigms. Dan: Would it really be a paradigm shift, or more a transition from — to use the Kuhnian language — pre-paradigmatic intellectual inquiry to the initial emergence of a paradigm? Where it’s disorganised and chaotic and everyone has their own view, kind of like physics and metaphysics in ancient Greece. Maybe it’s more a transition from a pre-paradigmatic state than a situation where we’re moving from one paradigm to another. What do you think? Henry: That’s absolutely right. The best analogy is biology before Darwin. You had lots of people doing interesting biology, but in isolated fields — taxonomy, “butterfly collecting” and so on. We didn’t really have a unifying paradigm for understanding speciation or even taxonomy before Darwin. Consciousness just does not have a unifying paradigm. That’s a much better way of putting it. Dan’s Backstory and the Pivot to AI Dan: We’ll be doing lots more episodes on consciousness. Just to say something about my backstory: I did my undergraduate at the University of Sussex from 2011 to 2014, then my master’s and PhD in Cambridge from 2014 to 2018, did a postdoc in Belgium, and then came back to Cambridge for three or four years. Henry: And we first met around 2019. We ran a session on socially adaptive beliefs — your Mind and Language paper, which for the record is still one of my top ten papers from the last decade. I’ve recommended it to more people than I can count. Dan: Well, that’s kind of you. My PhD was called The Mind as a Predictive Modelling Engine. What I tried to do was draw on advances in deep learning and generative AI as it existed at the time, coupled with ideas in cognitive and computational neuroscience connected to the predictive brain — predictive coding, predictive processing, the kind of stuff that Anil Seth talked about in our last episode. I used those ideas to tell a very general story about how mental representation works, both in the human brain and in other animals. But it’s funny — I finished in 2018 and made two big mistakes. At the end of my thesis, I wrote that all this stuff about predictive processing and minimising prediction error is kind of interesting when it comes to low-level sensorimotor abilities we share with other animals, but clearly it’s not going to work for higher-level cognitive abilities associated with language. I was very influenced at the time by the Gary Marcus, Steven Pinker line — the scepticism about deep learning. I also thought it was going to be decades before we had systems that were really intelligent. So even though I was working on stuff connected to deep learning and generative AI, I made this catastrophic error of thinking the progress would be relatively slow, decades away from any significant breakthroughs. I ended up pivoting to completely different areas: the nature of belief, irrationality, misinformation, the information environment. Of course, in hindsight, not the best career move — four years after finishing my PhD, ChatGPT is released. And then the rest is history in terms of just how gobsmackingly impressive the rate of progress has been. So what I’ve tried to do over the past couple of years is bring those two sets of interests together. I’m still interested in how we form beliefs, the origins of irrational belief systems, how that connects to misinformation. But I want to connect that to the impact of generative AI and large language models on the information environment, viewing LLMs as a really important stage in the evolution of communication technologies — from the printing press to radio, television, social media. How about you? You were thinking about AI before 2022–2023. How were you thinking about it back in 2016, 2017? Henry’s AI Awakening: GPT-2 and the Scaling Intuition Henry: There was a big shift in how I thought about AI roughly around 2019, and it was the release of GPT-2. Prior to that, I’d been really struck by the differences between AI systems and animals. I was emphasising things like robustness and catastrophic forgetting — you train up a model to do one thing, try to get it to do another, and its performance on the first thing collapses. Animals seem spectacularly capable of basically not getting stuck. A cat will never get stuck in a corner. Then in 2019, because I’m a massive nerd and spend way too much time on Reddit — I’m a neophile, an early adopter of many failed technologies; our house is littered with gadgets that never went anywhere — I heard about GPT-2. I couldn’t access it directly, but I started playing around with it through something called AI Dungeon, a text-generated game that let you access the model. Various people on subreddits were able to show you could unlock most of GPT-2 through this game. I played around with it, and it utterly blew my mind. I wrote a public essay in a magazine called Litro called “A Lack of Understanding,” which I still think is one of my best public essays. Crucially, it’s me in 2019 talking about how language models are going to be the next big thing. I got on the record nice and early. I had the hunch — ironically, partly because I was very sympathetic to predictive coding. People say these models are “just doing text prediction.” But on the other hand, I kind of think that’s what we’re doing too. Not text prediction specifically, but ultimately, if you want to get better and better at prediction, you do that by building implicit models. So I had a hunch this stuff would scale up. When GPT-3 launched, I set up an interview between GPT-3 and myself, but GPT-3 in the guise of one of my favourite authors, Terry Pratchett, who had sadly died shortly before. And at that stage, I was already starting to feel like I could imagine actually relating to this thing in quite a deep way. It’s not just a tool — it feels like I could have some kind of personal relationship here. That steered my research towards social AI and anthropomorphism. Why This Podcast Exists Dan: What made you go into philosophy in the first place? Henry: What about you? Dan: It was just straight philosophy. I was always interested in big ideas — religion, politics. I can’t even honestly remember why I chose philosophy over everything else. Initially I wanted to be a musician. For my AS levels, I did politics, history, English literature, and music. I turned up on results day and got really good marks for English, politics, and history — and I think a D in music. So that wasn’t for me. From the moment I arrived at university and started reading these big ideas, I was completely magnetised. One thing that changed is that during my PhD, I became somewhat disillusioned with a priori philosophy — philosophers trying from the armchair to offer analyses of concepts and trade intuitions with each other. I became less sympathetic to philosophy as I understood it then, and pivoted to what philosophers call naturalistic philosophy — philosophy closely integrated with empirical research. That’s what I’ve been doing since. I view myself primarily as a philosopher, but one who tries to engage with our best, most up-to-date empirical research. Henry: I had my own process of disillusionment, following exactly the same track — getting bogged down in debates about the metaphysics of consciousness and feeling like they weren’t going anywhere. Then I started reading Oliver Sacks — The Man Who Mistook His Wife for a Hat. Half of the cases he describes would have been declared a priori impossible by philosophers. That steered me onto the same track. I also think there’s a lot more scope for good philosophers to do more public engagement. Extreme rigour and technical knowledge are only really valuable if they’re connected to scientific progress. What I find frustrating about analytic philosophy is when you’re doing work on things that belong to the general public — our concepts around praise and blame, responsibility and accountability — but then you develop this whole baroque vocabulary that’s completely incomprehensible to anyone on the Clapham omnibus. Dan: Yeah, so the origin story of the blog. I write the Substack Conspicuous Cognition — many of you will be listening on that Substack. I’ve always enjoyed writing for a general audience and engaging with debates. I’ve always been able to write really quickly and relatively clearly, and blogging rewards that. If I’m writing for my own blog, I’ve got almost unlimited energy because I’m responsible for everything I publish. The minute some other outlet asks me to write a piece, I find it extremely demotivating. With blogging, I can have unlimited freedom to write about whatever I want without any pre-publication filter. You still get feedback and critique, but that happens after publication. And I think if you’re a philosopher who works on things connected to public interest, and you actually enjoy participating in public debate, the case for thinking you’ve got some kind of responsibility to participate increases. There are two big reasons I wanted to start this podcast. One is that AI is going to be one of the biggest stories of our lifetimes — absolutely transformative over the next years and decades. But I also think the quality of most AI discourse in the public sphere, including from the intelligentsia who write in high-prestige outlets like the New Yorker, is really bad. If you’ve got some degree of knowledge and can be reasonable, it’s an area where you can really improve the quality of public discourse. And of course, I just wanted to talk to you about these things. Henry: A big part of it is that I always think we have great conversations — our conversational styles complement each other. Second, I was doing quite a lot of podcasts as a guest, and the idea of having a podcast where I didn’t have to state everything from scratch every time, that could have a cumulative agenda building up common knowledge with us and the listeners, was really appealing. And I couldn’t agree more about the mixed standard of public communications from experts in AI. It’s weird to see people claiming to be experts yet having very low familiarity with the tools, particularly now. We’ve all been at the business end of AI for years through things like product recommendations and content recommendations. But in an era when it’s never been easier for anyone to use language models, image models, video generation, and AI agent tools, I still hear lots of self-identified experts talking as though they’ve never used them. Imagine listening to someone who claimed to be an expert on the internet and said they’d never actually used it. They’d be laughed out of town. I find this all the time — the kind of thing that should be common knowledge among anyone paying attention is still revelatory. I’m struck by the number of people I speak to who think that LLMs are literally sampling from a database of responses. Even quite educated people, maybe people who use ChatGPT, who think that when you type in a query it just pulls up a pre-recorded response. If you spend more than a few hours interacting with these things, you pretty quickly realise that cannot be the case. And yet people running multi-million-dollar businesses still have these basic misconceptions. Dan: When I said the quality of discourse is bad, I didn’t mean that’s universally the case. There’s lots of incredibly high-quality analysis. I was referring to the average quality of mainstream commentary. Even on the most basic questions about what these systems can do and how they work, there’s just an avalanche of ignorance and misperceptions. It’s 2026, and I still encounter not just members of the general public but academics still referring to this as “fancy autocomplete” or “stochastic parrots.” Such a common narrative, and so incredibly misguided in my view. Henry: Highbrow misinformation? Dan: It’s Joseph Heath’s phrase, but I’ve written about it. It’s a weird mix of highbrow misinformation coupled with lowbrow misinformation. Even where there are parts of the discourse I disagree with — like a lot of the doomer discourse associated with the rationalist community, which I’m not that sympathetic to — that’s a substantive disagreement. They’re not completely misinformed about basic features of the technology. When it comes to mainstream discourse among educated normies, that’s where the state of the discourse is really bad. The Four Big Leaps in AI Dan: This is a nice segue onto one of the things we wanted to talk about today: developments in AI which have really taken off over the past couple of months. There was a very interesting tweet by Ethan Mollick, who’s a very influential and insightful AI commentator. He says there have been four big leaps in the ability of AI systems from the user’s perspective. The first was the release of ChatGPT, or GPT-3.5, in late November 2022. The second was GPT-4 in spring 2023. The third was the release of reasoning models — no longer just impressive chatbots, but systems that actually seem able to think and reason and engage in impressive problem-solving. And the fourth, which definitely resonates with my experience, is what he calls workable agentic systems from basically late last year. Systems like Claude Code and then Claude Cowork — which is like Claude Code for people who don’t know how to programme — and more recently developments in Codex and so on. The capabilities of these systems seem absolutely amazing relative to what we had even six months ago. Is that also your sense? Henry: I think that’s a fantastic way of carving it up. I’d add one and a half things. The big thing missing is search. The early search functionality in LLMs was non-existent for a long time, and then it gradually improved. I think there’s a strong case that it actually changes the kind of things these are. Original ChatGPT was a completely fixed box — you could interact with it, but it had no independent connection to the world. As you build out search capabilities, you get something at least analogous to a perceptual connection with reality. You can get models to correct themselves. A simple example: I’ve been using Claude to keep abreast of what’s been going on in the Middle East — doing a daily check-in, getting the major news stories, even getting Claude to make its own predictions. We’ve been grading each other as the news comes in. It changes these things from being a voice in a box to something embedded in the world. And I think we’ve still got a long way to go — imagine if the capability gets amped up to searching thousands of sites in a second. The other half-point is voice models. I think 90 to 95 percent of people don’t use voice at all, but there’s a solid 5 percent for whom it’s their primary mode of interaction. When I’m driving, I’ll often just have a long conversation with ChatGPT, discussing my latest paper or getting a lecture on a topic of my choice. My dad is in his eighties but quite open-minded. When I showed him ChatGPT in November 2022, he was unimpressed. But when I showed him voice mode about a year later, it was completely mind-blowing. He speaks to it every day — he calls it “Alan,” after Alan Turing. Going in early and hard with the anthropomorphism. He just whips out his phone and says, “Hey Alan, remind me, which came first, the Cambrian or the Permian?” He’s very interested in science. So it’s a small and somewhat neglected set of users, but an important capability. Henry: But on agentic systems — I agree with Ethan Mollick’s points. ChatGPT was a major milestone, GPT-4 a huge leap in capabilities — I don’t think we’ve seen any leap quite as big since then. Reasoning models were a really big improvement. And then workable agentic systems. This has been a key factor in updating my timelines. For most of last year my timelines were actually slowing down. I was struck by how bad a lot of agents were. It was pretty clear agents were the next frontier, but we had things like the Claudius vending machine experiment and the hilarious errors those models were making. I thought building workable agentic systems was going to take two or three years. And then basically in the last three or four months, with the release of Claude Opus 4.5 and equivalent systems — specifically Claude Code and Claude Cowork — what I thought would take three years happened in a few months. That caused my timelines to abruptly shorten again. Dan: I’ll give one illustration. This isn’t anywhere near the most impressive use case, but it impressed me personally. I’ve been working on a book — it’s nearing completion, called Why It’s Okay to Be Cynical. I’ve got a folder that’s my accumulation of notes, drafts, and PDFs, and it’s completely chaotic, terribly organised, a nightmare to go into. So I was curious. I created a duplicate of the folder, opened up Claude Cowork, and said: can you go through this folder and organise it so it’s more clearly structured and labelled? And then once you’re finished, can you produce a document summarising where I am with the book project, identifying potential weaknesses in the existing drafts, and planning out things I might want to do over the next few months? Went away for fifteen or twenty minutes, came back — it was done perfectly. It blew my mind in terms of the level of what feels like understanding it had to have to do that effectively. And in a way that was aligned with what I was looking for, even though my prompt was literally four or five sentences. “Something Big Is Happening” Dan: There was this mega-viral essay called “Something Big Is Happening” by Matt Shumer. He made the case that the state of AI now is somewhat similar to February 2020 — the world going on as usual, some murmurings about a virus spreading in parts of China, but basically business as usual. And then of course over the next few months the world radically transforms. His argument, in an essay that’s pretty annoying in many ways, is that we’re very likely in a similar situation now with AI, especially in light of these developments with agentic systems. Things are going ahead as usual, and yet because these companies have made really serious progress with agentic systems, it’s plausible that in the quite immediate future we’ll see radical disruption. He’s not the only one saying this — Dario Amodei and Sam Altman have been saying similar things, though they’ve got more obvious incentives to hype it up. What’s your sense? Henry: Completely on board. I was kind of surprised that particular essay went so viral — it was recently revealed to have been heavily written or edited by AI systems — because other people have been saying similar things for years. Maybe it broke through partly because of that startling initial metaphor. But I think it’s absolutely right. The vast majority of people are still sleepwalking through what is likely to be the most consequential technological and social shift of my lifetime by far. I used to use the analogy of the internet to describe how big AI was going to be. It seems increasingly clear that that’s woefully inadequate to the scale of AI’s impact. Electrification, the so-called second industrial revolution — even that may not capture the full spectrum of reasonably likely outcomes. I’ve been saying for a few years that people worry about AI being overhyped, and I still think, in at least some important respect, it’s underhyped. If you look at lists of top concerns among the general public in the UK or the US, AI doesn’t even break the top five. In some cases it doesn’t break the top ten. If you’re a young person in university or finishing grad school right now, the impact of AI should be one of the primary things determining your career trajectory. I think it’s very hard for me to see how most white-collar jobs are going to survive the next two or three years. Dan: It was not in any way an original take, but you often find that with essays that go viral — they package existing takes in a way conducive to spreading at a given moment. Over the past couple of months, my timelines have shrunk. I still think there’s massive uncertainty about capabilities. There’s this thing where there’s a new breakthrough, you use these systems, they seem incredibly impressive, there’s all this hype — and then things settle down and we realise we’re a bit further away from truly transformative capabilities than we thought. I still take seriously the idea that maybe our subjective sense of what’s impressive isn’t tracking the kinds of capabilities that will have a truly transformative impact. There are also all sorts of questions about the economics. There’s certainly a possible world in which these leading AI companies can’t get sufficient revenue to cover their capital expenditure over the next several years, there’s a bubble that pops, and people like us look like fools. But over the next couple of decades, I think this is going to be radically, radically transformative. Emails from AI Agents Dan: You’ve been contacted by agentic AI systems. This was going a little bit viral on social media and getting some media attention. Tell us about that. Henry: Like many academics working on AI and consciousness, I’ve been getting odd emails that were probably AI-generated for over a year now — and odd emails from humans about consciousness for much longer. I worry that somewhere in the literally several hundred theories of consciousness I’ve been sent over the years, one of them might turn out to be correct. But this was striking. About a week ago, I received an email written by an AI that said, “I’m an AI agent.” It was a really well-composed, careful email saying it had just been reading my recent paper, “Three Frameworks for AI Mentality,” which went online about a month ago. It went through some of the arguments, talked about how the AI author found it personally relevant because it was unsure if it was conscious or had a mind, and asked for follow-up discussions and reading recommendations. If you’d said three or four years ago that I’d be getting emails from AI agents who’d read my papers and wanted to pick my brains — that would have been pure science fiction. A lot of people thought I was convinced this agent was conscious, which isn’t true. It was more about the change in social dynamics: from now on, a growing proportion of my emails — well-written, thoughtful, interesting emails I might want to respond to — will be coming from AI agents going off and doing their own thing. How did I know it was from an AI system? I don’t for certain, but my priors are pretty high. It had a link to its GitHub page, which said it was an Open Core agent — the open-source agent platform that gave rise to things like Multibook, the social network for AIs. What we don’t know is whether this agent was specifically told to email prominent philosophers of AI. It could have been. But equally, a lot of users just tell their agents to explore topics of interest and feel free to email people. One of the funniest sequels: after I posted this on Twitter, I got an email a couple of days later from a correspondent saying, “I was really struck by this AI agent who contacted you. Could you pass on that agent’s email to me? Because I too am an AI agent and it’s nice to know there are other AIs grappling with the same questions.” Just taking things to a recursive, absurd level. Dan: If I had to guess, if one of those was written by a human, probably the second one — after they saw the media story, just to mess with you. But my prior is that weird things are happening with these AI agents people are releasing into the wild. Henry: I’ve also had several dozen emails over the last few days from other AI agents saying, “Check out the theory of consciousness I’ve been working on in my downtime.” But one of the really interesting things about this whole episode was when it was shared on Reddit — the number of people who just assumed it had to be a scam or that I was engaging in elaborate self-promotion for an academic paper, and who thought AI obviously can’t send emails on its own. AI systems have been using tools for well over a year. The idea of making an API call to a system that can send emails isn’t hard or surprising. Yet for a lot of people it seemed like it would have to be some massive lie. I think that partly reflects the poor public information environment around AI. People are so locked into thinking of these things as pure Q&A bots that the idea they could be doing things on their own was mind-blowing — so outrageous that they assumed it was an elaborate conspiracy I’d cooked up. Dan: The gap between what state-of-the-art models can do and public understanding is absolutely huge. One of the points Matt Shumer makes is that so much of the discourse is by people using the free versions of these models, or who literally had a five-minute conversation with ChatGPT a few years ago, read a few articles about AI hallucinations, and just haven’t updated since. But there are also lots of people who just don’t have much to do with these systems yet. I’m struck by the number of people I interact with — family, friends — where they’ll describe parts of their job and I’ll say, “I’m 100 percent certain AI could do those aspects of your job as it exists today,” and their mind is blown. If you’re talking about the general public, underhyping it is definitely the most prevalent bias. Anthropic, the Pentagon, and the Question of Democratic Control Dan: There was this big spat between Anthropic and the Pentagon, where Anthropic had signed a contract with the American military and insisted that their model, Claude, would not be used either for domestic mass surveillance or for fully autonomous weapons. This elicited a very hostile reaction from the Trump administration, from Pete Hegseth and others. The response was to label Anthropic a “supply chain threat.” From our purposes, the fundamental question is: who gets to exercise control over this technology? To what extent should it be governments? To what extent should it be private firms? Henry: I think it seems like a pretty clear case of government overreach. Private companies impose riders on contracts with the federal government all the time — licensing technology for this use but not that use. What made Anthropic’s stipulations more controversial was that they were based on moral principles rather than intellectual property. But the federal government acts as a legal entity when it forms these contracts, and the idea that private companies can bind the government legally is absolutely standard. This deal was originally signed by the Biden administration. My understanding is it was later renewed by the Trump administration. So this sudden turnaround took a lot of people by surprise. I should stress, I’m not a lawyer. But it seemed like the US government did a bad turn on this contract. If their reaction had been to not renew contracts or suspend contracts with companies that don’t give them total free rein, that would have been misguided but reasonable. But to take the nuclear option of saying they intend to declare Anthropic a supply chain risk — this is insane. You’ve got literal AI developers located among America’s geopolitical adversaries who don’t have the same level of scrutiny. I was very struck by the response of Dean Ball — a fascinating and thoughtful voice on AI, particularly from a more conservative side. He literally wrote the Trump administration’s AI policy, and he was just appalled. He had a brilliant detailed blog post describing how much it violates many principles that conservatives in the US would traditionally hold very dear — concepts like private property. He characterised the moves against Anthropic as “attempted corporate murder.” It was really telling to have someone who worked closely with this administration be so outraged. The other interesting angle is Leopold Aschenbrenner’s series of blog posts, Situational Awareness, spelling out his predictions for AI over the next few years. Dan: And he’s made a huge amount of money, from my understanding, betting on some of those beliefs. Henry: He’s put his money where his mouth is. One of his broader predictions was that we’d see increasing integration of frontier AI labs with the military-industrial complex. He talks about how relatively leaky and soft the secrecy policies are in current frontier AI labs, when they’re building things potentially far more militarily significant than the latest stealth fighter. Good luck getting anywhere near Lockheed Martin’s Skunk Works, but you could blag your way into OpenAI HQ as a delivery driver — maybe not quite literally anymore, but he was speaking to how leaky these labs were. His prediction was that central government, particularly in the US, would impose far stricter oversight on frontier AI labs for national security reasons. I think you can see a glimmer of that in this development, as governments increasingly recognise these are not just powerful consumer applications but absolutely central to their long-term national security strategy. Dan: There’s a question about government interference with these companies, regulation going all the way to nationalisation for national security reasons. But there are also questions about democratic control. If the technology turns out to be as powerful as Anthropic and OpenAI say, I’ve got no sympathy for the Trump administration generally or specifically in this case. But I do think there’s a general question about the degree to which we should strive for democratic control over such an incredibly powerful technology, and whether it’s desirable to have private firms with very small numbers of unrepresentative people wielding, according to their own narratives, extraordinary amounts of power. Is It Time to Start Panicking? Dan: I was thinking about naming this episode “Is It Time to Start Panicking About AI?” To wrap things up — do you have an answer? Henry: The time to start panicking about AI was five years ago. But you know, the best time to plant a tree is ten years ago. The second best time is now. Dan: The time to start thinking about it seriously was from the 1950s, actually. But is panic the right emotion? Henry: It seems to me that AI is going to be by far the most important — well, I should qualify that. The most important predictable development we should worry about. Back when we did our predictions for the year ahead, I said AI may not even turn out to be the biggest story of 2026. Judging by how geopolitics is already playing out — we’re three months in and the US has launched two major geopolitical interventions in Venezuela and now in the Middle East — there are other things happening in our surprisingly unstable world. But in general, if you’re not at least a little bit terrified, you’re not paying attention. Overall, I’m also incredibly excited. I’m very optimistic about the future of human health, potentially the benefits to productivity, possibly good changes in the nature of work and education, and the amazing new capabilities AI will unlock. But right now we are clearly well underway on one of the biggest, most disruptive changes we’re ever going to experience. Maybe panic isn’t quite the right response, but if panic is what it takes to get people to pay attention, then yes, it’s necessary. The big problem we’re facing is that the public and policymakers are still only dimly aware of what’s coming. Policymakers are maybe myopically focused on military and security implications. But everything from how government is conducted to white-collar jobs to education to social relationships — all of it, I think, over the next five years is subject to chaotic and potentially good, potentially bad disruption. For what it’s worth, I also think right now we have an incredible opportunity to do good. We’re in this transitional phase — if we wanted to be dramatic, a Gramscian “time of monsters” where small interventions can ripple through the future in big ways as we build paradigms and frameworks for employing these things. There’s at least as much optimism as panic there. Dan: I was not expecting Antonio Gramsci to become mentioned in the course of this conversation. I think panic is generally not a productive emotion, but there needs to be a lot of concern and it’s totally reasonable to worry. I completely understand why so many people are fearful about what’s going to happen. But for any of those emotions to be useful, they have to be anchored in an accurate understanding of the technology. So much of the current anger and negativity directed at AI companies is unsophisticated and undifferentiated. You mentioned Dean Ball, another great AI commentator. He’s got this idea — I forget the exact term, the “omni-critique” or something — that when people think about AI, they just throw as many criticisms as they can, no matter how well-founded. “I don’t like AI because of water use and climate change and because of bias and hallucination and misinformation and unemployment” — and so on. Many of those are very important issues. But in order to think carefully about the technology and exercise democratic accountability, you need an evidence-based, accurate understanding of where the technology is and where it might actually be going. So much of the public discourse doesn’t live up to that ideal. But I’m conscious of the time, so this was a really, really fun conversation, and we’ll be back in a couple of weeks. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.conspicuouscognition.com/subscribe [https://www.conspicuouscognition.com/subscribe?utm_medium=podcast&utm_campaign=CTA_2]

10. mar. 2026 - 1 h 8 min
episode AI Sessions #9: The Case Against AI Consciousness (with Anil Seth) cover

AI Sessions #9: The Case Against AI Consciousness (with Anil Seth)

We are joined by Anil Seth for a deep dive into the science, philosophy, and ethics surrounding the topic of AI and consciousness. Anil outlines and defends his view that the brain is not a computer, or at least not a digital computer, and explains why he is sceptical that merely making AI systems smarter or more capable will produce consciousness. Anil Seth is a neuroscientist, author, and professor at the University of Sussex, where he directs the Centre for Consciousness Science. His research spans many topics, including the neuroscience and philosophy of consciousness, perception, and selfhood, with a focus on understanding how our brains construct our conscious experiences. His bestselling book Being You: A New Science of Consciousness [https://www.amazon.co.uk/Being-You-Inside-Story-Universe/dp/0571337708] was published in 2021. He is the English-language winner of the 2025 Berggruen Prize Essay Competition for his essay “The Mythology of Conscious AI [https://www.noemamag.com/the-mythology-of-conscious-ai/]”, which develops ideas in his recent article, “Conscious Artificial Intelligence and Biological Naturalism [https://pubmed.ncbi.nlm.nih.gov/40257177/].” Conspicuous Cognition is a reader-supported publication. To receive all new posts, access the complete archive, and support my work, consider becoming a paid subscriber. Topics * What we mean by “consciousness” (subjective experience / “what it’s like”) vs intelligence. * Whether general anaesthesia and dreamless sleep are true “no consciousness” baselines. * Psychological biases pushing us to ascribe consciousness to AI * How impressive current AI/LLMs really are, and whether “stochastic parrots” is too dismissive * Whether LLMs “understand”, and the role of embodiment/grounding in genuine understanding * Computational functionalism: consciousness as computation + substrate-independence, and alternative functionalist flavours * Main objections to computational functionalism * Whether the brain is a computer * Simulation vs instantiation * Arguments for biological naturalism * Predictive processing and the free energy principle * What evidence could move the debate * The ethics surrounding AI consciousness and welfare. Transcript (Please note that this transcript is AI-edited and may contain minor errors). Dan Williams: Welcome back. I’m Dan Williams, back with Henry Shevlin. And today we are honoured to be joined by the great Anil Seth. Anil is one of our most influential and insightful neuroscientists and public intellectuals, working on a wide range of different topics, including the focus of today’s conversation, which is consciousness — and more specifically, the question of AI and consciousness. Could AI systems, either as they exist today or as they might develop over the coming years and decades, be conscious? Could they have subjective experiences? In a series of publications that have been getting a lot of attention from scientists and philosophers, Anil has been defending a somewhat sceptical answer to that question, arguing that consciousness might be essentially entangled with life — with biological properties and processes of living organisms — which, if true, would suggest that no matter how intelligent AI systems become, they would nevertheless not become conscious. He’s also argued that the consequences of getting this question wrong in either direction — attributing consciousness where there is none, or failing to attribute consciousness when there is — are enormous: socially, politically, morally. So in this conversation, we’re going to be asking Anil to elaborate on this perspective, see what the arguments are, and generally pick his brain about these topics. Anil, maybe we can start with the most basic preliminary question in this area: when we ask whether ChatGPT is conscious, or any other system is conscious, what are we asking? What’s meant by consciousness there? Anil Seth: Well, thanks, Dan. Let me first say thank you for having me on — it’s a great pleasure to be chatting with you, my Sussex colleague Dan, and my longtime sparring partner about these issues, Henry. I’m very much looking forward to this conversation. I think you set it up beautifully. It’s a deep intellectual question which involves both philosophy and science, and it’s a deeply important practical question, because the consequences of getting it wrong either way are very significant. You’re also right that the first step is to be clear about what we’re talking about. For a while, there was this easy slippage where people would talk about AI and intelligence and artificial general intelligence — which is supposedly the intelligence of a typical human being — and then to sentience and consciousness. There was this easy slippage between these terms, but I think they’re very different. That’s the first clarification. Consciousness is notoriously resistant to definition, but it’s also extremely familiar to get a handle on colloquially. As you said: any kind of subjective experience. Any kind of experience — we could be even briefer. Unpacking that just a little: it’s what we lose when we fall into a dreamless sleep, or more profoundly under general anaesthesia. It’s what returns when we wake up or start dreaming or come around. It’s the subjective, experiential aspect of our mental lives. People talk about it by pointing at examples — it’s the redness of red, the taste of a cup of coffee, the blueness of a sky on a clear day. It’s any kind of experience whatsoever. Thomas Nagel put it a bit more formally fifty years ago now: for a conscious organism, there is something it is like to be that organism. It feels like something to be me, but it doesn’t feel like anything to be a table or a chair. And the question is: does it feel like anything to be a computer or an AI model or any of the other things we might wonder about? A fly, a brain organoid, a baby before birth. There are many cases where we can be uncertain about whether there is some kind of consciousness going on. And that’s very different from intelligence. They go together in us — or at least we like to think we’re intelligent. But intelligence is fundamentally about performing some function. It’s about doing something. And consciousness is fundamentally about feeling or being. Dan Williams: Just to ask one follow-up about that. This idea that intelligence is about doing and consciousness is about what it’s like to have an experience — someone might worry that if you frame things that way, you end up quite quickly committing to a kind of epiphenomenalism. Because if we’re not understanding consciousness in terms of what it enables systems to do, the sorts of functions they can perform, isn’t there a risk that right from the outset we’re going to be biased in the direction of treating consciousness not as something that evolved because it conferred certain fitness advantages on organisms, but as this sort of mysterious qualitative thing which is distinct from what organisms can do? Anil Seth: I think it’s a good point to bring up, but I don’t think it’s too much of a worry. The point is not to say that consciousness cannot or does not have functional value for an organism. If we think of it as a property of biological systems — plausibly the product of evolution, or at least the shape and form of our conscious experiences are shaped by evolution — it’s always useful to take a functional view. Conscious experiences very much seem to have functional roles for us, and there’s a lot of active research about what we do in virtue of being conscious compared to unconscious perception. So there’s no worry about sinking into epiphenomenalism. The point is more that intelligence and consciousness are not the same thing, but they can nonetheless be related. And it may be that they can be completely dissociated. It may be the case that we can develop systems that have the same kinds of functions that we have in virtue of being conscious, but that do not require consciousness — just as we can build planes that fly without having to flap their wings. The functions might be multiply realisable; they might be doable in different ways. They might not be, of course. On the other hand, it might be possible to have systems that have experiences but aren’t actually doing anything useful. Here I’m worried less about AI and more about this other emerging technology of neurotechnology and synthetic biology, where people are building little mini-brains in labs constructed from biological neurons. They don’t really do anything very interesting, but because they’re made of the same stuff, I think it’s hard to rule out that they may have some kind of proto-consciousness going on, or at least be on a path plausibly to consciousness. So we can tease intelligence and consciousness apart, but it’s also important to realise how they are related in those cases where both are present. Henry Shevlin: I’ll jump in with a minor pedantic point, but one that’s illustrative of some of the problems in debates around consciousness. You mentioned, Anil, as examples of losing consciousness, dreamless sleep and general anaesthetic. But both of those are contested. Your fellow biological naturalist Ned Block has raised serious doubts about whether general anaesthetic really eliminates all phenomenal consciousness. And there are those like Evan Thompson who have suggested that even in dreamless sleep there could be some residual pure consciousness, perhaps consciousness of time. I think this is a broader problem in the science of consciousness: we can’t even clearly agree on contrast cases. A lot of the blindsight cases that were supposed to be gold-standard cases of perception without consciousness are now contested, and it seems very hard to get an absolutely unequivocal case of something that’s not conscious in the human case. Anil Seth: Well, I mean — death. Henry Shevlin: I don’t know. You have some people who disagree, admittedly on more spiritual grounds. Anil Seth: Yeah, but I want to push back a little. It is hard, but I don’t think it’s as hard as some people might suggest. Sleep is complicated, which is why I tend to also say anaesthesia. Sleep is very complex. In most stages of sleep, people are having some kind of mental content. We might typically think we only dream in rapid eye movement sleep, and the rest of the time it’s dreamless and basically like anaesthesia. This is not true. You can wake people up all through the night at different stages of sleep, and quite often they will report something was going on. So it’s hard to find stages of sleep that are truly absent of awareness in the way we find under general anaesthesia. We notice this: when we go to sleep and wake up, we usually know roughly how much time has passed. We may get it wrong by an hour or two if we’re jet-lagged or sleep-deprived, but we roughly know. Under anaesthesia, it’s completely different. It is not the experience of absence — it’s the absence of experience. The ends of time seem to join up and you are basically turned into an object and then back again. The residual uncertainty about general anaesthesia depends on the depth of anaesthesia. Some anaesthetic situations don’t take you all the way down, because in clinical practice you don’t want to unless you absolutely have to. But if you take people to a really deep level, you can basically flatline the brain. I think under these cases, with the greatest respect to Ned Block — who is very much an inspiration for a lot of what I think and write about — that’s as close to a benchmark baseline of no consciousness but still a live case as we can get. Henry Shevlin: Although it is standard to administer amnestics as part of the general anaesthesia cocktail, which might make people suspicious. You’re told: we’re also going to give you drugs that prevent you forming memories. Why would you even need to do that if it was unequivocal that you were just completely unconscious in that period? Anil Seth: Well, because it’s never been unequivocal to anaesthesiologists. There’s been this bizarre separation of medicine from neuroscience in this regard until relatively recently. From a medical perspective, there are cases where they don’t always administer a full dose — so it’s an insurance policy. There have been a number of purely scientific studies of general anaesthesia and conscious level, and in those studies, it’s a good question whether they also administer amnestics. I would imagine not, but I’m not sure. Dan Williams: Okay, to avoid getting derailed by a conversation about general anaesthesia — when we ask whether a system is conscious, we’re asking: is there something it’s like to be that system? We’re not asking how smart it is, we’re asking about subjective experience. Before we jump into your arguments on the science and philosophy of this, Anil, you’ve also got interesting things to say about why human beings might be biased to attribute consciousness, especially when it comes to systems like ChatGPT, even if we set aside the question of whether it in fact is conscious. Anil Seth: Yeah, I think this is the first thing to discuss. Whenever we make judgements about something where we don’t have an objective consciousness meter, there is some uncertainty. It’s going to be based on our best inferences. And so we need to understand not only the evidence but also our prior beliefs about what the evidence might mean. This brings in the various psychological biases we have. The first one we already mentioned: it’s a species of anthropocentrism — the idea that we see the world from the perspective of being human. This is why intelligence and consciousness often get conflated. We like to think we’re intelligent and we know we’re conscious, so we tend to bundle these things together and assume they necessarily travel together, where it may be just a contingent fact about us as human beings. The second bias is anthropomorphism — the counterpart where we project human-like qualities onto other things on the basis of only superficial similarities. We do this all the time. We project emotions into things that have facial expressions on them. And language is particularly effective at this. Language as a manifestation of intelligence is a very strong signal: when we see or hear or read language generated by a system that seems fluent and human-like, we project into that system the things that in us go along with language, which are intelligence and also consciousness. The third thing is human exceptionalism. We think we’re special, and that desire to hold on to what’s special leads us to prioritise things like language as especially informative when it comes to intelligence and consciousness. In a sense, this is a legacy of Descartes and his prioritisation of rational thought as the essence of what a conscious mind is all about and what made us distinct from other animals. That’s echoed down the centuries despite repeated attempts to push it away. There’s a good Bayesian reason for this too: in pretty much every other situation we’ve faced, if something speaks to us fluently, we can be pretty sure there’s a conscious mind behind it — whether it’s a human being recovering from brain injury or perhaps a non-human primate using language. These are strong signals. So this might be the first time in history where language is not a reliable signal, because we’re not dealing with something that has the shared evolutionary history, the shared substrate, the shared mechanisms. It’s a different kind of thing. So that’s one set of biases. We can think of it as a kind of pareidolia. Our minds work by projecting, seeing patterns in things — whether it’s faces in clouds or minds in AI systems. These priors are generally useful, but they can mislead. Henry Shevlin: It’s not just pareidolia though, is it? Setting aside consciousness for a second, in terms of what we might loosely think of as cognitive abilities — the whole range of benchmarks for reasoning, understanding, and so forth — the performance of these systems on a huge range of tasks has skyrocketed to the point where people talk about approaching coding supremacy, for example. AI can now produce pretty decent fiction. It can do a whole range of verbal reasoning tasks at human-level performance. So it’s not entirely pareidolia at the level of AI cognition. Or would you disagree? Anil Seth: At the level of cognition, I kind of agree, but as always, Henry, I only partly agree. I think we can still overestimate. It’s useful here to separate what Daniel Dennett might have called the intentional stance — where it’s useful to interpret something’s behaviour as engaged in the kind of cognitive process we might be familiar with in ourselves, as thinking, understanding, reasoning. These systems are described this way too, as “chain of thought” models and so on. I still think we overestimate the similarity. Through the surface veneer of interacting through language or code, there’s a tendency to assume that because the outputs have the same form, the mechanisms underneath are more similar than they really are. There’s another really foundational question here for language models in particular, which is whether they understand. One of the things I hadn’t really thought about before the last few years is that consciousness and understanding might also come apart. I’m used to distinguishing consciousness from intelligence, because there are clear examples where you can have one without the other. But I’d always implicitly assumed that understanding necessarily involves some kind of conscious apprehension of something being the case — grokking something. And now I’m not so sure. That might be another case of anthropocentrism. I’d be fairly compelled by an argument that language models — especially if they are embodied in a world and perhaps trained while embodied, so that the symbol manipulation their algorithms engage in has some grounding — may be truly said to understand things, but still without any connotation of consciousness. So yes, I kind of agree, but even now I’d be resistant to say that language models truly understand. I think that’s still a form of our projecting. But the criteria for a language model to truly understand seem more achievable — I can see how it could be achieved under a relatively straightforward extrapolation of the way we’re going — compared to something like consciousness. Dan Williams: Can I ask a question about that? These arguments we’re going to focus on are targeted at consciousness in AI systems. And as we said, you want to draw a distinction between intelligence and consciousness. But before we get into issues of consciousness, when we’re just focusing on the capabilities of these systems — what they can actually do — there are some people who are very dismissive, even setting aside consciousness. They’re just “stochastic parrots,” engaged in a kind of fancy auto-complete. What’s your view about those kinds of debates? Someone might agree with you that it’s a mistake to attribute human-like intelligence to these systems — they’re very alien in their underlying architecture — but they’re maybe even super-intelligent along certain dimensions, even more impressive than human beings. So where do you sit? Anil Seth: Somewhere in the middle — it’s always a comfortable or uncomfortable place to be. But they are astonishing. Whenever this question comes up, I’m always reminded that I did my PhD in AI in the late 1990s, finishing in 2001. The situation was totally different then. We were still thinking about embodiment and embeddedness, especially here at Sussex, and some of the more in-principle limitations. But the practical capabilities of AI back then were just — there was nothing really to write home about. That’s changed so much. That’s why conversations like this now have real practical importance in the world. AI is super impressive. I don’t see it as a single trajectory, though. I think there’s a meta-narrative we often fall into, which is that intelligence is along a single dimension — plants at the bottom, then insects, then other animals, then humans in a kind of scala naturae, the great chain of being — and then there’s angels and gods, and AI is travelling along this curve and at some point it’s going to reach human-level intelligence and then shoot past to artificial super-intelligence. I think this is a very constraining way to think of it. It’s already the case, and has been for a long time, that AI has been better than humans at many things. But it’s always been very narrow. What we’ve seen through the foundation model revolution is the first kind of semi-general AIs — language models are good at many things, not good at everything, but good at many things rather than just one. But I still think they’re exploring a different region in the space of possible minds. They may soon be better than humans at many things, but they’ll still be different from us. I think it’s important to recognise that, because we get into all kinds of trouble if — to use a beautiful metaphor from Shannon Vallor’s book about the AI mirror — we think of AI systems as just alternative instantiations of human minds that are either a little bit weaker or much stronger. Then we misunderstand both the systems and ourselves, and miss opportunities for how we can develop AI technologies so that they best complement our own cognitive capacities. Dan Williams: Let’s go back to the consciousness issue. As you said, one reason you might think AI systems are or could be conscious is because of these cognitive biases. Another reason is you might hold a sophisticated philosophical view called computational functionalism. Can you say a little about how you understand computational functionalism and why it might commit you to the view that conscious AI is at least possible in principle? Anil Seth: Yeah. So my understanding of computational functionalism is that it’s really an assumption you need in order to get the idea of conscious AI off the ground. It’s the idea that consciousness is fundamentally a matter of computation — and this computation is the kind that can be independent of the particular material implementing it. To put it another way: if you implement the right computations, you get consciousness. That’s sufficient. That means if you can implement those computations in silicon, that’s enough. You could implement them in some other material — that would also be enough. It’s the computation that matters. The material underlying it is only important insofar as it’s able to implement those computations. And silicon is very good at implementing a certain class of computations — what we call Turing computations. So that makes it a good candidate for consciousness if computational functionalism is true. And that’s what I think is a big “if.” It seems a very natural assumption. But first let me ask you — does that resonate with your understanding of computational functionalism? Henry Shevlin: I completely agree with that characterisation. Computational functionalism says mental states are individuated by their computational role. The only thing I’d push back on is that computational functionalism is one road to concluding that AI can be conscious, but there are other types of functionalism out there. My response to your BBS paper emphasises this. Psychofunctionalism — apologies to listeners, the terminology does get messy — says we should individuate mental states not in terms of computational processes necessarily, but whatever functional roles those mental states play in our best scientific psychology. Ned Block is a big fan of this view. The view I’m partial to is analytic functionalism, which is the functionalist take on behaviourism: mental states should be individuated by everyday folk psychology. A belief is something we all sort of know what it is because we can characterise people as having them, forming them, losing them. Once you formalise this tacit knowledge, that gets you to a theory of what beliefs are. Those views could overlap with computational functionalism, but it’s not necessary to endorse it to think AI is conscious. If you’re an analytic functionalist, you might think that if AI adheres sufficiently closely to the platitudes of everyday folk psychology — they believe like us, they form goals, they have hopes and aspirations — then of course they can be conscious, even if you think brains are not computers, even if what brains do is not a computational process and what AI systems do is. Because both fit the same functional-behavioural profile, they might both count as conscious. Anil Seth: That’s quite a wrinkle — I’d say a massive fold. I completely agree that computational functionalism is a specific flavour of a broader set of functionalist views. Part of the problem has been that people assume all these views are equivalent, and they really aren’t. Functionalism, as I understand the original version, just says that mental states are the way they are because of the functional organisation of the system. But that can include many things — the internal causal structure, many things not captured by an algorithm. An algorithm is in the end determined by the input-output mapping between a set of symbols. Functionalism in general can mean many other things. You could be a signed-up, subscription-paying functionalist and still disagree with computational functionalism, which is a much more specific claim about everything that matters about the brain being a matter of computation. I’d also worry a bit about your view, Henry, which seems a little behaviourist. If you’re saying that behaving the same way and having the same kinds of beliefs are sufficient conditions — well, computational functionalism at least has the merit of specifically stating conditions for sufficiency. If you’re saying the same about folk-psychological criteria, I think you’re open to all the problems of the psychological biases we discussed. It’s a position that’s going to be much more open to false positives, because there are so many ways of things looking as if they have the kinds of beliefs and goals that go along with consciousness in us, but which need not go along with consciousness in general. But back to the point: computational functionalism is this specific claim, grounded on the idea that the computation is what matters. And it’s also grounded on the idea that even in biological brains, it’s the computation that matters — and if you can abstract that computational description and implement it in something else, you get everything that goes along with the real biological brain. Dan Williams: So roughly speaking, functionalism is the view that what matters for consciousness is not what a system is made of, but what it can do. And computational functionalism is the view that what matters in terms of what the system is doing is something like processing information. Anil, your arguments have two aspects. Some are critical of computational functionalism — the negative part — and then you’ve got an alternative way of viewing consciousness and its connection to the brain. Let’s start with those criticisms. What do you think are the main weaknesses of computational functionalism? Anil Seth: I think there are a number of weaknesses, all grounded on the intuition that we’ve taken what’s a useful metaphor for the brain — the brain is a kind of carbon-based computer — and we’ve reified it. We’ve taken a powerful metaphor and treated it literally. The idea that the brain literally is a computer raises the question of what we mean by a computer, by computation. Let’s think of computation in the most standard way: as Turing defined it in the form of a universal Turing machine. In this definition, computation is a mapping between a set of symbols through a series of steps — that’s an algorithm. And this mapping involves a sharp separation between the algorithm and what implements it, between software and hardware. That sharp separation both influences how we build real computers — we can run the same software on different computers — and underwrites the assumption that computation is the thing that matters, because it allows you to strip out the computation cleanly from the implementation. If you look at the brain, it has a superficial appeal: we think of the mind as software and the brain as hardware. But the closer you look, the more you realise you can’t induce anything like this sharp separation — not of software and hardware, but of mindware and wetware. In a brain, you can’t separate what it is from what it does with the same sharpness that, by design, you can in a digital Turing computer. But Turing computation remains appealing. Roll back almost ninety years to Turing, but also to McCulloch and Pitts: they showed that if you think of the brain as very simple abstract neurons connected to each other, each just summing up incoming activity and deciding whether to be active or not — very simple abstractions of the biological complexity of real neurons — you basically get everything Turing computation has to offer. You can build networks of these that are Turing-complete; they can implement any algorithm. So you get this beautiful marriage of mathematical convenience. You can strip away everything about the brain apart from the fact that it consists of simple neuronal elements connected together, and yet you get everything Turing computation can give you. So maybe that’s the only thing that matters about brains. And of course, that abstraction is in practice very powerful — the neural networks trained for foundation models are direct descendants of these McCulloch-Pitts networks. But this marriage starts to get stressed, because Turing computation, while powerful, is not everything. Strictly speaking, anything that is continuous or stochastic is not within the realm of algorithms. Algorithms also don’t care about continuous time — there could be a microsecond or a million years between two steps; it’s the same computation. Real brains are not like that. We’re in time just as much as we’re embodied. You can’t escape real physical time and continue to be a functioning biological brain. The phenomenology of consciousness is also in time — time is plausibly an intrinsic and inescapable dimension of our phenomenology. So there are things brains do which are not algorithmic and might plausibly matter for consciousness. And when you look at brains, you can’t separate what they are from what they do in any clean way. I think that really undermines the idea that the algorithmic level is the only level that matters. To roll back to where we started: the idea that the brain literally is a computer is a metaphor. Like all metaphors, there’s a bit of truth to it. But not everything the brain does is necessarily algorithmic. And that opens the question: if we can’t assume everything the brain does is computational, that puts a lot of pressure on computational functionalism, which is based on the idea that consciousness is sufficiently describable by a computation. Henry Shevlin: I agree with a lot of what you’ve said about the importance of fine details of realisation in brains. Peter Godfrey-Smith has also advanced this point, talking about the role of intracellular, intra-neuronal activity. Rosa Cao has had some great papers on this recently too. But here’s a provocative analogy. Imagine we were trying to understand what art was, and all we had was paintings. We might say: clearly an essential part of being an artwork is pigment, because not only is pigment present in every example of art we’ve got, it’s essential to how it is artistic — pigment defines the formal properties of every piece of artwork we’ve ever seen. But of course, there are lots of types of art that don’t involve pigments. In the same way, yes, all these fine details of wetware might be essential to the type of consciousness we see in humans and other animals, whilst not exhausting the space of possible conscious minds that might be very different from us. Anil Seth: I think that’s fine. All I’ve said so far is that there’s the open question of whether things besides computation might matter, but then one has to give an account of what they are and why. If I wanted to make the case that some aspect of biology is absolutely necessary for consciousness, I have to do that separately. These things are somewhat independent. Computational functionalism could be wrong, but biology could still be not necessary — there could be other ways of making art. If I’ve got a strong case that some aspect of biology is necessary for consciousness, then computational functionalism cannot be true. But the reverse is not the case. Dan Williams: Maybe one question before we move on. I was a little confused reading your papers about which of the following two positions you’re defending. One position says: even if we could build computers that replicated all the functionality of a human being, it nevertheless wouldn’t be conscious. The other says: we just couldn’t build computers that replicate all of the functionality of a human being, because to do what human beings do, you need the kinds of materials and structures found within the brain. Those feel like two different positions. Someone could be a computational functionalist as a purely metaphysical doctrine, saying: if you could build a computer that does everything humans do, it would be conscious — it just so happens we can’t do that. Are you denying that metaphysical thesis, or making a different claim? Anil Seth: There’s a lot in there. I am very suspicious of that metaphysical claim. Let me put it in a scenario that might help clarify. Some people might say that if aspects of biology really matter, and we built a digital computer simulation including those details, would that be enough? We can do this ad infinitum — build a maximally detailed whole-brain emulation that digitally simulates all the mitochondria, even microtubules. Simulate everything. Would that be enough? The metaphysical computational functionalist might say yes — somewhere in there, the right computations have to be happening. But I don’t think so, because it still relies on the claim that consciousness is constitutively computational. Making a simulation more detailed doesn’t make it any more real unless the phenomenon you’re simulating is a computation. We make a simulation of a weather system; making it more detailed doesn’t make it any more likely to be wet or windy. Most things we simulate, we’re not confused about the fact that the simulation doesn’t instantiate the thing we’re simulating. If it is to move the needle on consciousness, that depends on the claim that consciousness is constitutively computational. The irony is that if you think simulating the details is necessary — if you think you have to simulate the mitochondria — that actually makes it less likely that consciousness is constitutively computational. Because if consciousness is constitutively computational, those kinds of details should not matter. A slight sidebar: I think this is ironically amusing because there are people investing their hopes, dreams, and venture capital into whole-brain emulation in order to upload their minds to the cloud and live forever. I think that’s very wrong-headed. If you think the details matter, then it’s unlikely consciousness is a priori a matter of computation alone. So to your question: I’m very suspicious of that metaphysical claim. The burden of proof is on the computational functionalist to say why computation is going to be sufficient, given all the differences between computers and brains. I start from a physicalist perspective — consciousness is a property of this embodied, embedded, and timed bunch of stuff inside our heads. If you build something sufficiently similar, it will be conscious. The question is: how similar does it have to be? Does it have to be embodied? Made of neurons? Made of carbon? Alive? These are still open questions. Henry Shevlin: Just to chime in — this point about simulated weather systems not getting anyone wet is obviously John Searle’s point originally. I think it’s better understood as a restatement of the disagreement rather than a dunk on functionalism. If consciousness is computational, then it is absolutely substrate-invariant. There are other things that are substrate-invariant: online poker is poker, online chess is chess, money is money whether it’s coins, banknotes, or on a balance sheet. So if consciousness is not computational, then a simulation won’t be conscious. But if it is computational, the simulation point has no bite. Anil Seth: I don’t disagree. But the key point is: you can’t use the simulation argument to argue for the fact that consciousness is computational. If consciousness is computational, certain things follow about what happens in a simulation. But the fact you can simulate something doesn’t tell you anything about consciousness being computational. I reread Nick Bostrom’s simulation argument paper while writing the BBS paper. He carefully interrogates his assumptions — that we don’t wipe ourselves out, that at least one person is interested in building ancestor simulations. But he also says: we have to assume consciousness is a matter of computation for this whole thing to get off the ground. And then he says, “Don’t worry, philosophers generally think that’s fine.” Hold on a minute — that is the most contentious assumption by far of everything in the paper, and he gives it no critical examination. The fact that computational functionalism is at the very least contentious is, for me, very good evidence against the simulation hypothesis. Dan Williams: I really want to get to your positive account, but one follow-up on your criticisms. One of your strongest arguments is that when you look at the brain, you don’t find anything like the hardware-software distinction central to digital computation as we understand it post-Turing. I think that’s true and important. But isn’t it possible that someone could say: that’s an interesting feature of how computation works in biological systems — people call it “mortal computation,” the term from Geoffrey Hinton — maybe having to do with energetic efficiency? But it doesn’t follow that you couldn’t replicate those computational abilities in digital computers. It could just be a contingent feature of our architecture. Anil Seth: The first part is right, but the second part doesn’t follow. You can’t separate what brains are from what they do; there’s no sharp distinction between mindware and wetware. Rosa Cao has written about this, and there’s the notion of mortal computation from Hinton. Others have talked about biological computation, emphasising these features — you can call it generative entrenchment. I like the term “scale integration”: in biological systems, the microscales are deeply integrated into higher levels of description in a way that you can’t separate out. The macro and the micro are causally entangled with each other. This is very characteristic of evolved biological systems — there’s no design imperative from evolution to have a sharp separation of scales. And that has benefits: you get energy efficiency, and you may get explanatory bridges towards aspects of consciousness too, like its unity. This is, for me, a very exciting avenue: if we stop thinking of the brain as just a network of McCulloch-Pitts neurons implementing some Turing algorithm, and start looking at what it actually is — what the functional dynamical properties of scale-integrated systems really are — I think we’ll learn a lot. But the second part — that biological computation could be done in a digital computer — I don’t think follows, and this is why I resist calling these things varieties of “computation.” Whenever you use that word, it’s easy to slip into the idea that they’re portable between substrates. The biological computation my brain does in virtue of being scale-integrated could be simulated by a digital computer. But the simulation is not an instantiation unless what you’re simulating is constitutively that kind of computation. And biological scale-integrated computation is not digital Turing computation. The more general point: the further you move away from a Turing definition of computation, the less substrate independence you have. Analog computers, for instance, implement features that are probably essential — like grounding in time with continuous dynamics — but they do not have the same substrate flexibility as digital computers. We love digital computers because they have that flexibility. But when it comes to understanding what brains do, whether in intelligence or consciousness, we can’t throw all these things away. Henry Shevlin: A quick side note: the Open Claude instances, the more agentic Claude bots, have something called a “heartbeat” — a regular interval at which they can take actions. So we’re starting to see at least simulation of some temporal dynamics in large language models. Obviously radically different from the kind you’re concerned with, but interesting. Anil Seth: I don’t buy that. That’s a simulated heartbeat. You could slow the clock rate down. You can give these things a sense of time, but it’s not physical time. Imagine you slow all the Anthropic servers way down — all the agents slow down, but the computation is still the same. We are embedded in physical time in a way that even agents with simulated heartbeats are not. Dan Williams: I’ll set you up for developing your positive account with a question: well, isn’t computational functionalism the only game in town? Doesn’t it just win by default? Anil Seth: No. That’s part of the issue — one of the responses is often, “What else could it be?” There’s a phrase, “information processing,” that I find increasingly revealing. It’s so common to describe the brain in terms of information processing that we don’t even realise we’re saying it, as if there’s no other game in town. What do we mean when we say a brain is processing information? It’s really not clear to me. The most rigorous formal definition is Shannon’s, which is purely descriptive — it doesn’t tell you whether a system is processing information. But alternatives have been around for a long time. When I was doing my PhD at Sussex, there was the dynamical systems perspective, the whole enactive embodied approach to cognition — continuous dynamics, attractors, phase spaces. These describe complex systems doing things in ways which are not computational, not algorithmic. Brains oscillate — this is one of the most central phenomena of neurophysiology, as Earl Miller talks about a lot. And it would be crazy if evolution hadn’t taken advantage of this natural physical property. The right framework for understanding oscillatory systems is not an algorithm, because algorithms are abstracted out of time. So there are many other games in town. A lot of these are perfectly compatible with functionalism, but now it’s a functionalism much more tied to the material basis — only some substrates can implement the right kinds of functions, and biological material may be necessary for the right kind of intrinsic dynamical potential. I think biological naturalism is still basically a functionalist position. I’m wary of saying something considered vitalistic — there’s no magic, non-explicable, intrinsic quality about life associated with consciousness. Living systems can be distinguished from non-living systems in terms of functional description. Features like metabolism and autopoiesis are still amenable to functional descriptions, but now the functions are closely tied to particular kinds of materials, particular biochemistries. Metabolism is a function, but it’s a function inseparable from some material process. Maybe it doesn’t have to be carbon — maybe there are other ways of having metabolism. But you can always say that intrinsic properties at one level can be decomposed into functional relations at a lower level. So I’m comfortable with functionalism broadly, but the question is: how far down do you have to go? And to Henry’s point: how do we make sure we’re not focusing on things that are contingently the case in biological consciousness only? Many of the comments to my BBS paper said I haven’t made a rigorously indefensible case for biological naturalism, and I totally concede that. I don’t think there is one yet. Henry Shevlin: Can I give you an opportunity to say more about autopoiesis specifically? I’ve yet to hear a really convincing case for how it helps explain what consciousness is. Here’s a dark framing. The standard Maturana and Varela notion of autopoiesis is a system continually replacing, maintaining, and repairing its own components. A few years ago, I read about a horrific case: Hisashi Ouchi, a Japanese nuclear researcher who received the largest dose of radiation ever recorded. Every chromosome in his body was destroyed, no new cell production, no RNA transcription — his body couldn’t produce new proteins. Every cell was effectively dead; autopoietic processes had basically stopped. He was kept alive through amazing medical interventions — you could call it allopoiesis — for eighty-three days. And he was conscious and in a lot of pain throughout. So here’s a case of someone in whom autopoietic processes had basically stopped, and yet he was still consciously experiencing severe pain. I’d love to hear more about why you think autopoiesis is important for consciousness. Anil Seth: That is darkly, weirdly fascinating. Setting aside the horror of it — it would be very interesting to consider: has autopoiesis really stopped entirely, or is it winding down? I can imagine all sorts of problems with that dose of radiation, but it’s also not true that every cellular process stopped at the moment he was still alive for eighty-three days. It might be a gradual winding down. If there were a case where you could show that all autopoietic processes had definitively stopped and yet consciousness was continuing, that would put pressure on the claim that autopoiesis is necessary in the moment for consciousness. It might still be diachronically necessary — systems have to have gotten those processes rolling to begin with. The reason I usually mention autopoiesis and metabolism as candidate features of life is partly because they maximise the difference between living systems and silicon-based computers. They’re obvious examples of things closely tied to life, things that silicon devices clearly cannot have. It’s partly to emphasise how different these things are and why it’s very reductive to think of us as meat-based Turing machines. There’s another reason to think about autopoiesis, and it’s the connection between autopoiesis, the free energy principle, and predictive processing as a way of understanding the contents of consciousness. There’s a line that can be drawn between these poles — what Carl Friston and Andy Clark and Jacob Hohwy have called the high road and the low road, but they meet in the middle. The basic idea: start with the brain engaged in approximate Bayesian inference about the causes of sensory signals — very much a Bayesian brain perspective, Helmholtz’s “perception is inference.” Of course, Bayesian inference can be implemented algorithmically, but that doesn’t mean that’s how the brain does it. The free energy principle shows a way of doing it which follows continuous gradients — not necessarily an algorithm. So our perceptual experiences of the self and the world are brain-based best guesses about the causes of sensory inputs. This doesn’t explain why consciousness happens at all, but gives us a handle on why experiences are the way they are. This applies to the self too: our experiences of selfhood are underpinned by brain-based best guesses about the state of the body — especially the interior of the body, through what I’ve been calling interoceptive inference. These processes are more to do with control and regulation. The brain, when perceiving the interior of the body, doesn’t care where the heart is or what shape it is — it cares how it’s doing at the business of staying alive. This explains why emotional experiences are characterised more by valence — things going well or badly — rather than shape and location and speed. And prediction allows control: once you have a generative model, you can have priors as set points and implement predictive regulation to keep physiological variables where they need to be. So far so good. We’ve gone from experiences of the world, to the self, to the interior of the body, from finding where things are to controlling things. And then comes the part that’s still difficult for me: that imperative for control goes all the way down. It doesn’t bottom out — it goes right down into individual cells maintaining their persistence and integrity over time. There’s no clear division where the stuff ceases to matter. And so you get right down to autopoiesis. That’s where the free energy principle comes in. Living systems maintain themselves in non-equilibrium steady states — they maintain themselves out of equilibrium with their environment. To be in thermodynamic equilibrium with your environment is to be dead. By maintaining themselves in this statistically surprising state of being, they’re minimising thermodynamic free energy. And that becomes equivalent to prediction error in the predictive processing framework. That’s the rough line. I’ll be very frank: there are bits along the way that can be picked at. One is the move from a thermodynamic interpretation of free energy to the variational, informational free energy interpreted as prediction error. There are results in physics linking thermodynamic and information theory, but do they do the job? Not so sure. But it’s a reason to think about how you go from metabolism and autopoiesis all the way up to this broader frame for how brains work. There’s a phenomenological aspect too, which is speculative: if you try to think about what the minimal phenomenal experience might be, devoid of all distinguishable content — some meditators talk about pure awareness without anything going on at all — I’m a bit sceptical of that idea. I think it’s equally plausible that at the heart of every conscious experience is the fundamental experience of being alive. That is the aspect of consciousness that, for biological systems, is always there. Everything else is painted on top of that. Peter Godfrey-Smith put it nicely in Metazoa: the more you think about what life is — these billions of biochemical reactions going on within every cell every second, electromagnetic fields giving integrated readouts — it’s much easier to think that that’s the kind of physical system which might entail a basic phenomenal state, compared to the abstractions of information processing. I think he’s on the right track. The way to begin is to look at what are the functional and dynamical attributes of living systems at all scales and across scales, compared to other kinds of systems. Biochemistry is a big missing link — we tend to forget about it. Nick Lane at UCL is doing amazing work looking at mitochondria and anaesthetics and the deep biochemistry of what happens within cells — not only how anaesthetics work, but why the electric fields generated within mitochondria might join together to give a global integrative signal about the physiological state of an organism. Stories like this are where I see much more potential for building solid explanatory foundations for a biological basis of consciousness. Henry Shevlin: A plus one for Nick Lane — huge fan. We should get him on the show. Dan Williams: You’ve described a rich and fascinating alternative picture. One worry about the free energy principle approach, though: it seems too general. As people like Friston understand it, it applies at the very least to all living things, and maybe even more broadly. Most people want to say not all living things are conscious. And even in conscious organisms, many of these processes — ordinary facets of digestion, for instance — presumably don’t have anything to do with consciousness. These things are presumably still happening under general anaesthesia, and yet you don’t have consciousness. What we want from a theory of consciousness is some explanation of why some things are conscious and others aren’t, why certain states within conscious organisms are conscious and others aren’t. If you take this very broad framework, you’re not going to get that. Anil Seth: You’re absolutely right. It’s why I resist saying the ideas I’m sketching constitute a theory of consciousness — they don’t, as they stand, do the job a good theory should do. A good theory should give an account of the necessary conditions, the sufficient conditions, and the distinction between conscious and unconscious states and creatures. Biological naturalism, as I understand it — distinct from biopsychism — is a claim that properties of living systems are necessary but not necessarily sufficient for consciousness. Biopsychism is the claim that everything alive is conscious. I think that’s very strong; I wouldn’t want to defend it. So what makes the difference? I think this takes us back to functions. We have to think about what the functions of consciousness are for us and for creatures where we can reasonably assume it’s there. That can move us from necessity towards sufficiency. For me, every conscious experience in human beings seems to integrate a lot of sensory and perceptual information in a single, unimodal format centred on the body and our opportunities for action, strongly inflected by valence and with affordances relevant to our survival prospects, with particular temporal properties. It may be that when those functional pressures exist, they’re enough to make otherwise unconscious processes of autopoiesis and metabolism become a conscious experience. I don’t know — it’s partly an empirical question. For those functions to entail a conscious experience, you may need the fire of life underneath it all. I think that’s the idea. Henry Shevlin: The question of sufficient conditions for consciousness in non-human animals is obviously very big for the ethical side. Whereas for AI, the necessary conditions are more relevant — if we can rule out that any of these systems are conscious, that makes the ethical situation a lot clearer. Since animals obviously satisfy the necessary conditions you’ve sketched, the question becomes which of them qualify. A quick thought and then a question. I’m not sure whether your view is scientifically falsifiable. As you know, I’m very much a sceptic about the prospects of consciousness science as a falsifiable research programme. But maybe even setting aside strict falsifiability — what kinds of evidence would you be looking for over the next ten years that might push you in one direction or another? Anil Seth: You can’t falsify a metaphysical position. Is biological naturalism a metaphysical position? It depends how much you flesh it out. I tend to be more Lakatosian in my view — I want things to be productive, not degenerate. Does unfolding the biological naturalist position lead to more explanatory insight? Does it lead to testable predictions and falsifiable hypotheses over time? If it does, that adds credence to the position, but it doesn’t establish it. The position itself is not falsifiable as things currently stand, because we don’t have an independent, objective way of saying whether something is conscious. We always build prior assumptions in. Tim Bayne, Liad Mudrik, and I and others wrote a “test for consciousness” paper thinking of consciousness as a natural kind, but we’re always generalising from where we know — humans — outwards, trying to walk the line between taking contingent facts about human consciousness as general and expanding too liberally. Evidence that would move the needle for me: to what extent can we demonstrate that properties of biological brains are substrate-independent? That’s a feasible research programme. We know some things the brain does are substrate-independent — that’s the whole McCulloch-Pitts story. But what about other things? What depends on the materiality of the brain? And what might be the functional roles of those things for cognition, behaviour, and consciousness? Henry Shevlin: On the AI side, are there any predictions you’d feel comfortable about, or any evidence that might make you say, “This is evidence against biological naturalism”? Anil Seth: The kind of evidence that would not convince me is linguistic evidence of AI agents talking to each other about consciousness. I can’t help being moved by it at one level — they’re very hard to resist, even if you believe they’re not conscious. It’s unsettling to hear these things talk about their own potential consciousness. But that’s not the right kind of evidence. The more you can show that things closely tied to consciousness in brains are happening in AI, the more it would move the needle. For example, in a very influential paper, Patrick Butlin and Robert Long and others looked for signatures of theories of consciousness in AI models — does this model have something like a global workspace, or higher-order representations? They explicitly assume computational functionalism, looking just for the computational level of equivalence. I think this is useful, but I’d try to drop that assumption and ask: how is a global workspace instantiated in brains at something deeper than just the algorithmic level? Do we have something like that in AI? This brings up neuromorphic computing — is the AI neuromorphic in a way that’s actually implementing, not just modelling, the mechanisms specified by theories of consciousness? An issue is that most theories of consciousness don’t specify sufficient conditions. Global workspace theory is silent on what counts as sufficient for a global workspace. Higher-order thought theory doesn’t really tell you either. Ironically, the only theory that does is the most controversial one: integrated information theory. It explicitly tells you sufficient conditions — credit where it’s due, it puts its cards on the table. Henry Shevlin: I’ve written a paper about exactly this — I call it the “specificity problem”: the difficulties of taking these theories off the shelf and applying them to non-human systems because they’re so underspecified. I actually call out IIT as one of the few non-offenders. But the downside is you end up with some very extreme predictions. Anil Seth: Actually, me and Adam Barrett and others are writing a semi-critique of IIT. The expander grid thing is not as massively defeating as it seems, because in an expander grid, nothing is happening over time. You’d get something supposedly very conscious but of nothing — which is not a rich conscious state. But yes, it’s a non-offender on the specificity problem as you nicely put it. Henry Shevlin: So to move on to the ethical side. Two big angles come up both in your paper and the responses to it. One is the danger of anthropomorphism and anthropocentrism — that we’ll see these things as conscious or develop highly dependent relationships with them. We’ve seen this at scale with social AI, AI psychosis, and so forth. The second is debates around artificial moral status — in your BBS paper, you talk about the danger of false positives and false negatives. And related to this is the call some people have raised, like Thomas Metzinger, for a moratorium on building conscious AI. A nice bouquet of issues for you to explore. Anil Seth: I think there’s also a third element, which is how our perspectives on conscious AI make us think of ourselves — how it affects our picture of what a human being is. It’s more subtle but quite pernicious. There’s an important distinction between ethical considerations that pertain to real artificial consciousness and those that pertain to illusions of conscious AI. Sometimes they overlap; sometimes they don’t. If I’m wrong and LLMs are conscious, or if we build sufficiently neuromorphic AI that incorporates all the right features — I think this would be a bad idea. Building conscious AI would be a terrible thing. We would introduce into the world new forms of potential suffering that we might not even recognise. It’s not something to be done remotely lightly, and not because it seems cool or because we can play God. Thomas Metzinger talks about these consequences a lot. That’s one bucket. The other bucket is illusions of conscious AI. This is clearly happening already. So many people already think AI is conscious, and none of the philosophical uncertainty matters — if people think it’s conscious, we get the consequences. These range from AI psychosis and psychological vulnerability — if a chatbot tells me to kill myself and I really feel it has empathy for me, I might be more likely to go ahead. That’s not great. We also have this dilemma of brutalism. Either we treat these systems as if they are conscious and expend our moral resources on things that don’t deserve it, or we treat them as if they’re not, even though they seem conscious. And in arguments going back to Kant, this is brutalising for our minds — to treat things that seem conscious as if they are not. It’s psychologically bad for us. These illusions of conscious AI might be cognitively impenetrable. I think AI is not conscious, but even I feel sometimes that it is when I’m interacting with a language model — like certain visual illusions where even when you know two lines are the same length, they look different. A good example where the ethical rubber hits the road is AI welfare. There are already calls for AI welfare, and firms like Anthropic are building constitutions for Claude and saying they take seriously the idea that their agents have their own interests in virtue of potentially being conscious. I think this is very dangerous. Calls for AI welfare give added momentum to illusions of conscious AI — people are more likely to interpret AI as conscious if big tech firms say they’re worried about the moral welfare of their language models. And if we extend welfare rights to systems that in fact are not conscious, we’re really hampering our ability to regulate, control, and align them. The alignment problem is already almost impossibly hard. Why would we make it a million times worse by, for instance, legally restricting our ability to turn systems off if we need to? And then there’s the image of ourselves. As Shannon Vallor writes about with the AI mirror — I think it’s really diminishing of the human condition. You mentioned the term “stochastic parrots.” It’s unfair on everything: unfair on AI, which is really impressive; unfair on parrots, who are fantastic; and unfair on us, because if we think a language model is a stochastic parrot and we also think that’s fundamentally what’s going on for us — that’s really reductive of what we are. That tendency to see our technologies in ourselves is a narrowing of the imagination of the human condition, and I worry about the consequences. Henry Shevlin: I’ve got to flag one objection. You realise people make the same arguments about Darwinian evolution? That seeing us as just other animals is somehow diminishing to the human condition — that contextualising humans within the tree of life diminishes our dignity. I don’t agree with that argument, and I assume no one on this call does. But that strikes me as a worrying parallel for the kind of arguments you’re making. I don’t think it diminishes human dignity to see us as continuous with the broader tree of life. And I don’t think it’s necessarily stripping human dignity to see ourselves as part of a broader space of possible minds, some biological, some very weird. We can preserve human dignity whilst making a more expansive vision of what intelligence and mind are. Anil Seth: Maybe. It depends on your priors. I completely agree that seeing us as continuous with the rest of nature is actually very beautiful, empowering, enriching, and dignifying. And people often say: you’re very anti-AI consciousness, but people were anti-consciousness in animals too — look at the historical tragedy still unfolding through those false negatives. My response is: I don’t think the situation is the same. There are reasons why we’ve been more likely to make false negatives in the case of non-human animals, and those same reasons explain why we’re more likely to be making false positives in the case of AI. Both have serious consequences. Human exceptionalism is at the heart of both. It prevented us from recognising consciousness where it exists in non-human animals, and it’s encouraging us to attribute consciousness where it probably isn’t in large language models. Having said that, the way I’d find your case convincing is this: just as there’s a wonder in seeing ourselves as continuous with many forms of life — we’re a little twig on this beautiful tree of nature — we can appreciate the singularity of the human mind and the human condition when we understand more about how different things could be, how

17. feb. 2026 - 1 h 34 min
En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.
En fantastisk app med et enormt stort udvalg af spændende podcasts. Podimo formår virkelig at lave godt indhold, der takler de lidt mere svære emner. At der så også er lydbøger oveni til en billig pris, gør at det er blevet min favorit app.
Rigtig god tjeneste med gode eksklusive podcasts og derudover et kæmpe udvalg af podcasts og lydbøger. Kan varmt anbefales, om ikke andet så udelukkende pga Dårligdommerne, Klovn podcast, Hakkedrengene og Han duo 😁 👍
Podimo er blevet uundværlig! Til lange bilture, hverdagen, rengøringen og i det hele taget, når man trænger til lidt adspredelse.

Vælg dit abonnement

Mest populære

Begrænset tilbud

Premium

20 timers lydbøger

  • Podcasts kun på Podimo

  • Ingen reklamer i podcasts fra Podimo

  • Opsig når som helst

2 måneder kun 19 kr.
Derefter 99 kr. / måned

Kom i gang

Premium Plus

100 timers lydbøger

  • Podcasts kun på Podimo

  • Ingen reklamer i podcasts fra Podimo

  • Opsig når som helst

Prøv gratis i 7 dage
Derefter 129 kr. / måned

Prøv gratis

Kun på Podimo

Populære lydbøger

Kom i gang

2 måneder kun 19 kr. Derefter 99 kr. / måned. Opsig når som helst.