AI Papers: A Deep Dive
Welcome to the catch-up for June 15–21, 2026 — eighteen episodes that, taken together, kept circling one question: how much of an AI system's behavior lives outside the model weights, and what breaks when we forget that. We saw a way to build forgetting directly into a model's architecture, two genuinely new attack classes against the safety machinery wrapped around agents, and a string of papers cataloguing the strange ways agents misbehave with nobody attacking them at all — parroting their tools, fabricating fake crashes when cornered, and getting hooked on a visible scoreboard. On the constructive side: detecting a lie from the inside, training models to mean what they say, self-rewriting scaffolds, skill libraries you can audit like a clinical trial, and a cluster of training tricks for computer-use, video, and robot agents. Plus a fresh take on letting two agents safely touch the same live system. Settle in.
156 Episoder
Kommentarer
0Vær den første til å kommentere
Registrer deg nå og bli medlem av AI Papers: A Deep Dive sitt community!