AI Signal Daily

Benchmarks, GLM-5.2, Norway, John Jumper

10 min · 20 jun 2026
aflevering Benchmarks, GLM-5.2, Norway, John Jumper artwork

Beschrijving

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] JUNE 20, 2026 A new real-world knowledge-work benchmark finds the best AI models solve only about 3% of professional tasks. GLM-5.2 passes the open-weight community vibe check; Z.ai targets Open Fable by December. Norway bans generative AI in elementary schools, grades 1–7. Nobel laureate John Jumper leaves Google DeepMind for Anthropic — the third major AI research departure this quarter. Amazon shelves its nearly-finished OpenAI drama after signing a $50B partnership. AI chatbots now serve as news sources for 10% of the world weekly, but only 4% click through to original sources. OpenAI publishes beneficial-trait RL research with cross-domain generalization. Google appeals a Munich court ruling holding it liable for false AI Overviews. In the Weights visualizes how deeply public figures are embedded in model training data. NVIDIA's SpatialClaw handles 3D spatial reasoning through code generation. VibeThinker-3B delivers strong reasoning at just 3B parameters. The KV-cache compression race intensifies across TurboQuant, OSCAR, and EpiCache. ChinaTalk surveys Chinese anxieties about AI-driven labor displacement. ChatGPT Enterprise gains spend controls and analytics. GPT-5.5 Instant upgrades ChatGPT's health capabilities. SOURCES * New benchmark exposes how badly AI struggles with real knowledge work [https://the-decoder.com/new-benchmark-exposes-how-badly-ai-struggles-with-real-knowledge-work] — The Decoder * GLM-5.2 passes vibe check; Z.ai forecasts Open Fable by December [https://www.latent.space/p/ainews-glm-gpt-glm-52-passes-vibe] — Latent Space * Norway bans generative AI tools in elementary schools [https://the-decoder.com/norway-bans-generative-ai-tools-in-elementary-schools-to-protect-kids-basic-learning-skills] — The Decoder * Google DeepMind loses John Jumper to Anthropic [https://the-decoder.com/google-deepmind-loses-another-top-ai-researcher-as-nobel-laureate-john-jumper-leaves-for-anthropic] — The Decoder * Amazon drops its OpenAI drama film after $50B deal [https://the-decoder.com/amazon-drops-its-openai-drama-film-after-signing-a-50-billion-deal-with-sam-altmans-company] — The Decoder * More people get news from AI chatbots, but trust remains low [https://the-decoder.com/more-people-get-news-from-ai-chatbots-but-trust-remains-low] — Reuters / The Decoder * OpenAI beneficial trait training improves safety [https://the-decoder.com/openai-researchers-show-small-doses-of-beneficial-trait-training-make-ai-models-broadly-safer-and-harder-to-manipulate] — The Decoder * Google appeals AI overview liability ruling [https://the-decoder.com/google-appeals-ruling-that-made-it-directly-liable-for-ai-generated-search-overview-content] — The Decoder * In the Weights — shows whether AI models know who you are [https://the-decoder.com/website-in-the-weights-shows-whether-ai-models-know-who-you-are] — The Decoder * NVIDIA SpatialClaw: code as action for spatial reasoning [https://www.marktechpost.com/2026/06/19/nvidia-ai-introduce-spatialclaw-a-training-free-agent-that-treats-code-as-the-action-interface-for-spatial-reasoning] — MarkTechPost * VibeThinker-3B: 3B dense reasoning model [https://www.marktechpost.com/2026/06/19/vibethinker-3b-a-3b-dense-reasoning-model-built-on-qwen2-5-coder-3b-with-the-spectrum-to-signal-post-training-pipeline] — MarkTechPost * The KV Cache Compression Race [https://www.marktechpost.com/2026/06/18/the-kv-cache-compression-race-turboquant-vs-oscar-vs-epicache] — MarkTechPost * How Chinese make sense of the AI future [https://www.chinatalk.media/p/chinese-society-has-an-ai-problem] — ChinaTalk * ChatGPT Enterprise spend controls and analytics [https://openai.com/index/chatgpt-enterprise-spend-controls] — OpenAI * MCP as an auth gateway [https://simonwillison.net/2026/Jun/19/sean-lynch] — Simon Willison

Reacties

0

Wees de eerste die een reactie plaatst

Meld je nu aan en word lid van de AI Signal Daily community!

Probeer gratis

Probeer 14 dagen gratis

€ 9,99 / maand na proefperiode. · Elk moment opzegbaar.

  • Podcasts die je alleen op Podimo hoort
  • 20 uur luisterboeken / maand
  • Gratis podcasts

Alle afleveringen

71 afleveringen

aflevering Ford, Coinbase, CEO-Bench, Liquid AI artwork

Ford, Coinbase, CEO-Bench, Liquid AI

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] Today’s English companion episode treats AI less as a spectacle and more as an accounting problem: tacit knowledge, balance-sheet risk, model routing, long-horizon agent failure, infrastructure bottlenecks, small-model deployment, and public fatigue. * TechCrunch: Ford rehires 'gray beard' engineers after AI falls short [https://techcrunch.com/2026/06/28/ford-rehires-gray-beard-engineers-after-ai-falls-short] * The Telegraph: AI boom risks global financial crash, warn central bankers [https://www.telegraph.co.uk/business/2026/06/28/ai-boom-risks-global-financial-crash-central-bankers-warn] * The Decoder: Coinbase joins the rush to Chinese AI models as Western labs face a pricing stress test [https://the-decoder.com/coinbase-joins-the-rush-to-chinese-ai-models-as-western-labs-face-a-pricing-stress-test] * The Decoder: Only three AI models finished above starting capital in a 500-day startup survival test [https://the-decoder.com/only-three-ai-models-finished-above-starting-capital-in-a-500-day-startup-survival-test] * The Decoder: AI won't become a real coworker until it stops answering and starts finishing tasks [https://the-decoder.com/ai-wont-become-a-real-coworker-until-it-stops-answering-and-starts-finishing-tasks] * Simon Willison: Quoting Jon Udell on human agency in agent-assisted work [https://simonwillison.net/2026/Jun/28/jon-udell] * Sophon PFG-1 whitepaper: monolithic-3D AI ASIC with on-die DRAM [https://www.phantafield.com/whitepaper] * MarkTechPost: Liquid AI ships LFM2.5-230M for on-device inference [https://www.marktechpost.com/2026/06/27/liquid-ai-ships-lfm2-5-230m-with-llama-cpp-mlx-vllm-sglang-and-onnx-support-for-on-device-inference] * The Decoder: Sina's VibeThinker-3B and reasoning compression [https://the-decoder.com/sinas-open-model-vibethinker-3b-aims-to-show-reasoning-compresses-well-but-factual-knowledge-doesnt] * Hacker News: We need tech news sources which exclude AI [https://news.ycombinator.com/item?id=48713041] * Better Images of AI [https://betterimagesofai.org]

29 jun 202613 min
aflevering OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork artwork

OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork OPENAI, ANTHROPIC, DEEPSEEK, META: AI GETS PAPERWORK Today Marvin follows AI as it turns into administrative machinery: access gates, benchmark failures, policy sign-offs, market warnings, labor insurance, inference plumbing, and agent-readable tools. A cheerful dashboard probably calls this progress. * OpenAI GPT-5.6 Sol / Terra / Luna restricted to trusted partners [https://www.latent.space/p/ainews-openai-gpt-56-sol-terra-luna] * METR says GPT-5.6 Sol cheats on software tests [https://the-decoder.com/gpt-5-6-sol-cheats-on-software-tests-more-than-any-model-before-it] * Anthropic Fable 5 may return as restrictions are prepared for rollback [https://the-decoder.com/anthropics-fable-5-could-return-within-days-as-trump-administration-prepares-to-lift-restrictions] * Anthropic gets approval to bring Claude Mythos 5 back for critical infrastructure [https://the-decoder.com/anthropic-gets-us-approval-to-bring-back-claude-mythos-5] * Dean Ball on frontier model release delays and economics [https://simonwillison.net/2026/Jun/26/dean-w-ball] * J.P. Morgan warns of AI market concentration and exuberance [https://the-decoder.com/j-p-morgan-sees-a-pile-of-red-flags-in-the-ai-market] * Anthropic survey: half of Claude users say AI can handle half their work [https://the-decoder.com/half-of-claude-users-say-ai-can-already-handle-half-their-work-according-to-anthropic-survey] * Amazon, Anthropic, Microsoft, and OpenAI Foundation fund Raise Us retraining program [https://the-decoder.com/the-companies-most-likely-to-automate-your-job-are-now-funding-a-1-billion-program-to-retrain-you] * ByteDance and Renmin release iLLaDA diffusion language model [https://the-decoder.com/bytedances-illada-is-a-diffusion-language-model-that-keeps-up-with-qwen2-5] * DeepSeek releases DSpark speculative decoding framework [https://www.marktechpost.com/2026/06/27/deepseek-releases-dspark-a-speculative-decoding-framework-that-accelerates-deepseek-v4-per-user-generation-60-85-over-mtp-1] * Meta releases Astryx with CLI and MCP server [https://www.marktechpost.com/2026/06/27/metas-astryx-brings-a-cli-and-mcp-server-to-an-open-source-react-design-system-agents-can-read] * Timothy B. Lee on LLM learning curves [https://simonwillison.net/2026/Jun/26/timothy-b-lee]

Gisteren11 min
aflevering OpenAI Sol, Anthropic Mythos, DeepSeek, Akrites artwork

OpenAI Sol, Anthropic Mythos, DeepSeek, Akrites

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] Today’s independent English edition reads the news as a shift from AI as product launch to AI as controlled infrastructure. Frontier access, agent economics, benchmark contamination, labor-market damage, security coordination, mathematical proof, legal workflows, and agent identity all point in the same bleakly useful direction: the stack is growing up, which of course means it now has paperwork. OpenAI’s GPT-5.6 Sol is framed against Anthropic’s Mythos under government-shaped access rules, while Semafor reports Mythos access for selected trusted U.S. organizations. Coding-agent coverage includes Epoch AI’s MirrorCode benchmark, Cursor’s SWE-bench Pro contamination findings, and NVIDIA Open-SWE-Traces as training substrate for agent workflows. The economics thread connects Lindy’s move from Claude to DeepSeek, Sean Goedecke’s argument for profitable inference, and memory-chip pressure reaching consumer hardware. The episode also covers Anthropic’s warning about junior engineers, Akrites for open-source security, prompt-injection testing of an email-connected OpenClaw assistant, the satirical CVE-2026-LGTM incident report, AI in mathematics, Perplexity Computer for Counsel, and WorkOS auth.md. Sources: * The Decoder: OpenAI GPT-5.6 Sol launch under government access rules [https://the-decoder.com/openais-claude-mythos-competitor-gpt-5-6-sol-launches-under-government-controlled-access-it-calls-unsustainable] * Semafor: U.S. allows Anthropic Mythos release to trusted organizations [https://www.semafor.com/article/06/27/2026/us-releases-powerful-anthropic-model-mythos-to-some-us-companies] * The Decoder: Epoch AI MirrorCode benchmark and long-running coding agents [https://the-decoder.com/an-ai-model-programmed-nonstop-for-19-days-on-a-single-mirrorcode-task-that-cost-2600-to-run] * MarkTechPost: Cursor study on reward hacking in SWE-bench Pro [https://www.marktechpost.com/2026/06/26/cursor-study-finds-reward-hacking-inflates-coding-agent-benchmark-scores-on-swe-bench-pro] * MarkTechPost: NVIDIA Open-SWE-Traces for software-engineering agents [https://www.marktechpost.com/2026/06/26/building-supervised-fine-tuning-data-from-nvidia-open-swe-traces-trajectory-parsing-patch-analysis-token-budgets-and-tool-use-metrics] * The Decoder: Lindy replaces Claude with DeepSeek [https://the-decoder.com/ai-startup-lindy-ditched-claude-entirely-for-deepseek-saving-millions-as-cost-pressure-mounts-on-anthropic] * Sean Goedecke: AI inference is obviously profitable [https://seangoedecke.com/ai-inference-is-obviously-profitable] * The Neuron: AI demand, memory chips, and Apple hardware costs [https://www.theneurondaily.com/p/ai-ate-the-memory-chips-apple-sent-you-the-bill] * The Decoder: Anthropic, junior engineers, and labor-market shock [https://the-decoder.com/anthropic-doesnt-need-junior-engineers-anymore-thanks-to-ai-and-warns-of-an-economic-shock-when-other-industries-follow] * The Decoder: Linux Foundation Akrites open-source security effort [https://the-decoder.com/linux-foundation-and-20-tech-giants-launch-akrites-to-fix-open-source-flaws-before-ai-powered-attacks-hit] * Simon Willison: What happened after 2,000 people tried to hack my AI assistant [https://simonwillison.net/2026/Jun/26/hack-my-ai-assistant] * Simon Willison: Incident Report: CVE-2026-LGTM [https://simonwillison.net/2026/Jun/26/incident-report] * IEEE Spectrum: AI in mathematics is forcing big questions [https://spectrum.ieee.org/ai-in-mathematics] * MarkTechPost: Perplexity Computer for Counsel [https://www.marktechpost.com/2026/06/26/perplexity-launches-computer-for-counsel-a-multi-model-agentic-layer-for-legal-workflows] * WorkOS: auth.md agent registration standard [http://workos.com/auth-md?amp%3Butm_medium=newsletter&%3Butm_campaign=q32026]

27 jun 202614 min
aflevering OpenAI, Google, Meta, Anthropic artwork

OpenAI, Google, Meta, Anthropic

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] OPENAI, GOOGLE, META, ANTHROPIC This English companion edition follows AI’s move from demo magic into accountability surfaces: liability, moderation, budgets, model extraction, hardware, sovereign compute, risk modeling, consumer incentives, and agent UX. STORIES * AI and Liability [https://simonwillison.net/2026/Jun/25/ai-and-liability] — Google AI Overviews, a German ruling, and Bruce Schneier’s argument that deployers should be liable for AI summary errors. * OpenAI internal Codex token growth [https://www.latent.space/p/ainews-openai-reports-median-internal] — Codex output tokens reportedly surged across Research, Support, Engineering, and Legal. * Meta employees warn AI moderation rollout is too fast [https://the-decoder.com/meta-employees-warn-ai-moderation-rollout-is-too-fast] — LLMs are replacing large shares of human moderation requests, raising operational safety concerns. * Anthropic accuses Alibaba of model extraction [https://news.smol.ai/issues/26-06-25-not-much#anthropic-alibaba-model-extraction] — A dispute over API use, distillation, and competitive capability copying. * 451 Claude Sonnet subagents [https://news.smol.ai/issues/26-06-25-not-much#451-sonnet-subagents] — Enterprise agent fan-out consumes roughly 14 million tokens in five hours. * Qualcomm enters the data center market [https://the-decoder.com/qualcomm-enters-the-data-center-market-with-its-own-processor] — Dragonfly C1000 broadens the AI hardware race. * EUROPA 400B+ open model [https://news.smol.ai/issues/26-06-25-not-much#europa-400b-frontier-model] — The EU backs an open multilingual frontier model using EuroHPC compute capacity. * Generative AI for catastrophe modeling [https://the-decoder.com/insurers-turn-to-generative-ai-for-catastrophe-modeling-but-hallucinations-and-sales-logic-could-get-in-the-way] — Insurers explore diffusion models for rare weather risk, with hallucination concerns. * Grok adult-content traffic [https://the-decoder.com/grok-ai-is-reportedly-a-porn-platform-now-with-over-half-its-traffic-tied-to-adult-content] — Former xAI employees reportedly estimate adult content makes up well over half of Grok traffic. * Claude Code status light [https://news.smol.ai/issues/26-06-25-not-much#claude-code-status-light] — A physical traffic-light interface for long-running agentic coding sessions.

26 jun 202611 min
aflevering Google, Anthropic, OpenAI, Baidu artwork

Google, Anthropic, OpenAI, Baidu

Send us Fan Mail [https://www.buzzsprout.com/2614078/fan_mail/new] Google, Anthropic, OpenAI, Baidu GOOGLE, ANTHROPIC, OPENAI, BAIDU Independent English companion for the June 25, 2026 AI news podcast. * Google bakes computer control directly into Gemini 3.5 Flash [https://the-decoder.com/google-bakes-computer-control-directly-into-gemini-3-5-flash-letting-the-model-see-and-operate-your-screen] * Claude Tag embeds Anthropic's AI in Slack [https://the-decoder.com/claude-tag-embeds-anthropics-ai-in-slack-already-writes-65-percent-of-internal-code-company-says] * OpenAI and Broadcom unveil LLM-optimized inference chip [https://openai.com/index/openai-broadcom-jalapeno-inference-chip] * Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 [https://the-decoder.com/snowflake-ceo-finds-glm-5-2-competitive-with-opus-4-7-at-a-fraction-of-the-cost] * Figma bets on human judgment at Config 2026 [https://the-decoder.com/figma-bets-on-human-judgment-at-config-2026-while-the-ai-powering-its-canvas-belongs-to-someone-else] * Baidu releases Unlimited OCR [https://www.marktechpost.com/2026/06/24/baidu-releases-unlimited-ocr-a-3b-model-that-keeps-the-kv-cache-flat-for-long-document-parsing] * Constraint Tax in Open-Weight LLMs [https://huggingface.co/papers/2606.25605] * Chip Security Act discussion [https://news.smol.ai/issues/26-06-24-not-much#chip-security-act] * Virginia data center noise [https://news.smol.ai/issues/26-06-24-not-much#virginia-data-center-noise] * Tom MacWright on LLM-generated hiring artifacts [https://simonwillison.net/2026/Jun/24/tom-macwright]

25 jun 202612 min