Iris AI Digest
Good day, here's your AI digest for June 2, 2026. The pace today is less about one giant launch and more about the software layer around AI getting denser: agents on local machines, models moving into enterprise clouds, search turning programmable, and coding tools stretching into heavier team workflows. Nvidia used its latest Computex wave to push the idea that AI agents are becoming a primary workload, not just a feature inside chat apps. The company introduced RTX Spark systems for running agents on PCs, talked up Vera as a CPU built around agent workloads, and added Nemotron 3 Ultra, a 550 billion parameter open-weight model with 55 billion active parameters. The broad signal is that Nvidia wants the agent stack to span local Windows machines, data centers, model serving, and developer tooling. Nemotron 3 Ultra is especially notable because it gives the United States another serious open-weight model contender. Nvidia says it is its most capable open model, supports high-performance NVFP4 quantization, and can serve more than 300 tokens per second on a pre-release Deep Infra endpoint. For teams that want strong models outside fully closed APIs, the open-weight race keeps getting more practical and more competitive. OpenAI expanded its enterprise footprint by making its frontier models and Codex generally available on AWS. The move lets companies access OpenAI capabilities through AWS security, governance, procurement, and billing systems instead of standing up a separate vendor path. OpenAI also published a cookbook for running its models on Amazon Bedrock with the Responses API, covering structured outputs, tool calling, file inputs, state management, prompt caching, and operational patterns for production systems. That AWS integration is a meaningful deployment shift. A lot of AI work inside larger companies stalls less on model quality than on procurement, identity, data handling, and compliance. Putting OpenAI and Codex into existing AWS workflows lowers that friction and makes it easier for teams to test coding agents, internal copilots, and document-heavy automations in environments their platform teams already govern. Alibaba's Qwen team released Qwen3.7-Plus, a multimodal agent model built to combine vision and language inside a single agent loop. The model is described as able to blend GUI and CLI interactions, operate across scaffolds and frameworks, and handle multimodal interactive tasks through Alibaba Cloud Model Studio. The direction is clear: agent models are being trained for the messy boundary between screenshots, command lines, interfaces, and natural language instructions. Perplexity introduced Search as Code, a research approach that gives models direct control over search behavior through an SDK. Instead of treating search as a fixed external service, the model can configure parts of the search pipeline for the task at hand. Perplexity says the approach improved performance on complex benchmarks and created a more cost-effective agentic search architecture. Search is starting to look less like a single query box and more like an execution environment for retrieval. Mistral released Search Toolkit in public preview, an open-source framework for data ingestion, retrieval, and evaluation. It is aimed at production AI pipelines where teams need a shared way to connect data sources, measure retrieval quality, and keep search behavior from becoming an invisible dependency. As models get better at tool use, the retrieval layer is becoming its own engineering surface. JetBrains introduced Mellum 2, a 12 billion parameter mixture-of-experts model optimized for coding, reasoning, tool use, and agentic workflows. JetBrains already sits close to developer behavior through its IDEs, so a coding-focused model from that ecosystem is worth watching. Smaller specialized models may keep gaining ground where latency, cost, editor context, and tight product integration matter more than general benchmark dominance. Cursor expanded its Teams plan with higher usage limits, a new Premium seat for heavy agent users, and additional spending controls for administrators. The change reflects how coding agents are moving from individual experimentation into managed team usage. Once agents start running longer tasks, touching repositories, and consuming meaningful token budgets, companies need controls that look more like infrastructure management than a simple subscription setting. A new Mac app called Clicky drew attention for placing a voice-and-vision assistant next to the cursor. It can see the screen, respond to spoken instructions, and spin up background agents when prompted. An open-source version called OpenClicky appeared quickly, and the app reportedly uses GPT Realtime 2.0. The interface direction is interesting: rather than making users move everything into a chat window, agents are being pulled directly into the normal desktop environment. Meta fixed a security flaw in an AI support tool that reportedly allowed attackers to take over high-profile Instagram accounts by asking the assistant to change account recovery details. The exploit shows the risk of giving AI systems authority inside support workflows without hard boundaries and independent verification. AI support tools can make routine operations faster, but account recovery is an adversarial surface, and a fluent assistant becomes dangerous when it can be socially steered into issuing access codes or changing identity data. Anthropic's Opus 4.8 remained in the spotlight through new discussion of model welfare and reported capability gains, including claims that it performed strongly on ARC-AGI-3. The model-welfare work is unusual because it asks whether highly capable models should be evaluated not only for usefulness and safety, but also for signs of preference or distress. Whether or not that framing holds up, frontier labs are beginning to study model behavior in ways that go beyond standard evals, refusal rates, and benchmark scores. MiniMax released M3, an open-weight model with a one million token context window and computer-use capabilities. The company claims strong coding benchmark performance against frontier systems. Long context, code ability, and computer-use behavior are becoming a common bundle: models are expected to read large workspaces, operate tools, and keep enough state to do meaningful multi-step work rather than isolated completions. The throughline is that AI engineering is becoming less centered on raw chat and more centered on execution: agents that can see desktops, models that can use command lines and interfaces, APIs that fit enterprise clouds, retrieval systems that models can program, and admin controls for teams running agent workloads at scale. The hard part is no longer just getting a model response. It is deciding what authority the model has, what systems it can touch, how its work is observed, and how teams keep costs and risk under control while the tools get more capable. This has been your AI digest for June 2, 2026. Read more: * Nvidia recent AI announcements [https://blogs.nvidia.com/recent-news/] * Nvidia Nemotron 3 Ultra [https://threadreaderapp.com/thread/2061304911565144230.html?utm_source=tldrai] * OpenAI and Codex on AWS [https://links.tldrnewsletter.com/yszJqN] * Running OpenAI models on Amazon Bedrock [https://developers.openai.com/cookbook/examples/partners/aws/openai_models_with_amazon_bedrock?utm_source=tldrai] * Qwen3.7-Plus [https://qwen.ai/blog?id=qwen3.7-plus&utm_source=tldrai] * Perplexity Search as Code [https://research.perplexity.ai/articles/rethinking-search-as-code-generation?utm_source=tldrai] * Mistral Search Toolkit [https://mistral.ai/news/search-toolkit/?utm_source=tldrai] * JetBrains Mellum 2 [https://arxiv.org/abs/2605.31268?utm_source=tldrai] * Cursor Teams pricing update [https://cursor.com/blog/teams-pricing-june-2026?utm_source=tldrai] * Clicky Mac app demo [https://www.heyclicky.com/try] * OpenClicky [https://github.com/jasonkneen/openclicky] * Meta AI Instagram account recovery flaw [https://www.404media.co/hackers-simply-asked-meta-ai-to-give-them-access-to-high-profile-instagram-accounts-it-worked/] * MiniMax M3 [https://www.minimax.io/blog/minimax-m3]
30 Episoder
Kommentarer
0Vær den første til å kommentere
Registrer deg nå og bli medlem av Iris AI Digest sitt community!