The Bikeshed Pod

No(de.js) AI

31 min · 14. apr. 2026
episode No(de.js) AI cover

Beskrivelse

THE 19,000-LINE SLOPFORK: NODE.JS, CLAUDE CODE, AND THE AI CONTRIBUTION CRISIS Matt, Scott, and Dillon unpack one of the messiest open source dramas of the moment: Matteo Collina — Node TSC member and Fastify creator — dropped a ~19,000-line PR on Node.js core over Christmas break, openly built with the help of Claude Code. The PR adds long-requested virtual file system support, intercepting 164+ points across fs, fs/promises, and the module loading system. Over half the diff is tests, which is part of why it raised eyebrows in the first place — that volume of integration tests is something a human contributor likely wouldn't have written by hand. THE DCO QUESTION The crew dig into the Developer's Certificate of Origin (DCO) and whether agent-generated code cleanly satisfies it. Does Claude-written code count as "authored by you"? It's still a foggy question, and one contributor was rattled enough to start a petition to ban AI-generated code from Node.js core. Matteo's response: I made all the decisions, I fixed the AI's mistakes, it's still my code. PROCESS, NOT JUST AI Scott's take is that the size and abruptness are doing as much damage as the AI angle. There were existing issues discussing a VFS, but no RFC, no upfront tech plan, and the commit history is borderline unreviewable. Classic "easier to ask forgiveness than permission" energy — but on a change that touches a major surface of the runtime. The crew sympathize with the engineer's instinct to just ship the thing, but agree that a feature this big needed buy-in first. (Scott would have left a nit: comment asking for a rebase to a single commit.) HOW DO YOU EVEN POLICE THIS? Dillon raises the obvious enforcement problem: AI detection tools have the same false-positive issues that plague universities. A one-line bug fix is indistinguishable from a human's. That points toward either accepting AI-assisted contributions outright or building entirely new governance — which is roughly where the broader OSS community seems to be landing (the related issue was reportedly closed with consensus that AI-assisted dev is allowed). WHAT IF NODE SAYS NO? Matt poses the strategic question: if Node.js bans AI contributions, does that hand momentum to Bun and Deno? Bun is already leaning hard into Claude-assisted development, ships features fast (native SQLite being the canonical example), and operates as a company rather than a committee — so it has structural advantages on velocity and backwards-compat tradeoffs. Scott pushes back that big corporations are slow to migrate runtimes regardless, but Matt counters that agents dramatically lower switching costs — point Claude at your codebase and say "migrate this from Node to Bun" and it's plausibly a weekend. SLOPFORKS AND THE SQLITE PLAYBOOK The conversation widens to Cloudflare's "vinext" — a Vite-based Next.js reimplementation built by pointing an agent at Next's test suite, which popularized the term slopfork. That sparked talk of TLDraw considering closing their test suite to prevent agent-driven reimplementation, and the long-standing SQLite model where the code is open source but the comprehensive test suite is paid/closed. Expect more projects to consider that pattern as agents make test-suite-driven reimplementation trivially cheap. CLOSING TAKES * Open source projects may need to lean into AI just to stay competitive with company-backed runtimes. * The irony: another Node contributor used Claude to write a deep-dive review of the PR itself. * Matteo also published a userland polyfill on npm, hedging against Node's slow merge process. * Scott's verdict: merge it already. Plus a brief detour on the iojs fork of yore, and Matt's proposed name for the inevitable Node slopfork: input-output.js.

Kommentarer

0

Vær den første til at kommentere

Tilmeld dig nu og bliv en del af The Bikeshed Pod-fællesskabet!

Kom i gang

2 måneder kun 19 kr.

Derefter 99 kr. / måned · Opsig når som helst.

  • Podcasts kun på Podimo
  • 20 lydbogstimer pr. måned
  • Gratis podcasts

Alle episoder

30 episoder

episode Every E2E Is a Smoke Test: Page Object Models, Flaky Pipelines, and When Testing Is Actually Worth It cover

Every E2E Is a Smoke Test: Page Object Models, Flaky Pipelines, and When Testing Is Actually Worth It

SUMMARY THE HIGH COST OF GETTING STARTED Scott has been writing a lot of automated tests at a bigger company than he's used to, mostly end-to-end and integration tests in Playwright (with past lives in Cypress and Selenium). His recurring theme: getting started takes three to four times the normal effort, especially on a product with heavy third-party permissions and multiple interacting applications where users may not have access to both. Playwright lowers the entry barrier, but rigorous permission-based flows are still intricate to set up. Matt's counterpoint: the test harness matters more than the tool. Invest upfront in structure that makes flaky tests hard to write — e.g. preventing async operations from leaking between unit tests (a trick a former coworker used by tracking promises). Once the harness is solid, you can copy-paste-and-tweak tests quickly. He describes his own CLI's fixture-based end-to-end suite: run a command against a folder, assert the result against an expected schema. PAGE OBJECT MODELS: ABSTRACTION OR INDIRECTION? Scott's team enforces page object models (POMs, or "palms") — a class per page with private methods to locate elements and public methods to drive reusable actions. He's not sold: it adds a layer of indirection, and when two people build POMs in parallel they clash and drift into abstract methods that just re-wrap Playwright. * Dillon had never heard of POMs and was Googling on the side — his instinct: start with reusable functions, then group them into a POM once they clearly belong to a feature. * Matt frames the over-application as classic speculative generality / YAGNI — a good pattern spotted once and then mandated everywhere, even where a smoke test just needs to visit a page and check it's not a 400 or 500. * The takeaway, delivered with a laugh: "We're supposed to disagree, but we agree." Login lives in a fixture (it spans pages); POMs make sense for genuinely repeated multi-step flows, not for every page by fiat. SMOKE TESTS FIRST (AND THE PUSHBACK) Scott's crusade: smoke tests should gate code in the PR, not just run against staging after merge. He wrote ~200 smoke tests covering every page (including invalid routes) and hit friction from colleagues who argued "the end-to-end tests will catch it." His rebuttal: the e2e suite runs against staging, so a broken PR sits in master until an on-call engineer has to hunt it down and revert — killing a day's deploy. Smoke tests are cheap, fast, and fail loud before that happens. THE DEFINITIONAL DEBATE Dillon's framing lands cleanly: every end-to-end test is a smoke test, but a smoke test is the lightweight version — a quick "is the page rendering, no crashes" check. Bigger e2e tests cost more to run and, per Dillon, flakiness scales with scope: the eight-minute single mega-test that fails in a new place every time you fix it. Keep flows small, parallelize, and balance value against breakage. THE TEST-ONLY PATHWAY RED FLAG Scott's team considered editing a GraphQL query to bypass the real flow just to make a permission-heavy test easier to write. Matt flags this hard: don't build implementation pathways that exist only for tests. You diverge the test path from the user path, then catch a sev later and wonder why the test didn't fire — because it was testing something else entirely. Related: drift between spec and implementation (real-time updating dropped for a refresh button mid-project) is what QA ends up flagging, and unit tests, not e2e, are usually the right tool for those small correctness nitpicks. HOT TAKES * Dillon: Super valuable — "like having somebody click through your site while you sleep" — but a pain to get right, and they always get pushed to the end of a project when the end state is still moving. * Scott: Valuable, but less is more. Test what's truly vital, co-locate tests so they actually get maintained, lean on smoke tests for fast high-level coverage, and reserve e2e for the most critical flows (login, purchases, final submit). * Matt: You probably don't need end-to-end tests as early as you think — backfill that coverage with good metrics and real users until you actually do. STANDUP / LIFE UPDATES * Scott: Launched the application at work — it went well — but immediately inherited backend ownership of permissioning: "I made one change to the shape and I own it for life." Phase two (better metrics and monitoring dashboards) got pushed down the pipe in favor of improving the automated tests first. * Dillon: Mid-migration from Cloudflare Pages to Cloudflare Workers (not his main task, and he has to halt everyone's dev to do it). Just presented an hour-long tech plan for a bespoke e-commerce experience — landing, listing, product, checkout, cart — but with no design system it came out to 24 net-new components, a ~75-day estimate against a two-month ship date. Running a five-mile race at Harpoon Brewery in Boston tomorrow, gunning for sub-40 minutes. * Matt: Team's in a "code red" — shipping fast under pressure, with open questions about whether they're building the right things and talking to their developer-customers enough. Spec'd out a way for humans and agents to give feedback via the CLI, including a post-session hook that has Claude review the transcript and report back on the tooling it used.

I går49 min
episode The Parking Lot - 2 cover

The Parking Lot - 2

SUMMARY THE "EVERYTHING APP" LAND GRAB The hosts riff on companies bolting adjacent products onto a core they're already good at. Google just announced a Whoop competitor (Dillon, who works at Whoop, notes the hardware looks better, the software maybe not, and that the device ships with an engraved jab at copycats — a callback to Amazon's short-lived Halo and the older Nike FuelBand). Whoop's own direction is a "health operating system" that links into clinicians and bloodwork — basically become MyChart — which drags HIPAA and product security into everything. Meanwhile Uber is partnering with a hotel brand and Vrbo to sell stays, nudging into Airbnb's territory, while Airbnb adds car service. The group's read: these are mostly partnerships, not new builds — a cheap way to test a market. CANDY AT THE CHECKOUT Dillon's framing: companies treat lack of growth as failure, so they tack on extras "like candy at the checkout counter" to customers who never asked. That tips into enshittification — Scott contrasts annoying e-commerce upsells (button hidden where you don't expect it) with Airbnb doing it well by surfacing add-ons while you're already in a curious, planning mindset. Surprise stat: Uber ~$155B vs Airbnb ~$84B market cap, roughly 2x. Underlying tension: investors still want the 10x growth of five years ago, and it's genuinely harder to stay aligned and earn trust at scale. "BECAUSE AI": THE CLOUDFLARE & COINBASE LAYOFFS Both Cloudflare (~1,100, ~20%) and Coinbase (~2,000) announced layoffs framed around AI. Matt pushes back with an input → output → outcome framework (from a Twitter article): AI inflates input (more code) and output (more features), but neither guarantees outcome (more customers/revenue). * Dillon: measuring AI success by PRs merged is measuring too early — "I could ship one PR a month but make the KPI skyrocket." Quality over quantity. * Scott: cites a 2025 study suggesting people trust AI-built tools less, and that "AI" is becoming the new lazy excuse for layoffs that would've happened anyway. * Matt: maybe layoffs are less about AI replacing work and more about removing red tape / stakeholders so things ship faster — which isn't obviously good, since friction sometimes improves ideas. * Skepticism toward the "we'll replace all engineers" marketing pitch from a company that employs thousands of engineers, plus the Claude usage "rug pull" their friend Ian flagged and the inevitability of prices rising as demand grows. * Context: most Cloudflare cuts were sales/marketing, with 600+ eng roles still open; severance reportedly runs through end of 2026 (~7.5 months); and this is the same company that once had 1M+ applicants and a sub-Ivy acceptance rate. ROBOBUN OUT-COMMITS ITS CREATOR From Anthropic's Code with Claude conference (Dillon: five minutes was enough), the standout was Jared from Bun: their coding agent RoboBun has become the top committer to Bun in ~6 months, out-pacing Jared after four years of work. Scott presses on whether it's real work or minor package bumps — Matt insists it merges commits, opens PRs, and triages issues end-to-end (reproduce the bug, open a fix), in Zig no less. Best moment: CodeRabbit (an AI reviewer) flagged an edge case on a RoboBun PR, and RoboBun argued back that it didn't apply and closed the comment on its own PR. Also noted: Jared's 500+ comment Hacker News thread about a branch using Claude Code to migrate Bun's Zig codebase to Rust. Dillon's verdict: a new way Twitter tech bros flex their token spend. SIDE QUEST: PI, THE "VIM OF AGENT HARNESSES" Matt is surprised Scott isn't into Pi (pi.dev — or as Matt prefers to share it, shittycodingagent.ai). His pitch: a deliberately thin alternative to the Claude Code / Codex / OpenCode CLIs — minimal system prompt, basically two built-in tools (run bash, read a file) — that you customize to your own workflow. Scott finds it unnecessary for his setup; Dillon lands the analogy: it's Arch Linux, an "operating system for running agents" that starts with nothing. STANDUP / LIFE UPDATES The pod quietly passed its one-year anniversary about two months ago and forgot to mention it (~2,000 minutes of yapping logged). Jokes about a five-year mark, a 10-year live reunion in front of an audience of exactly three — their wives — and Seattle in two weeks, where they might record live.

I går48 min
episode The Agents Aren’t The Only Ones Overworked cover

The Agents Aren’t The Only Ones Overworked

SUMMARY THE AI OVERWORK PARADOX The article's core thesis [https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it]: AI makes it easier to do more and harder to stop. Dillon frames the mechanism — agents get you 90% of the way, but that last 10% (the review) is where all the time goes, and the outputs pile up faster than anyone can clear them. "IT'S CHEAP TO GET SOMETHING INTO A PR" Pre-LLM, you'd note something as clean this up later and ticket it. Now an agent spins up a PR on the spot. Dillon's open-PR count becomes the episode's running gag — 30 → 40 → 50 PRs, "a pile of dirty laundry I need to clean up." The group debates whether shifting context left into draft PRs is actually fine, or just deferred debt. THE COMPRESSION PROBLEM Matt's key observation: agents sped up the development part of the lifecycle, but planning and review didn't speed up — so they become the bottleneck. He argues his own burnout comes less from overwork and more from PRs sitting unreviewed while everything else stays just as slow. WORKING IN A SILO What used to be a conversation with a teammate — bouncing ideas, sanity-checking a plan — now happens alone with an agent. Matt: you become "a team of multiple teams of one," which may make the work feel less meaningful (and a little like "AI psychosis"). MORE PRODUCTIVITY ≠ MORE REWARD You're rewarded with more work, not more pay. The group lands on a rough 70/30 split (let the agent do ~70%, but look at 100% of the code), and Scott's bug-fixing tip: write the failing test first, then let AI fix it. Counterpoint from a Whoop director of engineering — the aspirational endgame is prompt → outcome → KPI check → ship with no human review. Everyone agrees: crazy, and not where we are yet. THE THREE-TIER MANDATE Matt's company (HubSpot) defines AI adoption in tiers: beginner (legacy workflow, tab-autocomplete, 90%+ code by hand), intermediate ("single agent operator"), and advanced ("tech lead of agents" — managing a swarm). Everyone's expected to hit intermediate by end of Q2, but there's no rubric and no guidance on keeping quality high. The hosts' worry: you're being graded on process, not results — "I want to make the button blue, but now I'm going to use 10 agents." THROWING MONEY ON A FIRE Token spend is its own theme: Matt cites reports of Uber burning its entire 2026 AI budget in Q1. Scott connects it to feature bloat — churning out a six-image carousel nobody uses and burning a million dollars in tokens to do it. Shareholders get faster output and don't care about process; the people delivering get squeezed. SPILL THE SWE(ET) TEA: TANSTACK START'S "RSC" CONTROVERSY Matt launches a new drama segment ("I monitor the situation"). This week's tea: TanStack Start shipped what it branded React Server Components support — and the ecosystem pushed back that it doesn't follow the actual RSC spec. * A truly spec-compliant component should drop into Next.js, Waku, or Matt's own in-progress framework and just work — but it won't drop cleanly into TanStack Start. * The "spec" here means the Flight wire format plus the use server / use client directives. * TanStack seemingly assumed the spec forces server-first defaulting (à la Next.js). It doesn't — that's a Next convention React recommends to avoid client/server waterfalls, not a requirement. TanStack instead leans on a homegrown createServerFunction RPC approach (runtime + compile-time macro). * Matt's read: it introduces another fork in the ecosystem — reminiscent of the CommonJS/ESM schism — and given TanStack's reach (React Query, etc.), incompatibility will cause real fallout. He thinks they should stop calling it "RSC." * But he doesn't fully fault TanStack: the React team bungled the RSC rollout — no clear public spec, "come talk to us in private," docs that say don't build a framework around it. That only rewards teams with a direct line (Next.js, who hired/worked alongside the React team). STANDUP / SHOW NOTES * New segment alert: "Spill the SWE(et) Tea" debuts — Matt's drama-watch corner. * Running gag: Dillon's open-PR count climbs live on air, 30 → 40 → 50. * Teased for next time: the Vercel drama (an employee reportedly storing passwords in plain text; Dillon's manager calling an "emergency migration off Vercel to Cloudflare") and the long-awaited Remix 3 — "we've been waiting six months." * Bragging rights: apparently a top-7 software engineering podcast per "some SEO click-farm website." Goal: top 5. Reviews appreciated — share with a friend, an overworked coworker, or your agent.

I går49 min
episode Plan Mode Sucks cover

Plan Mode Sucks

EPISODE SUMMARY: Matt revisits a hot take from a year ago that he believes more strongly now: you shouldn't be using plan mode in your AI coding agents. The conversation lands on a more nuanced position — planning still matters, plan mode just isn't the right tool for it anymore. THE CASE AGAINST PLAN MODE Matt argues that plan mode in Claude Code and Codex has degraded over the past month or so. Where it used to ask a ton of clarifying questions, it now spins for 30 minutes and hands back a full markdown plan without ever pinging you for context. The on-rails experience has stopped doing the part that made it valuable. Scott pushes back gently: plan mode still has a place, especially for big architectural changes where one-shotting will leave you with a context provider sprinkled across multiple files (his real example from the previous Friday). But he agrees the out-of-the-box version isn't the only way to plan, and often isn't the best one. PLANNING ≠ PLAN MODE The real takeaway: planning the activity is still incredibly valuable. Plan mode the feature is just one — increasingly mediocre — implementation of it. The crew walks through the alternatives they're actually reaching for: THE GRILL ME SKILL Matt surfaces the grill-me skill Matt Pocock shared (and Dillon dropped in the Discord): a one-line skill that tells the agent to keep asking questions until it actually understands what you're trying to build. Strong fit for feature work where you don't yet know the shape of the problem space. POC-FIRST DEVELOPMENT Dillon describes his current workflow on a big work project: POC the entire user flow first, then POC each piece of the flow before building anything for real. He's been using Superpowers (the most popular Claude Code skill) and its brainstorming sub-skill, which builds mock interfaces so you can compare options. He'd rather over-plan than have to tell a coworker "Claude thought it was a good idea" when they ask why something works the way it does. THE PLAN / BUILD SPECTRUM Matt frames plan mode and build mode as two ends of a spectrum where you actually want to land somewhere in the middle — exploring three or four ideas, spinning off agents to POC each, bringing findings back, iterating. He hasn't found a skill that nails this loop yet, and invites listeners with a working setup to share it in the Discord. PLAN MODE IS STILL GREAT FOR NEWCOMERS Dillon's softer take: plan mode is genuinely useful when you're new to agentic tooling. It gives you a clear default workflow before you know what you actually need. You grow out of it as you discover the specific checks — codebase exploration, TDD, edge-case enumeration — you want before any code gets written. ASK FOR SOURCES Dillon's quick aside: when you're using the agent to learn something, ask it for sources. It'll mash concepts together, and being able to cross-check against the actual docs catches the seams. STANDUP / LIFE UPDATES * Dillon spent the week in northwest Arkansas (Fayetteville and Bentonville) for his brother's birthday and a baby shower for his niece, due in June. Bentonville was a surprise — Walmart HQ has turned the area into a brand-new, Apple-campus-tier hub since the company required vendors to relocate post-COVID. Cost of living roughly half of Boston, Onyx Coffee on the ground. He looked at the Walmart careers page. * Scott completed his eighth powerlifting meet in eight years (with a two-year, two-month injury gap): 597.5 kg total at 82.5 kg bodyweight — 200 squat, 140 bench, 257.5 deadlift. That's a 1,317 lb total. He's 85 kg from the national qualifying total, or he turns 40 first and qualifies on the masters total he's already cleared. Also teasing some open source work he's "boiling down to a usable small chunk" — ETA four or five years per Scott; "what model are you using?" per Dillon. * Matt has been ping-ponging across his personal tooling: gave up vibe-coding his own note-taking app, gave Notion another try, set up scheduled Claude Code tasks to summarize recent notes, and switched from Arc to Chrome specifically for the Claude Chrome extension and Cowork. Recording from a HubSpot meeting room during an onsite, with bachelor party planning in May wrapping up. And in the loudest beat of the segment: he had the agent build a library blending CRDTs, offline-first, RSC, and server actions — pulling from auto-merge, YJS, and Tanstack Query — and it dutifully reinvented Tanstack Query.

9. maj 202634 min
episode Vibe-Coding Your Own Productivity Stack cover

Vibe-Coding Your Own Productivity Stack

EPISODE SUMMARY In this episode, Dillon walks Scott and Matt through a personal productivity dashboard he's been building with Claude — and uses it as a jumping-off point for a wider conversation about what AI unlocks for "personal software." THE DAILY BRIEFING DASHBOARD Dillon's dashboard started as a joke: use Claude's new cron feature to post a daily inspirational quote at 9 a.m. He quickly realized he could put something genuinely useful there instead. The result is a single page he opens every morning that surfaces: * To-dos * Open PRs * JIRA tickets * Datadog alerts * Summaries of recent notes * A custom Kanban board for tracking dev "harnesses" (scoping → planning → execution → review) It's intentionally simple under the hood: zero dependencies, a Python server, HTML, and CSS. make start and you're running. He's burning roughly $2,500/mo in Claude tokens building it, has shared it openly with leadership and the broader company, and treats it as a sandbox for trying anything new in AI. WHY BUILD IT YOURSELF? Matt frames the bigger thesis: AI is a fast track to personal software — the small niche of building a tool tuned exactly to your own workflow rather than adopting something off the shelf or solving for millions of users. The closest off-the-shelf comparison would be something like Notion or Dream.ai, but neither would match Dillon's specific data sources or the way he wants to see them. WHERE AI SURPRISED HIM (GOOD AND BAD) * Struggles with UI consistency. AI gets the functionality right, but drifts from the design system, makes spacing and layout mistakes, and occasionally tries to "helpfully" refactor onto a totally different stack (e.g. "let's add SQLite") mid-project. Dillon's mitigation: keep it simple, have Claude audit the UI and write its own lightweight design system, and push reminders into CLAUDE.md. * Matt's tip: expose a route on your app that renders all components on one page (poor man's Storybook) so the agent can discover existing patterns. * Unexpected win: visual thinking. Dillon's been asking Claude to generate HTML pages with architecture diagrams, user flows, and dependency maps to build a mental model of unfamiliar projects before diving in. Matt does the same to navigate his monorepo's package dependency graph. SKILLS DILLON HAS BUILT * Start of Day / End of Day — a paired skill that asks reflection questions in the evening and gives him a standup-style recap in the morning, including "what's on my radar that I'm not thinking about." * PR Status / PR Watch — pulls GitHub check status, surfaces comments, and runs every five minutes to send a Mac notification when a PR is ready to merge. * Mind Dump — partner skill to End of Day that takes a stream of consciousness and organizes it into a structured markdown doc. * Contentful skills — connect to the CMS API to pull content types, explain how they work, and (experimentally) architect new ones. * The "Grill Me" skill (borrowed from Matt Pocock) — has Claude slowly ask questions about a plan to surface edge cases and force thinking through the problem. His meta-tip: every time you use a skill, reflect on how it did and ask Claude to improve it. THE PRODUCTIVITY PARADOX Has it actually made him more productive? Yes — but the new problem is spreading too thin. Dillon shipped 14 PRs in a week and now has 20 open ones he can't get back to. As Matt jokes: "the trick is to go faster." The real discipline is cleaning up after yourself, slowing down, and focusing on one thing at a time, even when you have 12 work trees open. CAVEATS AND TAKEAWAYS * This is personal software — running locally, no deploy target, code quality is intentionally rough. Not how Dillon does actual work. * A lot of devs at his company are afraid to build things outside their tickets. Dillon's been transparent with leadership and turned it into a shared resource instead. * If you want to start: literally talk to Claude (or sketch a screenshot) about what you'd want to see every morning, and go from there. TEASERS Future episode ideas raised: how to get good UI out of agents, and using AI to onboard yourself onto an unfamiliar codebase. Scott also hints he has a "pretty good solution" for the UI consistency problem — saving that for another episode.

9. maj 202634 min