"How Roy Resh Scaled Retail AI by Moving from Custom Pipelines to Configurable Computer Vision Systems"
Episode Summary:
In this episode of Engineering Choices You Have to Defend, host Nicola Onassis sits down with Roy Resh, VP of Engineering at https://traxretail.com?utm_source=chatgpt.comTrax Retail [https://traxretail.com?utm_source=chatgpt.com], to explore a pivotal architectural decision that reshaped how large-scale computer vision systems are built and scaled in retail environments.
At Trax, Roy and his team built a computer vision platform that analyzes shelf images captured in retail stores, identifying products, pricing, and point-of-sale materials to generate a digital representation of store shelves. This enables brands to measure execution, shelf share, and product availability in near real time. But as the platform scaled across enterprise clients, complexity began to compound rapidly.
What started as a unified recognition pipeline evolved into a heavily customized system, with per-client logic for attributes like expiration dates, display detection, reporting formats, and KPI calculations. Each new customer introduced new requirements, leading to custom code per client, duplicated processing flows, and increasingly long onboarding cycles that stretched from weeks to months.
Roy explains how the system eventually reached a breaking point: onboarding delays of 30–60 days, rising operational overhead, and microservices becoming entangled with client-specific logic. In some cases, the platform even processed the same image multiple times to satisfy different customer requirements, driving up cost and complexity.
The team made a strategic decision to move away from custom implementations and toward a configurable, JSON-driven workflow architecture. Built on event-driven microservices, queues, and coordination barriers, this new system allowed engineering teams to define and version entire processing flows through configuration rather than code.
This shift enabled safer deployments, faster experimentation, and gradual rollouts per client—without affecting the entire platform. It also introduced a standardized KPI layer, reducing the need for bespoke reporting logic across customers.
Roy also discusses the importance of human-in-the-loop validation in production AI systems. In a constantly evolving retail environment, human annotators help generate training data, validate model outputs, and maintain accuracy for high-stakes enterprise use cases where precision is critical.
For engineering leaders, this episode highlights a key lesson: when every customer forces new code paths, you’re not scaling a product—you’re scaling complexity.
Key Takeaways:
* Over-customization is a clear signal of architectural scaling limits
* Long onboarding cycles often reveal hidden system fragmentation
* Configurable workflows reduce dependency on per-client code changes
* Event-driven, JSON-based orchestration improves flexibility and deployment safety
* Gradual migration strategies reduce risk in enterprise system rewrites
* Standardizing KPI logic is as important as standardizing AI pipelines
* Human-in-the-loop systems remain essential in dynamic real-world AI environments
* Scalable platforms reduce variability instead of multiplying it
Connect with Roy Resh:
* LinkedIn: Roy Resh: linkedin.com/in/roy-resh [https://www.linkedin.com/in/roy-resh/]
Listen Now & Subscribe:
Apple Podcasts, Spotify, Amazon Music, or wherever you get your podcasts.
"Engineering Choices You Have to Defend explores the real technical decisions behind regulated software, compliance, and AI integration, helping leaders build secure, auditable, and user-friendly systems."