Senior SDET, a decade building test platforms.
01. GenAI
View All
Series
Evals Are Checks, Not Tests
GENAI_TESTING
The Golden Dataset: Building the Oracle You Test Against
Nothing downstream beats the oracle. Before a metric, before a judge, before a regression gate, you build the small set of human-verified cases that says what correct means, and every choice inside it carries a measured cost.
28 MIN READ
GENAI_TESTING The Harness Is the Product: Models Are a Commodity
The teams shipping agents don't have a better model; they have a better harness. Five properties that make one trustworthy around a LangGraph and MCP agent.
GENAI_TESTINGThe System Under Test: A Broadband Support Agent
Meet Atlas: a broadband support agent on LangGraph and MCP, mapped against OWASP's 10 agentic risks and the real incidents that prove each part can fail.
GENAI_TESTINGYour Evals Are Checks, Not Tests
Air Canada's chatbot cost CAD $812 for an answer its evals scored as faithful. Five classical software-testing patterns catch what your eval dashboard misses.