logo

Trajectly – deterministic regression tests for AI agents

Posted by ashmawy |2 hours ago |1 comments

ashmawy 2 hours ago

Hi HN — I built Trajectly, a tool for deterministic regression testing of AI agents.

Problem: agent “evals” are often flaky (network, time, tool nondeterminism, model drift), so it’s hard to tell if a change actually broke behavior.

What Trajectly does:

records an agent run once (inputs, tool calls, outputs)

replays it deterministically offline as a test fixture (so CI is stable)

checks a TRT “contract” (allowed tools/sequence, budgets, invariants, etc.)

when something breaks, it pinpoints the earliest violating step and can shrink the run to a minimal counterexample

You can try it locally (no signup):

pip install trajectly

run one of the standalone demos:

procurement approval agent demo

support escalation agent demo (or clone the main repo and run the GitHub Actions example)

Repo: https://github.com/trajectly/trajectly

I’m around to answer questions. I’d love feedback on:

what contract checks would be most useful in real agent deployments?

integrations you’d want first (LangGraph / LangChain / custom tool runners)?

whether the “shrink to minimal failing trace” output is understandable.