Show HN: Nyx – multi-turn, adaptive, offensive testing harness for AI agents

Posted by zachdotai |3 hours ago |8 comments

ibrahim-fab 3 hours ago[1 more]

Nice. Definitely true that evaluating agents behavior is by far the toughest part of building them. Also most eval cases are added without thought and not maintained when agent behaviour updates. Interesting approach.

azhassan1 an hour ago[1 more]

Where do you draw the line between this and coverage-guided fuzzing? A lot of what you describe (parallel, adaptive, finds edge cases in unbounded input spaces) maps cleanly onto the fuzzing playbook, which has decades of theory behind it - corpus management, mutation scheduling, minimization of found crashes.

Are you borrowing from that literature or treating agent testing as a distinct problem? Feels like there's real transfer available if you're not already pulling from it.

AmineAfia 2 hours ago[1 more]

Can I integrate this in my CI/CD pipeline?

aacudad 3 hours ago

I am not sure this will work, seems like added complexity to something simple

an hour ago

Comment deleted

ljhasdr 3 hours ago

i need to try this before mythos comes to attack our service. thanks!

adam_rida 3 hours ago

Very cool!