logo

How to Make a Good Terminal Bench Task

Posted by neversupervised |2 hours ago |1 comments

2 hours ago

Comment deleted

neversupervised 2 hours ago

I've been a contributor and reviewer for terminal bench since last August, and this post is about what I've learned designing and reviewing tasks. The guidance is broadly applicable to anyone building an agentic benchmark.I would love feedback from the HN community.