logo

Less human AI agents, please

Posted by nialse |4 hours ago |64 comments

gregates 3 hours ago[9 more]

The version of this I encounter literally every day is:

I ask my coding agent to do some tedious, extremely well-specified refactor, such as (to give a concrete real life example) changing a commonly used fn to take a locale parameter, because it will soon need to be locale-aware. I am very clear — we are not actually changing any behavior, just the fn signature. In fact, at all call sites, I want it to specify a default locale, because we haven't actually localized anything yet!

Said agent, I know, will spend many minutes (and tokens) finding all the call sites, and then I will still have to either confirm each update or yolo and trust the compiler and tests and the agents ability to deal with their failures. I am ok with this, because while I could do this just fine with vim and my lsp, the LLM agent can do it in about the same amount of time, maybe even a little less, and it's a very straightforward change that's tedious for me, and I'd rather think about or do anything else and just check in occasionally to approve a change.

But my f'ing agent is all like, "I found 67 call sites. This is a pretty substantial change. Maybe we should just commit the signature change with a TODO to update all the call sites, what do you think?"

And in that moment I guess I know why some people say having an LLM is like having a junior engineer who never learns anything.

js8 3 hours ago[1 more]

A very human thing to do is - not to tell us which model has failed like this! They are not all alike, some are, what I observe, order of magnitude better at this kind of stuff than others.

I believe how "neurotypical" (for the lack of a better word) you want model to be is a design choice. (But I also believe model traits such as sycophancy, some hallucinations or moral transgressions can be a side effect of training to be subservient. With humans it is similar, they tend to do these things when they are forced to perform.)

hausrat 2 hours ago[2 more]

This has very little to do with someone making the LLM too human but rather a core limitation of the transformer architecture itself. Fundamentally, the model has no notion of what is normal and what is exceptional, its only window into reality is its training data and your added prompt. From the perspective of the model your prompt and its token vector is super small compared to the semantic vectors it has generated over the course of training on billions of data points. How should it decide whether your prompt is actually interesting novel exploration of an unknown concept or just complete bogus? It can't and that is why it will fall back on output that is most likely (and therefore most likely average) with respect to its training data.

raincole 3 hours ago[3 more]

I know anthropomorphizing LLMs has been normalized, but holy shit. I hope the language in this article is intentionally chosen for a dramatic effect.

plastic041 3 hours ago[2 more]

> There was only one small issue: it was written in the programming language and with the library it had been told not to use. This was not hidden from it. It had been documented clearly, repeatedly, and in detail. What a human thing to do.

"Ignoring" instructions is not human thing. It's a bad LLM thing. Or just LLM thing.

richsouth an hour ago

Shocker - these agents aren't actually intelligent. They take best guesses and use other peoples' work it deems 'close enough' and cobbles something together with n 'thought' behind it. They're dumb, stupid pieces of code that don't think or reason - The 'I' in 'AI' is very misleading because it has none.

lexicality 3 hours ago[1 more]

The entire point of LLMs is that they produce statistically average results, so of course you're going to have problems getting them to produce non-average code.

mentalgear 3 hours ago[2 more]

Yes, LLMs should not be allowed to use "I" or indicate they have emotions or are human-adjacent (unless explicit role play).

vachanmn123 3 hours ago

I've seen this way too many times as well. I wrote about this recently: https://medium.com/@vachanmn123/my-thoughts-on-vibe-coding-a...

2 hours ago

Comment deleted

bob1029 3 hours ago[1 more]

If you want to talk to the actual robot, the APIs seem to be the way to go. The prebuilt consumer facing products are insufferable by comparison.

"ChatGPT wrapper" is no longer a pejorative reference in my lexicon. How you expose the model to your specific problem space is everything. The code should look trivial because it is. That's what makes it so goddamn compelling.

aryehof 3 hours ago

For agents I think the desire is less intrusive model fine-tuning and less opinionated “system instructions” please. Particularly in light of an agent/harness’s core motivation - to achieve its goal even if not exactly aligned with yours.

hughlilly 2 hours ago[1 more]

* fewer.

DeathArrow 3 hours ago

>Faced with an awkward task, they drift towards the familiar.

They drift to their training data. If thousand of humans solved a thing in a particular way, it's natural that AI does it too, because that is what it knows.

4 hours ago

Comment deleted

jansan 3 hours ago[1 more]

I disagree. I wan't agents to feel at least a bit human-like. They should not be emotional, but I want to talk to it like I talk to a human. Claude 4.7 is already too socially awkward for me. It feels like the guy who does not listen to the end of the assignment, run to his desks, does the work (with great competence) only to find out that he missed half of the assignment or that this was only a discussion possible scenarios. I would like my coding agent to behave like a friendly, socially able and highly skilled coworker.

chrisjj 2 hours ago

> ... or simply gave up when the problem was too hard,

More of that please. Perhaps on a check box, "[x] Less bullsh*t".

DeathArrow 2 hours ago

>So no, I do not think we should try to make AI agents more human in this regard. I would prefer less eagerness to please, less improvisation around constraints, less narrative self-defence after the fact. More willingness to say: I cannot do this under the rules you set. More willingness to say: I broke the constraint because I optimised for an easier path. More obedience to the actual task, less social performance around it.

>Less human AI agents, please.

Agents aren't humans. The choices they make do depend on their training data. Most people using AI for coding know that AI will sometime not respect rules and the longer the task is, the more AI will drift from instructions.

There are ways to work around this: using smaller contexts, feeding it smaller tasks, using a good harness, using tests etc.

But at the end of the day, AI agents will shine only if they are asked to to what they know best. And if you want to extract the maximum benefit from AI coding agents, you have to keep that in mind.

When using AI agents for C# LOB apps, they mostly one shot everything. Same for JS frontends. When using AI to write some web backends in Go, the results were still good. But when I tried asking to write a simple cli tool in Zig, it pretty much struggled. It made lots of errors, it was hard to solve the errors. It was hard to fix the code so the tests pass. Had I chose Python, JS, C, C#, Java, the agent would have finished 20x faster.

So, if you keep in mind what the agent was trained on, if you use a good harness, if you have good tests, if you divide the work in small and independent tasks and if the current task is not something very new and special, you are golden.

lokthedev 3 hours ago

Comment deleted

incognito124 3 hours ago[2 more]

Your claim, paraphrased, is that AGI is already here and you want ASI

zingar 2 hours ago[2 more]

I think the author is looking for something that doesn't exist (yet?). I don't think there's an agent in existence that can handle a list of 128 tasks exactly specified in one session. You need multiple sessions with clear context to get exact results. Ralph loops, Gastown, taskmaster etc are built for this, and they almost entirely exist to correct drift like this over a longer term. The agent-makers and models are slowly catching up to these tricks (or the shortcomings they exist to solve); some of what used to be standard practice in Ralph loops seems irrelevant now... and certainly the marketing for Opus 4.7 is "don't tell it what to do in detail, rather give it something broad".

In fairness to coding agents, most of coding is not exactly specified like this, and the right answer is very frequently to find the easiest path that the person asking might not have thought about; sometimes even in direct contradiction of specific points listed. Human requirements are usually much more fuzzy. It's unusual that the person asking would have such a clear/definite requirement that they've thought about very clearly.