logo

Autoresearch on an old research idea

Posted by ykumards |2 hours ago |35 comments

the_arun an hour ago

Try this if the main link is not responsive - https://archive.is/6xLiU

carlsborg an hour ago

> “ The agent acted like a hyperparameter optimization algorithm with some basic reasoning baked in.”

Good lens.

The crux of the auto research repo is basically one file - program.md which is a system prompt that can be summarized as “do this in a loop: improve train.py, run the training, run evals, record result. Favor simplicity”. The other files are an arbitrary ML model that is being trained.

_pdp_ 40 minutes ago[1 more]

Take some working code. Ask an LLM to fix bugs. Measure performance and test coverage. Feed the results back into the LLM. Repeat.

This has been the standard approach for more complex LLM deployments for a while now in our shop.

Using different models across iterations is also something I've found useful in my own experiments. It's like getting a fresh pair of eyes.

datsci_est_2015 an hour ago[3 more]

I often use LLMs to explore prior art and maybe find some alternative ways of thinking of problems. About 90% of what it tells me is useless or inapplicable to my domain due to a technicality it could not have known, but the other 10% is nice and has helped me learn some great new things.

I can’t imagine letting an agent try everything that the LLM chatbot had recommended ($$$). Often coming up in recommendations are very poorly maintained / niche libraries that have quite a lot of content written about them but what I can only imagine is very limited use in real production environments.

On the other hand, we have domain expert “consultants” in our leadership’s ears making equally absurd recommendations that we constantly have to disprove. Maybe an agent can occupy those consultants and let us do our work in peace.

jpcompartir an hour ago[2 more]

There are better techniques for hyper-parameter optimisation, right? I fear I have missed something important, why has Autoresearch blown up so much?

The bottleneck in AI/ML/DL is always data (volume & quality) or compute.

Does/can Autoresearch help improve large-scale datasets? Is it more compute efficien than humans?

1970-01-01 16 minutes ago

> The original paper used several medical X-ray datasets which I don’t have access to anymore, so I needed a new dataset with spatial annotations to test the expert attention mechanism. I picked the Ukiyo-eVG dataset: ~11K Japanese woodblock prints

That's such a weird switch. There's lots of free medical imaging online. Example: https://www.cancerimagingarchive.net/

love2read an hour ago

So... It did work. It found bugs (that he didn't know about) and it did optimization (that he hadn't done).

dvt an hour ago[2 more]

Ok, so looking at the commit log[1], I was mostly interested in seeing what the "moonshot ideas" implementations looked like, but basically everything is just hyperparameter tuning. Which is nice, but likely not worth the $$$ spent on the tokens. Am I missing something here?

[1] https://github.com/ykumards/eCLIP/commits/main/autoresearch

motbus3 12 minutes ago

I've done something with a small project I have and I had very similar results overall.

lucasay 29 minutes ago[1 more]

This feels less like automated research and more like structured trial and error with a decent feedback loop. Still useful, but I think the real bottleneck is how good your eval metric is. If that’s weak, the whole loop just optimizes for the wrong thing faster.

lamroger an hour ago[1 more]

Awesome breakdown! It really feels like a hyper-hyper parameter search + bug fixer.

I started looking at Kaggle again and autoresearch seems to converge to many of the solution vibes there.

Wild ensembles, squeezing a bit of loss out. More engineering than research IMO

BrokenCogs an hour ago[6 more]

Does autoresearch work for projects that are not llm based? Eg in karpathy's example he is optimizing the nanogpt. What if I wanted to improve a Unet for image segmentation?

Achiyacohen 14 minutes ago

Comment deleted

edwardsrobbie 19 minutes ago

Comment deleted

nadavdebi an hour ago

[flagged]