AI Coding at Home Without Going Broke

Posted by sbochins |3 hours ago |108 comments

tunesmith 2 hours ago[10 more]

I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.

I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.

I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).

I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.

closeparen 4 minutes ago[1 more]

>Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling. The plans are metered, and any large AI native workflow will chew through the included tokens fast

I don't think that's true at all. I'm doing 8-12 PRs a week at work, all primarily Claude Code, and the usage at API billing has never broken $500/mo.

isatty 3 hours ago[8 more]

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

Power is not free.

What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.

mikgp 44 minutes ago

What are people doing at home? I have like 5 different apps I code on the $20/month Claude plan and like sure I can hit rate limits but - What are people doing to burn through $3k in tokens?

mwcampbell 2 hours ago[2 more]

I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware?

atreids 3 hours ago[1 more]

I find just going via Deepseek's platform API directly, using their V4 flash model, and hooking into a harness like Opencode more than acceptable. Think I've spent maybe $10 over a couple of weeks.

I did explore self-hosting models but hardware right now is just too expensive.

bachmeier an hour ago[1 more]

> The upfront cost is steep and the models you can actually run at home are weaker than what the frontier labs ship, so this only pays off if you can keep the rig busy with long running tasks where a slower, cheaper model grinds away overnight. Most people can’t keep a home machine that loaded, and the hardware you buy today may look like a bad bet in a year.

Oh, so this is not a post about AI coding at home. It's about vibe coding at home.

There's a lot I disagree with in this post, but I'm posting this from a home computer with 64 GB of RAM and no GPU. I do lots of AI coding while spending very little money. I run Gemma 4 26b (mixture of experts) and Qwen 3 coder with Ollama. I use Github Copilot code completions. I use the Gemini and Mistral API free tiers. I have a Gemini paid API account. It's now prepaid, so you don't have to worry about an accidental $1000 bill. You can do a lot of things with Gemini Flash Lite 3.1.

None of this is burning through tokens to create an expensive blob of spaghetti code, but it does qualify as AI coding.

esalman 2 hours ago[5 more]

For me, investing in hardware seems to be the way to go.

I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.

If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.

vadansky 2 hours ago[6 more]

Can I run something comparable to Opus 4.6 locally yet? I keep hearing conflicting things. If I can spend 10k to do that I would cancel my subscription. The problem is I don’t wanna spend the money to find out myself.

0xB0D 12 minutes ago

If your job becomes writing complex specs to make an LLM write code, you've not optimised anything.

In fact all you've done is add a business cost.

geophph an hour ago[1 more]

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

What does this look like after 6-12 months? Like, how much code are you trying to write total?

Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.

RomanPushkin 2 hours ago[1 more]

AI coding at home literally costs $100/month. I'm wondering where $400 is coming from? $100 is more than enough for "coding at home", IMO. I rarely face the limits, and when I do it's just a time for a quick walk anyway.

hillj23 an hour ago

I think this is only going to become more relevant. I'm personally a $200/mo Claude Maxer and I know that the usage I'm getting on Opus 4.8 Max and (until they yoked it out from under me) Fable 5 is way, way more than what I'm paying them. At some point, this will turn usage-based and I will be hammered on it and probably forced to look at self-hosting. I think while the caps are there, even at $200, it's honestly not too bad if you're coding value into the market, but as soon as those caps come off for retail AI users, we're all going to have some tough choices to make.

thomasjb 31 minutes ago

Opencode's free models have been fine for me, they're what I tried after Gemma 4 8B proved hard to persuade into usefulness (I want to revisit with 12B and messing with harnesses, but I'm happy for now).

pianopatrick 2 hours ago[2 more]

I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.

impure 2 hours ago[2 more]

I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.

MemoryHoleHQ 2 hours ago

I've been thinking a lot about this and my personal take right now is that at some near-medium future the models abvailable to run at home and the hardware needed to use them will be enough.

My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.

At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.

That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.

I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.

pshirshov an hour ago

> and the hardware you buy today may look like a bad bet in a year.

3090s and 7900s are going well so far.

Next year an Arc Pro B70 won't produce you less tokens than today.

They aren't fast but if you have flows where you can make money with them - they are a bargain in terms of price per Gb.

abc42 2 hours ago

What kind of usage chews through Claude Max x20? I use several agents with max effort in parallel and usually end up with something like 50% weekly usage. Fable almost allowed me to get to 70% but then they started resetting the limits mid-week and of course now ended the whole thing.

quickthoughts 2 hours ago

Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.

1: https://news.ycombinator.com/item?id=48519181

spgorbatiuk 2 hours ago

Hardware and provider juggling is a way to go, although I think it is also worth mentioning that the cost is not only the price-per-token, but first of all, the amount of tokens used.

Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built

WhiteOwlLion 2 hours ago

There’s a lot of Xeon chips for $10 on eBay. Too bad there’s no drive for cpu based inference. The data center will need to swap out the older gpu clusters so what does that do for hardware pricing on data center gpus? H100 are cheap enough but the power requirements make it a long term net negative for how much pay for power in California.

Kuyawa an hour ago

This month I've spent only 15 cents using DeepSeek API and my own coding agent. Three apps delivered to clients and currently working on a tournament management app for pickleball, padel and beach tennis. I love DeepSeek.

2 hours ago

Comment deleted

dempedempe 2 hours ago

Did you just copy-and-paste an AI response an post it on your blog?

13415 2 hours ago[1 more]

I use copy & paste with a pro subscription. I guess I'm a bit behind in terms of tool use but it works great for me.

jacobgold 2 hours ago[2 more]

"Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling."

I realize this text is just slop but it never stops being a "real bargain" at any point.

And it's more like $200/mo for $4000+/mo in tokens. You can also buy additional subscriptions.

There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.

OutOfHere 2 hours ago[2 more]

Fixed-price monthly plans ought to be sufficient for most people who actually review their spec and code, for building production-grade software that stand the test of time. A careful spec+review+iteration takes time, resetting the usage quota. Granted, security audits uses tokens too.

If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.

sesm 2 hours ago

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

As usual, an extraordinary claim without an extraordinary evidence: https://stephen.bochinski.dev/apps/

jrm4 2 hours ago

Is spending (metered money) even worth it? Perhaps for most I mean "beyond like a 30 bucks a month," but for me I'm literally not spending more money beyond my very cheapo 16gb video card.

No clue what y'all are doing, perhaps because I'm hobbying, and also I'm old and can perhaps do more of this by hand.

But I'm basically just doing what I did before, plus ollama self hosted and sometimes gemini and I feel like I'm going lightspeed beyond what I've ever done.

And I suppose this is still very fine-grained. I have it make a draft, then just have them fix/change it step by step?

I tried one of the bigger boys that can one-shot apps, which I guess is cool, but I'm finding it's just as hard to modify as if I just grabbed someone elses repo on github.

tamimio 2 hours ago

You can have opencode and switch between multiple providers based on the tasks you are doing on the fly, normal tasks use deepseek for example, hard one use gpt5 or opus4, and track the usage with something like codexbar or similar. Openrouter seems to charge extra on top of the api costs, same with zen ide, so keep that in mind.

gaigalas 2 hours ago

> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.

In the good ol' days, we bought machines not only to run stuff, but to experiment.

I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.

*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.

So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.

zuzululu 2 hours ago

Another update for codex users they let you accumulate resets which greatly adds to the mileage

I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs

KaiShips 36 minutes ago

[flagged]

reinitctxoffset 2 hours ago

Comment deleted

aplomb1026 2 hours ago

[flagged]

ricodebug 2 hours ago

Comment deleted