The 1979 Design Choice Breaking AI Workloads

Posted by za_mike157 |3 hours ago |19 comments

pocksuppet 2 hours ago

Clickbait title. Summary: Their AI docker containers are slow to start up because they are 10GB layers that have to be gunzipped, and gzip doesn't support random access.

andrewvc 2 hours ago[1 more]

They say an ideal container system would download portions of layers on demand, however is seems far from ideal for many production workloads. What if your service starts, works fine for an hour, then needs to read one file that is only available over the network, but that endpoint is unreachable? What if it is reachable but it is very very slow?

The current system has issues with network stuff, but in a deploy process you can delineate that all to a new container deployment. Perhaps you try to deploy a new container and it fails because the network is slow or broken. Rollback is simpler there. Spreading network issues over time makes debugging much harder.

The current system is simple and resilient but clearly not fast. Trading speed for more complex failure modes for such a widely distributed technology is hardly a clear win.

The de-duplication seems like a neat win however.

MontyCarloHall 2 hours ago[1 more]

I ran into a similar issue years ago, where the base infrastructure occupied the lion's share of the container size, very similar to the sizes shown in the article:

   Ubuntu base      ~29 MB compressed
   PyTorch + CUDA   7 – 13 GB
   NVIDIA NGC       4.5+ GB compressed

The easy solution that worked for us was to bake all of these into a single base container, and force all production containers built within the company to use that base. We then preloaded this base container onto our cloud VM disk images, so that pulling the model container only needed to download comparatively tiny layers for model code/weights/etc. As a benefit, this forced all production containers to be up-to-date, since we regularly updated the base container which caused automatic rebuilding of all derived containers.

dsr_ an hour ago[1 more]

The problem: "containers that take far too long to start".

Somehow, they don't hit upon the solution other organizations use: having software running all the time.

I suppose if you have a lousy economic model where the cost of running your software is a large percentage of your overall costs, that's a problem. I can only advise them to move to a model where they provide more value for their clients.

cosmotic an hour ago[2 more]

Why does the model data need to be stored in the image? Download the model data on container startup using whatever method works best.

alanfranz 2 hours ago

Looks like they'd like something git repositories (maybe with transparent compression on top) rather than .tar.gz files. Just pull the latest head and you're done.

formerly_proven 2 hours ago[1 more]

The gzip compression of layers is actually optional in OCI images, but iirc not in legacy docker images. The two formats are not the same. On SSDs, the overhead for building an index for a tar is not that high, if we're primarily talking about large files (so the data/weights/cuda layers instead of system layers). The approach from the article is of course still faster, especially for running many minor variations of containers, though I am wondering how common it is for only some parts of weights changing? I would've assumed that most things you'll do with weights would change about 100% of them when viewed through 1M chunks. The lazy pulling probably has some rather dubious/interesting service latency implications.

The main annoyance imho with gzip here is that it was already slow when the format was new (unless you have Intel QAT and bothered to patch and recompile that into all the go binaries which handle these, which you do not).

notyourbiz an hour ago[1 more]

Super helpful.

aplomb1026 2 hours ago

Comment deleted

PaulHoule 3 hours ago[1 more]

I remember dealing with this BS back in 2017. It was clear to me that containers were, more than anything else, a system for turning 15MB of I/O into 15GB of I/O.

So wow and new shiny though so if you told people that they would just plug their ears with their fingers.