logo

Show HN: Docker pulls more than it needs to - and how we can fix it

Posted by a_t48 |2 hours ago |4 comments

PaulHoule an hour ago

Back in the early 2010s I couldn't bring up Docker images at all on my 2mbps DSL because any attempt to download images would time out.

theamk an hour ago

Reminds me of OSTree and casync.

danudey an hour ago[1 more]

If you're interested in implementing this directly into your dockerfiles with some minimal changes, Docker already supports this to a degree:

https://docs.docker.com/reference/dockerfile/#copy---link

The TL;DR:

If you change your dockerfile to use `COPY --link <foo> <bar>`, then docker will create a layer containing only the files that would be copied, and that layer is treated as independent of layers coming before it. The only caveat is that you need to have a build cache with previous builds and use --cache-from to specify it, which means saving build state.

That said, there are a lot of benefits you can get very quickly if you can implement it. For example, if you have a dockerfile which creates a container, builds your golang application in it, and then copies the result into a fresh alpine:3.23.3 image, and you use a local cache for that build, then when you update to alpine 3.23.4 it will see that the build layers have not changed, therefore the `COPY --link` layer has not changed. Thus, it can just directly apply that on top of the new alpine image without doing any extra work.

Apparently it can even be smart enough to realize that it doesn't need to pull down the new alpine:3.23.4 image; it can just create a manifest that references its layers and upload the manifest; the new alpine image layers are there, the original 'my application' layers are already there, so it just creates a new manifest and publishes it. No bandwidth used at all!

> How many copies of `python3.10` do I have floating around `/var/lib/docker`.

Well, if you use 'FROM python:3.10' for your images then only one.

If you're careful, you can sort of pull together contents of multiple images by using `COPY --link`, and then even if you have 10 layers then changing from python:3.10 to python:3.14 only changes one of them.

Again, this does require that you maintain a cache, but that cache can live in a lot of places that doesn't have to be the local filesystem: https://docs.docker.com/reference/cli/docker/buildx/build/#c...