Show HN: Kage – Shadow any website to a single binary for offline viewing

Posted by tamnd |2 hours ago |27 comments

maxloh 2 hours ago[4 more]

I find SingleFile [0] to be a much more robust version of this.

It strips out all the JavaScript too, but also packs everything into a single HTML file that is easy to transfer. Binary assets (like web fonts and images) are packed as base64 strings.

They also offer a CLI powered by Puppeteer. [1]

[0]: https://github.com/gildas-lormeau/singlefile

[1]: https://github.com/gildas-lormeau/single-file-cli

wolttam an hour ago[1 more]

One use I'd have for this is company wikis that you want to give folks easy offline access to (maybe the wiki has documentation that's useful at sites that don't have cellular coverage).

Cool!

It would be especially cool to have a version that didn't require the separate serving process - even though it's nifty you can package up a whole site as a single binary.

Maybe a single HTML entrypoint shim with a bit of javascript that could index into an archive (potentially embedded) of the site's content?

21 minutes ago

Comment deleted

gregwebs an hour ago[1 more]

This seems like it has potential to create a lot of load on a site- are there settings to set how fast it clones or avoid images/videos? Is there a way to only get a subset of a website?

telesilla 20 minutes ago

I've been using httrack (https://www.httrack.com) to download wikis to read on flights, which isn't perfect but better than I'd found previously. I'll try this out, I'd be delighted to have good results. Thanks for the post.

Igor_Wiwi 30 minutes ago

This is quite useful tool, especially for the cases where internet access is limited (the flights for example). I implemented it as a separate feature in mdview.io: for example you can export a document as a html file for offline usage, with all the presentation features like reach tables, mermaid and etc built in. Example https://mdview.io/s/why-markdown-became-default-format-for-a... then try to Export - Export HTML

sanqui an hour ago[2 more]

Cool concept. I would like to see this combined with mitmproxy for archive grade fidelity. You could be saving exactly the data served and at the same time a representation by a modern (contemporary) browser, with all JS having run. This combination would be my perfect replacement for the WARC format.

dimiprasakis an hour ago

Neat project, I like the idea. One thing from a quick read: you launch Chrome with --no-sandbox. Is there a good reason for that? Security wise it's probably not a good idea. If there is no reason, I'd suggest leaving the sandbox on!

In any case, cool stuff :)

lolpython an hour ago

This is cool. I could see myself downloading the articles behind the first couple pages of hacker news with this, for viewing on a flight or long distance train ride with spotty internet

29 minutes ago

Comment deleted

daviding an hour ago

Nice idea! fwiw, false positives and all, but the Windows 11 default Windows Security doesn't like it: `leakless.exe: Operation did not complete successfully because the file contains a virus or potentially unwanted software.`

rahimnathwani an hour ago[1 more]

So this is like using wget --mirror except that it works on pages that require javascript, right?

delduca 36 minutes ago

curl can do this

grahamstanes17 an hour ago

nice