Freeze the Web Into a Single Self-Contained Binary
Kage uses headless Chrome to clone websites, strip their JavaScript, and pack them into zero-dependency executables.
We have all been there. You hit "Save As" on a valuable technical resource, a documentation page, or an essay, only to open it six months later and find a blank screen, a broken layout, or an infinite loading spinner. The modern web is no longer made of documents; it is made of thin clients executing complex, ephemeral JavaScript. When the third-party API or tracking server goes dark, the saved page dies with it.
Enter Kage (影, meaning "shadow"), an open-source tool written in Go designed to solve this exact problem. Instead of simply downloading raw source HTML, Kage drives a real browser to capture a fully rendered snapshot of a website, strips out every single line of JavaScript, and packages the remaining static assets into a single, self-contained binary or archive that you can run offline forever.
The Headless Execution Strategy
Traditional web scrapers often fail on modern single-page applications (SPAs) because they do not execute client-side JavaScript. Kage takes the opposite approach. It spins up a headless instance of Chromium or Chrome, navigates to the target URL, and waits for the page to settle.
Once the page has fully rendered, Kage snapshots the DOM exactly as a human reader would see it. It then performs a series of sanitization steps:
- JavaScript Stripping: Every
<script>tag and inline event handler is completely removed. - Asset Localization: It rewrites URLs for CSS, images, and fonts, downloading those assets to local paths.
- Zero Network Footprint: The resulting files run zero code, make no external API calls, and contain no tracking scripts.
Because Kage relies on a real browser, it requires Chrome or Chromium on the host system. It automatically detects system installations, but developers can specify a custom path using the --chrome flag or the KAGE_CHROME environment variable. For environments without a local browser, Kage is also distributed as a Docker container that bundles Chromium out of the box.
A Polite, Idempotent Crawler
Cloning a single page is useful, but archiving an entire site requires a robust crawling engine. Kage implements a breadth-first crawler that is designed to be both efficient and polite.
By default, the crawler respects robots.txt rules and seeds its queue using the site's sitemap.xml. It is also highly idempotent: pages are keyed by the files they write, meaning that duplicate paths (such as variations between HTTP and HTTPS, or trailing slashes) are only fetched once.
If you interrupt a crawl with Ctrl-C, Kage gracefully saves its state. Running the command again resumes the crawl from where it stopped. For updating existing archives, the --refresh flag re-renders pages in place to capture updates, while --force wipes the local mirror and starts clean.
To handle modern web design patterns, Kage includes several specialized crawling flags:
--scroll: Automatically scrolls down each page during rendering to trigger lazy-loaded images and dynamic content.--workers: Controls concurrency (defaulting to 4 parallel workers).--max-depthand--max-pages: Prevents infinite crawl loops on highly dynamic sites.--scope-prefix: Restricts the crawl to specific subpaths (e.g.,/doc).
Packaging to ZIM and Self-Serving Binaries
Once a site is cloned, Kage writes a standard directory structure of HTML, CSS, and images. You can preview this folder locally using kage serve, which launches a lightweight static file server on port 8800.
However, managing thousands of loose files is not ideal for long-term archiving or sharing. Kage solves this with its pack command, which compresses the entire cloned directory into a single file.
Developers can choose between two output formats:
- ZIM Archives: A standard format for offline content. These can be read back using
kage open <file.zim>or other third-party ZIM readers. - Self-Contained Binaries: By passing the
--format binaryflag, Kage compiles the cloned site and a minimal web server into a single executable binary.
This compiled binary has zero external dependencies. You can copy it to a server, hand it to a colleague, or store it on a thumb drive. Running the binary immediately spins up a local server and hosts the archived site, ensuring that your documentation or reference material remains readable decades into the future, completely independent of the original host.
Sources & further reading
Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.
Discussion 0
No comments yet
Be the first to weigh in.