--archive-cdx
Write a CDXJ index of the WARC’s records. Each line carries a canonicalized (SURT) URL key, a 14-digit timestamp, and a JSON block with the record’s URL, MIME, HTTP status, payload digest, byte offset, length, and the WARC’s filename. Lines are sorted, so replay tools like pywb can serve the archive by binary search instead of re-indexing it:
zshot -f site.warc --archive-cdx site.cdxj https://zshot-cli.comRequires a WARC: either warc output as above, or a --warc
sidecar alongside another output. Offsets address whole gzip members in the default
per-record-gzip WARC and raw byte spans with
--warc-no-gzip. Response, revisit, and resource
records are indexed; request and warcinfo records are not.
This is a Standard-tier flag.
On the HTTP server, request this as with_archive_cdx=true instead of a path. The index is
returned as an additional asset in a Link response header, like with_warc. See the API
reference for reading those links.