Providers

The same connection string configures both the llm formatter (--llm-format-model) and the navigator (--navigator-type). It selects an LLM backend and the model that runs on it. This is a Pro-tier capability.

Connection string format

protocol://[key@][host]/model[?options]

protocol picks the backend (see below).
key is the API key for hosted providers, sent as key@.
host overrides the default endpoint, for self-hosted or proxied APIs.
model is the model name, or a file path for local.
options are query parameters that tune the request.

A bare protocol with no URL is also valid and uses that backend’s defaults, e.g. local.

Providers

The provider set is identical for the formatter and the navigator.

Protocol	Backend	Key
`local`	llama-cpp, running a GGUF model file on the host	none
`ollama`	Ollama, local or remote	none
`anthropic`	Anthropic Claude	required
`openai`	OpenAI	required
`google`	Google Gemini	required
`mistral`	Mistral	required
`deepseek`	DeepSeek	required
`groq`	Groq	required
`huggingface`	HuggingFace Inference	required

Query options

Option	Applies to	Effect
`max_tokens`	all	Cap on the completion length.
`temperature`	all	Sampling temperature. Omitted unless set, so the provider’s own default applies.
`top_p`	all	Nucleus sampling cutoff. Omitted unless set, so the provider’s own default applies.
`vision`	all	Force vision input on or off.
`insecure`	hosted	Use `http` instead of `https` for a custom `host`.
`gpu_layers`	`local`	Model layers to offload to the GPU.
`ctx_size`	`local`	Context window size.

max_tokens caps the model’s completion length. When omitted, most backends send no cap and the provider applies its own high default. Anthropic is the exception: its Messages API requires the field, so zshot supplies a default of 4096 — generous enough that ordinary page summaries finish cleanly.¹ Raise it with ?max_tokens=8192 for very long extractions.

Examples

A hosted Anthropic model:

zshot -t llm_json \
  --llm-format-model "anthropic://$ANTHROPIC_API_KEY@/claude-haiku-4-5" \
  --llm-format-prompt "Summarize this page." \
  https://example.com -f out.json

A local GGUF model driving the navigator:

zshot --navigate "Click the News link" \
  --navigator-type "local:///path/to/model.gguf?gpu_layers=1" \
  https://example.com

When a completion does hit the cap, zshot reports “LLM output truncated at the N-token completion limit” and suggests a higher value, rather than a confusing parse error. ↩︎