Overview

Concepts

hev ask rests on one idea: the digest is just a directory. Your docs are distilled into a committed tree of markdown files, one per section, and that tree is read two ways: a coding agent navigates it with the file tools it already has, and a ⌘K overlay reads it server-side to synthesize a grounded answer for a human. Both climb the same tree progressively. This page explains the tree, the ladder over it, and the two readers.

  ask digest build
      glob collections → chunk by headings → distil each section (Opus 4.8)
      one markdown file per section · hash-gated, incremental
      │
      ▼
  .hev-ask/                a committed, distilled mirror of your docs
  ├─ _meta.md              overview · context · suggestions · content hash
  ├─ _glossary/
  │  └─ digest.md          a term · its aliases · its definition
  ├─ overview/
  │  ├─ quick-start.md     one markdown + frontmatter file per section:
  │  └─ limits.md          title · summary · body · facts · url#anchor
  └─ api/
     └─ cli.md
      │
      ▼
  read three ways
  ├─ ⌘K overlay · humans   keyword search + a grounded answer · synthesis
  ├─ ask CLI    · agents   tree · ls · head · cat · facts · grep · keyless
  └─ ask mcp    · agents   one tool hydrates the tree; the agent reads it

The split that matters is that the tree is built offline with a strong model and committed to git, while the readers run on demand: the overlay at the edge, the agent on its own machine. No durable state lives in the running site.

Chunks and anchors

hev ask indexes sections, not pages. Each document is split on its headings (up to a configurable depth, default ## and ###). Content before the first heading becomes the intro chunk whose URL is the page itself.

Each chunk carries the section’s heading, its cleaned prose, and a URL of the form basePath + slug + #anchor. The anchor is generated with github-slugger by default, the same slugger Astro and GitHub-flavored renderers use, so the link lands on a heading that actually exists in the rendered HTML. Each framework adapter declares its slug scheme, so the rule generalizes: the Docusaurus adapter, for instance, also honors explicit {#custom-id} headings.

Both code paths chunk through the same function: the offline build feeds it disk-parsed markdown, and the Astro runtime index feeds it getCollection entries. One source of truth for slugs means the anchors agree — and because the build reads files, not a renderer, the same digest comes out whatever framework renders the docs.

Host-neutral: one digest, any framework

Everything above happens before a renderer touches your docs. The build reads markdown off the filesystem, chunks on headings, derives anchors in code, and writes the .hev-ask/ tree — none of it imports Astro. So the digest is the same artifact whether Astro, Docusaurus, VitePress, or MkDocs renders the pages. What differs per framework is only the adapter — the thin glue that builds the digest at the right time and wires the overlay in:

Astro (the flagship adapter). The hevAsk() integration is batteries included: it builds the digest during astro build, mounts the /api/ask endpoint as an on-demand route, and ships SearchOverlay.astro. One config block, one component.
Every other framework. Two host-neutral primitives stand in for the integration:
- the static overlay — a <script> plus web component that reads the committed digest tree and a prebuilt keyword index. No server, no API key, no framework binding; it drops into Docusaurus, VitePress, MkDocs, or hand-written HTML. This is the keyword path, fully static.
- the hostable endpoint — the bounded agentic loop (below) as a standalone deployable: a Cloudflare Worker, a Node or Vercel function. The overlay points at it with one config attribute. This is the agentic path, decoupled from any one framework’s routing.

The CLI and MCP surfaces are host-neutral already: they read the committed tree directly, so a coding agent gets the same tree/cat/grep over your docs no matter what renders them. See the drop-in overlay for the concrete per-framework setup.

The ask digest directory

The build distils each section into a small markdown + frontmatter file and writes the whole .hev-ask/ tree — a committed, reviewable mirror of your docs:

overview/, api/, … — one file per section, mirroring your doc paths. The body is the distilled summary and prose; the frontmatter carries the title, the url+anchor to cite, the verbatim facts (flags, code, identifiers), the sources, the terms, and the section’s content hash.
_glossary/ — one file per term, with its aliases and definition.
_meta.md — the overview, orientation context, suggested questions, and the digest’s content hash and version.

There is no committed JSON. The artifact is markdown all the way down, so a section’s distilled prose and its grounded facts change together in one diff, reviewable in the PR that changes the docs. The in-memory representation the overlay and CLI build from the tree is the same either way — the tree is the format, not a new runtime.

Progressive disclosure as a directory

Because the digest is a real directory, progressive disclosure is just the directory’s own cost model. An agent already knows a listing is cheap and cat might be large; hev ask leans on that intuition instead of inventing a new one. The reads form a disclosure ladder, each rung a strictly larger slice of one section file:

  progressive disclosure as a directory — each verb a larger slice of one section

  tree [--depth] ▸  titles only, the whole map          ·  cheap, safe to skim first
       │              (it's a directory — real ls / head work too)
       ▼
  cat <path>     ▸  the full distilled section body     ·  opt-in, one section at a time
       │
       ▼
  facts <path>   ▸  verbatim flags / code / identifiers ·  grounded literals + url#anchor
                    + sources + terms                     to cite back to the live page

Two properties make the ladder real. A listing reads frontmatter only, so tree can never leak a body — its output is bounded by the number of sections, not the size of the docs, which is what makes it safe to call speculatively; it defaults to a couple of levels deep so even a large site stays skimmable. And every deeper rung is an explicit verb: cat for the body, facts for the grounded literals. Nothing larger than a title is ever returned by surprise.

A coding agent climbs this with its own tree/cat/grep once it has the tree on disk (see MCP) — the digest is plain markdown, so ls and head work on it directly. The ask CLI offers one verb per operation for the keyless and remote cases (see the CLI reference).

Keyword search and the glossary

The overlay’s instant path, like the CLI’s grep over a remote tree, runs a dependency-free prefilter over the chunks:

Expand the query. Each term picks up its glossary aliases (k8s → kubernetes) and the tokens of any matched glossary term. The expansion is additive and capped.
Score by token overlap between the expanded query and each section — widened by the digest. A match also counts against that section’s distilled summary, its terms, and its verbatim facts, so the sections the digest considers central to a term rank above an incidental mention buried in body prose.
Cap per document (default 2 sections per doc) so one long page can’t monopolize the results, then take the top pool.
Excerpt around the first matched term for the snippet.

This needs no API key and no embeddings — just the section index and the committed tree. With no tree it degrades to plain token overlap over the raw chunk text, so keyword search always works.

Asking is the default

The overlay is ask-first. A single word is treated as a keyword lookup and answered instantly from the index. The moment the query grows past one word (the reader types a space) the overlay stops the keyword type-ahead and switches to ask mode: pressing Enter sends the question to the agentic loop. On open it also shows a few suggested questions (baked into _meta.md at build time, so they cost nothing at runtime) to make asking the obvious move.

None of this is forced. A reader can flip the overlay to keyword-only (persisted in localStorage); then a space just searches for a phrase and the model is never called. See the overlay reference for the exact interaction.

The agentic search loop

The overlay answers humans, so it does the synthesis an agent would otherwise do itself. When the reader asks (presses Enter on a multi-word query, or clicks a suggested question, with an API key present), the query goes to a bounded tool-use loop that ends by streaming a grounded answer. The loop is the same disclosure ladder, climbed server-side over the in-memory tree. It runs in two phases.

Phase 1 — gather. The model is given the title-tree of every section up front, plus one tool:

open_section({ id }) — opens a section to read its distilled summary, its verbatim facts, and, for reference sections, its source text. The model opens the sections it needs; each open is streamed to the overlay as a faint activity line. It may only cite sections it opened.

The model decides when it has enough context. It opens up to maxIterations rounds (default 4); when it stops opening, it’s ready to answer.

Phase 2 — answer. The accumulated sources are sent to the overlay (so it can validate links), then the model is called once more with no tools — so it can only write prose — and its answer is streamed token-by-token. The system prompt instructs it to ground every claim in the retrieved sections, link to them inline using their exact url, and say plainly when the docs don’t cover the question. Dropping the tools on the final turn is what guarantees the model answers instead of searching again.

The system prompt is cached

The title-tree and section summaries are injected into the system prompt with a cache_control marker, so across the rounds it’s a prompt-cache hit rather than re-sent tokens. The answer turn changes the tool set (it has none), so it can’t reuse the search rounds’ cache — but it’s the last call anyway. The loop model defaults to Claude Haiku 4.5 and is configurable.

Two ways to build the tree

Only the section summary, the glossary, the orientation context, and the suggestions are model-authored; the tree structure, the verbatim facts, the overview, the per-section hashes, and the anchors are derived deterministically in code. So the model only ever supplies the distillation — which is exactly the part worth doing in a Claude Code skill.

Claude Code skill (recommended). A skill walks Claude through reading the corpus and writing the distillation, then assembles the .hev-ask/ tree locally. It runs inside your existing Claude Code subscription (no provider API key, no per-build token spend on your own key) and fits the editor workflow most authors are already in.
ask digest build (fallback). One provider API key call to Claude Opus 4.8 (default) does the same distillation unattended — the right choice for CI or anyone not using Claude Code.

Either way the build is incremental and hash-gated: each section carries the hash of the content it was distilled from, so a rebuild only re-distils the sections whose content actually changed, and a clean tree does no model work at all. The tree is reviewed in pull requests like any other change. See the digest reference and the CLI reference.

Degradation, by design

hev ask is built to keep working as pieces drop away:

No key at runtime → keyword mode only. The overlay still searches.
No key at build → the committed tree is kept; the build warns but never fails for lack of a key.
No .hev-ask/ tree → the agentic loop falls back to keyword-style retrieval; keyword ranking falls back to raw token overlap; and the overlay simply shows no suggested questions. Everything still works.
Stale tree → the runtime logs a one-line warning when the live index hash differs from _meta.md’s, but still serves.

For the boundaries of what it can do, read Limits; for what you’re choosing by adopting it, read Tradeoffs.