Overview
Concepts
hev ask rests on one idea: the digest is just a directory. Your docs are
distilled into a committed tree of markdown files, one per section, and that
tree is read two ways: a coding agent navigates it with the file
tools it already has, and a ⌘K overlay reads it server-side to synthesize
a grounded answer for a human. Both climb the same tree progressively. This page
explains the tree, the ladder over it, and the two readers.
ask digest build
glob collections → chunk by headings → distil each section (Opus 4.8)
one markdown file per section · hash-gated, incremental
│
▼
.hev-ask/ a committed, distilled mirror of your docs
├─ _meta.md overview · context · suggestions · content hash
├─ _glossary/
│ └─ digest.md a term · its aliases · its definition
├─ overview/
│ ├─ quick-start.md one markdown + frontmatter file per section:
│ └─ limits.md title · summary · body · facts · url#anchor
└─ api/
└─ cli.md
│
▼
read three ways
├─ ⌘K overlay · humans keyword search + a grounded answer · synthesis
├─ ask CLI · agents tree · ls · head · cat · facts · grep · keyless
└─ ask mcp · agents one tool hydrates the tree; the agent reads it
The split that matters is that the tree is built offline with a strong model and committed to git, while the readers run on demand: the overlay at the edge, the agent on its own machine. No durable state lives in the running site.
Chunks and anchors
hev ask indexes sections, not pages. Each document is split
on its headings (up to a configurable depth, default ## and ###). Content
before the first heading becomes the intro chunk whose URL is the page itself.
Each chunk carries the section’s heading, its cleaned prose, and a URL of the
form basePath + slug + #anchor. The anchor is generated with
github-slugger by default, the same
slugger Astro and GitHub-flavored renderers use, so the link lands on a heading
that actually exists in the rendered HTML. Each framework adapter declares its
slug scheme, so the rule generalizes: the Docusaurus adapter, for instance, also
honors explicit {#custom-id} headings.
Both code paths chunk through the same function: the offline build feeds it
disk-parsed markdown, and the Astro runtime index feeds it getCollection
entries. One source of truth for slugs means the anchors agree — and because the
build reads files, not a renderer, the same digest comes out whatever
framework renders the docs.
Host-neutral: one digest, any framework
Everything above happens before a renderer touches your docs. The build
reads markdown off the filesystem, chunks on headings, derives anchors in code,
and writes the .hev-ask/ tree — none of it imports Astro. So the digest is the
same artifact whether Astro, Docusaurus, VitePress, or MkDocs renders the pages.
What differs per framework is only the adapter — the thin glue that builds
the digest at the right time and wires the overlay in:
- Astro (the flagship adapter). The
hevAsk()integration is batteries included: it builds the digest duringastro build, mounts the/api/askendpoint as an on-demand route, and shipsSearchOverlay.astro. One config block, one component. - Every other framework. Two host-neutral primitives stand in for the
integration:
- the static overlay — a
<script>plus web component that reads the committed digest tree and a prebuilt keyword index. No server, no API key, no framework binding; it drops into Docusaurus, VitePress, MkDocs, or hand-written HTML. This is the keyword path, fully static. - the hostable endpoint — the bounded agentic loop (below) as a standalone deployable: a Cloudflare Worker, a Node or Vercel function. The overlay points at it with one config attribute. This is the agentic path, decoupled from any one framework’s routing.
- the static overlay — a
The CLI and MCP surfaces are host-neutral already: they read the committed tree
directly, so a coding agent gets the same tree/cat/grep over your docs no
matter what renders them. See
the drop-in overlay
for the concrete per-framework setup.
The ask digest directory
The build distils each section into a small markdown + frontmatter file and
writes the whole .hev-ask/ tree — a committed, reviewable mirror of your docs:
overview/,api/, … — one file per section, mirroring your doc paths. The body is the distilled summary and prose; the frontmatter carries thetitle, theurl+anchorto cite, the verbatimfacts(flags, code, identifiers), thesources, theterms, and the section’s contenthash._glossary/— one file per term, with its aliases and definition._meta.md— the overview, orientationcontext, suggested questions, and the digest’s content hash and version.
There is no committed JSON. The artifact is markdown all the way down, so a section’s distilled prose and its grounded facts change together in one diff, reviewable in the PR that changes the docs. The in-memory representation the overlay and CLI build from the tree is the same either way — the tree is the format, not a new runtime.
Progressive disclosure as a directory
Because the digest is a real directory, progressive disclosure is just the
directory’s own cost model. An agent already knows ls is cheap and cat
might be large; hev ask leans on that intuition instead of inventing a new one.
The reads form a four-rung ladder, each rung a strictly larger slice of one
section file:
progressive disclosure as a directory — each verb a larger slice of one section
tree · ls ▸ titles only, the whole map · cheap, safe to skim first
│
▼
head <path> ▸ title + one-line summary · bounded, the decision rung
│
▼
cat <path> ▸ the full distilled section body · opt-in
│
▼
facts <path> ▸ verbatim flags / code / identifiers · grounded literals + url#anchor
+ sources + terms to cite back to the live page
Two properties make the ladder real. A listing reads frontmatter only, so
tree and ls can never leak a body — their output is bounded by the number of
sections, not the size of the docs, which is what makes them safe to call
speculatively. And every deeper rung is an explicit verb: head for the
summary, cat for the body, facts for the grounded literals. Nothing larger
than a title is ever returned by surprise.
A coding agent climbs this with its own tree/cat/grep once it has the
tree on disk (see MCP); the ask CLI offers the same verbs for
the keyless and remote cases (see the CLI reference).
Keyword search and the glossary
The overlay’s instant path, like the CLI’s grep over a remote tree, runs a
dependency-free prefilter over the chunks:
- Expand the query. Each term picks up its glossary aliases (
k8s→kubernetes) and the tokens of any matched glossary term. The expansion is additive and capped. - Score by token overlap between the expanded query and each section —
widened by the digest. A match also counts against that section’s
distilled
summary, itsterms, and its verbatimfacts, so the sections the digest considers central to a term rank above an incidental mention buried in body prose. - Cap per document (default 2 sections per doc) so one long page can’t monopolize the results, then take the top pool.
- Excerpt around the first matched term for the snippet.
This needs no API key and no embeddings — just the section index and the committed tree. With no tree it degrades to plain token overlap over the raw chunk text, so keyword search always works.
Asking is the default
The overlay is ask-first. A single word is treated as a keyword lookup and
answered instantly from the index. The moment the query grows past one word
(the reader types a space) the overlay stops the keyword type-ahead and switches
to ask mode: pressing Enter sends the question to the agentic loop. On open
it also shows a few suggested questions (baked into _meta.md at build time,
so they cost nothing at runtime) to make asking the obvious move.
None of this is forced. A reader can flip the overlay to keyword-only
(persisted in localStorage); then a space just searches for a phrase and the
model is never called. See the overlay reference for
the exact interaction.
The agentic search loop
The overlay answers humans, so it does the synthesis an agent would otherwise do itself. When the reader asks (presses Enter on a multi-word query, or clicks a suggested question, with an API key present), the query goes to a bounded tool-use loop that ends by streaming a grounded answer. The loop is the same disclosure ladder, climbed server-side over the in-memory tree. It runs in two phases.
Phase 1 — gather. The model is given the title-tree of every section up front, plus one tool:
open_section({ id })— opens a section to read its distilled summary, its verbatimfacts, and, for reference sections, its source text. The model opens the sections it needs; each open is streamed to the overlay as a faint activity line. It may only cite sections it opened.
The model decides when it has enough context. It opens up to maxIterations
rounds (default 4); when it stops opening, it’s ready to answer.
Phase 2 — answer. The accumulated sources are sent to the overlay (so it can
validate links), then the model is called once more with no tools — so it
can only write prose — and its answer is streamed token-by-token. The system
prompt instructs it to ground every claim in the retrieved sections, link to
them inline using their exact url, and say plainly when the docs don’t cover
the question. Dropping the tools on the final turn is what guarantees the model
answers instead of searching again.
The system prompt is cached
The title-tree and section summaries are injected into the system prompt with a
cache_control marker, so across the rounds it’s a prompt-cache hit rather than
re-sent tokens. The answer turn changes the tool set (it has none), so it can’t
reuse the search rounds’ cache — but it’s the last call anyway. The loop model
defaults to Claude Haiku 4.5 and is configurable.
Two ways to build the tree
Only the section summary, the glossary, the orientation context, and the
suggestions are model-authored; the tree structure, the verbatim facts, the
overview, the per-section hashes, and the anchors are derived deterministically
in code. So the model only ever supplies the distillation — which is exactly the
part worth doing in a Claude Code skill.
- Claude Code skill (recommended). A skill walks Claude through reading the
corpus and writing the distillation, then assembles the
.hev-ask/tree locally. It runs inside your existing Claude Code subscription (noANTHROPIC_API_KEY, no per-build token spend on your own key) and fits the editor workflow most authors are already in. ask digest build(fallback). OneANTHROPIC_API_KEYcall to Claude Opus 4.8 (default) does the same distillation unattended — the right choice for CI or anyone not using Claude Code.
Either way the build is incremental and hash-gated: each section carries the hash of the content it was distilled from, so a rebuild only re-distils the sections whose content actually changed, and a clean tree does no model work at all. The tree is reviewed in pull requests like any other change. See the digest reference and the CLI reference.
Degradation, by design
hev ask is built to keep working as pieces drop away:
- No key at runtime → keyword mode only. The overlay still searches.
- No key at build → the committed tree is kept; the build warns but never fails for lack of a key.
- No
.hev-ask/tree → the agentic loop falls back to keyword-style retrieval; keyword ranking falls back to raw token overlap; and the overlay simply shows no suggested questions. Everything still works. - Stale tree → the runtime logs a one-line warning when the live index hash
differs from
_meta.md’s, but still serves.
For the boundaries of what it can do, read Limits; for what you’re choosing by adopting it, read Tradeoffs.