Organize Agent-Friendly Doc with Progressive Exposure Principle

I wrote an agent-friendly skill to organize doc of a repo. See github. See example crawled and processed document.

Motivation: Progressive Exposure

It is impossible and inefficient to dump a large documentation (e.g., NVIDIA’s PTX ISA at 14 sections, 700+ subsections) into an agent’s context window. This skill organizes crawled docs into a tree where an agent can drill down level by level:

  1. Read a parent’s README to see 20-word summaries of each child.
  2. Pick interesting children and read their READMEs.
  3. At a leaf, read the 50-word README summary. If it’s worth it, read the full doc.md.

No one reads everything. Only interesting doc contents are loaded in the context window.

Pipeline

crawl  ->  collapse  ->  merge  ->  summary writing & format fix

Crawl (crawl.py): Parses the Sphinx sidebar TOC into a tree of arbitrary depth. Each leaf gets its own doc.md with content converted from HTML to markdown. Images are downloaded into img/ dirs.

Collapse (built into crawl.py): Eliminates single-child directories bottom-up. If a parent has exactly one child subdirectory and no doc.md of its own, the child’s contents are pulled up into the parent.

Merge (merge.py): Consecutive leaf docs under 300 words are merged into a single doc.md, capped at 2000 words per merged doc. If all children of a parent are short, the parent itself becomes a leaf.

Summary writing & format fix (done by the skill agent): Reads every doc.md, writes README summaries, and fixes markdown formatting issues. This is not scripted - the agent reads and writes each doc.

Example: Collapse + Merge

Suppose crawl.py produces this tree for a section on Directives:

11-Directives/
11-4-Performance-Tuning-Directives/
11-4-1-maxnreg/ (doc.md: 80 words)
11-4-2-maxntid/ (doc.md: 95 words)
11-4-3-reqntid/ (doc.md: 70 words)
...
11-4-9-abi_preserve_ctrl/ (doc.md: 60 words)
11-6-Linking-Directives/
11-6-1-extern/ (doc.md: 45 words)
11-6-1-1-details/ (doc.md + img/)

After collapse: 11-6-1-extern/ has one child 11-6-1-1-details/. Since the parent has no doc.md, the child’s contents (doc.md, img/) are pulled up into 11-6-1-extern/ and the child dir is removed:

  11-6-Linking-Directives/
11-6-1-extern/ (doc.md + img/) <-- collapsed

After merge: All 9 children of 11-4-Performance-Tuning-Directives/ are under 300 words and their total is under 2000 words. They are all merged into the parent, which becomes a leaf:

11-Directives/
11-4-Performance-Tuning-Directives/ (doc.md: ~700 words, merged from 9 docs)
11-6-Linking-Directives/
11-6-1-extern/ (doc.md + img/)
...

The merged doc.md contains all 9 original docs separated by --- dividers, preserving their original headings as subtitles.