Organize Agent-Friendly Doc with Progressive Exposure Principle
I wrote an agent-friendly skill to organize doc of a repo. See github. See example crawled and processed document.
Motivation: Progressive Exposure
It is impossible and inefficient to dump a large documentation (e.g., NVIDIA’s PTX ISA at 14 sections, 700+ subsections) into an agent’s context window. This skill organizes crawled docs into a tree where an agent can drill down level by level:
- Read a parent’s README to see 20-word summaries of each child.
- Pick interesting children and read their READMEs.
- At a leaf, read the 50-word README summary. If it’s worth it, read the full
doc.md.
No one reads everything. Only interesting doc contents are loaded in the context window.
Pipeline
crawl -> collapse -> merge -> summary writing & format fix
Crawl (crawl.py): Parses the Sphinx sidebar TOC into a tree of arbitrary depth. Each leaf gets its own doc.md with content converted from HTML to markdown. Images are downloaded into img/ dirs.
Collapse (built into crawl.py): Eliminates single-child directories bottom-up. If a parent has exactly one child subdirectory and no doc.md of its own, the child’s contents are pulled up into the parent.
Merge (merge.py): Consecutive leaf docs under 300 words are merged into a single doc.md, capped at 2000 words per merged doc. If all children of a parent are short, the parent itself becomes a leaf.
Summary writing & format fix (done by the skill agent): Reads every doc.md, writes README summaries, and fixes markdown formatting issues. This is not scripted - the agent reads and writes each doc.
Example: Collapse + Merge
Suppose crawl.py produces this tree for a section on Directives:
11-Directives/
11-4-Performance-Tuning-Directives/
11-4-1-maxnreg/ (doc.md: 80 words)
11-4-2-maxntid/ (doc.md: 95 words)
11-4-3-reqntid/ (doc.md: 70 words)
...
11-4-9-abi_preserve_ctrl/ (doc.md: 60 words)
11-6-Linking-Directives/
11-6-1-extern/ (doc.md: 45 words)
11-6-1-1-details/ (doc.md + img/)
After collapse: 11-6-1-extern/ has one child 11-6-1-1-details/. Since the parent has no doc.md, the child’s contents (doc.md, img/) are pulled up into 11-6-1-extern/ and the child dir is removed:
11-6-Linking-Directives/
11-6-1-extern/ (doc.md + img/) <-- collapsed
After merge: All 9 children of 11-4-Performance-Tuning-Directives/ are under 300 words and their total is under 2000 words. They are all merged into the parent, which becomes a leaf:
11-Directives/
11-4-Performance-Tuning-Directives/ (doc.md: ~700 words, merged from 9 docs)
11-6-Linking-Directives/
11-6-1-extern/ (doc.md + img/)
...
The merged doc.md contains all 9 original docs separated by --- dividers, preserving their original headings as subtitles.