Skip to main content

Batch API

Batch workflows are useful for precompressing docs, notes, research folders, or benchmark inputs before sending them to a model.

CLI Batch

Compress every Markdown file under docs into compressed-docs:

contextcrumb batch docs --glob "*.md" --out compressed-docs

ContextCrumb recursively matches the glob and mirrors the input directory structure in the output directory.

Example:

docs/
guide.md
api/
python.md

compressed-docs/
guide.md
api/
python.md

Use an explicit keep ratio:

contextcrumb batch docs --glob "*.md" --out compressed-docs --target-keep-ratio 0.5

Batch can also process supported source files with code-aware compression:

contextcrumb batch src --glob "*.py" --out compressed-src --content-mode code-comments

Executable source is preserved exactly; only comments/docstrings are compressed.

Emit machine-readable summaries:

contextcrumb batch docs --glob "*.md" --out compressed-docs --json

Python Batch

For custom behavior, reuse one compressor:

from pathlib import Path
from contextcrumb import ContextCompressor

input_dir = Path("docs")
output_dir = Path("compressed-docs")
compressor = ContextCompressor()

for source in input_dir.rglob("*.md"):
relative = source.relative_to(input_dir)
target = output_dir / relative
target.parent.mkdir(parents=True, exist_ok=True)

result = compressor.compress_file(source, target_keep_ratio=0.5)
target.write_text(result.text, encoding="utf-8")

Service-Backed Batch

If you are compressing many files from multiple processes, start the warm service once:

contextcrumb service start --allow-root docs
contextcrumb batch docs --glob "*.md" --out compressed-docs --use-service

This avoids repeatedly loading the model.

Practical Defaults

Input typeSuggested setting
Internal docsDefault threshold mode
Research dumps--target-keep-ratio 0.5
Meeting notes--target-keep-ratio 0.5 to 0.7
Logs with proseDefault threshold mode first, then inspect
Supported source files--content-mode code-comments
Code-heavy Markdown--target-keep-ratio 0.75 or raw file for exact snippets

Skip empty files. The CLI does this during batch runs.