Core Concepts

ContextCrumb turns text into token decisions. Each token receives a keep probability. The final output is built by preserving kept tokens in original order and using minimal separators from the original text.

Input Boundary

ContextCrumb expects natural-language text, or source files where only comments/docstrings should be compressed. It can be used on many context surfaces, not only files:

Prompt text
Older conversation turns
Retrieved document chunks
Subagent reports
Natural-language tool output
MCP descriptions

Keep exact data outside the compression boundary. For JSON, YAML, tables, identifiers, commands, and schemas, preserve the structure and compress only the prose values that are safe to shorten. For supported source files, use file content_mode="auto" or code-comments; executable code stays exact while comments/docstrings are compressed.

For example, compress summary, body, description, or comment fields, but leave id, url, status, score, created_at, and schema fields unchanged.

Compression Modes

Threshold Mode

Threshold mode is the default.

ContextCrumb is trained as a binary keep/delete classifier. After subtokens and sliding-window repeats are aggregated back to original deletion units, threshold mode keeps tokens whose KEEP probability is greater than or equal to threshold. The default threshold is 0.5.

Use the default when you want the model's direct keep/delete boundary:

contextcrumb load notes.md

Target Keep Ratio

target_keep_ratio keeps the top-scoring tokens near a fixed ratio. It overrides threshold mode.

Use this when you have a budget:

contextcrumb load notes.md --target-keep-ratio 0.5

In Python:

from contextcrumb import compress

result = compress(text, target_keep_ratio=0.5)

Custom Threshold

Use a custom threshold when you want direct model-score control:

contextcrumb load notes.md --threshold 0.6

File Content Mode

File compression also has content_mode, configured by contextcrumb config or overridden per command:

contextcrumb config set compression.content_mode auto
contextcrumb load src/app.py --content-mode code-comments

Modes:

auto: prose files use normal compression; supported code files use code-comments
prose: compress the whole file as natural language
code-comments: preserve executable code and compress only comments/docstrings
raw: return the file unchanged
refuse: reject file compression

Supported code-aware languages are Python, JavaScript, TypeScript, JSX, TSX, Go, and Rust.

Sliding Windows

The model processes long inputs with sliding windows.

Defaults:

Setting	Default
`max_length`	`1024`
`stride`	`64`
`backend`	`onnx`
`model_id`	`ymao20/contextcrumb-32m`

model_windows in the result stats tells you how many windows were used.

Result Shape

Every API path returns or can emit the same conceptual payload:

{
  "text": "compressed text",
  "original_text": "original text",
  "stats": {
    "input_tokens": 100,
    "kept_tokens": 55,
    "deleted_tokens": 45,
    "token_keep_ratio": 0.55,
    "mode": "threshold",
    "backend": "onnx"
  }
}

When return_tokens=True, token decisions are included as tokens.

Input Boundary​

Compression Modes​

Threshold Mode​

Target Keep Ratio​

Custom Threshold​

File Content Mode​

Sliding Windows​

Result Shape​