Skip to main content

Python API

Use the Python API when ContextCrumb is part of an application, script, evaluation pipeline, or local agent tool. The common pattern is to compress natural-language context before adding it to an LLM request.

ContextCrumb is not limited to files. Use it on any prose-heavy model input: user prompts, retrieved chunks, older chat turns, subagent reports, planner output, and natural-language tool results.

Middleware Pattern

from contextcrumb import ContextCompressor

compressor = ContextCompressor()

def prepare_context(raw_context: str) -> str:
result = compressor.compress(raw_context, target_keep_ratio=0.6)
return result.text

context = prepare_context(open("notes.md", encoding="utf-8").read())

Create the compressor once, then reuse it. This avoids loading the model for every request.

Compress Prompt Text

from contextcrumb import ContextCompressor

compressor = ContextCompressor()

def compress_user_prompt(prompt: str) -> str:
result = compressor.compress(prompt, target_keep_ratio=0.75)
return result.text

Use a conservative keep ratio for user prompts. Prompts often contain constraints that should survive.

Compress Conversation History

def compress_history(messages: list[dict[str, str]]) -> list[dict[str, str]]:
compressed = []
for message in messages:
if message["role"] in {"user", "assistant"} and len(message["content"]) > 2000:
result = compressor.compress(message["content"], target_keep_ratio=0.6)
compressed.append({**message, "content": result.text})
else:
compressed.append(message)
return compressed

Keep system messages and short instruction turns raw unless you have a reason to compress them.

Compress Structured Tool Output

Do not compress raw JSON as one string when structure matters. Walk the object and compress only natural-language values.

PROSE_KEYS = {"summary", "body", "description", "comment", "notes"}

def compress_tool_output(value):
if isinstance(value, dict):
return {
key: (
compressor.compress(text, target_keep_ratio=0.6).text
if key in PROSE_KEYS and isinstance(text, str) and len(text) > 500
else compress_tool_output(text)
)
for key, text in value.items()
}
if isinstance(value, list):
return [compress_tool_output(item) for item in value]
return value

This preserves keys, ids, numbers, URLs, and schema shape while shrinking long prose fields.

One-Off Compression

from contextcrumb import compress

text = "Agents spend context on long notes, docs, tickets, and logs."
result = compress(text)

print(result.text)
print(result.stats)

compress() loads a compressor for the call. For repeated calls, create a ContextCompressor once and reuse it.

Reuse A Warm Compressor

from contextcrumb import ContextCompressor

compressor = ContextCompressor()

first = compressor.compress("First long input.")
second = compressor.compress("Second long input.")

print(first.text)
print(second.stats["token_keep_ratio"])

Compress A File

from contextcrumb import compress_file

result = compress_file("notes.md")
print(result.text)

Or reuse a compressor:

from contextcrumb import ContextCompressor

compressor = ContextCompressor()
result = compressor.compress_file("notes.md", encoding="utf-8")

For supported source files, file compression defaults to content_mode="auto": executable code is preserved exactly and only comments/docstrings are compressed.

result = compressor.compress_file("src/app.py", content_mode="code-comments")
print(result.stats["preserved_code_exact"])

Supported code-aware languages are Python, JavaScript, TypeScript, JSX, TSX, Go, and Rust.

Explicit Keep Ratio

result = compressor.compress(
text,
target_keep_ratio=0.5,
)

target_keep_ratio overrides threshold mode and keeps the top-scoring tokens near the requested ratio.

Threshold Mode

result = compressor.compress(
text,
threshold=0.6,
)

Threshold mode keeps tokens with keep probability greater than or equal to the threshold.

Token Decisions

result = compressor.compress(
text,
return_tokens=True,
)

for token in result.tokens:
print(token.text, token.keep_prob, token.keep)

Each TokenDecision has:

FieldMeaning
textToken text
startCharacter start offset
endCharacter end offset
keep_probModel probability that the token should be kept
keepFinal keep/delete decision

Result Object

CompressionResult contains:

FieldMeaning
textCompressed output
original_textOriginal input
statsCompression and runtime metadata
tokensOptional token decisions

Convert to a dict:

payload = result.to_dict()
payload_with_tokens = result.to_dict(include_tokens=True)

Constructor Options

compressor = ContextCompressor(
model_id="ymao20/contextcrumb-32m",
backend="onnx",
device="auto",
revision=None,
cache_dir=None,
max_length=1024,
stride=64,
window_batch_size=None,
)

Use backend="torch" only after installing:

pip install "contextcrumb[torch]"

API Reference

compress(text, ...)

Loads a compressor, compresses one string, and returns CompressionResult.

Important keyword arguments:

ArgumentDefaultMeaning
model_idymao20/contextcrumb-32mHugging Face model id or local model path
backendonnxInference backend: onnx or torch
deviceautoInference device
target_keep_ratioNoneFixed token budget; overrides threshold mode
goldenTrueDeprecated compatibility flag; threshold mode is used by default
threshold0.5Probability cutoff used when target_keep_ratio is not set
return_tokensFalseInclude token-level decisions

compress_file(path, ...)

Reads a text file and compresses it. It accepts the same compression options plus encoding and content_mode.

content_mode values:

ValueBehavior
autoProse files use normal compression; supported code files use code-comments
proseCompress the whole file as natural language
code-commentsPreserve executable code and compress only comments/docstrings
rawReturn the file unchanged
refuseReject file compression

ContextCompressor

Use this class for repeated calls. It exposes:

MethodUse
compress(text, ...)Compress one string
compress_file(path, ...)Read and compress one text file
score_keep_probabilities(text)Inspect raw token scores

CompressionResult

FieldUse
textCompressed context
original_textOriginal input
statsCounts, mode, backend, model window metadata
tokensOptional token decisions when return_tokens=True