Python API
Use the Python API when ContextCrumb is part of an application, script, evaluation pipeline, or local agent tool. The common pattern is to compress natural-language context before adding it to an LLM request.
ContextCrumb is not limited to files. Use it on any prose-heavy model input: user prompts, retrieved chunks, older chat turns, subagent reports, planner output, and natural-language tool results.
Middleware Pattern
from contextcrumb import ContextCompressor
compressor = ContextCompressor()
def prepare_context(raw_context: str) -> str:
result = compressor.compress(raw_context, target_keep_ratio=0.6)
return result.text
context = prepare_context(open("notes.md", encoding="utf-8").read())
Create the compressor once, then reuse it. This avoids loading the model for every request.
Compress Prompt Text
from contextcrumb import ContextCompressor
compressor = ContextCompressor()
def compress_user_prompt(prompt: str) -> str:
result = compressor.compress(prompt, target_keep_ratio=0.75)
return result.text
Use a conservative keep ratio for user prompts. Prompts often contain constraints that should survive.
Compress Conversation History
def compress_history(messages: list[dict[str, str]]) -> list[dict[str, str]]:
compressed = []
for message in messages:
if message["role"] in {"user", "assistant"} and len(message["content"]) > 2000:
result = compressor.compress(message["content"], target_keep_ratio=0.6)
compressed.append({**message, "content": result.text})
else:
compressed.append(message)
return compressed
Keep system messages and short instruction turns raw unless you have a reason to compress them.
Compress Structured Tool Output
Do not compress raw JSON as one string when structure matters. Walk the object and compress only natural-language values.
PROSE_KEYS = {"summary", "body", "description", "comment", "notes"}
def compress_tool_output(value):
if isinstance(value, dict):
return {
key: (
compressor.compress(text, target_keep_ratio=0.6).text
if key in PROSE_KEYS and isinstance(text, str) and len(text) > 500
else compress_tool_output(text)
)
for key, text in value.items()
}
if isinstance(value, list):
return [compress_tool_output(item) for item in value]
return value
This preserves keys, ids, numbers, URLs, and schema shape while shrinking long prose fields.
One-Off Compression
from contextcrumb import compress
text = "Agents spend context on long notes, docs, tickets, and logs."
result = compress(text)
print(result.text)
print(result.stats)
compress() loads a compressor for the call. For repeated calls, create a ContextCompressor once and reuse it.
Reuse A Warm Compressor
from contextcrumb import ContextCompressor
compressor = ContextCompressor()
first = compressor.compress("First long input.")
second = compressor.compress("Second long input.")
print(first.text)
print(second.stats["token_keep_ratio"])
Compress A File
from contextcrumb import compress_file
result = compress_file("notes.md")
print(result.text)
Or reuse a compressor:
from contextcrumb import ContextCompressor
compressor = ContextCompressor()
result = compressor.compress_file("notes.md", encoding="utf-8")
For supported source files, file compression defaults to content_mode="auto":
executable code is preserved exactly and only comments/docstrings are compressed.
result = compressor.compress_file("src/app.py", content_mode="code-comments")
print(result.stats["preserved_code_exact"])
Supported code-aware languages are Python, JavaScript, TypeScript, JSX, TSX, Go, and Rust.
Explicit Keep Ratio
result = compressor.compress(
text,
target_keep_ratio=0.5,
)
target_keep_ratio overrides threshold mode and keeps the top-scoring tokens near the requested ratio.
Threshold Mode
result = compressor.compress(
text,
threshold=0.6,
)
Threshold mode keeps tokens with keep probability greater than or equal to the threshold.
Token Decisions
result = compressor.compress(
text,
return_tokens=True,
)
for token in result.tokens:
print(token.text, token.keep_prob, token.keep)
Each TokenDecision has:
| Field | Meaning |
|---|---|
text | Token text |
start | Character start offset |
end | Character end offset |
keep_prob | Model probability that the token should be kept |
keep | Final keep/delete decision |
Result Object
CompressionResult contains:
| Field | Meaning |
|---|---|
text | Compressed output |
original_text | Original input |
stats | Compression and runtime metadata |
tokens | Optional token decisions |
Convert to a dict:
payload = result.to_dict()
payload_with_tokens = result.to_dict(include_tokens=True)
Constructor Options
compressor = ContextCompressor(
model_id="ymao20/contextcrumb-32m",
backend="onnx",
device="auto",
revision=None,
cache_dir=None,
max_length=1024,
stride=64,
window_batch_size=None,
)
Use backend="torch" only after installing:
pip install "contextcrumb[torch]"
API Reference
compress(text, ...)
Loads a compressor, compresses one string, and returns CompressionResult.
Important keyword arguments:
| Argument | Default | Meaning |
|---|---|---|
model_id | ymao20/contextcrumb-32m | Hugging Face model id or local model path |
backend | onnx | Inference backend: onnx or torch |
device | auto | Inference device |
target_keep_ratio | None | Fixed token budget; overrides threshold mode |
golden | True | Deprecated compatibility flag; threshold mode is used by default |
threshold | 0.5 | Probability cutoff used when target_keep_ratio is not set |
return_tokens | False | Include token-level decisions |
compress_file(path, ...)
Reads a text file and compresses it. It accepts the same compression options plus encoding and content_mode.
content_mode values:
| Value | Behavior |
|---|---|
auto | Prose files use normal compression; supported code files use code-comments |
prose | Compress the whole file as natural language |
code-comments | Preserve executable code and compress only comments/docstrings |
raw | Return the file unchanged |
refuse | Reject file compression |
ContextCompressor
Use this class for repeated calls. It exposes:
| Method | Use |
|---|---|
compress(text, ...) | Compress one string |
compress_file(path, ...) | Read and compress one text file |
score_keep_probabilities(text) | Inspect raw token scores |
CompressionResult
| Field | Use |
|---|---|
text | Compressed context |
original_text | Original input |
stats | Counts, mode, backend, model window metadata |
tokens | Optional token decisions when return_tokens=True |