API Reference

Tokko API

Compress prompts programmatically. One endpoint, one API key. Add prompt compression to any app in under 5 minutes.

Authentication Compress Response Errors Examples

Base URL

https://tokko-seven.vercel.app/api/v1

Authentication

All requests require a Bearer token in the Authorization header. Generate your token from the Settings page.

Authorization: Bearer tkk_your_token_here

Keep your token secret. Do not expose it in client-side code or public repositories.

POST /v1/compress

POST/api/v1/compress

Compresses a prompt and returns the shorter version with token usage stats.

Request body

promptstringrequired

The text to compress. Maximum 200,000 characters.

modestringdefault: "balanced"

Compression level. Options: balanced (keeps full meaning, ~50% reduction), aggressive (keyword-only, ~75% reduction), smart (AI-powered, best quality).

modelstringdefault: "claude"

Target model for cost calculation. Options: claude, gpt4, gemini. This affects the cost_saved_usd calculation only.

Response

A successful response returns the compressed text along with usage statistics.

{
  "compressed": "Python function: filter even numbers from list.",
  "usage": {
    "original_tokens": 64,
    "compressed_tokens": 18,
    "tokens_saved": 46,
    "reduction_pct": 72,
    "cost_saved_usd": 0.000138
  },
  "meta": {
    "mode": "balanced",
    "model": "claude",
    "id": "ps_1711234567_abc1234"
  }
}

Response fields

compressedstring

The compressed prompt text. Ready to send to any AI model.

usage.original_tokensnumber

Token count of the original prompt.

usage.compressed_tokensnumber

Token count of the compressed result.

usage.tokens_savednumber

How many tokens were removed.

usage.reduction_pctnumber

Percentage reduction (0 to 100).

usage.cost_saved_usdnumber

Estimated dollar savings based on the selected model.

Rate limit headers

X-RateLimit-Limitheader

Your daily compression limit (e.g. 20 on free plan).

X-RateLimit-Remainingheader

Compressions remaining today.

Errors

Errors return a consistent format with an error code you can handle programmatically.

{
  "error": {
    "message": "Daily limit reached (20/day on free plan)",
    "code": "rate_limit_exceeded",
    "status": 429
  }
}

Error codes

Status	Code	Meaning
400	validation_error	Invalid request body or parameters
400	prompt_too_large	Prompt exceeds 200k characters
401	auth_required	Missing Authorization header
401	invalid_token	Token is invalid or expired
429	rate_limit_exceeded	Daily compression limit reached
503	service_overloaded	AI is temporarily busy, retry
500	internal_error	Something went wrong on our end

Examples

cURL

curl -X POST https://tokko-seven.vercel.app/api/v1/compress \
  -H "Authorization: Bearer tkk_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Could you please help me write a Python function that filters even numbers from a list?",
    "mode": "balanced"
  }'

Python

import requests

response = requests.post(
    "https://tokko-seven.vercel.app/api/v1/compress",
    headers={"Authorization": "Bearer tkk_your_token"},
    json={
        "prompt": "Could you please help me write a Python function that filters even numbers from a list?",
        "mode": "balanced"
    }
)

data = response.json()
print(data["compressed"])
# "Python function: filter even numbers from list."
print(f"Saved {data['usage']['reduction_pct']}% tokens")

JavaScript / Node.js

const response = await fetch("https://tokko-seven.vercel.app/api/v1/compress", {
  method: "POST",
  headers: {
    "Authorization": "Bearer tkk_your_token",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    prompt: "Could you please help me write a Python function that filters even numbers from a list?",
    mode: "balanced",
  }),
});

const data = await response.json();
console.log(data.compressed);
// "Python function: filter even numbers from list."
console.log(`Saved ${data.usage.reduction_pct}% tokens`);

Use with OpenAI / Anthropic SDK

Compress your prompt before passing it to any AI SDK.

import Anthropic from "@anthropic-ai/sdk";

// 1. Compress the prompt first
const tokko = await fetch("https://tokko-seven.vercel.app/api/v1/compress", {
  method: "POST",
  headers: {
    "Authorization": "Bearer tkk_your_token",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ prompt: userInput, mode: "smart" }),
});
const { compressed } = await tokko.json();

// 2. Send the compressed prompt to Claude
const anthropic = new Anthropic();
const message = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: compressed }],
});

Ready to integrate?

Get your API token Go to Settings