POST /v1/memories/consolidate

The consolidate endpoint retrieves and compresses the memories in a scope into a single, token-budgeted context string. Rather than fetching raw memories and stitching them together yourself, you hand the engine a query focus and a token budget, and it returns a ready-to-use string you can inject directly into your model call as a system message. This is the recommended way to ground an agent’s responses in accumulated prior knowledge without overflowing your context window.

The same operation is available as atomicmemory package in the CLI and as the memory_package tool via the Model Context Protocol (MCP) server. All three surfaces share the same underlying logic.

Consolidation retrieves relevant memories from the specified scope, optionally filtered by a topic query, and compresses them into a coherent context string within the requested token budget. The engine applies tiered compression (L0/L1/L2) to fit as much relevant context as possible while staying under the budget.

Request

POST /v1/memories/consolidate

scope

object

required

The scope to consolidate memories from. At least one of the following keys must be present:

Show Scope fields

user

string

Consolidate memories belonging to this user identifier.

agent

string

Consolidate memories belonging to this agent identifier.

namespace

string

Consolidate memories in this logical namespace.

thread

string

Consolidate memories from this conversation thread or session.

query

string

An optional topic or question to focus the consolidation. When provided, the engine biases memory selection toward content relevant to this query, producing a more targeted context string. When omitted, the engine selects broadly across all memories in scope.

tokenBudget

number

default:"2000"

The maximum number of tokens the output context string may contain. The engine will not exceed this limit. Lower values produce shorter, more compressed context; higher values allow more detail to be preserved.

Example Request

curl -X POST http://127.0.0.1:17350/v1/memories/consolidate \
  -H "Authorization: Bearer local-dev-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "user preferences and settings",
    "scope": {"user": "alice"},
    "tokenBudget": 1200
  }'

Response

context

string

A ready-to-use context string containing the consolidated memories, formatted for direct injection into a model prompt. The string fits within the requested tokenBudget.

memories

array

The list of memory IDs that were included in the consolidated context. Use this to trace which source records contributed to the output.

tokenCount

number

The approximate token count of the returned context string.

Example Response

{
  "context": "Alice prefers TypeScript over JavaScript. She uses VS Code with the Vim keybinding extension. Her preferred formatting style is 2-space indentation with no semicolons.",
  "memories": ["mem_abc123", "mem_def456", "mem_ghi789"],
  "tokenCount": 42
}

Using the Context String

Inject the context string as a system message at the beginning of your model call to ground the response in what the agent already knows about the user:

const { context } = await fetch("http://127.0.0.1:17350/v1/memories/consolidate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer local-dev-key",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query: userMessage,
    scope: { user: userId },
    tokenBudget: 1200,
  }),
}).then(r => r.json());

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: `What you know about this user:\n\n${context}` },
    { role: "user", content: userMessage },
  ],
});

Prepend the context string as a system message in your model call to ground responses in prior knowledge without needing to pass raw memory records to the model yourself. Tune tokenBudget to leave headroom for your base system prompt and the expected length of the user message plus response.