Memory Server API

Port: 48912 (internal)

The memory server runs as a separate process and handles all persistent memory operations. It is not intended for direct external access — the main server proxies memory-related requests.

Internal endpoints

The memory server provides endpoints for:

Storing new conversation turns with timestamps and embeddings
Querying recent context for LLM prompt construction
Searching semantically similar past conversations
Compressing old conversations into summaries
Managing memory review settings

Storage backend

Table	Purpose
`time_indexed_original`	Full conversation history
`time_indexed_compressed`	Summarized conversation history
Embedding store	Vector embeddings for semantic search

Models used

Task	Default model
Embeddings	`text-embedding-v4`
Summarization	`qwen-plus` (SUMMARY_MODEL)
Routing	`qwen-plus` (ROUTER_MODEL)
Reranking	`qwen-plus` (RERANKER_MODEL)

Communication

The main server communicates with the memory server via HTTP requests and a persistent sync connector thread (cross_server.py).

Memory Server API ​

Internal endpoints ​

Storage backend ​

Models used ​

Communication ​

Memory Server API

Internal endpoints

Storage backend

Models used

Communication