Skip to content

Memory Server API

Port: 48912 (internal)

The memory server runs as a separate process and handles all persistent memory operations. It is not intended for direct external access — the main server proxies memory-related requests.

Internal endpoints

The memory server provides endpoints for:

  • Storing new conversation turns with timestamps and embeddings
  • Querying recent context for LLM prompt construction
  • Searching semantically similar past conversations
  • Compressing old conversations into summaries
  • Managing memory review settings

Storage backend

TablePurpose
time_indexed_originalFull conversation history
time_indexed_compressedSummarized conversation history
Embedding storeVector embeddings for semantic search

Models used

TaskDefault model
Embeddingstext-embedding-v4
Summarizationqwen-plus (SUMMARY_MODEL)
Routingqwen-plus (ROUTER_MODEL)
Rerankingqwen-plus (RERANKER_MODEL)

Communication

The main server communicates with the memory server via HTTP requests and a persistent sync connector thread (cross_server.py).

Released under the MIT License.