Skip to content

Architecture Overview

Project N.E.K.O. is built as a multi-process microservice system where three main servers cooperate through WebSocket, HTTP, and ZeroMQ messaging.

System diagram

Architecture

Three-server design

ServerPortEntry pointRole
Main Server48911main_server.pyWeb UI, REST API, WebSocket chat, TTS
Memory Server48912memory_server.pySemantic recall, time-indexed history, memory compression
Agent Server48915agent_server.pyBackground task execution (MCP, Computer Use, Browser Use)

The main server is the user-facing entry point. It serves the Web UI, handles all REST API requests, and maintains WebSocket connections for real-time voice/text chat. The memory and agent servers are internal services that the main server communicates with.

Communication patterns

┌──────────────────────────────────────────┐
│              Main Server (:48911)         │
│                                          │
│  FastAPI ─── REST Routers                │
│  WebSocket ─── LLMSessionManager         │
│  ZeroMQ PUB ───┐                         │
│  ZeroMQ PULL ──┼── AgentEventBridge      │
│  HTTP Client ──┤                         │
└────────────────┼─────────────────────────┘

        ┌────────┼────────┐
        │        │        │
        ▼        ▼        ▼
   Memory     Agent    Monitor
   Server     Server   Server
   (:48912)   (:48915) (:48913)
  • Main ↔ Memory: HTTP requests for storing/querying memories
  • Main ↔ Agent: ZeroMQ pub/sub for task delegation and result streaming
  • Main ↔ Monitor: WebSocket for real-time status updates

Key architectural patterns

Hot-swap sessions

The LLMSessionManager prepares a new LLM session in the background while the current session is still active. When the user ends a conversation turn, it seamlessly swaps to the pre-warmed session with zero downtime. Audio is cached during the transition and flushed afterward.

Per-character isolation

Each character (identified by lanlan_name) gets its own:

  • LLMSessionManager instance
  • Sync connector thread
  • WebSocket lock
  • Message queue
  • Shutdown event

Async/sync boundary

FastAPI handlers are async. TTS synthesis runs in a dedicated thread with queue-based communication. Audio processing uses executor thread pools. The ZeroMQ event bridge runs a background recv thread.

Next

Released under the MIT License.