Three-Server Design

Main Server (`main_server.py`, port 48911)

The main server is a FastAPI application that serves as the user-facing entry point for all interactions.

Configuration loading — Load config_manager, initialize character data
Session creation — Create an LLMSessionManager for each defined character
Static file mounting — Mount /static, /user_live2d, /user_vrm, /workshop
Router registration — Include all 10 API routers
Event handlers — Initialize Steamworks, start ZeroMQ bridge, preload audio modules, detect language
Uvicorn launch — Bind to 127.0.0.1:48911

The memory server manages persistent conversation history and semantic recall.

Layer	Purpose	Backend
Recent memory	Last N messages per character	JSON files (`recent_*.json`)
Time-indexed original	Full conversation history	SQLite table
Time-indexed compressed	Summarized history	SQLite table
Semantic memory	Embedding-based recall	Vector store

Store: Save new conversation turns with timestamps
Query: Retrieve recent context for LLM prompts
Search: Semantic similarity search across all history
Compress: Periodically summarize old conversations to save context window space
Review: Allow users to browse and correct stored memories

The agent server handles background task execution triggered by conversation context.

Socket	Address	Direction	Purpose
PUB/SUB	`tcp://127.0.0.1:48961`	Main → Agent	Session events
PUSH/PULL	`tcp://127.0.0.1:48962`	Agent → Main	Task results
PUSH/PULL	`tcp://127.0.0.1:48963`	Main → Agent	Analyze requests

Main server publishes a task via ZeroMQ
Agent server receives and creates a task plan (planner.py)
Actions execute through adapters:
- MCP Client — Model Context Protocol tool calls
- Computer Use — Screenshot analysis, mouse/keyboard actions
- Browser Use — Web browsing automation
Results are analyzed (analyzer.py) and deduped (deduper.py)
Final results stream back via ZeroMQ (task_result, proactive_message)