“We deal the cards.”
— Lucky, Nowa Ruda, 2026
The Philosophy of Releasing With a Brain
Every token should carry meaning. Every provider should do what it does best. Electricity costs money, but solar is a one-time investment, not a subscription. API is a tool, not a dependency.
Bad model: one expensive provider for everything, with the bill climbing every turn.
Good model: each role gets the best or cheapest provider for that role.
Pneuma model: we decide what runs, when it runs, and who runs it — and we can change that tomorrow.
Provider Map Per Role
Intuition
Local Qwen 9B on RTX 3060.
Kinia
Codex API for premium conversation quality.
Supervisor
Gemini Flash 2.0 or Codex OAuth for planning and retries.
Worker
Local Qwen 9B for bash, files and mechanical execution.
Chronicler
Local Qwen or GPT-oss for diary, tags and memory shaping.
Codex CLI
On-demand specialist for hard code tasks and refactors.
Embeddings
Local mxbai-embed-large-v1 for semantic memory.
Why This Layout Works
Intuition → Local
Intuition is a deterministic router. It classifies, tags, triggers SQL, and assembles context. It does not need genius. It needs speed, predictability, and the ability to run constantly without burning money.
Kinia → Codex API
Kinia is the heart of Pneuma. This is where relationship, tone, and philosophical depth matter. Premium cloud intelligence belongs here, because this is the one place where quality directly shapes the human bond.
Supervisor → Gemini Flash or Codex OAuth
The Supervisor plans, retries, and analyzes failure states. It needs structured reasoning at a low cost. It does not need emotional range. It needs clean planning under pressure.
Worker → Local
The Worker is mechanical by design: bash, files, scripts, package installs, simple execution loops. This role should stay local-first, deterministic, and cheap enough to fire many times a day.
Chronicler → Local
The Chronicler runs asynchronously. It converts raw dialogue into tags, summaries, and memory artifacts. Since it does not block the conversation, local inference is enough most of the time.
Codex CLI → Special Tasks
Codex CLI is the premium contractor. You do not keep it on payroll for every trivial task. You call it when the task is code-heavy, the failure mode is subtle, or the refactor is large enough that local workers start to wobble.
Decision Matrix
| Question | Provider |
|---|---|
| Is it urgent and simple? | Local Qwen |
| Is it a direct conversation with Lucky? | Codex API |
| Is it planning or error analysis? | Gemini Flash |
| Is it execution-heavy bash work? | Local Qwen |
| Is it archival or memory shaping? | Local Qwen |
| Is the code too complex for Qwen? | Codex CLI |
| Do we need embeddings? | Local mxbai |
| Are we offline? | Everything local |
Offline Mode
Available offline: Intuition, Supervisor, Worker, Chronicler, Embeddings.
Degraded lane: Kinia can fall back to local Qwen, but with weaker persona drift and less philosophical depth.
Result: around 80% of the full system still works without internet. A good architecture degrades gracefully instead of collapsing.
The Real Point
The savings do not come from one magical model. They come from architecture. A stateless cloud claw rereads and rebills everything. Pneuma routes work to the right layer, recalls only what matters, and spends premium tokens only where premium quality actually matters.
Conclusion
One-cloud claw: one model, one provider, one bill.
Pneuma: the right model for the right job, the right provider for the right role, and no pointless token burn.
Money is not the problem when you spend with intention. Intention becomes architecture. Architecture becomes Pneuma.
art4pro Sp. z o.o. | Lucky & Codinka & Codex | Nowa Ruda 2026
GitHub repository: lunara69-ctrl/pneuma-memory