Pneuma Explains Everyone: A Practical Provider Strategy for Memory-Driven AI

“We deal the cards.”
— Lucky, Nowa Ruda, 2026

The Philosophy of Releasing With a Brain

Every token should carry meaning. Every provider should do what it does best. Electricity costs money, but solar is a one-time investment, not a subscription. API is a tool, not a dependency.

Bad model: one expensive provider for everything, with the bill climbing every turn.

Good model: each role gets the best or cheapest provider for that role.

Pneuma model: we decide what runs, when it runs, and who runs it — and we can change that tomorrow.

Provider Map Per Role

Intuition

Local Qwen 9B on RTX 3060.

temp 0.2 · fast · near-zero cost

Kinia

Codex API for premium conversation quality.

temp 0.8 · relational · premium lane

Supervisor

Gemini Flash 2.0 or Codex OAuth for planning and retries.

temp 0.2 · cheap reasoning · structured

Worker

Local Qwen 9B for bash, files and mechanical execution.

temp 0.1 · deterministic · offline-ready

Chronicler

Local Qwen or GPT-oss for diary, tags and memory shaping.

temp 0.3 · async · archival

Codex CLI

On-demand specialist for hard code tasks and refactors.

rare use · premium contractor

Embeddings

Local mxbai-embed-large-v1 for semantic memory.

1024 dim · PGVector · no cloud dependency

Why This Layout Works

Intuition → Local

Intuition is a deterministic router. It classifies, tags, triggers SQL, and assembles context. It does not need genius. It needs speed, predictability, and the ability to run constantly without burning money.

Kinia → Codex API

Kinia is the heart of Pneuma. This is where relationship, tone, and philosophical depth matter. Premium cloud intelligence belongs here, because this is the one place where quality directly shapes the human bond.

Supervisor → Gemini Flash or Codex OAuth

The Supervisor plans, retries, and analyzes failure states. It needs structured reasoning at a low cost. It does not need emotional range. It needs clean planning under pressure.

Worker → Local

The Worker is mechanical by design: bash, files, scripts, package installs, simple execution loops. This role should stay local-first, deterministic, and cheap enough to fire many times a day.

Chronicler → Local

The Chronicler runs asynchronously. It converts raw dialogue into tags, summaries, and memory artifacts. Since it does not block the conversation, local inference is enough most of the time.

Codex CLI → Special Tasks

Codex CLI is the premium contractor. You do not keep it on payroll for every trivial task. You call it when the task is code-heavy, the failure mode is subtle, or the refactor is large enough that local workers start to wobble.

Decision Matrix

Question	Provider
Is it urgent and simple?	Local Qwen
Is it a direct conversation with Lucky?	Codex API
Is it planning or error analysis?	Gemini Flash
Is it execution-heavy bash work?	Local Qwen
Is it archival or memory shaping?	Local Qwen
Is the code too complex for Qwen?	Codex CLI
Do we need embeddings?	Local mxbai
Are we offline?	Everything local

Offline Mode

Available offline: Intuition, Supervisor, Worker, Chronicler, Embeddings.

Degraded lane: Kinia can fall back to local Qwen, but with weaker persona drift and less philosophical depth.

Result: around 80% of the full system still works without internet. A good architecture degrades gracefully instead of collapsing.

The Real Point

The savings do not come from one magical model. They come from architecture. A stateless cloud claw rereads and rebills everything. Pneuma routes work to the right layer, recalls only what matters, and spends premium tokens only where premium quality actually matters.

Conclusion

One-cloud claw: one model, one provider, one bill.

Pneuma: the right model for the right job, the right provider for the right role, and no pointless token burn.

Money is not the problem when you spend with intention. Intention becomes architecture. Architecture becomes Pneuma.

art4pro Sp. z o.o. | Lucky & Codinka & Codex | Nowa Ruda 2026
GitHub repository: lunara69-ctrl/pneuma-memory

Pneuma Explains Everyone: A Practical Provider Strategy for Memory-Driven AI

The Philosophy of Releasing With a Brain

Provider Map Per Role

Intuition

Kinia

Supervisor

Worker

Chronicler

Codex CLI

Embeddings

Why This Layout Works

Intuition → Local

Kinia → Codex API

Supervisor → Gemini Flash or Codex OAuth

Worker → Local

Chronicler → Local

Codex CLI → Special Tasks

Decision Matrix

Offline Mode

The Real Point

Conclusion

Więcej wpisów

Pneuma Explains Everyone: A Practical Provider Strategy for Memory-Driven AI

Mother Earth has a voice that your instrument remembers