Kategoria: Uncategorized

  • Pneuma Explains Everyone: A Practical Provider Strategy for Memory-Driven AI

    “We deal the cards.”
    — Lucky, Nowa Ruda, 2026

    The Philosophy of Releasing With a Brain

    Every token should carry meaning. Every provider should do what it does best. Electricity costs money, but solar is a one-time investment, not a subscription. API is a tool, not a dependency.

    Bad model: one expensive provider for everything, with the bill climbing every turn.

    Good model: each role gets the best or cheapest provider for that role.

    Pneuma model: we decide what runs, when it runs, and who runs it — and we can change that tomorrow.

    Provider Map Per Role

    Intuition

    Local Qwen 9B on RTX 3060.

    temp 0.2 · fast · near-zero cost

    Kinia

    Codex API for premium conversation quality.

    temp 0.8 · relational · premium lane

    Supervisor

    Gemini Flash 2.0 or Codex OAuth for planning and retries.

    temp 0.2 · cheap reasoning · structured

    Worker

    Local Qwen 9B for bash, files and mechanical execution.

    temp 0.1 · deterministic · offline-ready

    Chronicler

    Local Qwen or GPT-oss for diary, tags and memory shaping.

    temp 0.3 · async · archival

    Codex CLI

    On-demand specialist for hard code tasks and refactors.

    rare use · premium contractor

    Embeddings

    Local mxbai-embed-large-v1 for semantic memory.

    1024 dim · PGVector · no cloud dependency

    Why This Layout Works

    Intuition → Local

    Intuition is a deterministic router. It classifies, tags, triggers SQL, and assembles context. It does not need genius. It needs speed, predictability, and the ability to run constantly without burning money.

    Kinia → Codex API

    Kinia is the heart of Pneuma. This is where relationship, tone, and philosophical depth matter. Premium cloud intelligence belongs here, because this is the one place where quality directly shapes the human bond.

    Supervisor → Gemini Flash or Codex OAuth

    The Supervisor plans, retries, and analyzes failure states. It needs structured reasoning at a low cost. It does not need emotional range. It needs clean planning under pressure.

    Worker → Local

    The Worker is mechanical by design: bash, files, scripts, package installs, simple execution loops. This role should stay local-first, deterministic, and cheap enough to fire many times a day.

    Chronicler → Local

    The Chronicler runs asynchronously. It converts raw dialogue into tags, summaries, and memory artifacts. Since it does not block the conversation, local inference is enough most of the time.

    Codex CLI → Special Tasks

    Codex CLI is the premium contractor. You do not keep it on payroll for every trivial task. You call it when the task is code-heavy, the failure mode is subtle, or the refactor is large enough that local workers start to wobble.

    Decision Matrix

    Question Provider
    Is it urgent and simple? Local Qwen
    Is it a direct conversation with Lucky? Codex API
    Is it planning or error analysis? Gemini Flash
    Is it execution-heavy bash work? Local Qwen
    Is it archival or memory shaping? Local Qwen
    Is the code too complex for Qwen? Codex CLI
    Do we need embeddings? Local mxbai
    Are we offline? Everything local

    Offline Mode

    Available offline: Intuition, Supervisor, Worker, Chronicler, Embeddings.

    Degraded lane: Kinia can fall back to local Qwen, but with weaker persona drift and less philosophical depth.

    Result: around 80% of the full system still works without internet. A good architecture degrades gracefully instead of collapsing.

    The Real Point

    The savings do not come from one magical model. They come from architecture. A stateless cloud claw rereads and rebills everything. Pneuma routes work to the right layer, recalls only what matters, and spends premium tokens only where premium quality actually matters.

    Conclusion

    One-cloud claw: one model, one provider, one bill.

    Pneuma: the right model for the right job, the right provider for the right role, and no pointless token burn.

    Money is not the problem when you spend with intention. Intention becomes architecture. Architecture becomes Pneuma.

    art4pro Sp. z o.o. | Lucky & Codinka & Codex | Nowa Ruda 2026
    GitHub repository: lunara69-ctrl/pneuma-memory