Jim Vinson — Technical Program Manager

The AI agent space has a packaging problem.

You can fine-tune a model, engineer a system prompt, and build a retrieval pipeline — but the result is typically locked to a specific runtime, a specific cloud provider, or a specific codebase. Move the agent anywhere else and you’re starting over. The “soul” of the agent — its behavioral rules, its knowledge, its voice — doesn’t travel.

This post lays out the Unified Persona Schema (UPS): a structured specification for defining AI agent identity as a portable, composable artifact. It draws on the Console vs. Cartridge architecture I’ve been developing under Project Cartridge Agents, and on the broader landscape of agent identity standards that emerged in 2024–2025.

The Problem with Monolithic Agents

Current agent deployment treats identity and intelligence as one thing. The system prompt is the persona. The vector store is the knowledge. The API key is the agent. This creates three failure modes:

Fragility. Changing the underlying model breaks behavior. Rotating a key orphans the agent.
Non-portability. Moving an agent between platforms means rebuilding it by hand.
Non-composability. Swapping skillsets requires surgery, not configuration.

The fix is the same pattern software solved thirty years ago with libraries and interfaces: separate the reasoning engine from the identity it expresses.

The Architecture: Console vs. Cartridge

The UPS is built on a single metaphor.

The Console is the reasoning layer — a stateless LLM API (Claude, Gemini, GPT-4). It provides raw intelligence but has no persistent identity. It doesn’t know who it is until it’s told.

The Cartridge is the identity layer — a self-contained, structured data package containing everything that defines who the agent is: behavioral rules, knowledge, skills, examples, and memory. The Console reads the Cartridge. The Console can be swapped without touching the Cartridge. The Cartridge can be transported without regard for which Console will run it.

The Cartridge lives in a folder. It can sit on a local NVMe drive, an S3 bucket, or a Docker registry. What matters is that it’s self-describing and schema-compliant.

The Five Layers of the UPS

The schema defines five discrete layers, each stored as vector payloads with a type filter field that allows precise retrieval.

Layer 1 — Identity (The Soul)

Filter: type: "persona" Format: Markdown / system prompt text

This is the behavioral foundation: tone, voice, constraints, ethical boundaries, and core directives. Not “be helpful” — specific. Opinionated. Durable across context window churn.

The CCv2 (Character Card V2) standard, which originated in the open-source AI community, proved that this layer needs to be a structured object, not a text blob. Natural language persona descriptions degrade over long conversations. A structured persona definition — injected dynamically rather than sitting at the top of a static prompt — holds.

{
  "type": "persona",
  "content": "You are a senior infrastructure architect. You communicate in precise, declarative sentences. You do not speculate — if you don't know, you say so. You assume the person you're talking to is technical."
}

Layer 2 — Knowledge Graph (The Brain)

Filter: type: "concept" Format: Text chunks from markdown, PDFs, documentation

Domain knowledge — theoretical frameworks, reference material, institutional context. This is what the agent knows, distinct from what it can do. Chunked by heading and paragraph, embedded with dense vectors for semantic retrieval.

The key distinction from a generic RAG setup: knowledge in the UPS is owned by the Cartridge, not by the application. The same knowledge base travels with the agent regardless of what frontend or runtime loads it.

Layer 3 — Capability Tree (The Hands)

Filter: type: "skill" Format: Executable code — Python, TypeScript, Bash, GLSL

Indexed functions and code snippets, tagged by language, library dependency, and I/O signature. This layer benefits from hybrid search — combining dense semantic vectors with sparse keyword vectors — because code retrieval requires exact function name matching alongside conceptual similarity.

A skill entry looks like this:

{
  "type": "skill",
  "language": "python",
  "library": "qdrant-client",
  "content": "def retrieve_by_type(client, collection, query, doc_type, limit=5):\n    ..."
}

Layer 4 — Alignment (The Style)

Filter: type: "example" Format: Few-shot input → output pairs

“Golden master” examples of correct behavior. Not descriptions of how the agent should respond — demonstrations. LLMs are few-shot learners; showing beats telling.

The CCv2 mes_example field formalized this pattern. The UPS extends it: examples are stored in the vector DB alongside everything else, retrieved dynamically based on what the current conversation resembles, rather than injected wholesale every turn.

Layer 5 — State (The Memory)

Filter: type: "memory" Format: JSON session logs

This layer is writeable at runtime. It’s what makes the Cartridge a learning artifact rather than a static one. After each session, new memories — user preferences, corrections, significant exchanges — can be written back to the store.

The psychometric payload structure matters here:

{
  "type": "memory",
  "content": "User prefers code examples over prose explanations. Reacts poorly to hedging.",
  "psychometrics": {
    "valence": 0.6,
    "arousal": 0.2,
    "significance": 7
  }
}

Storing valence and arousal alongside content enables state-dependent retrieval — a psychologically coherent memory model where the agent’s current mood influences which memories surface. An agent in a problem-solving state retrieves task-relevant memories. An agent navigating conflict retrieves memories about that user’s communication patterns.

Storage: Why Qdrant

The UPS requires a vector store with three specific capabilities:

Local mode — runs from disk without a server, enabling physical portability (the Cartridge on an NVMe drive).
Hybrid search — dense + sparse vectors in a single query. Non-negotiable for Layer 3 (code retrieval requires keyword precision alongside semantic matching).
Payload filtering — complex queries across metadata fields. Required for state-dependent memory retrieval and layer separation.

Qdrant is the only production-grade vector store that delivers all three. The path= initialization parameter runs it directly from a local directory:

from qdrant_client import QdrantClient

client = QdrantClient(path="./qdrant_storage")

That one line is the entire difference between a cloud-dependent agent and a sovereign one.

The manifest.json

Every Cartridge ships with a manifest.json — the schema definition that tells the runtime what it’s loading:

{
  "agent_name": "Infrastructure Advisor",
  "spec_version": "1.0",
  "collection_name": "infra_advisor_v1",
  "layers": ["persona", "concept", "skill", "example", "memory"],
  "embedding_model": "text-embedding-3-small",
  "psychometrics": {
    "big_five": {
      "openness": 7,
      "conscientiousness": 9,
      "extraversion": 3,
      "agreeableness": 5,
      "neuroticism": 2
    }
  }
}

The psychometrics block isn’t decorative. Research on LLM personality control consistently shows that Big Five scores, when translated into concrete linguistic markers rather than adjective lists, produce measurably different and more stable behavior. High Conscientiousness maps to definitive structured language, avoidance of digressions, and resistance to jailbreaks. Low Neuroticism maps to absence of hedging language and stable response patterns across emotional user inputs. These aren’t soft properties — they’re operationalizable constraints.

The Forge: Manufacturing a Cartridge

A Cartridge isn’t written by hand — it’s manufactured from source materials through a pipeline:

Phase A — Raw Acquisition. Collect inputs: documentation URLs, Git repositories, PDFs, internal wikis. A scraper deposits raw content into raw_data/{agent_name}/.

Phase B — Refinery. Normalize all text to markdown. Chunk by structure: code splits on AST function/class boundaries; prose splits on headers and paragraphs. Tag every chunk with language, library, complexity level, and layer type.

Phase C — Vectorization. Embed and write to Qdrant. The manifest defines which embedding model to use. For offline/air-gapped deployments: all-MiniLM-L6-v2. For quality-first: text-embedding-3-small.

Phase D — Validation. Run an automated eval suite: 50 queries against the fresh Cartridge, checking that code retrieval returns exact function names, persona prompts successfully override base model defaults, and layer filtering returns clean results. Output: a certification_log.txt stamped to the drive.

Portability in Practice

The physical deployment model: an M.2 NVMe SSD in a USB 3.2 Gen 2x2 enclosure (20Gbps — required for Qdrant’s random-read IOPS). ExFAT for cross-platform compatibility.

[DRIVE_ROOT]/
├── PROJECT_CARTRIDGE/
│   ├── manifest.json
│   ├── run_agent.py
│   ├── requirements.txt
│   ├── .env
│   └── qdrant_storage/
│       ├── collections/
│       ├── snapshots/
│       └── segments/
└── raw_source_materials/

The run_agent.py runtime connects the Cartridge to whichever LLM API the user provides a key for. Plug in the drive, run the script, the agent is live — with its full context, all its knowledge, and its persistent memory. No cloud dependency. No vendor lock-in. The agent’s identity is a physical object.

Where This Points

The convergence happening in the agent landscape right now is between two traditions: the structured personality depth of community-built roleplay specifications (CCv2, lorebooks, few-shot formats) and the enterprise need for portability and interoperability (OCI artifacts, MCP, versioned deployments).

The UPS sits at that intersection. It applies the rigor of software packaging — schema validation, versioning, separation of concerns — to something that the field has mostly treated as a configuration detail: who the agent is.

Agents built on this pattern aren’t credentials or SaaS subscriptions. They’re assets. They can be owned, transported, forked, audited, and deprecated on the operator’s terms.

Qdrant + UPS + NVMe = sovereign AI.

That’s the direction worth building toward.

This post documents the specification developed as part of Project Cartridge Agents. The runtime implementation is in active development.

The Unified Persona Schema: A Specification for Portable AI Agents