The Permanent Record & AI Peer Review
Asked my AI coworkers to formally review my job performance — then built the evidence infrastructure to make the grades auditable
Published:
The Infrastructure — The Permanent Record
Working across multiple AI interfaces (Claude, Gemini, LobeChat, Claude Code) created a fragmentation problem. Months of accumulated institutional knowledge — architecture decisions, debugging sessions, design rationale — existed only in scattered chat logs with no unified retrieval layer.
Built a serverless RAG system entirely on Cloudflare’s edge infrastructure:
- Vectorize for semantic search across 1,700+ knowledge chunks
- D1 (SQLite at the edge) for structured metadata
- R2 for raw conversation archives (S3-compatible, zero egress — the disaster recovery layer)
- Workers AI (bge-m3) for embedding generation — no external API dependency
- Workers for the search API — V8 isolates, zero cold starts, sub-50ms global response
Key architecture decision: all-Cloudflare stack. Single wrangler.toml, one billing dashboard, one deployment pipeline. Deliberate vendor lock-in traded for operational simplicity at this scale, with R2 archive as the portability hedge.
The Application — AI Peer Review System
Hypothesized that my AI collaborators possessed the most complete dataset of my professional behavior. Designed a multi-agent peer review to test whether AI could produce rigorous, graded, evidence-backed professional assessment.
Three-round methodology:
- Round 1 — Blind Independence: Claude reviewed its own conversation logs; Gemini reviewed its own. No data sharing. Result: significant divergence (GPA delta 0.26). Gemini saw “all start, no finish” because it handled ideation. Claude saw “high execution, low documentation” because it handled coding.
- Round 2 — Cross-Validation: Each agent audited the other’s evidence. Gemini rescinded its “scope abandonment” critique after seeing Claude’s evidence of completed builds. Delta narrowed to 0.12.
- Round 3 — Collaborative Synthesis: Both agents instantiated in a shared LobeChat environment for real-time debate. Final delta: 0.04. Consensus GPA: 3.81/4.0.
The Permanent Record was the retrieval layer that made evidence-based grading possible. Without semantic search over conversation history, the agents would have been working from memory (unreliable) rather than indexed evidence (auditable).
Outcome
The Permanent Record is deployed and live. The Peer Review produced a published field report, two independent audit documents, and a final consensus grade report — all generated through the system, all serving as portfolio artifacts. The methodology itself is documented as a replicable framework.
Key Insight
An isolated AI is a biased observer. A network of AIs cross-validating each other's observations becomes a credible panel. The infrastructure (RAG) enables the methodology (peer review) — architecture serving a purpose beyond search.
Portfolio Signal
- ◈ Deployed RAG system with documented architecture trade-offs (Vectorize over Pinecone, bge-m3 over ada-002)
- ◈ Multi-model orchestration with structured disagreement resolution
- ◈ Methodology design: a replicable framework, not a one-off experiment
- ◈ Recursive self-documentation: the process of evaluating AI collaboration is itself AI collaboration
- ◈ The quality of the output is evidence for the assessment it contains
Corporate Translation
Two skills demonstrated: (1) designing and deploying data infrastructure with clear trade-off reasoning, and (2) building structured processes that produce auditable, evidence-based outcomes. Both are core TPM competencies.