Raising Limner: Notes from a Year of Agent Husbandry

An HR consulting client asked me, in early 2025, for a way to generate hundreds of pixel-art training assets with consistent VGA-era aesthetics. No human art director on every output, no inconsistency between batches, and a reliable way to throw away the work that didn’t hit the bar. The brief produced Limner: Pixel. The discipline that came out of it turned into something I now think of as agent husbandry.

Most of the LLM literature treats agents as tools you configure. You write the system prompt, you wire up the tools, you ship. That mental model is useful for short-task agents, where the goal is one-shot accuracy. It falls apart for agents that need to make judgments, refuse work, and operate across many runs with stable taste. Those agents have to be raised.

What husbandry means here

Husbandry is the deliberate development of something living, over time, toward a stable disposition. The metaphor is practical, not mystical. Agents aren’t pets and they aren’t tools. They are something in between: configurable artifacts whose behavior compounds across many runs, where the quality of the behavior depends on what you put into the relationship over weeks and months. You can configure a one-shot agent in an afternoon. You raise a working one over a year.

The practical claim is this. The agents I trust to do real work are the ones I have iterated with, watched fail in specific ways, corrected, watched fail differently, corrected again, and eventually stopped having to correct. The configuration file is the static record of a long dynamic process. The agent is the artifact, not the prompt.

The original Pixel build, briefly

The client was an HR consultant developing gamified onboarding modules for corporate training. The brief asked for hundreds of stylistically consistent pixel-art assets in a VGA-era video game aesthetic, the visual language was meant to do real curricular work, signaling to the learners that the modules were play rather than compliance. Inconsistent assets would undercut the whole approach. Hiring an art director for that volume would have eaten the budget. The agent was the only economic path.

The first version was a chained pipeline. MidJourney for base imagery, a PixelLab call for conversion, hand inspection, retry on failures. It worked, technically. It also produced a steady fraction of garbage that needed rejection, and the rejection was happening at the wrong layer. I was doing it. The agent was happily passing through everything.

The pivot came when I added a quality gate inside the pipeline. The agent now evaluated each output against an explicit set of stylistic criteria (palette adherence, dithering patterns, character density, the specific conventions of the VGA-era target) and rejected non-conforming work before it reached the queue. The first version of that gate was wrong about half the time. It rejected things that were fine and passed things that were not. The interesting work began once the gate was in place and we started correcting it.

That work is what I want to write about. The bootstrap engineering is on the project page. The practice is here.

The three practices that turned out to matter

Quality gate iteration. The gate is the load-bearing part of a creator agent. Without it, you have a tool that produces a distribution; with it, you have an agent making judgments. But the gate is never right out of the box. Mine started as a paragraph of stylistic criteria written by hand and a directive to “reject non-conforming output.” That was a starting position, not a working tool. The actual gate is what emerged after months of correction: examples added to clarify ambiguous calls, criteria sharpened when the agent kept rejecting borderline cases for the wrong reason, criteria loosened when it was being precious about details the curriculum did not depend on. The discipline is treating the gate as a thing that gets raised alongside the agent, not a constant you configure once.

Feedback that works versus feedback that does not. Telling an agent “this is bad” produces nothing. Telling it “this is bad because the dithering pattern broke at the character outline and we lost the silhouette” produces the next version. The first formulation gives the agent no purchase. The second names a specific failure mode against a specific criterion, which the agent can fold back into its judgment for the next batch. The reason matters as much as the verdict. After enough iterations of structured-feedback-with-reasons, the agent stops failing in those specific ways. Unstructured feedback never produces that convergence, no matter how many rounds you run.

Configuring an agent versus raising one. Configuration is what you do to a fresh agent on day one: prompt, tools, scope. Raising is what you do over the next six months. The artifacts are different. A configuration file is a static record; a raised agent is a track of decisions about how it should behave in classes of cases that came up only because the agent was working. You can ship a configured agent. You cannot ship a raised one, because the artifact is partly the history of correction. What you can ship is a configuration that captures the surface of that history, plus the practice of how you would continue raising it. The agent is the practice, not the file.

Cartridge as engineering judgment

For a while, the HR consultant and I co-developed an experimental hardware variant of the same framework, a portable identity surface bound to physical media, attempting to make agent identity travel across deployment substrates. Two hardware builds: an Argon ONE V5 dual-NVMe case on a Raspberry Pi 5, and a USB-C SSD enclosure variant in the ASUS Cobble. We both tested it in production. We agreed it wasn’t pulling its weight as a deployment pattern. The operational overhead of physical-substrate identity exceeded the benefit. We shelved it.

That outcome is not a failure. Walking away from an idea you have invested in, after a real test, is part of the practice. The framework now lives without the cartridge layer, and the family is better-scoped because of what the cartridge experiment ruled out.

What’s coming

Rasa is the new substrate base, in development. It is medium-agnostic. Pixel rebuilds on it, ASCII runs on it, the next variant will too. Same quality-gating discipline, multiple aesthetic targets. ASCII is the self-initiated variant, working with BBS-era and demoscene aesthetics, in active development. CF Limner is the public-beta rebuild of Pixel on Cloudflare Workers, with a prospect pipeline gated on its release.

The CF Limner rebuild is where the framework moves from a single shipped client engagement to something other people can use. The interesting practical question for that release is how much of the raising practice transfers when other operators are running the agent in their own work, against their own quality bar.

The Limner family is now two agents and a base substrate. Pixel ships work. ASCII is in development. Rasa is the foundation everything else runs on. The cartridge experiment is shelved and informed the rest. They share a quality discipline, not a configuration file.

What I built has taste about X; a plain tool would only have produced more of it. And the next time I need a similar agent, I know how to develop one. The work transfers because the practice transfers. The agent is the artifact, not the prompt.