PHINEAS

Custom-built for an ESL academy in Singapore on a speculative basis. Active client relationship, the head of English championing the work, beta-user trials with students and faculty.

PHINEAS AI teaching assistant
vinsonconsulting/ phineas-app
Private

A six-step state machine that holds Gemini to a CEFR level for ESL/ELL instructors. A part-of-speech-aware vocabulary database carries the primary signal, approved trainer corrections tune the model through many-shot examples, and per-word levels come from the database rather than the model, so they stay deterministic.

Next.js React Firebase Google Gemini Vercel Sentry

Context

The head of English at an ESL academy in Singapore needed a way to produce reading material at defined CEFR levels (the Common European Framework of Reference, the standard for language proficiency), at scale, with deterministic adherence to the level. Off-the-shelf LLM output couldn’t be trusted to stay on-level, even vaguely. A passage requested at B1 would drift into B2 vocabulary, or slip back to A2 grammar, or include culture-specific references that broke the assessment.

I took on the build on a speculative basis. The head of English championed the work internally and committed beta-trial time from students and faculty.

Approach

A six-step state machine that forces deterministic output from a probabilistic model. The corpus grounding does the heavy lifting: a custom CEFR corpus of roughly 14,000 core words and phrases, extended with extrapolated morphologies to speed processing and sharpen determinism.

  1. Topic seed and CEFR target lock
  2. Word-frequency check against the corpus for level-appropriate vocabulary
  3. Grammar constraint check against CEFR descriptors
  4. Draft generation with frozen vocabulary and grammar windows
  5. Self-review by a second agent against the CEFR rubric
  6. Final pass that re-checks frequency drift and outputs a confidence score

Each step has a defined input/output schema. The LLM produces probabilistic text inside hard guardrails.

Outcome

Now at phineas.app, in formal product development. The pilot bar was outcome-based: a generated passage had to be classroom-ready without edits beyond formatting, and clear unanimous staff review. The target was 60% of passages clearing that bar; the pilot reached just under 85%, which is what moved it into open beta. Beta-user trials are running with students and faculty at the original academy.

The pattern (rubric-grounded multi-step state machine for deterministic LLM output) generalizes to any domain where the output needs to hit a precise level or category: medical literacy, legal writing, regulatory compliance, technical documentation tiered by audience.