PHINEAS

PHINEAS

AI teaching assistant using corpus linguistics to rewrite complex text at target reading levels — now in beta at an ESL academy in Singapore

AI Product Development — Education/Accessibility Beta (Singapore) OpenAI API Embeddings Fine-tuning COCA Corpus Google Gemini (Gem) Anvil GCP
PHINEAS AI teaching assistant

Published:

Context

English learners and readers with accessibility needs struggle with text complexity. Existing “simplification” tools are crude — they swap long words for short ones without understanding usage frequency or semantic context. A word isn’t hard because it’s long; it’s hard because learners haven’t encountered it yet.

Approach

Built on the COCA (Corpus of Contemporary American English) word frequency database — 1 billion words of real English usage data:

  1. Corpus Architecture: Embedded COCA frequency database via OpenAI for semantic search across 60,000+ ranked vocabulary items
  2. Analysis Engine: Model identifies words above target frequency thresholds based on CEFR proficiency levels (A1–C2)
  3. Rewrite System: Intelligent substitution replaces complex vocabulary with accessible alternatives, preserving meaning and sentence structure
  4. SME Workflow: Fine-tuning pipeline designed for subject matter experts (ESL teachers) to contribute training examples without requiring technical skills — currently compiling 120-example training batch

Outcome

Currently deployed as a Google Gem (prompt + lexical database) in beta testing at a partner ESL academy in Singapore. Shared with select staff and students while training samples are compiled for the first batch fine-tuning run. Core analysis and rewrite functionality validated. Exceeds original project KPIs for accuracy.

Key Insight

The hard problem isn't the AI — it's the corpus data architecture. And making fine-tuning accessible to non-technical SMEs is a product design challenge, not an engineering challenge.

Portfolio Signal

  • Domain expertise (education/linguistics — COCA corpus, CEFR standards)
  • Embeddings implementation on real-world structured data
  • Fine-tuning workflow design for non-technical contributors
  • Phased delivery: working product in users' hands before optimization
  • International deployment with real user feedback loop

Corporate Translation

Three skills in one project: (a) technical build with embeddings and fine-tuning, (b) PM discipline with phased delivery and measurable KPIs, (c) product management with a real beta program generating real user data.