AI coding agents like Claude Code write excellent Python, but ML is stateful, iterative, and unforgiving. Agents lose context after compaction, fall victim to silent failures with no way to verify results.
Goldfish is the ML platform built for agents. Contract-based runs. Deterministic validation. AI-powered review. Everything documented automatically. A backbone that transforms a coding agent into a research assistant with perfect recall, infinite patience, and documentation you'd never write yourself.
"A memory I can actually trust."
"Reproducibility by default."
"The missing link between code and insight."
Tools like W&B and MLflow weren't designed for agents. Goldfish is — built around their strengths (tireless, precise, great at documentation) and weaknesses (no persistent memory, no intuition for "normal"). One server handles the MLOps so agents can focus on research.
Every decision, result, and rationale is captured. After compaction, agents resume with full context — what they tried, why, and what to do next.
Every run is versioned before execution. Full lineage from raw data to final model. "What changed between v3 and v4?" is always answerable.
Deterministic checks catch shape mismatches, NaN propagation, and data leakage. AI review catches logic errors. Problems surface before they corrupt results.
Failed experiments become searchable knowledge. Patterns are extracted, approved, and applied. The same mistake never happens twice.
What made run B better than run A? Config diff, metric deltas, outcome tracking — all captured automatically, recallable instantly.
Write profile: h100-spot and run. Local Docker for iteration, cloud GPUs for training. Agents focus on research, not DevOps.