Careers

ML Platform / MLOps Engineer (Receipts + Model Governance)

Focus: reproducible ML pipelines, dataset/model lineage, staged releases, monitoring, and “receipts” for every prediction.

About the Role

Our platform is built on a hard rule: don’t learn on lies. We co-time streams, gate windows for validity, cancel nuisances, and only then let ML learn patterns in the residual. ML is not a standalone sandbox—it’s part of a contract-driven system that must be auditable end-to-end.

The ML Platform / MLOps Engineer makes this real by building the infrastructure that turns ML into a reliable product: reproducible pipelines, strict lineage, model versioning, safe deployments, and monitoring that tells us when a model should abstain, retrain, or be rolled back.

What You’ll Own

  • Reproducible ML pipelines: deterministic training + evaluation workflows that can be rerun from a receipt.
  • Dataset lineage + artifacts: dataset versioning/hashing, feature snapshots, artifact storage tying models to exact inputs.
  • Model registry + release process: versioning, promotion (dev → stage → prod), canary/rollback, “what model is serving where?”
  • Inference plumbing: batch + near-real-time scoring, attaching receipts (model version, data version, validity gates) to predictions.
  • ML observability: monitoring for data drift, model drift, performance decay, calibration degradation, validity-uptime regressions.
  • Governance guardrails: enforce “train only on valid windows,” require evaluation receipts, block un-audited models from production.

What You’ll Do

  • Build a standard ML workflow: residual datasets → validity filters/weights → train + evaluate → package → register + deploy → monitor + retrain/rollback.
  • Implement receipt generation for ML: dataset snapshot hashes, preprocessing code version, hyperparameters, seeds, gate definitions, eval suite version.
  • Create CI/CD for ML: automated backtests, metric gates, reproducibility checks, scheduled retraining jobs (when appropriate).
  • Build drift + health monitoring: feature drift per site/zone, validity uptime changes, calibration/alert-quality proxies, confidence calibration checks.
  • Support portal integration: prediction APIs, model metadata endpoints (“why trust this?”), explanations + evidence artifacts.
  • Work with scientific ML and backend/data teams to keep schemas evolution-safe and pipelines robust under change.

Concrete Deliverables

  • A working ML pipeline skeleton: one workflow from dataset → model → evaluation report → registered artifact.
  • A model registry + promotion flow with staged rollouts and rollback support.
  • An inference scorer/service that writes predictions + receipts to the backend store.
  • A monitoring dashboard: data drift, model drift, calibration health, retraining triggers, validity-uptime regressions.
  • A policy gate that enforces: no training/inference without validity metadata, no deploy without evaluation receipt.

Required Qualifications

  • Strong experience in MLOps / ML platform engineering: training pipelines, model packaging, deployment, monitoring.
  • Experience with experiment tracking and artifact management (MLflow, W&B, SageMaker/Vertex, or self-hosted equivalents).
  • Solid backend/data engineering skills: APIs, queues, storage patterns, schema evolution, observability.
  • Comfort with cloud infrastructure and CI/CD: containers, orchestration, secrets management, reliable automation.
  • Ability to reason about reproducibility and leakage risks in time-series ML.

Preferred Qualifications

  • Experience with time-series ML in production (drift is the default state).
  • Familiarity with governed ML: model cards, evaluation gates, audit trails, compliance-friendly logs.
  • Experience with multi-tenant SaaS constraints and cost control (per-customer models vs global models).
  • Comfort with probabilistic outputs and calibration monitoring.
  • DSP/measurement intuition helpful for validity uptime and abstention semantics (not required).

How You’ll Be Measured (First 60–90 Days)

  • A training run is fully reproducible from a receipt (another engineer can rerun and get the same artifact).
  • The first pilot ML feature can be deployed with versioned artifacts, attached prediction receipts, rollback capability, and monitoring.
  • Drift/health dashboards exist and catch at least one meaningful issue (data shift, validity collapse, calibration drift).
  • ML releases become safer and faster because pipeline gates prevent silent regressions.

Working Style

  • You treat reproducibility as a product feature.
  • You prefer boring, deterministic pipelines over clever notebooks.
  • You build guardrails that let the team move fast without breaking trust.

Title & Level

ML Platform / MLOps Engineer (Receipts + Model Governance) (senior IC; can scale to Staff owning the ML platform architecture), partnering with Scientific ML, backend/data, validation, and product/UI.

Apply

Send a short note and your resume.

Back to roles

We only use this to respond to your application. No spam.