About the Role
Our platform is built on a hard rule: don’t learn on lies. We co-time streams, gate windows for validity, cancel
nuisances, and only then let ML learn patterns in the residual. ML is not a standalone sandbox—it’s part of a
contract-driven system that must be auditable end-to-end.
The ML Platform / MLOps Engineer makes this real by building the infrastructure that turns ML into a reliable
product: reproducible pipelines, strict lineage, model versioning, safe deployments, and monitoring that tells
us when a model should abstain, retrain, or be rolled back.
What You’ll Own
-
Reproducible ML pipelines: deterministic training + evaluation workflows that can be rerun from a receipt.
-
Dataset lineage + artifacts: dataset versioning/hashing, feature snapshots, artifact storage tying models to exact inputs.
-
Model registry + release process: versioning, promotion (dev → stage → prod), canary/rollback, “what model is serving where?”
-
Inference plumbing: batch + near-real-time scoring, attaching receipts (model version, data version, validity gates) to predictions.
-
ML observability: monitoring for data drift, model drift, performance decay, calibration degradation, validity-uptime regressions.
-
Governance guardrails: enforce “train only on valid windows,” require evaluation receipts, block un-audited models from production.
What You’ll Do
-
Build a standard ML workflow: residual datasets → validity filters/weights → train + evaluate → package → register + deploy → monitor + retrain/rollback.
-
Implement receipt generation for ML: dataset snapshot hashes, preprocessing code version, hyperparameters, seeds, gate definitions, eval suite version.
-
Create CI/CD for ML: automated backtests, metric gates, reproducibility checks, scheduled retraining jobs (when appropriate).
-
Build drift + health monitoring: feature drift per site/zone, validity uptime changes, calibration/alert-quality proxies, confidence calibration checks.
-
Support portal integration: prediction APIs, model metadata endpoints (“why trust this?”), explanations + evidence artifacts.
-
Work with scientific ML and backend/data teams to keep schemas evolution-safe and pipelines robust under change.
Concrete Deliverables
-
A working ML pipeline skeleton: one workflow from dataset → model → evaluation report → registered artifact.
-
A model registry + promotion flow with staged rollouts and rollback support.
-
An inference scorer/service that writes predictions + receipts to the backend store.
-
A monitoring dashboard: data drift, model drift, calibration health, retraining triggers, validity-uptime regressions.
-
A policy gate that enforces: no training/inference without validity metadata, no deploy without evaluation receipt.
Required Qualifications
-
Strong experience in MLOps / ML platform engineering: training pipelines, model packaging, deployment, monitoring.
-
Experience with experiment tracking and artifact management (MLflow, W&B, SageMaker/Vertex, or self-hosted equivalents).
-
Solid backend/data engineering skills: APIs, queues, storage patterns, schema evolution, observability.
-
Comfort with cloud infrastructure and CI/CD: containers, orchestration, secrets management, reliable automation.
-
Ability to reason about reproducibility and leakage risks in time-series ML.
Preferred Qualifications
- Experience with time-series ML in production (drift is the default state).
- Familiarity with governed ML: model cards, evaluation gates, audit trails, compliance-friendly logs.
- Experience with multi-tenant SaaS constraints and cost control (per-customer models vs global models).
- Comfort with probabilistic outputs and calibration monitoring.
- DSP/measurement intuition helpful for validity uptime and abstention semantics (not required).
How You’ll Be Measured (First 60–90 Days)
-
A training run is fully reproducible from a receipt (another engineer can rerun and get the same artifact).
-
The first pilot ML feature can be deployed with versioned artifacts, attached prediction receipts, rollback capability, and monitoring.
-
Drift/health dashboards exist and catch at least one meaningful issue (data shift, validity collapse, calibration drift).
-
ML releases become safer and faster because pipeline gates prevent silent regressions.
Working Style
- You treat reproducibility as a product feature.
- You prefer boring, deterministic pipelines over clever notebooks.
- You build guardrails that let the team move fast without breaking trust.
Title & Level
ML Platform / MLOps Engineer (Receipts + Model Governance) (senior IC; can scale to Staff owning the ML platform architecture),
partnering with Scientific ML, backend/data, validation, and product/UI.
Apply
Send a short note and your resume.