Careers

DevOps / Platform Engineer

Focus: deployments, observability, secure updates, fleet management, and “receipts” that prove what ran where.

About the Role

We run a measurement platform where correctness isn’t just “the service is up”—it’s also “the data is valid, assumptions are enforced, and we can prove what ran where.” The DevOps / Platform Engineer owns the infrastructure and operational systems that keep deployments reliable, observable, and secure across cloud services and (where relevant) device fleets.

This role is for someone who treats reliability, auditability, and secure updates as product features.

What You’ll Do

  • Deployments & environments: build predictable deploy pipelines (dev/stage/prod), safe rollouts (canary/blue-green), environment parity, and rollback paths that actually work.
  • Observability by default: metrics/logs/traces/dashboards/alerts—not just for uptime, but for domain signals (ingest lag, dropped samples, contract failure rates, calibration staleness, drift anomalies).
  • Fleet management: provisioning, identity, config rollout, and lifecycle ops for devices/edge nodes (where applicable).
  • Secure updates: OTA update mechanisms (firmware + edge software), version pinning, rollout controls, rollback, update attestations.
  • Security & compliance fundamentals: secrets management, least privilege IAM, network policies, vuln scanning, dependency hygiene, incident response readiness.
  • Data platform reliability: ensure storage/queues are resilient, backed up, recoverable; manage retention/cost without breaking reproducibility.
  • CI/CD for scientific software: wire algorithm regression gates (golden datasets, “if notch does not dip…” diagnostics) into pipelines.
  • Run receipts & provenance plumbing: make sure every pipeline run captures build versions, configs, dataset hashes, execution context.

Concrete Deliverables

  • A CI/CD pipeline with lint/unit tests + algorithm regression tests + artifact capture + gated releases.
  • An observability stack: dashboards for service health and measurement integrity, alerts with actionable runbooks.
  • A secure update pipeline for edge/device software: signed artifacts, staged rollout, rollback, audit trails.
  • A fleet operations toolkit: device registry, config rollout, heartbeat/telemetry monitoring, “what version is running where?” queries.
  • A disaster recovery plan: backups, restore tests, and clear RTO/RPO assumptions.

Required Qualifications

  • Strong experience with cloud infrastructure and deployment automation (Terraform/Pulumi, Kubernetes or managed services, CI/CD systems).
  • Practical observability experience: metrics/logs/traces, alert tuning, dashboarding, incident response.
  • Security fundamentals: secrets management, IAM, network security, supply chain security (signing, SBOMs), vuln management.
  • Ability to design operational systems that support both backend services and edge/device components.

Preferred Qualifications

  • Experience with IoT/edge fleets: provisioning, OTA updates, device identity, certificate rotation, offline connectivity.
  • Experience operating data pipelines: streaming ingest, queues, object storage + metadata DB patterns, retention/cost controls.
  • Familiarity with enterprise requirements (SSO, audit logs, change management) and platform guardrails.
  • Experience building regression gates for scientific/numerical systems (goldens, metrics, tolerances).

How You’ll Be Measured (First 60–90 Days)

  • Deployments become safer and more repeatable (clear promotion path, rollbacks that work).
  • Observability improves: the team can detect and diagnose failures faster (including data validity failures, not just 500s).
  • Secure update capability exists for edge/device software with audit trails and controlled rollout.
  • Fleet visibility improves: you can answer “what’s running where?” and spot unhealthy devices quickly.

Working Style

  • You assume things will fail and design for fast detection + fast recovery.
  • You like immutable artifacts, signed builds, and audit logs that settle arguments.
  • You prefer automation over heroics, and runbooks over tribal knowledge.

Title & Level

DevOps / Platform Engineer (mid-to-senior; can scale to Staff owning platform architecture), partnering with backend, embedded, and validation teams.

Apply

Send a short note and your resume.

Back to roles

We only use this to respond to your application. No spam.