HermodLabs — Home Page

About the Role

Our platform’s core idea is: align first, then cancel, then interpret. The most valuable data we produce is not raw telemetry—it’s the residual after co-timing, validity gating, and nuisance cancellation, plus the receipts that explain exactly how that residual was produced.

This role owns the data layer that makes that usable: dataset construction pipelines, feature views, schema discipline, and lineage that lets anyone answer “which data went into this output, under what gates, with what versions?”

What You’ll Own

Residual dataset pipelines: raw streams → co-timed streams → canceled residuals → windowed features, at scale and with clear contracts.
Lineage + receipts in data form: every dataset row carries pointers to inputs, versions, configs, calibration state, and validity decisions.
Feature tables / analytics views: stable feature sets for ML and product analytics (validity uptime, alarm quality, commissioning outcomes).
Data quality & validation: prevent silent corruption (missingness, skew, misaligned timestamps, schema drift) with tests and checks.
Backfills & schema evolution: reprocessing and versioned datasets when algorithms/configs change without breaking consumers.

What You’ll Do

Design canonical schemas for runs/windows, validity verdicts (valid/borderline/invalid + reasons), residual artifacts, and derived features.
Build reproducible pipelines (batch + incremental) with deterministic windowing and feature computation.
Implement dataset receipts: snapshot IDs/hashes, code + config versions, upstream algorithm versions, calibration state, gating rules used.
Create queryable analytics: validity uptime by site/zone/device, abstain reason distributions, commissioning pass rates, pipeline health.
Partner with ML + backend to support training-ready datasets, inference joins, and portal drilldowns (“show the evidence behind this alert”).

Concrete Deliverables

A Residual Dataset Spec: schemas, partitions, naming conventions, versioning strategy, lineage fields.
A working pipeline that produces window-level feature tables suitable for ML training plus validity gate tables with reason codes.
A data quality test suite (CI + scheduled): missingness, timestamp sanity, schema drift detection, outlier flags.
A backfill & migration playbook: recompute safely when co-timing/cancellation code changes while keeping old versions accessible.
A simple analytics dashboard: validity uptime, abstain reasons, pipeline freshness.

Required Qualifications

Strong experience with data engineering: reliable pipelines, data models, production-grade datasets.
Proficiency with common stacks (choose what fits): SQL + warehouses (Postgres/BigQuery/Snowflake), object storage patterns, orchestration (Airflow/Dagster/Prefect), Python for transforms/validation.
Ability to design schema evolution and versioning strategies that don’t break downstream consumers.
Comfort with time-series data realities: ordering, late arrivals, missing segments, and clock weirdness.

Preferred Qualifications

Experience with lineage/provenance systems (manifests, immutable logs, dataset versioning).
Familiarity with ML feature engineering workflows (feature tables, splits, leakage prevention).
Experience with IoT/device telemetry ingestion and fleet-scale data quality issues.
Basic DSP literacy (windowing, spectral summaries) helpful for feature correctness and sanity checks.

How You’ll Be Measured (First 60–90 Days)

You ship a training-ready residual dataset (and feature view) for at least one pilot type.
Lineage is queryable: given a feature row or alert, the team can trace it back to raw inputs + versions.
Pipeline quality improves: fewer mystery gaps, clearer freshness and missingness reporting.
Backfills are safe and repeatable when co-timing/cancellation code changes.

Working Style

You treat every dataset as a product: documented, versioned, tested, reproducible.
You don’t tolerate silent schema drift or “unknown joins.”
You design for audits: the system can always answer “how was this computed?”

Title & Level

Data Engineer / Analytics Engineer (Residual Datasets + Lineage) (mid-to-senior; can scale to Staff if owning the data model + lineage architecture), partnering with backend/data, Scientific ML, validation, and product/UI.

Apply

Send a short note and your resume.