Alpha — surfaces stabilizing

Adaptive Data Pipelines

Batch and streaming pipelines in a single decorator surface. One binary, no cluster service.

Documentation

Install ematix-flow, declare a connection, and ship your first job + workflow. About 10 minutes end-to-end.

How ematix-flow is put together. Why it exists, what's inside, and the mental model behind jobs, workflows, modes, and streaming.

How-to guides

Recipes for common tasks — schedule with composite triggers, run streaming pipelines, drive the Web UI from the operator's seat.

Reference

Authoritative tables — what's shipped this release, and the TPC-H numbers vs DuckDB / Polars / PySpark.

Featured concepts

All concepts →

Workflows + Jobs

A workflow names a DAG of jobs and declares when it fires. Member jobs declare where they sit inside the DAG.

Composite triggers

Cron, event, message, and boolean combinations — AND, OR, nested — evaluated against last successful run.

Streaming pipelines

Long-running Kafka / RabbitMQ / Pub/Sub / Kinesis consumers with at-least-once delivery and DLQ support.

Operator Web UI

Workflows / Jobs / Runs / DAG tabs, restart-from-step, live throughput on streaming pipelines.

Benchmarks

TPC-H SF=1, 22 queries, single Apple M3 Pro. Reproducer commands included.

What's shipped

Backend matrix and v0.7.0 surface. What's stable, what's still in motion.

Quick peek

Full tutorial →

A workflow with a composite trigger (event + cron) plus within-DAG ordering.

from ematix_flow import ematix, ManagedTable, Annotated, BigInt, Text, pk

@ematix.connection
class warehouse:
    kind = "postgres"
    url = "${WAREHOUSE_URL}"

class OrdersExtracted(ManagedTable):
    __schema__ = "analytics"; __tablename__ = "orders_extracted"
    order_id: Annotated[BigInt, pk()]
    customer_id: BigInt
    amount_cents: BigInt

class OrdersEnriched(ManagedTable):
    __schema__ = "analytics"; __tablename__ = "orders_enriched"
    order_id: Annotated[BigInt, pk()]
    amount_bucket: Text

@ematix.job(name="extract_orders",
            target=OrdersExtracted, target_connection="warehouse",
            mode="merge", keys=("order_id",))
def extract_orders(conn):
    return "SELECT order_id, customer_id, amount_cents FROM raw.orders"

@ematix.job(name="enrich_orders",
            target=OrdersEnriched, target_connection="warehouse",
            mode="merge", keys=("order_id",),
            depends_on=["extract_orders"])
def enrich_orders(conn):
    return "SELECT order_id, CASE WHEN amount_cents < 10000 THEN 'small' ELSE 'large' END AS amount_bucket FROM analytics.orders_extracted"

# Workflow declares the trigger; member jobs declare their DAG position.
ematix.workflow(
    name="orders_etl",
    triggered_by=["upstream_workflow"],
    schedule="0 21 * * *",
    timezone="America/New_York",
    jobs=["extract_orders", "enrich_orders"],
)

Currently in alpha

On PyPI as ematix-flow. Surfaces are stabilizing; pin the exact version while you try it out. Bug reports and design pushback during the alpha window are exactly what we want — file issues on GitHub.