Batch and streaming pipelines in a single decorator surface. One binary, no cluster service.
Install ematix-flow, declare a connection, and ship your first job + workflow. About 10 minutes end-to-end.
How ematix-flow is put together. Why it exists, what's inside, and the mental model behind jobs, workflows, modes, and streaming.
Recipes for common tasks — schedule with composite triggers, run streaming pipelines, drive the Web UI from the operator's seat.
Authoritative tables — what's shipped this release, and the TPC-H numbers vs DuckDB / Polars / PySpark.
A workflow names a DAG of jobs and declares when it fires. Member jobs declare where they sit inside the DAG.
Cron, event, message, and boolean combinations — AND, OR, nested — evaluated against last successful run.
Long-running Kafka / RabbitMQ / Pub/Sub / Kinesis consumers with at-least-once delivery and DLQ support.
Workflows / Jobs / Runs / DAG tabs, restart-from-step, live throughput on streaming pipelines.
TPC-H SF=1, 22 queries, single Apple M3 Pro. Reproducer commands included.
Backend matrix and v0.7.0 surface. What's stable, what's still in motion.
A workflow with a composite trigger (event + cron) plus within-DAG ordering.
from ematix_flow import ematix, ManagedTable, Annotated, BigInt, Text, pk
@ematix.connection
class warehouse:
kind = "postgres"
url = "${WAREHOUSE_URL}"
class OrdersExtracted(ManagedTable):
__schema__ = "analytics"; __tablename__ = "orders_extracted"
order_id: Annotated[BigInt, pk()]
customer_id: BigInt
amount_cents: BigInt
class OrdersEnriched(ManagedTable):
__schema__ = "analytics"; __tablename__ = "orders_enriched"
order_id: Annotated[BigInt, pk()]
amount_bucket: Text
@ematix.job(name="extract_orders",
target=OrdersExtracted, target_connection="warehouse",
mode="merge", keys=("order_id",))
def extract_orders(conn):
return "SELECT order_id, customer_id, amount_cents FROM raw.orders"
@ematix.job(name="enrich_orders",
target=OrdersEnriched, target_connection="warehouse",
mode="merge", keys=("order_id",),
depends_on=["extract_orders"])
def enrich_orders(conn):
return "SELECT order_id, CASE WHEN amount_cents < 10000 THEN 'small' ELSE 'large' END AS amount_bucket FROM analytics.orders_extracted"
# Workflow declares the trigger; member jobs declare their DAG position.
ematix.workflow(
name="orders_etl",
triggered_by=["upstream_workflow"],
schedule="0 21 * * *",
timezone="America/New_York",
jobs=["extract_orders", "enrich_orders"],
)
On PyPI as ematix-flow. Surfaces are
stabilizing; pin the exact version while you try it out.
Bug reports and design pushback during the alpha window
are exactly what we want — file issues on
GitHub.