███████╗███╗ ███╗ █████╗ ████████╗██╗██╗ ██╗ ██╔════╝████╗ ████║██╔══██╗╚══██╔══╝██║╚██╗██╔╝ █████╗ ██╔████╔██║███████║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║╚██╔╝██║██╔══██║ ██║ ██║ ██╔██╗ ███████╗██║ ╚═╝ ██║██║ ██║ ██║ ██║██╔╝ ██╗ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
WORKFLOWS AND JOBS — DECLARATIVE PYTHON DATA PIPELINES.
Rust + Apache Arrow under the hood. Declare a job with one decorator; group jobs into a named workflow with a DAG between them. Move data between databases, files, and streams. Cron schedules, watermarks, schema evolution, restart-safe state, at-least-once delivery — built in. No extra scheduler service to deploy.
User Guide
Install, connections, pipelines, modes, scheduling, streaming, stream processing, CLI. Each chapter is a copy-paste-runnable example.
Specs & Benchmarks
Why ematix-flow exists, what's shipped, TPC-H numbers (1.75× DuckDB, 2.77× Polars, 13.4× PySpark geomean), and how it stacks up against the field.
Workflows tab ships with
flow web — one card per workflow, with the
member jobs laid out as an inline SVG flowchart. Arrows show
DAG dependencies; bar width on each node encodes its latest-run
duration. Click any node to focus the full DAG view on that
job. Jobs without a workflow show up as
kind: single workflow-of-one cards.
from ematix_flow import ematix, ManagedTable, Annotated, BigInt, Text, TimestampTZ, pk
@ematix.connection
class app_db:
kind = "postgres"
url = "${APP_DB_URL}"
@ematix.connection
class warehouse:
kind = "postgres"
url = "${WAREHOUSE_URL}"
# Target tables — typed Python; framework migrates on first run.
class Events(ManagedTable):
__schema__ = "analytics"; __tablename__ = "events"
event_id: Annotated[BigInt, pk()]
name: Text | None
received_at: TimestampTZ
class DailyEvents(ManagedTable):
__schema__ = "analytics"; __tablename__ = "events_daily"
day: Annotated[Text, pk()]
count: BigInt
# Two jobs — same decorator surface as v0.5.
@ematix.job(
name="ingest_events",
source_connection="app_db",
target=Events,
target_connection="warehouse",
schedule="*/5 * * * *",
mode="append",
)
def ingest_events(conn):
return "SELECT event_id, name, received_at FROM public.events"
@ematix.job(
name="rollup_daily",
source_connection="warehouse",
target=DailyEvents,
target_connection="warehouse",
schedule="0 1 * * *",
mode="merge", keys=("day",),
)
def rollup_daily(conn):
return ("SELECT date_trunc('day', received_at)::text AS day, "
"COUNT(*) AS count FROM analytics.events GROUP BY 1")
# Workflow — names the group + the DAG between its jobs.
ematix.workflow(
name="events_etl",
jobs=["ingest_events", "rollup_daily"],
depends_on={"rollup_daily": ["ingest_events"]},
) ematix-flow. Surfaces are stabilizing; minor APIs may still shift before beta.