EMATIX(R) DATA TERMINAL — ROBCO INDUSTRIES UNIFIED OPERATING SYSTEM
COPYRIGHT 2026 EMATIX SYSTEMS — ALL RIGHTS RESERVED
USER: GUEST   SESSION: 2026-05-20 21:38:22Z   HOST: ematix.dev/specs
// TECHNICAL SPECS

Architecture

What's actually inside — Rust core, Arrow plane, Python surface.


ematix-flow is a Rust core wrapped in a Python surface. Three concentric layers:

┌───────────────────────────────────────────────────────────┐
│  PYTHON SURFACE                                           │
│   @ematix.pipeline / @ematix.streaming_pipeline           │
│   @ematix.connection / @ematix.table                      │
│   @ematix_flow.udf / .udaf                                │
│   `flow` CLI                                              │
├───────────────────────────────────────────────────────────┤
│  RUST CORE — ematix-flow-core                             │
│   • DataFusion execution plan                             │
│   • Custom physical optimizer rules                       │
│   • Arrow record-batch streaming                          │
│   • Backend trait (Postgres, MySQL, Kafka, …)             │
│   • Run-history store + watermarks                        │
├───────────────────────────────────────────────────────────┤
│  ARROW DATA PLANE                                         │
│   Every byte crossing a backend boundary is Arrow.        │
│   No row-by-row serialization. No intermediate files.     │
└───────────────────────────────────────────────────────────┘

Why Rust + Arrow?

Why Python on top?

Sibling projects

ematix-parquet — the Parquet codec

ematix-parquet is the hand-rolled Rust Parquet codec that powers the fast scan path. Hand-tuned SIMD on NEON + AVX2, predicate-fused decode, adaptive dispatch on selectivity, full read + write coverage of the Parquet spec, and a dependency-light footprint. Ships independently on crates.io as ematix-parquet-codec / ematix-parquet-io — use it without ematix-flow if you just want the codec.

See Advantages — hand-tuned Parquet scan path for the perf details that show up in the TPC-H benchmark.

ematix-probe — data quality + load testing

ematix-probe is a separate (but related) framework for data-quality and load testing. You declare a target (a Postgres table, a Parquet file, a SQL query, an HTTP endpoint) and the assertions it must satisfy in Python; the framework runs the checks and returns a structured verdict.

It pairs naturally with ematix-flow — a @probe can fire as a pre_load_transform or post_load_transform step in a pipeline, gating the load on data shape. There’s also a pytest plugin so the same probes run from CI.

ematix-probe is CLI-driven; no web UI ships with either project today. Run history is queryable via flow runs ... (ematix-flow) and ematix-probe runs ... (ematix-probe).


◀ BACK TO TECHNICAL SPECS ▲ HOME