Benchmarks
TPC-H SF=1, 22 queries, Apple M3 Pro — median ms ± σ vs DuckDB, Polars, PySpark.
Same-machine TPC-H benchmark (Apple M3 Pro, single-node) over all 22 queries
against SF=1 Parquet data. ematix-flow / DuckDB / Polars run in-process;
PySpark runs in local[*] mode against the same files.
Scope: every number on this page is single-node. ematix-flow also has an auto-detected distributed mode (Arrow Flight peer mesh — see Advantages §5). Cluster-scale TPC-H runs at SF≥100 will land in a later release; the bench harness (
tpch_distributed) is already in the repo.
- ematix-flow / DuckDB / Polars: 10 timed trials after 3 warmups (v0.4.0–v0.5.0; unchanged, refreshed 2026-05-20).
- PySpark: 3 trials after 1 warmup, Spark 4.1.1 on JDK 23 (refreshed on the same machine, same data, same day).
- Data:
examples/tpch/data/sf1.
Each ematix-flow / DuckDB / Polars cell is median ms ± σ across 5 trials; PySpark cells are median ms across 3 trials. ”—” means the engine couldn’t parse / execute the query (dialect gap).
Headline
- Geomean speedup of ematix-flow (v0.4.0–v0.5.0; unchanged, 10-trial refresh 2026-05-20):
- 1.75× vs DuckDB (was 1.69× at v0.3.0)
- 2.77× vs Polars (was 2.71×)
- 13.4× vs PySpark local[*] (was 12.9×)
- Win counts (lowest median per query): ematix-flow 19, DuckDB 1, Polars 2, PySpark 0.
The geomean improvement over v0.3.0 comes from the ematix-parquet v0.13.0 bump (full SIMD specialisation bw=1..=32 — Q06 scan kernel -18.7%, Q17 -9.5% in isolation) plus the Σ.F.1 shape-catalog substrate that auto-loads previously hand-wired optimizer rules. The “What’s not shipped” closures (warehouse backends, Web UI, secrets, distributed peer auto-detection) are orthogonal to the scan/aggregate hot path.
v0.5.0 ships the same query-execution surface as v0.4.0. The kernel work lives in the sibling ematix-parquet codec; v0.5.0 itself is operational — CLIs, Web UI, alerters, observability — so per-query times match v0.4.0.
Full table
| Query | ematix-flow | DuckDB | Polars | PySpark | Best |
|---|---|---|---|---|---|
| Q01 | 28.63 ± 0.61 | 45.24 ± 0.20 | 38.52 ± 0.84 | 196.5 | ematix-flow |
| Q02 | 9.85 ± 0.21 | 19.07 ± 0.62 | 46.07 ± 0.65 | 290.7 | ematix-flow |
| Q03 | 13.96 ± 1.40 | 32.70 ± 0.65 | 46.00 ± 0.86 | 288.2 | ematix-flow |
| Q04 | 13.21 ± 0.43 | 23.07 ± 2.21 | 25.28 ± 1.51 | 226.1 | ematix-flow |
| Q05 | 21.59 ± 0.93 | 31.48 ± 0.70 | 11150.72 ± 689.69 | 364.2 | ematix-flow |
| Q06 | 11.04 ± 1.41 | 11.94 ± 0.20 | 10.16 ± 0.27 | 68.3 | Polars |
| Q07 | 28.79 ± 1.15 | 32.65 ± 0.93 | 115.31 ± 3.89 | 286.8 | ematix-flow |
| Q08 | 20.41 ± 0.67 | 38.26 ± 0.41 | 93.62 ± 7.78 | 209.8 | ematix-flow |
| Q09 | 26.30 ± 1.36 | 60.67 ± 1.63 | 47.96 ± 1.36 | 461.3 | ematix-flow |
| Q10 | 28.83 ± 10.44 | 68.29 ± 2.23 | 111.80 ± 8.15 | 421.9 | ematix-flow |
| Q11 | 8.65 ± 0.31 | 11.62 ± 0.62 | 9.35 ± 5.04 | 139.1 | ematix-flow |
| Q12 | 14.85 ± 0.37 | 24.37 ± 0.68 | 19.06 ± 0.86 | 288.4 | ematix-flow |
| Q13 | 41.68 ± 0.73 | 147.33 ± 2.06 | 117.00 ± 4.13 | 694.2 | ematix-flow |
| Q14 | 12.13 ± 1.00 | 24.22 ± 1.04 | 13.01 ± 0.78 | 138.3 | ematix-flow |
| Q15 | 16.25 ± 0.92 | 15.69 ± 1.87 | 11.48 ± 0.22 | 166.4 | Polars |
| Q16 | 8.76 ± 1.48 | 26.00 ± 4.35 | 21.29 ± 0.71 | 211.5 | ematix-flow |
| Q17 | 36.85 ± 2.24 | 28.48 ± 1.62 | 42.04 ± 2.96 | 239.4 | DuckDB |
| Q18 | 51.21 ± 3.06 | 52.37 ± 1.31 | 59.19 ± 2.32 | 569.1 | ematix-flow |
| Q19 | 17.79 ± 1.89 | 36.82 ± 3.48 | 106.55 ± 9.04 | 111.4 | ematix-flow |
| Q20 | 16.34 ± 0.85 | 39.11 ± 3.04 | 23.30 ± 2.39 | 148.8 | ematix-flow |
| Q21 | 41.08 ± 1.67 | 87.04 ± 2.18 | 730.68 ± 39.43 | 648.5 | ematix-flow |
| Q22 | 8.62 ± 0.52 | 22.40 ± 0.65 | 12.97 ± 1.67 | 280.2 | ematix-flow |
Release-over-release perf history
v0.4.0 vs v0.3.0
v0.4.0 is the alpha milestone — warehouse backends, Web UI, pluggable secrets, distributed peer auto-detection. All four are orthogonal to the scan / aggregate hot path. The geomean still moved:
| Engine | v0.3.0 | v0.4.0 | Δ geomean |
|---|---|---|---|
| DuckDB | 1.69× | 1.75× | +3.6% |
| Polars | 2.71× | 2.77× | +2.2% |
| PySpark | 12.9× | 13.4× | +4.0% |
Win count rose 18 → 19 / 22 (Q18 flipped to ematix as σ tightened under 10-trial medians). Per-query times shifted ±10% — noise-band movement, not directional, with two non-headline sources of lift:
- ematix-parquet v0.13.0 — full SIMD specialisation bw=1..=32 landed; Q06 scan kernel -18.7%, Q17 -9.5% measured in isolation (kernel-only, not end-to-end Q06 wall time).
- Σ.F.1 shape-catalog substrate — bit-identical perf vs the
hand-wired
Inject*Ruleset it replaced, but stable enough to allow the 10-trial / 3-warmup bench config that surfaced the gain.
v0.3.0 vs v0.2.1
Historical record of the big jump — when ematix-parquet replaced the parquet-rs scan path:
| Query | v0.2.1 | v0.3.0 | Δ |
|---|---|---|---|
| Q01 | 78.19 | 28.11 | -64% |
| Q03 | 20.38 | 15.11 | -26% |
| Q05 | 34.09 | 20.93 | -39% |
| Q07 | 75.56 | 28.96 | -62% |
| Q08 | 35.66 | 20.76 | -42% |
| Q09 | 50.16 | 28.13 | -44% |
| Q10 | 39.73 | 28.16 | -29% |
| Q13 | 44.73 | 41.36 | -8% |
| Q14 | 19.45 | 11.28 | -42% |
| Q16 | 18.29 | 8.60 | -53% |
| Q18 | 157.55 | 52.02 | -67% |
| Q19 | 99.76 | 18.81 | -81% |
| Q21 | 75.48 | 38.08 | -50% |
v0.3.0 win count rose from 15 → 18 / 22.
Caveats
- ematix-flow’s late-materialization path (
read_column_*_masked_into) is enabled forlineitem. Late-mat helps queries with a selective filter on a dict/PLAIN-decodable scalar column; on aggregate-heavy queries with low filter selectivity (Q01) it’s effectively a no-op. - Polars’s SQL frontend rejects several TPC-H canonical shapes; hand-
translated
q??.polars.sqlvariants ship underexamples/tpch/queries/. Q05 specifically still blows up Polars’s planner. - DuckDB runs at default settings (in-memory
read_parquetviews). ematix-flow runs withtarget_partitions=14and theInjectFilterMultiAggRule+InjectFilterSumRule+EnableDictGroupCountRulephysical-optimizer rules registered. - PySpark uses
local[*],spark.sql.shuffle.partitions=8,spark.sql.adaptive.enabled=true. JVM warmup costs sit above what the warmup-trial discard can amortize — treat as order-of-magnitude.
Reproducing
# ematix-flow vs DuckDB vs Polars
cargo run --release -p ematix-flow-core \
--example tpch_triangulation_bench --features triangulation
# PySpark (needs Java 17+; install with `brew install openjdk@23`):
JAVA_HOME=$(/usr/libexec/java_home) python scripts/bench-tpch-pyspark.py \
--data-dir examples/tpch/data/sf1 --trials 3