Benchmarks

TPC-H SF=1, 22 queries, Apple M3 Pro — median ms ± σ vs DuckDB, Polars, PySpark.

Same-machine TPC-H benchmark (Apple M3 Pro, single-node) over all 22 queries against SF=1 Parquet data. ematix-flow / DuckDB / Polars run in-process; PySpark runs in local[*] mode against the same files.

Scope: every number on this page is single-node. ematix-flow also has an auto-detected distributed mode (Arrow Flight peer mesh — see Advantages §5). Cluster-scale TPC-H runs at SF≥100 will land in a later release; the bench harness (tpch_distributed) is already in the repo.

ematix-flow / DuckDB / Polars: 10 timed trials after 3 warmups (v0.4.0–v0.5.0; unchanged, refreshed 2026-05-20).
PySpark: 3 trials after 1 warmup, Spark 4.1.1 on JDK 23 (refreshed on the same machine, same data, same day).
Data: examples/tpch/data/sf1.

Each ematix-flow / DuckDB / Polars cell is median ms ± σ across 5 trials; PySpark cells are median ms across 3 trials. ”—” means the engine couldn’t parse / execute the query (dialect gap).

Headline

Geomean speedup of ematix-flow (v0.4.0–v0.5.0; unchanged, 10-trial refresh 2026-05-20):
- 1.75× vs DuckDB (was 1.69× at v0.3.0)
- 2.77× vs Polars (was 2.71×)
- 13.4× vs PySpark local[*] (was 12.9×)
Win counts (lowest median per query): ematix-flow 19, DuckDB 1, Polars 2, PySpark 0.

The geomean improvement over v0.3.0 comes from the ematix-parquet v0.13.0 bump (full SIMD specialisation bw=1..=32 — Q06 scan kernel -18.7%, Q17 -9.5% in isolation) plus the Σ.F.1 shape-catalog substrate that auto-loads previously hand-wired optimizer rules. The “What’s not shipped” closures (warehouse backends, Web UI, secrets, distributed peer auto-detection) are orthogonal to the scan/aggregate hot path.

v0.5.0 ships the same query-execution surface as v0.4.0. The kernel work lives in the sibling ematix-parquet codec; v0.5.0 itself is operational — CLIs, Web UI, alerters, observability — so per-query times match v0.4.0.

Full table

Query	ematix-flow	DuckDB	Polars	PySpark	Best
Q01	28.63 ± 0.61	45.24 ± 0.20	38.52 ± 0.84	196.5	ematix-flow
Q02	9.85 ± 0.21	19.07 ± 0.62	46.07 ± 0.65	290.7	ematix-flow
Q03	13.96 ± 1.40	32.70 ± 0.65	46.00 ± 0.86	288.2	ematix-flow
Q04	13.21 ± 0.43	23.07 ± 2.21	25.28 ± 1.51	226.1	ematix-flow
Q05	21.59 ± 0.93	31.48 ± 0.70	11150.72 ± 689.69	364.2	ematix-flow
Q06	11.04 ± 1.41	11.94 ± 0.20	10.16 ± 0.27	68.3	Polars
Q07	28.79 ± 1.15	32.65 ± 0.93	115.31 ± 3.89	286.8	ematix-flow
Q08	20.41 ± 0.67	38.26 ± 0.41	93.62 ± 7.78	209.8	ematix-flow
Q09	26.30 ± 1.36	60.67 ± 1.63	47.96 ± 1.36	461.3	ematix-flow
Q10	28.83 ± 10.44	68.29 ± 2.23	111.80 ± 8.15	421.9	ematix-flow
Q11	8.65 ± 0.31	11.62 ± 0.62	9.35 ± 5.04	139.1	ematix-flow
Q12	14.85 ± 0.37	24.37 ± 0.68	19.06 ± 0.86	288.4	ematix-flow
Q13	41.68 ± 0.73	147.33 ± 2.06	117.00 ± 4.13	694.2	ematix-flow
Q14	12.13 ± 1.00	24.22 ± 1.04	13.01 ± 0.78	138.3	ematix-flow
Q15	16.25 ± 0.92	15.69 ± 1.87	11.48 ± 0.22	166.4	Polars
Q16	8.76 ± 1.48	26.00 ± 4.35	21.29 ± 0.71	211.5	ematix-flow
Q17	36.85 ± 2.24	28.48 ± 1.62	42.04 ± 2.96	239.4	DuckDB
Q18	51.21 ± 3.06	52.37 ± 1.31	59.19 ± 2.32	569.1	ematix-flow
Q19	17.79 ± 1.89	36.82 ± 3.48	106.55 ± 9.04	111.4	ematix-flow
Q20	16.34 ± 0.85	39.11 ± 3.04	23.30 ± 2.39	148.8	ematix-flow
Q21	41.08 ± 1.67	87.04 ± 2.18	730.68 ± 39.43	648.5	ematix-flow
Q22	8.62 ± 0.52	22.40 ± 0.65	12.97 ± 1.67	280.2	ematix-flow

Release-over-release perf history

v0.4.0 vs v0.3.0

v0.4.0 is the alpha milestone — warehouse backends, Web UI, pluggable secrets, distributed peer auto-detection. All four are orthogonal to the scan / aggregate hot path. The geomean still moved:

Engine	v0.3.0	v0.4.0	Δ geomean
DuckDB	1.69×	1.75×	+3.6%
Polars	2.71×	2.77×	+2.2%
PySpark	12.9×	13.4×	+4.0%

Win count rose 18 → 19 / 22 (Q18 flipped to ematix as σ tightened under 10-trial medians). Per-query times shifted ±10% — noise-band movement, not directional, with two non-headline sources of lift:

ematix-parquet v0.13.0 — full SIMD specialisation bw=1..=32 landed; Q06 scan kernel -18.7%, Q17 -9.5% measured in isolation (kernel-only, not end-to-end Q06 wall time).
Σ.F.1 shape-catalog substrate — bit-identical perf vs the hand-wired Inject*Rule set it replaced, but stable enough to allow the 10-trial / 3-warmup bench config that surfaced the gain.

v0.3.0 vs v0.2.1

Historical record of the big jump — when ematix-parquet replaced the parquet-rs scan path:

Query	v0.2.1	v0.3.0	Δ
Q01	78.19	28.11	-64%
Q03	20.38	15.11	-26%
Q05	34.09	20.93	-39%
Q07	75.56	28.96	-62%
Q08	35.66	20.76	-42%
Q09	50.16	28.13	-44%
Q10	39.73	28.16	-29%
Q13	44.73	41.36	-8%
Q14	19.45	11.28	-42%
Q16	18.29	8.60	-53%
Q18	157.55	52.02	-67%
Q19	99.76	18.81	-81%
Q21	75.48	38.08	-50%

v0.3.0 win count rose from 15 → 18 / 22.

Caveats

ematix-flow’s late-materialization path (read_column_*_masked_into) is enabled for lineitem. Late-mat helps queries with a selective filter on a dict/PLAIN-decodable scalar column; on aggregate-heavy queries with low filter selectivity (Q01) it’s effectively a no-op.
Polars’s SQL frontend rejects several TPC-H canonical shapes; hand- translated q??.polars.sql variants ship under examples/tpch/queries/. Q05 specifically still blows up Polars’s planner.
DuckDB runs at default settings (in-memory read_parquet views). ematix-flow runs with target_partitions=14 and the InjectFilterMultiAggRule + InjectFilterSumRule + EnableDictGroupCountRule physical-optimizer rules registered.
PySpark uses local[*], spark.sql.shuffle.partitions=8, spark.sql.adaptive.enabled=true. JVM warmup costs sit above what the warmup-trial discard can amortize — treat as order-of-magnitude.

Reproducing

# ematix-flow vs DuckDB vs Polars
cargo run --release -p ematix-flow-core \
    --example tpch_triangulation_bench --features triangulation

# PySpark (needs Java 17+; install with `brew install openjdk@23`):
JAVA_HOME=$(/usr/libexec/java_home) python scripts/bench-tpch-pyspark.py \
    --data-dir examples/tpch/data/sf1 --trials 3