Link: https://omlx.ai/my/92e9df628cd488f548c7885cb732afb0f7870d4d7f40ae6024f8d4f85724ca20
gemma-4-26B-A4B-it-QAT-MLX-4bit
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-26B-A4B-it-QAT-MLX-4bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 677.5 12.57 1511.3 tok/s 80.2 tok/s 2.273 506.7 tok/s 14.27 GB
pp4096/tg128 2114.7 13.02 1936.9 tok/s 77.4 tok/s 3.768 1121.0 tok/s 14.95 GB
pp8192/tg128 4036.9 13.86 2029.3 tok/s 72.7 tok/s 5.797 1435.2 tok/s 15.09 GB
pp16384/tg128 8306.2 15.69 1972.5 tok/s 64.2 tok/s 10.299 1603.2 tok/s 15.57 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 80.2 tok/s 1.00x 1511.3 tok/s 1511.3 tok/s 677.5 2.273
2x 106.5 tok/s 1.33x 1423.7 tok/s 711.9 tok/s 1438.3 3.842Code language: JavaScript (javascript)
With TurboQuant
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-26B-A4B-it-QAT-MLX-4bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 678.0 12.51 1510.3 tok/s 80.5 tok/s 2.267 508.1 tok/s 14.27 GB
pp4096/tg128 2110.5 13.07 1940.8 tok/s 77.1 tok/s 3.770 1120.5 tok/s 14.95 GB
pp8192/tg128 4027.5 14.11 2034.0 tok/s 71.4 tok/s 5.820 1429.6 tok/s 15.09 GB
pp16384/tg128 8276.4 15.61 1979.6 tok/s 64.6 tok/s 10.259 1609.5 tok/s 15.57 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 80.5 tok/s 1.00x 1510.3 tok/s 1510.3 tok/s 678.0 2.267
2x 108.2 tok/s 1.34x 1431.1 tok/s 715.5 tok/s 1431.0 3.796
4x 136.5 tok/s 1.70x 1704.2 tok/s 426.1 tok/s 2243.9 6.154Code language: JavaScript (javascript)
Qwen3.6-35B-A3B-NSC-ACE-SABER-8bit-MTPLX-Optimized-Speed
With Turboquant
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-35B-A3B-NSC-ACE-SABER-8bit-MTPLX-Optimized-Speed
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 742.0 14.60 1380.1 tok/s 69.0 tok/s 2.596 443.7 tok/s 35.40 GB
pp4096/tg128 2218.8 15.52 1846.0 tok/s 64.9 tok/s 4.190 1008.2 tok/s 36.17 GB
pp8192/tg128 3911.1 15.87 2094.6 tok/s 63.5 tok/s 5.927 1403.7 tok/s 36.52 GB
pp16384/tg128 8245.4 17.29 1987.0 tok/s 58.3 tok/s 10.441 1581.5 tok/s 37.26 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 69.0 tok/s 1.00x 1380.1 tok/s 1380.1 tok/s 742.0 2.596
2x 76.0 tok/s 1.10x 1424.4 tok/s 712.2 tok/s 1437.7 4.805
4x 110.2 tok/s 1.60x 1527.6 tok/s 381.9 tok/s 2538.7 7.328Code language: JavaScript (javascript)
Failed for pp131072 – Prefill context too large for available memory (pre-chunk guard at 83968 tokens, kv_len=83968): predicted peak would exceed prefill safety cap 46.7GB (90% of effective ceiling 51.8GB)
gemma-4-31b-it-8bit
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-31b-it-8bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 3149.9 121.51 325.1 tok/s 8.3 tok/s 18.581 62.0 tok/s 32.04 GB
pp4096/tg128 12715.1 125.48 322.1 tok/s 8.0 tok/s 28.651 147.4 tok/s 34.49 GB
pp8192/tg128 26383.0 129.68 310.5 tok/s 7.8 tok/s 42.852 194.2 tok/s 35.03 GB
pp16384/tg128 54453.2 134.12 300.9 tok/s 7.5 tok/s 71.486 231.0 tok/s 36.16 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 8.3 tok/s 1.00x 325.1 tok/s 325.1 tok/s 3149.9 18.581
2x 14.2 tok/s 1.71x 206.4 tok/s 103.2 tok/s 7955.1 28.012
4x 18.8 tok/s 2.27x 280.6 tok/s 70.2 tok/s 14186.6 41.799Code language: JavaScript (javascript)
Note: Fans went wild after pp4096
LFM2-24B-A2B-MLX-4bit
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: LFM2-24B-A2B-MLX-4bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 510.1 6.43 2009.5 tok/s 156.8 tok/s 1.326 869.4 tok/s 13.26 GB
pp4096/tg128 1229.4 7.06 3332.4 tok/s 142.7 tok/s 2.126 1986.9 tok/s 13.50 GB
pp8192/tg128 2374.4 7.42 3450.6 tok/s 135.8 tok/s 3.317 2508.5 tok/s 13.66 GB
pp16384/tg128 4743.5 8.18 3454.2 tok/s 123.3 tok/s 5.782 2856.0 tok/s 13.90 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 156.8 tok/s 1.00x 2009.5 tok/s 2009.5 tok/s 510.1 1.326
2x 188.7 tok/s 1.20x 2724.2 tok/s 1362.1 tok/s 751.7 2.109
4x 247.4 tok/s 1.58x 2713.6 tok/s 678.4 tok/s 1465.4 3.579Code language: JavaScript (javascript)
medgemma-27b-text-it-MLX-4bit
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: medgemma-27b-text-it-MLX-4bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 2702.4 59.42 379.3 tok/s 17.0 tok/s 10.248 112.5 tok/s 16.00 GB
pp4096/tg128 9072.4 60.54 451.6 tok/s 16.6 tok/s 16.761 252.1 tok/s 17.43 GB
pp8192/tg128 18032.9 61.63 454.3 tok/s 16.4 tok/s 25.859 321.8 tok/s 17.97 GB
pp16384/tg128 37958.5 64.37 431.7 tok/s 15.7 tok/s 46.134 357.9 tok/s 18.91 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 17.0 tok/s 1.00x 379.3 tok/s 379.3 tok/s 2702.4 10.248
2x 28.9 tok/s 1.70x 352.2 tok/s 176.1 tok/s 5815.2 14.671
4x 35.9 tok/s 2.11x 499.9 tok/s 125.0 tok/s 7772.3 22.466Code language: JavaScript (javascript)
Note: fans went wild
gemma-4-31b-it-8bit
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-31b-it-8bit
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 3197.7 122.20 320.2 tok/s 8.2 tok/s 18.717 61.5 tok/s 32.04 GB
pp4096/tg128 11781.6 125.59 347.7 tok/s 8.0 tok/s 27.731 152.3 tok/s 34.49 GB
pp8192/tg128 24612.7 128.39 332.8 tok/s 7.8 tok/s 40.918 203.3 tok/s 35.03 GB
pp16384/tg128 56271.4 133.19 291.2 tok/s 7.6 tok/s 73.187 225.6 tok/s 34.74 GB
Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch tg TPS Speedup pp TPS pp TPS/req TTFT(ms) E2E(s)
1x 8.2 tok/s 1.00x 320.2 tok/s 320.2 tok/s 3197.7 18.717
2x 14.3 tok/s 1.74x 141.8 tok/s 70.9 tok/s 14235.9 32.324
4x 19.3 tok/s 2.35x 291.1 tok/s 72.8 tok/s 13649.5 40.657Code language: JavaScript (javascript)
