Epidemiology & Technology

Benchmarking MLX Models on Laptop using oMLX

Link: https://omlx.ai/my/92e9df628cd488f548c7885cb732afb0f7870d4d7f40ae6024f8d4f85724ca20

gemma-4-26B-A4B-it-QAT-MLX-4bit

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-26B-A4B-it-QAT-MLX-4bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128           677.5       12.57  1511.3 tok/s    80.2 tok/s       2.273   506.7 tok/s    14.27 GB
pp4096/tg128          2114.7       13.02  1936.9 tok/s    77.4 tok/s       3.768  1121.0 tok/s    14.95 GB
pp8192/tg128          4036.9       13.86  2029.3 tok/s    72.7 tok/s       5.797  1435.2 tok/s    15.09 GB
pp16384/tg128         8306.2       15.69  1972.5 tok/s    64.2 tok/s      10.299  1603.2 tok/s    15.57 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x          80.2 tok/s     1.00x  1511.3 tok/s  1511.3 tok/s       677.5       2.273
2x         106.5 tok/s     1.33x  1423.7 tok/s   711.9 tok/s      1438.3       3.842Code language: JavaScript (javascript)

With TurboQuant

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-26B-A4B-it-QAT-MLX-4bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128           678.0       12.51  1510.3 tok/s    80.5 tok/s       2.267   508.1 tok/s    14.27 GB
pp4096/tg128          2110.5       13.07  1940.8 tok/s    77.1 tok/s       3.770  1120.5 tok/s    14.95 GB
pp8192/tg128          4027.5       14.11  2034.0 tok/s    71.4 tok/s       5.820  1429.6 tok/s    15.09 GB
pp16384/tg128         8276.4       15.61  1979.6 tok/s    64.6 tok/s      10.259  1609.5 tok/s    15.57 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x          80.5 tok/s     1.00x  1510.3 tok/s  1510.3 tok/s       678.0       2.267
2x         108.2 tok/s     1.34x  1431.1 tok/s   715.5 tok/s      1431.0       3.796
4x         136.5 tok/s     1.70x  1704.2 tok/s   426.1 tok/s      2243.9       6.154Code language: JavaScript (javascript)

Qwen3.6-35B-A3B-NSC-ACE-SABER-8bit-MTPLX-Optimized-Speed

With Turboquant

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-35B-A3B-NSC-ACE-SABER-8bit-MTPLX-Optimized-Speed
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128           742.0       14.60  1380.1 tok/s    69.0 tok/s       2.596   443.7 tok/s    35.40 GB
pp4096/tg128          2218.8       15.52  1846.0 tok/s    64.9 tok/s       4.190  1008.2 tok/s    36.17 GB
pp8192/tg128          3911.1       15.87  2094.6 tok/s    63.5 tok/s       5.927  1403.7 tok/s    36.52 GB
pp16384/tg128         8245.4       17.29  1987.0 tok/s    58.3 tok/s      10.441  1581.5 tok/s    37.26 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x          69.0 tok/s     1.00x  1380.1 tok/s  1380.1 tok/s       742.0       2.596
2x          76.0 tok/s     1.10x  1424.4 tok/s   712.2 tok/s      1437.7       4.805
4x         110.2 tok/s     1.60x  1527.6 tok/s   381.9 tok/s      2538.7       7.328Code language: JavaScript (javascript)

Failed for pp131072 – Prefill context too large for available memory (pre-chunk guard at 83968 tokens, kv_len=83968): predicted peak would exceed prefill safety cap 46.7GB (90% of effective ceiling 51.8GB)

gemma-4-31b-it-8bit

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-31b-it-8bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128          3149.9      121.51   325.1 tok/s     8.3 tok/s      18.581    62.0 tok/s    32.04 GB
pp4096/tg128         12715.1      125.48   322.1 tok/s     8.0 tok/s      28.651   147.4 tok/s    34.49 GB
pp8192/tg128         26383.0      129.68   310.5 tok/s     7.8 tok/s      42.852   194.2 tok/s    35.03 GB
pp16384/tg128        54453.2      134.12   300.9 tok/s     7.5 tok/s      71.486   231.0 tok/s    36.16 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x           8.3 tok/s     1.00x   325.1 tok/s   325.1 tok/s      3149.9      18.581
2x          14.2 tok/s     1.71x   206.4 tok/s   103.2 tok/s      7955.1      28.012
4x          18.8 tok/s     2.27x   280.6 tok/s    70.2 tok/s     14186.6      41.799Code language: JavaScript (javascript)

Note: Fans went wild after pp4096

LFM2-24B-A2B-MLX-4bit

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: LFM2-24B-A2B-MLX-4bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128           510.1        6.43  2009.5 tok/s   156.8 tok/s       1.326   869.4 tok/s    13.26 GB
pp4096/tg128          1229.4        7.06  3332.4 tok/s   142.7 tok/s       2.126  1986.9 tok/s    13.50 GB
pp8192/tg128          2374.4        7.42  3450.6 tok/s   135.8 tok/s       3.317  2508.5 tok/s    13.66 GB
pp16384/tg128         4743.5        8.18  3454.2 tok/s   123.3 tok/s       5.782  2856.0 tok/s    13.90 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x         156.8 tok/s     1.00x  2009.5 tok/s  2009.5 tok/s       510.1       1.326
2x         188.7 tok/s     1.20x  2724.2 tok/s  1362.1 tok/s       751.7       2.109
4x         247.4 tok/s     1.58x  2713.6 tok/s   678.4 tok/s      1465.4       3.579Code language: JavaScript (javascript)

medgemma-27b-text-it-MLX-4bit

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: medgemma-27b-text-it-MLX-4bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128          2702.4       59.42   379.3 tok/s    17.0 tok/s      10.248   112.5 tok/s    16.00 GB
pp4096/tg128          9072.4       60.54   451.6 tok/s    16.6 tok/s      16.761   252.1 tok/s    17.43 GB
pp8192/tg128         18032.9       61.63   454.3 tok/s    16.4 tok/s      25.859   321.8 tok/s    17.97 GB
pp16384/tg128        37958.5       64.37   431.7 tok/s    15.7 tok/s      46.134   357.9 tok/s    18.91 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x          17.0 tok/s     1.00x   379.3 tok/s   379.3 tok/s      2702.4      10.248
2x          28.9 tok/s     1.70x   352.2 tok/s   176.1 tok/s      5815.2      14.671
4x          35.9 tok/s     2.11x   499.9 tok/s   125.0 tok/s      7772.3      22.466Code language: JavaScript (javascript)

Note: fans went wild

gemma-4-31b-it-8bit

oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: gemma-4-31b-it-8bit
================================================================================

Single Request Results
--------------------------------------------------------------------------------
Test                TTFT(ms)    TPOT(ms)        pp TPS        tg TPS      E2E(s)    Throughput    Peak Mem
pp1024/tg128          3197.7      122.20   320.2 tok/s     8.2 tok/s      18.717    61.5 tok/s    32.04 GB
pp4096/tg128         11781.6      125.59   347.7 tok/s     8.0 tok/s      27.731   152.3 tok/s    34.49 GB
pp8192/tg128         24612.7      128.39   332.8 tok/s     7.8 tok/s      40.918   203.3 tok/s    35.03 GB
pp16384/tg128        56271.4      133.19   291.2 tok/s     7.6 tok/s      73.187   225.6 tok/s    34.74 GB

Continuous Batching
pp1024 / tg128
--------------------------------------------------------------------------------
Batch           tg TPS   Speedup        pp TPS    pp TPS/req    TTFT(ms)      E2E(s)
1x           8.2 tok/s     1.00x   320.2 tok/s   320.2 tok/s      3197.7      18.717
2x          14.3 tok/s     1.74x   141.8 tok/s    70.9 tok/s     14235.9      32.324
4x          19.3 tok/s     2.35x   291.1 tok/s    72.8 tok/s     13649.5      40.657Code language: JavaScript (javascript)

Related Posts