Independent analysis of AI. Compare models by intelligence, speed, and price.
| Select | Model ↕ | Creator ↕ | Context Window ↕ | Intelligence ↓ⓘ | Price ($/M) ↕ⓘ | Speed (Tokens/s) ↕ | Latency (First Chunk s) ↕ | End-to-End Response (s) ↕ |
|---|---|---|---|---|---|---|---|---|
Claude Opus 4.8 (max) claude-opus-4-8-max | 1M | 61.00 | $4.10/M | 58 tok/s | 17.47 s | 26.05 s | ||
GPT-5.5 (xhigh) gpt-5-5-xhigh | 922k | 60.00 | $4.35/M | 57 tok/s | 73.96 s | 82.77 s | ||
GPT-5.5 (high) gpt-5-5-high | 922k | 59.00 | $4.35/M | 51 tok/s | 17.22 s | 26.94 s | ||
Claude Opus 4.7 (max) claude-opus-4-7-max | 1M | 57.00 | $4.10/M | 45 tok/s | 10.79 s | 21.91 s | ||
Gemini 3.1 Pro Preview gemini-3-1-pro-preview | 1M | 57.00 | $1.74/M | 133 tok/s | 19.85 s | 23.60 s | ||
GPT-5.5 (medium) gpt-5-5-medium | 922k | 57.00 | $4.35/M | 50 tok/s | 6.91 s | 16.91 s | ||
Qwen3.7 Max qwen3-7-max | 1M | 57.00 | $1.43/M | 192 tok/s | 2.58 s | 17.71 s | ||
Gemini 3.5 Flash gemini-3-5-flash | 1M | 55.00 | $1.31/M | 183 tok/s | 18.79 s | 21.52 s | ||
Gemini 3.5 Flash (medium) gemini-3-5-flash-medium | 1M | 55.00 | $1.31/M | 174 tok/s | 13.85 s | 16.72 s | ||
Kimi K2.6 kimi-k2-6 | Kimi | 256k | 54.00 | $0.70/M | 44 tok/s | 2.31 s | 114.04 s | |
MiMo-V2.5-Pro mimo-v2-5-pro | Xiaomi | 1M | 54.00 | $0.18/M | 50 tok/s | 3.32 s | 53.27 s | |
GPT-5.3 Codex (xhigh) gpt-5-3-codex-xhigh | 400k | 54.00 | $1.87/M | 80 tok/s | 95.05 s | 101.33 s | ||
Grok 4.3 (high) grok-4-3-high | xAI | 1M | 53.00 | $0.64/M | 145 tok/s | 10.94 s | 14.39 s | |
Muse Spark muse-spark | 262k | 52.00 | — | — | — | — | ||
Claude Opus 4.7 (Non-reasoning, high) claude-opus-4-7-non-reasoning-high | 1M | 52.00 | $4.10/M | 42 tok/s | 1.17 s | 13.01 s | ||
Claude Sonnet 4.6 (max) claude-sonnet-4-6-max | 1M | 52.00 | $2.46/M | 46 tok/s | 106.64 s | 117.50 s | ||
DeepSeek V4 Pro (Max) deepseek-v4-pro-max | 1M | 52.00 | $0.18/M | 48 tok/s | 1.78 s | 104.11 s | ||
GLM-5.1 glm-5-1 | Z AI | 200k | 51.00 | $0.90/M | 62 tok/s | 1.51 s | 70.81 s | |
GPT-5.5 (low) gpt-5-5-low | 922k | 51.00 | $4.35/M | 53 tok/s | 1.67 s | 11.12 s | ||
Qwen3.6 Plus qwen3-6-plus | 1M | 50.00 | $0.43/M | 53 tok/s | 2.93 s | 116.85 s | ||
DeepSeek V4 Pro (High) deepseek-v4-pro-high | 1M | 50.00 | $0.18/M | 46 tok/s | 1.78 s | 55.68 s | ||
MiniMax-M2.7 minimax-m2-7 | MiniMax | 205k | 50.00 | $0.22/M | 108 tok/s | 2.65 s | 30.19 s | |
MiMo-V2.5 mimo-v2-5 | Xiaomi | 1M | 49.00 | $0.06/M | 93 tok/s | 2.88 s | 29.82 s | |
GPT-5.4 mini (xhigh) gpt-5-4-mini-xhigh | 400k | 49.00 | $0.65/M | 164 tok/s | 5.09 s | 8.13 s | ||
Grok 4.3 (medium) grok-4-3-medium | xAI | 1M | 49.00 | $0.64/M | 146 tok/s | 7.28 s | 10.70 s | |
GLM-5-Turbo glm-5-turbo | Z AI | 200k | 47.00 | — | — | — | — | |
DeepSeek V4 Flash (Max) deepseek-v4-flash-max | 1M | 47.00 | $0.06/M | 111 tok/s | 1.22 s | 56.09 s | ||
DeepSeek V4 Flash (High) deepseek-v4-flash-high | 1M | 46.00 | $0.08/M | — | — | — | ||
Qwen3.6 27B qwen3-6-27b | 262k | 46.00 | $0.90/M | 57 tok/s | 3.86 s | 112.65 s | ||
Qwen3.5 397B A17B qwen3-5-397b-a17b | 262k | 45.00 | $0.90/M | 53 tok/s | 2.54 s | 72.50 s | ||
MiMo-V2-Omni-0327 mimo-v2-omni-0327 | Xiaomi | 256k | 45.00 | $0.34/M | 93 tok/s | 2.83 s | 29.70 s | |
Claude Sonnet 4.6 (Non-reasoning) claude-sonnet-4-6-non-reasoning | 1M | 44.00 | $2.46/M | 42 tok/s | 1.17 s | 13.10 s | ||
GPT-5.4 nano (xhigh) gpt-5-4-nano-xhigh | 400k | 44.00 | $0.18/M | 152 tok/s | 4.25 s | 7.54 s | ||
Grok 4.3 (low) grok-4-3-low | xAI | 1M | 44.00 | $0.64/M | 113 tok/s | 4.27 s | 8.71 s | |
GLM-5.1 glm-5-1 | Z AI | 200k | 44.00 | $0.90/M | 49 tok/s | 1.75 s | 11.93 s | |
Qwen3.6 35B A3B qwen3-6-35b-a3b | 262k | 43.00 | $0.37/M | 174 tok/s | 2.41 s | 16.77 s | ||
MiMo-V2-Omni mimo-v2-omni | Xiaomi | 256k | 43.00 | $0.00/M | 91 tok/s | 3.71 s | 31.26 s | |
Gemini 3.5 Flash (minimal) gemini-3-5-flash-minimal | 1M | 43.00 | $1.31/M | 169 tok/s | 0.90 s | 3.86 s | ||
Kimi K2.6 kimi-k2-6 | Kimi | 256k | 43.00 | $0.70/M | 45 tok/s | 2.34 s | 13.50 s | |
GLM 5V Turbo glm-5v-turbo | Z AI | 200k | 43.00 | — | — | — | — | |
Claude Sonnet 4.6 (Non-reasoning, Low Effort) claude-sonnet-4-6-non-reasoning-low-effort | 1M | 43.00 | $2.46/M | 42 tok/s | 1.26 s | 13.07 s | ||
Hy3-preview hy3-preview | Tencent | 256k | 42.00 | $0.10/M | 98 tok/s | 3.90 s | 29.34 s | |
GPT-5.5 Instant (May 2026) gpt-5-5-instant-may-2026 | 400k | 42.00 | $4.35/M | — | — | — | ||
Qwen3.5 122B A10B qwen3-5-122b-a10b | 262k | 42.00 | $0.68/M | 140 tok/s | 2.55 s | 20.43 s | ||
MiMo-V2-Flash (Feb 2026) mimo-v2-flash-feb-2026 | Xiaomi | 256k | 41.00 | $0.06/M | 131 tok/s | 1.98 s | 21.04 s | |
GPT-5.5 (Non-reasoning) gpt-5-5-non-reasoning | 922k | 41.00 | $4.35/M | 51 tok/s | 0.96 s | 10.68 s | ||
Qwen3.5 397B A17B qwen3-5-397b-a17b | 262k | 40.00 | $0.90/M | 53 tok/s | 2.50 s | 11.86 s | ||
DeepSeek V4 Pro deepseek-v4-pro | 1M | 39.00 | $0.18/M | 52 tok/s | 1.93 s | 11.54 s | ||
Mistral Medium 3.5 mistral-medium-3-5 | 256k | 39.00 | $2.10/M | 148 tok/s | 1.83 s | 18.74 s | ||
Gemma 4 31B gemma-4-31b | 256k | 39.00 | $0.00/M | 35 tok/s | 1.07 s | 64.15 s | ||
Qwen3.5 Omni Plus qwen3-5-omni-plus | 256k | 39.00 | $0.84/M | 55 tok/s | 2.45 s | 11.48 s | ||
Step 3.5 Flash 2603 step-3-5-flash-2603 | StepFun | 256k | 38.00 | $0.00/M | 163 tok/s | 1.20 s | 16.56 s | |
Ring-2.6-1T ring-2-6-1t | InclusionAI | 262k | 38.00 | $0.52/M | 120 tok/s | 3.22 s | 24.09 s | |
o3 o3 | 200k | 38.00 | $1.55/M | 131 tok/s | 5.98 s | 9.79 s | ||
GPT-5.4 nano gpt-5-4-nano | 400k | 38.00 | $0.18/M | 148 tok/s | 3.01 s | 6.38 s | ||
GPT-5.4 mini (medium) gpt-5-4-mini-medium | 400k | 38.00 | $0.65/M | 158 tok/s | 4.11 s | 7.26 s | ||
Command A+ command-a | 192k | 37.00 | $0.00/M | 222 tok/s | 0.26 s | 11.54 s | ||
Qwen3.6 27B qwen3-6-27b | 262k | 37.00 | $0.90/M | 58 tok/s | 3.86 s | 12.56 s | ||
Claude 4.5 Haiku claude-4-5-haiku | 200k | 37.00 | $0.82/M | 92 tok/s | 20.98 s | 26.43 s | ||
DeepSeek V4 Flash deepseek-v4-flash | 1M | 36.00 | $0.06/M | 114 tok/s | 1.38 s | 5.78 s | ||
JT-35B-Flash jt-35b-flash | China Mobile | 256k | 36.00 | — | — | — | — | |
NVIDIA Nemotron 3 Super nvidia-nemotron-3-super | NVIDIA | 1M | 36.00 | $0.28/M | 187 tok/s | 1.82 s | 15.19 s | |
Qwen3.5 122B A10B qwen3-5-122b-a10b | 262k | 36.00 | $0.68/M | 162 tok/s | 2.51 s | 5.59 s | ||
Nova 2.0 Pro Preview (medium) nova-2-0-pro-preview-medium | Amazon | 256k | 36.00 | $1.47/M | 114 tok/s | 12.99 s | 34.89 s | |
MiMo-V2.5-Pro mimo-v2-5-pro | Xiaomi | 1M | 36.00 | $0.58/M | 50 tok/s | 3.20 s | 13.28 s | |
Gemini 2.5 Pro gemini-2-5-pro | 1M | 35.00 | $1.34/M | 133 tok/s | 22.23 s | 25.99 s | ||
Nova 2.0 Lite (high) nova-2-0-lite-high | Amazon | 1M | 35.00 | $0.52/M | 153 tok/s | 14.33 s | 30.68 s | |
Hy3-preview hy3-preview | Tencent | 256k | 34.00 | $0.10/M | 89 tok/s | 3.98 s | 9.62 s | |
Ling-2.6-1T ling-2-6-1t | InclusionAI | 262k | 34.00 | $0.52/M | — | — | — | |
Doubao Seed Code doubao-seed-code | ByteDance Seed | 256k | 34.00 | — | — | — | — | |
Gemini 3.1 Flash-Lite gemini-3-1-flash-lite | 1M | 34.00 | $0.22/M | 270 tok/s | 5.57 s | 7.43 s | ||
gpt-oss-120b (high) gpt-oss-120b-high | 131k | 33.00 | $0.20/M | 329 tok/s | 0.86 s | 8.46 s | ||
Mercury 2 mercury-2 | Inception | 128k | 33.00 | $0.14/M | 785 tok/s | 3.11 s | 3.75 s | |
Qwen3.5 9B qwen3-5-9b | 262k | 32.00 | $0.11/M | 68 tok/s | 2.28 s | 38.84 s | ||
Gemma 4 31B gemma-4-31b | 256k | 32.00 | $0.17/M | 18 tok/s | 1.39 s | 29.89 s | ||
K-EXAONE k-exaone | LG AI Research | 256k | 32.00 | — | — | — | — | |
Nova 2.0 Pro Preview (low) nova-2-0-pro-preview-low | Amazon | 256k | 32.00 | $2.13/M | 117 tok/s | 11.16 s | 32.50 s | |
Trinity Large Thinking trinity-large-thinking | Arcee AI | 512k | 32.00 | $0.24/M | 157 tok/s | 1.15 s | 17.03 s | |
Qwen3.6 35B A3B qwen3-6-35b-a3b | 262k | 32.00 | $0.56/M | 178 tok/s | 2.52 s | 5.32 s | ||
Gemma 4 26B A4B gemma-4-26b-a4b | 256k | 31.00 | $0.14/M | — | — | — | ||
Claude 4.5 Haiku claude-4-5-haiku | 200k | 31.00 | $0.82/M | 90 tok/s | 0.77 s | 6.34 s | ||
Grok 4.3 grok-4-3 | xAI | 1M | 31.00 | $0.64/M | 110 tok/s | 0.63 s | 5.17 s | |
Qwen3.5 35B A3B qwen3-5-35b-a3b | 262k | 31.00 | $0.42/M | 154 tok/s | 2.13 s | 5.38 s | ||
MiMo-V2-Flash mimo-v2-flash | Xiaomi | 256k | 30.00 | $0.12/M | 130 tok/s | 2.07 s | 5.92 s | |
EXAONE 4.5 33B exaone-4-5-33b | LG AI Research | 262k | 30.00 | — | — | — | — | |
Nova 2.0 Lite (medium) nova-2-0-lite-medium | Amazon | 1M | 30.00 | $0.52/M | 146 tok/s | 21.21 s | 38.33 s | |
ERNIE 5.0 Thinking Preview ernie-5-0-thinking-preview | Baidu | 128k | 29.00 | — | — | — | — | |
Nemotron Cascade 2 30B A3B nemotron-cascade-2-30b-a3b | NVIDIA | 1M | 28.00 | — | — | — | — | |
Qwen3 Coder Next qwen3-coder-next | 256k | 28.00 | $0.43/M | 107 tok/s | 1.62 s | 6.27 s | ||
Nova 2.0 Omni (medium) nova-2-0-omni-medium | Amazon | 1M | 28.00 | $0.52/M | — | — | — | |
Mistral Small 4 mistral-small-4 | 256k | 28.00 | $0.20/M | 181 tok/s | 0.70 s | 14.49 s | ||
Qwen3.5 9B qwen3-5-9b | 262k | 27.00 | — | — | — | — | ||
Magistral Medium 1.2 magistral-medium-1-2 | 128k | 27.00 | $2.30/M | 39 tok/s | 1.90 s | 66.75 s | ||
Gemma 4 26B A4B gemma-4-26b-a4b | 256k | 27.00 | $0.16/M | 83 tok/s | 1.59 s | 7.58 s | ||
Qwen3.5 4B qwen3-5-4b | 262k | 27.00 | $0.04/M | 194 tok/s | 0.42 s | 13.31 s | ||
Qwen3 Next 80B A3B qwen3-next-80b-a3b | 262k | 27.00 | $1.05/M | 137 tok/s | 2.31 s | 20.54 s | ||
Ling 2.6 Flash ling-2-6-flash | InclusionAI | 262k | 26.00 | $0.06/M | — | — | — | |
Solar Pro 3 solar-pro-3 | Upstage | 128k | 26.00 | — | — | — | — | |
Qwen3.5 Omni Flash qwen3-5-omni-flash | 256k | 26.00 | $0.17/M | 241 tok/s | 1.87 s | 3.95 s | ||
JT-MINI jt-mini | China Mobile | 128k | 25.00 | — | — | — | — | |
Nova 2.0 Lite (low) nova-2-0-lite-low | Amazon | 1M | 25.00 | $0.52/M | 152 tok/s | 9.97 s | 26.43 s | |
gpt-oss-20B (high) gpt-oss-20b-high | 131k | 24.00 | $0.07/M | 235 tok/s | 0.74 s | 11.38 s | ||
gpt-oss-120b (low) gpt-oss-120b-low | 131k | 24.00 | $0.20/M | 348 tok/s | 0.87 s | 8.04 s | ||
GPT-5.4 nano gpt-5-4-nano | 400k | 24.00 | $0.18/M | 148 tok/s | 0.63 s | 3.99 s | ||
NVIDIA Nemotron 3 Nano nvidia-nemotron-3-nano | NVIDIA | 1M | 24.00 | $0.07/M | 132 tok/s | 2.10 s | 21.02 s | |
LongCat Flash Lite longcat-flash-lite | LongCat | 256k | 24.00 | $0.00/M | 81 tok/s | 6.33 s | 12.48 s | |
K-EXAONE k-exaone | LG AI Research | 256k | 23.00 | — | — | — | — | |
GPT-5.4 mini gpt-5-4-mini | 400k | 23.00 | $0.65/M | 144 tok/s | 0.64 s | 4.10 s | ||
Nova 2.0 Omni (low) nova-2-0-omni-low | Amazon | 1M | 23.00 | $0.52/M | — | — | — | |
Nova 2.0 Pro Preview nova-2-0-pro-preview | Amazon | 256k | 23.00 | $2.13/M | 123 tok/s | 1.08 s | 5.14 s | |
Mi:dm K 2.5 Pro mi-dm-k-2-5-pro | Korea Telecom | 128k | 23.00 | — | — | — | — | |
Mistral Large 3 mistral-large-3 | 256k | 23.00 | $0.60/M | 52 tok/s | 1.09 s | 10.75 s | ||
Qwen3.5 4B qwen3-5-4b | 262k | 23.00 | $0.04/M | 200 tok/s | 0.45 s | 2.95 s | ||
INTELLECT-3 intellect-3 | Prime Intellect | 131k | 22.00 | — | — | — | — | |
Devstral 2 devstral-2 | 256k | 22.00 | $0.00/M | 66 tok/s | 1.21 s | 8.74 s | ||
Solar Open 100B solar-open-100b | Upstage | 128k | 22.00 | — | — | — | — | |
Nemotron 3 Nano Omni 30B A3B Reasoning nemotron-3-nano-omni-30b-a3b-reasoning | NVIDIA | 256k | 21.00 | $0.10/M | 299 tok/s | 1.03 s | 9.40 s | |
gpt-oss-20B (low) gpt-oss-20b-low | 131k | 21.00 | $0.07/M | 242 tok/s | 0.78 s | 11.11 s | ||
Qwen3 Next 80B A3B qwen3-next-80b-a3b | 262k | 20.00 | $0.65/M | 146 tok/s | 2.28 s | 5.70 s | ||
Devstral Small 2 devstral-small-2 | 256k | 19.00 | $0.00/M | 68 tok/s | 1.13 s | 8.50 s | ||
Motif-2-12.7B motif-2-12-7b | Motif Technologies | 128k | 19.00 | — | — | — | — | |
Nova Premier nova-premier | Amazon | 1M | 19.00 | $2.18/M | 35 tok/s | 2.92 s | 17.28 s | |
Gemma 4 E4B gemma-4-e4b | 128k | 19.00 | — | — | — | — | ||
Llama Nemotron Super 49B v1.5 llama-nemotron-super-49b-v1-5 | NVIDIA | 128k | 19.00 | $0.13/M | 47 tok/s | 1.34 s | 54.61 s | |
Mistral Small 4 mistral-small-4 | 256k | 19.00 | $0.20/M | 157 tok/s | 0.69 s | 3.86 s | ||
Llama 4 Maverick llama-4-maverick | 1M | 18.00 | $0.34/M | 109 tok/s | 1.01 s | 5.61 s | ||
Magistral Small 1.2 magistral-small-1-2 | 128k | 18.00 | $0.60/M | 108 tok/s | 0.81 s | 24.00 s | ||
Sarvam 105B (high) sarvam-105b-high | Sarvam | 128k | 18.00 | $0.04/M | 90 tok/s | 2.09 s | 29.99 s | |
Nova 2.0 Lite nova-2-0-lite | Amazon | 1M | 18.00 | $0.52/M | 141 tok/s | 1.34 s | 4.89 s | |
MiniCPM5-1B minicpm5-1b | OpenBMB | 128k | 18.00 | — | — | — | — | |
Llama 3.1 405B llama-3-1-405b | 128k | 17.00 | $3.13/M | 35 tok/s | 2.36 s | 16.67 s | ||
EXAONE 4.0 32B exaone-4-0-32b | LG AI Research | 131k | 17.00 | — | — | — | — | |
Nova 2.0 Omni nova-2-0-omni | Amazon | 1M | 17.00 | $0.52/M | — | — | — | |
Qwen3.5 2B qwen3-5-2b | 262k | 16.00 | $0.03/M | — | — | — | ||
Nanbeige4.1-3B nanbeige4-1-3b | Nanbeige | 256k | 16.00 | — | — | — | — | |
Ministral 3 14B ministral-3-14b | 256k | 16.00 | $0.20/M | 77 tok/s | 0.80 s | 7.30 s | ||
Falcon-H1R-7B falcon-h1r-7b | TII UAE | 256k | 16.00 | — | — | — | — | |
Qwen3 Omni 30B A3B qwen3-omni-30b-a3b | 66k | 16.00 | $0.32/M | 91 tok/s | 1.96 s | 29.41 s | ||
Step3 VL 10B step3-vl-10b | StepFun | 66k | 15.00 | — | — | — | — | |
Gemma 4 E2B gemma-4-e2b | 128k | 15.00 | — | — | — | — | ||
Llama Nemotron Ultra llama-nemotron-ultra | NVIDIA | 128k | 15.00 | $0.72/M | 52 tok/s | 2.43 s | 50.77 s | |
ERNIE 4.5 300B A47B ernie-4-5-300b-a47b | Baidu | 131k | 15.00 | $0.36/M | 25 tok/s | 3.57 s | 23.42 s | |
Solar Pro 2 solar-pro-2 | Upstage | 66k | 15.00 | — | — | — | — | |
NVIDIA Nemotron Nano 12B v2 VL nvidia-nemotron-nano-12b-v2-vl | NVIDIA | 128k | 15.00 | $0.24/M | — | — | — | |
Ministral 3 8B ministral-3-8b | 256k | 15.00 | $0.15/M | 97 tok/s | 0.64 s | 5.79 s | ||
Gemma 4 E4B gemma-4-e4b | 128k | 15.00 | — | — | — | — | ||
NVIDIA Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | NVIDIA | 131k | 15.00 | $0.05/M | 122 tok/s | 0.70 s | 21.19 s | |
Granite 4.1 30B granite-4-1-30b | IBM | 131k | 15.00 | — | — | — | — | |
NVIDIA Nemotron 3 Nano 4B nvidia-nemotron-3-nano-4b | NVIDIA | 262k | 15.00 | — | — | — | — | |
Qwen3.5 2B qwen3-5-2b | 262k | 15.00 | $0.03/M | 247 tok/s | 0.42 s | 2.45 s | ||
Llama Nemotron Super 49B v1.5 llama-nemotron-super-49b-v1-5 | NVIDIA | 128k | 15.00 | $0.13/M | 48 tok/s | 1.30 s | 11.67 s | |
Llama 3.3 70B llama-3-3-70b | 128k | 14.00 | $0.60/M | 81 tok/s | 1.61 s | 7.79 s | ||
Kimi Linear 48B A3B Instruct kimi-linear-48b-a3b-instruct | Kimi | 1M | 14.00 | — | — | — | — | |
Ring-flash-2.0 ring-flash-2-0 | InclusionAI | 128k | 14.00 | $0.18/M | — | — | — | |
Solar Pro 2 solar-pro-2 | Upstage | 66k | 14.00 | — | — | — | — | |
Llama 4 Scout llama-4-scout | 10M | 14.00 | $0.22/M | 106 tok/s | 0.86 s | 5.58 s | ||
Command A command-a | 256k | 13.00 | $3.25/M | 54 tok/s | 1.80 s | 11.12 s | ||
Llama 3.1 Nemotron 70B llama-3-1-nemotron-70b | NVIDIA | 128k | 13.00 | $1.20/M | 292 tok/s | 0.50 s | 2.21 s | |
NVIDIA Nemotron 3 Nano nvidia-nemotron-3-nano | NVIDIA | 1M | 13.00 | $0.07/M | 87 tok/s | 0.42 s | 6.17 s | |
NVIDIA Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | NVIDIA | 131k | 13.00 | $0.06/M | 142 tok/s | 1.03 s | 4.54 s | |
MiniCPM-V 4.6 1.3B minicpm-v-4-6-1-3b | OpenBMB | 262k | 13.00 | — | — | — | — | |
Granite 4.1 8B granite-4-1-8b | IBM | 131k | 12.00 | $0.06/M | 108 tok/s | 0.79 s | 5.42 s | |
Sarvam 30B (high) sarvam-30b-high | Sarvam | 66k | 12.00 | $0.03/M | 163 tok/s | 1.94 s | 17.29 s | |
Gemma 4 E2B gemma-4-e2b | 128k | 12.00 | — | — | — | — | ||
R1 1776 r1-1776 | 128k | 12.00 | — | — | — | — | ||
Llama 3.2 90B (Vision) llama-3-2-90b-vision | 128k | 12.00 | $1.38/M | 58 tok/s | 1.24 s | 9.83 s | ||
EXAONE 4.0 32B exaone-4-0-32b | LG AI Research | 131k | 12.00 | — | — | — | — | |
Ministral 3 3B ministral-3-3b | 256k | 11.00 | $0.10/M | 192 tok/s | 0.51 s | 3.12 s | ||
Jamba 1.7 Large jamba-1-7-large | AI21 Labs | 256k | 11.00 | $2.60/M | 62 tok/s | 1.59 s | 9.60 s | |
Granite 4.0 H Small granite-4-0-h-small | IBM | 128k | 11.00 | $0.08/M | 350 tok/s | 10.36 s | 11.79 s | |
Qwen3 Omni 30B A3B qwen3-omni-30b-a3b | 66k | 11.00 | $0.32/M | 96 tok/s | 2.06 s | 7.26 s | ||
Qwen3.5 0.8B qwen3-5-0-8b | 262k | 11.00 | $0.01/M | — | — | — | ||
LFM2 24B A2B lfm2-24b-a2b | Liquid AI | 33k | 10.00 | $0.04/M | 120 tok/s | 0.58 s | 4.76 s | |
Phi-4 phi-4 | Microsoft | 16k | 10.00 | $0.16/M | 38 tok/s | 2.02 s | 15.10 s | |
Nova Micro nova-micro | Amazon | 130k | 10.00 | $0.03/M | 290 tok/s | 0.92 s | 2.64 s | |
NVIDIA Nemotron Nano 12B v2 VL nvidia-nemotron-nano-12b-v2-vl | NVIDIA | 128k | 10.00 | $0.24/M | 227 tok/s | 1.06 s | 3.26 s | |
Phi-4 Multimodal phi-4-multimodal | Microsoft | 128k | 10.00 | $0.00/M | 17 tok/s | 1.06 s | 31.30 s | |
Qwen3.5 0.8B qwen3-5-0-8b | 262k | 10.00 | $0.01/M | 69 tok/s | 0.44 s | 7.64 s | ||
Jamba Reasoning 3B jamba-reasoning-3b | AI21 Labs | 262k | 10.00 | — | — | — | — | |
Reka Flash 3 reka-flash-3 | Reka AI | 128k | 10.00 | $0.26/M | — | — | — | |
Ling-mini-2.0 ling-mini-2-0 | InclusionAI | 131k | 9.00 | — | — | — | — | |
Llama 3.2 11B (Vision) llama-3-2-11b-vision | 128k | 9.00 | $0.25/M | 53 tok/s | 0.70 s | 10.18 s | ||
Granite 4.1 3B granite-4-1-3b | IBM | 131k | 9.00 | — | — | — | — | |
Phi-4 Mini phi-4-mini | Microsoft | 128k | 8.00 | $0.00/M | — | — | — | |
Exaone 4.0 1.2B exaone-4-0-1-2b | LG AI Research | 64k | 8.00 | — | — | — | — | |
Exaone 4.0 1.2B exaone-4-0-1-2b | LG AI Research | 64k | 8.00 | — | — | — | — | |
LFM2.5-1.2B-Thinking lfm2-5-1-2b-thinking | Liquid AI | 32k | 8.00 | — | — | — | — | |
Jamba 1.7 Mini jamba-1-7-mini | AI21 Labs | 258k | 8.00 | — | — | — | — | |
LFM2 2.6B lfm2-2-6b | Liquid AI | 33k | 8.00 | $0.00/M | — | — | — | |
LFM2.5-1.2B-Instruct lfm2-5-1-2b-instruct | Liquid AI | 32k | 8.00 | $0.00/M | — | — | — | |
Granite 4.0 H 1B granite-4-0-h-1b | IBM | 128k | 8.00 | — | — | — | — | |
Gemma 3 270M gemma-3-270m | 32k | 8.00 | — | — | — | — | ||
Apertus 70B Instruct apertus-70b-instruct | Swiss AI Initiative | 66k | 8.00 | $1.03/M | — | — | — | |
Granite 4.0 Micro granite-4-0-micro | IBM | 128k | 8.00 | — | — | — | — | |
Granite 4.0 1B granite-4-0-1b | IBM | 128k | 7.00 | — | — | — | — | |
LFM2 8B A1B lfm2-8b-a1b | Liquid AI | 33k | 7.00 | $0.00/M | — | — | — | |
LFM2.5-VL-1.6B lfm2-5-vl-1-6b | Liquid AI | 32k | 6.00 | $0.00/M | — | — | — | |
Granite 4.0 350M granite-4-0-350m | IBM | 33k | 6.00 | — | — | — | — | |
Apertus 8B Instruct apertus-8b-instruct | Swiss AI Initiative | 66k | 6.00 | $0.11/M | — | — | — | |
Granite 4.0 H 350M granite-4-0-h-350m | IBM | 33k | 5.00 | — | — | — | — | |
Tiny Aya Global tiny-aya-global | 8k | 5.00 | $0.00/M | — | — | — | ||
EXAONE 4.5 33B exaone-4-5-33b | LG AI Research | 262k | — | — | — | — | — | |
Gemini 3 Deep Think gemini-3-deep-think | 128k | — | — | — | — | — | ||
Mi:dm K 2.5 Pro Preview mi-dm-k-2-5-pro-preview | Korea Telecom | 128k | — | — | — | — | — | |
GPT-5.5 Pro (xhigh) gpt-5-5-pro-xhigh | 922k | — | — | — | — | — |