Pick a model, pick the hardware, and see where the system will bottleneck and how many tokens/sec to expect.
Pick a template per device, or edit any field to create a custom spec. The device library saves custom builds for reuse.
Real-world token rates from vendor, community, and research sources. Use the filters to find a reference close to your configuration.
| Model | Quantization | Framework | Hardware | Batch Size | Sequence Length | Token Rate (Batch) | Token Rate (Single) | Source |
|---|