Models & Pricing
All prices are per million tokens.
All Types
All Architectures
All Sizes
| Model | Tinker ID | Type | Arch | Size | Context | Prefill | Sample | Train |
|---|
Pricing Terms
- Prefill: Processing input/prompt tokens (forward pass only)
- Sample: Generating output tokens (forward pass + sampling)
- Train: Forward and backward pass for gradient computation
- Context: Maximum sequence length. Models with
:peft:suffix support extended context at higher prices. - Tinker ID: The exact string to pass to
create_lora_training_client(base_model=...)orcreate_sampling_client(base_model=...)
MoE models are priced by active parameters, making them significantly more cost-effective than dense models of similar quality.
Choosing a Model
- Cost-effective: Use MoE models (highlighted in amber)
- Research/post-training: Use Base models
- Task-specific fine-tuning: Start with an Instruction or Hybrid model
- Low latency: Use Instruction models (no chain-of-thought)
- High intelligence: Use Reasoning or Hybrid models (chain-of-thought)
- Vision tasks: Use models with Vision in the type
For the latest pricing, see the Tinker Console.