tinker_cookbook.renderers.Renderer
class tinker_cookbook.renderers.Renderer(ABC)
Abstract base class for rendering message lists into training and sampling prompts.
Subclasses must implement:
- get_stop_sequences(): Return stop tokens/strings for sampling
- render_message(): Break a message into header/output/stop_overlap components
- parse_response(): Convert sampled tokens back into a Message
The default build_generation_prompt and build_supervised_example implementations assume simple concatenation of rendered messages. Override these if your renderer modifies the conversation structure (e.g., stripping thinking blocks from history).
Pickle support: Renderers created via get_renderer() are automatically pickleable.
On deserialization, the tokenizer and image processor are reconstructed from cached
loaders, so the cost is negligible. Renderers created directly (not via get_renderer())
must set _renderer_name and _model_name manually to be pickleable.
Implementations of EnvGroupBuilder must be pickleable to support distributed rollout
execution. Since many builders store a Renderer, this pickle support is critical.
Fields:
- tokenizer (Tokenizer)
-
supports_streaming (bool, default:
False) – Whether this renderer supports streaming response parsing.Renderers that set this to True get a default parse_response_streaming implementation using ReasoningStreamingParser. They must also define
_end_message_tokenand_parse_response_for_streaming.
property has_extension_property
Whether this renderer satisfies the sequence extension property.
A renderer has the extension property if, for any multi-turn conversation, calling build_generation_prompt at each successive assistant turn produces token sequences where each is a prefix of the next. This enables:
- Merging multiple timesteps into a single training datum
- KV-cache reuse during sampling
- O(T) compute scaling instead of O(T^2) for T-turn trajectories
Renderers that strip thinking blocks from history (like Qwen3Renderer with strip_thinking_from_history=True) do NOT have this property because the observation at timestep 2 is not a prefix of timestep 1's full sequence.
See the Tinker documentation on sequence extension for details.
Returns: bool
get_stop_sequences()
Return stop token IDs or strings that signal end-of-generation for the model.
Returns: list[str] | list[int]
Abstract method.
render_message(message, ctx)
Render a single message into its header/output/stop_overlap components.
This method breaks down a message into parts for loss masking. See RenderedMessage for detailed semantics of each component.
Parameters:
- message (Message) – The message to render.
- ctx (RenderContext) – Context about the message's position in the conversation, including index, is_last flag, and prev_message.
Returns: RenderedMessage – Container with header, output, and optionally stop_overlap components for loss masking.
Abstract method.
parse_response(response)
Parse sampled tokens back into a Message.
Parameters:
- response (list[int]) – Token IDs returned from sampling.
Returns: tuple[Message, ParseTermination] – A (message, termination) tuple. termination is an explicit signal — see ParseTermination. A best-effort Message is always returned, even on MALFORMED, so callers can still log / display partial output.
Abstract method.
parse_response_streaming(response)
Parse response tokens with streaming, yielding incremental deltas.
This enables real-time display of model output by yielding partial content as tokens arrive, rather than waiting for the complete response.
Renderers that set supports_streaming = True get a default
implementation using ReasoningStreamingParser. Others raise
NotImplementedError.
Parameters:
- response (list[int]) – Token IDs from the model.
Returns: Iterator[MessageDelta]
to_openai_message(message)
Convert a Message to OpenAI chat completions API format.
The returned object can be passed into the transformers library's apply_chat_template function, which is useful for testing purposes.
It's also useful for querying models that are being served through OpenAI-compatible APIs (OpenRouter, vLLM, etc.).
The base implementation handles:
- Basic role/content conversion
- tool_calls conversion from ToolCall objects to OpenAI dict format
- tool_call_id and name for tool response messages
Subclasses should override this to handle model-specific features like reasoning_content for thinking models.
Parameters:
Returns: dict – A dict in OpenAI API message format.
create_conversation_prefix_with_tools(tools, system_prompt)
Create message(s) with tool specifications to prepend to conversations.
Returns one or more messages to prepend to the conversation. This is the standard way to add tools - the returned messages should be placed at the start of your message list before user/assistant messages.
Parameters:
- tools (list[ToolSpec]) – List of tool specifications.
- system_prompt (str) – The system prompt content.
Returns: list[Message] – List of messages to prepend to the conversation.
Raises:
- NotImplementedError: If the renderer doesn't support tool calling.
build_generation_prompt(messages, role, prefill)
Convert a message list to a token prompt for sampling.
Parameters:
- messages (list[Message]) – A list of messages to render.
- role (Role) – The role of the partial message to be completed. Defaults to
"assistant". - prefill (str | None) – An optional string to prefill in the model's generation. Useful for constraining the start of the model's output.
Returns: tinker.ModelInput – A ModelInput containing the tokenized prompt.
build_supervised_examples(messages, train_on_what)
Build tokens and per-token weights for supervised fine-tuning.
Returns a list of (model_input, weights) tuples. Multiple examples are needed when the renderer does not satisfy the extension property.
Parameters:
- messages (list[Message]) – The conversation to render.
- train_on_what (TrainOnWhat) – Which parts of the sequence to compute loss on.
Returns: list[tuple[tinker.ModelInput, torch.Tensor]] – A list of (ModelInput, weight_tensor) tuples for training.
build_supervised_example(messages, train_on_what)
Build tokens and per-token weights for supervised fine-tuning.
This default implementation concatenates rendered messages in order. Override this method if your build_generation_prompt does anything that breaks the simple concatenation assumption—for example, if it strips thinking blocks from history (like Qwen3Renderer), injects default system prompts (like KimiK2Renderer), or otherwise modifies the token sequence.
The supervised example tokens should match what build_generation_prompt would produce for the same conversation prefix, so the model trains on the same distribution it sees at inference time.
Parameters:
- messages (list[Message]) – A list of messages to render.
- train_on_what (TrainOnWhat) – Controls which tokens receive non-zero training weight:
Returns: tuple[tinker.ModelInput, torch.Tensor] – A (model_input, weights) tuple where weights is a 1-D float tensor with the same length as the total number of tokens.