Tutorial: Rendering
Prerequisites
Run it interactively
Rendering converts a list of messages into a token sequence that a model can consume. While similar to HuggingFace chat templates, Tinker's rendering system handles the full training lifecycle: supervised learning, reinforcement learning, and deployment.
The renderer sits between your high-level conversation data and the low-level tokens the model sees:
This tutorial covers the Renderer class and its key methods.
Setup
We need a tokenizer (to map between text and token IDs) and a renderer (to apply the model's chat format).
from tinker_cookbook import renderers, tokenizer_utils
tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
renderer = renderers.get_renderer("qwen3", tokenizer)
Example conversation
We will use this conversation throughout the tutorial.
messages = [
{"role": "system", "content": "Answer concisely; at most one sentence per response"},
{"role": "user", "content": "What is the longest-lived rodent species?"},
{"role": "assistant", "content": "The naked mole rat, which can live over 30 years."},
{"role": "user", "content": "How do they live so long?"},
{
"role": "assistant",
"content": "They evolved multiple protective mechanisms including special hyaluronic acid that prevents cancer, extremely stable proteins, and efficient DNA repair systems that work together to prevent aging.",
},
]
build_generation_prompt() -- for sampling
Converts a conversation into a token prompt ready for the model to continue. This is used during RL rollouts and at deployment time.
Typically you pass all messages except the final assistant reply, so the model generates its own response.
# Remove the last assistant message so the model can generate one
prompt = renderer.build_generation_prompt(messages[:-1])
print("ModelInput:", prompt)
print()
print("Decoded tokens:")
print(tokenizer.decode(prompt.to_ints()))
Output
ModelInput: ModelInput(chunks=[EncodedTextChunk(tokens=[151644, 8948, 198], type='encoded_text'), EncodedTextChunk(tokens=[16141, 3529, 285, 974, 26, 518, 1429, 825, 11652, 817, 2033, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[3838, 374, 279, 22032, 61854, 20589, 306, 9419, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text'), EncodedTextChunk(tokens=[785, 19020, 34651, 11244, 11, 892, 646, 3887, 916, 220, 18, 15, 1635, 13, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 872, 198], type='encoded_text'), EncodedTextChunk(tokens=[4340, 653, 807, 3887, 773, 1293, 30, 151645], type='encoded_text'), EncodedTextChunk(tokens=[198, 151644, 77091, 198], type='encoded_text')])
Decoded tokens:
<|im_start|>system
Answer concisely; at most one sentence per response<|im_end|>
<|im_start|>user
What is the longest-lived rodent species?<|im_end|>
<|im_start|>assistant
The naked mole rat, which can live over 30 years.<|im_end|>
<|im_start|>user
How do they live so long?<|im_end|>
<|im_start|>assistant
The output is a ModelInput object containing the tokenized chat template. Notice how each message is wrapped in special tokens like <|im_start|> and <|im_end|>, and the final <|im_start|>assistant is left open for the model to fill in.
get_stop_sequences() -- stop tokens
When sampling, we need to know when the model has finished its response. get_stop_sequences() returns the token IDs (or strings) that signal end-of-generation.
stop_sequences = renderer.get_stop_sequences()
print(f"Stop sequences: {stop_sequences}")
# For Qwen3, this is the <|im_end|> token
for tok in stop_sequences:
if isinstance(tok, int):
print(f" Token {tok} decodes to: {repr(tokenizer.decode([tok]))}")
parse_response() -- decoding tokens back to a message
After sampling, you get raw token IDs. parse_response() converts them back into a structured message dict.
# Simulate some sampled tokens (in practice these come from the model)
fake_tokens = [45, 7741, 34651, 31410, 614, 4911, 76665, 13, 151645]
parsed_message, parse_success = renderer.parse_response(fake_tokens)
print(f"Parsed message: {parsed_message}")
print(f"Parse success: {parse_success}")
Output
Putting it together: sampling a response
Here is the full pattern for generating a message from a model. This requires a running Tinker service (and TINKER_API_KEY).
import tinker
from tinker.types import SamplingParams
service_client = tinker.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model="Qwen/Qwen3-30B-A3B")
prompt = renderer.build_generation_prompt(messages[:-1])
stop_sequences = renderer.get_stop_sequences()
sampling_params = SamplingParams(max_tokens=100, temperature=0.5, stop=stop_sequences)
output = sampling_client.sample(prompt, sampling_params=sampling_params, num_samples=1).result()
sampled_message, success = renderer.parse_response(output.sequences[0].tokens)
print(sampled_message)
build_supervised_example() -- for training
For supervised fine-tuning, we need to distinguish prompt tokens (context the model reads) from completion tokens (what the model should learn to produce). build_supervised_example() returns both the tokens and per-token loss weights.
- Weight
0= prompt (no loss computed) - Weight
1= completion (model trains on these)
model_input, weights = renderer.build_supervised_example(messages)
# Show which tokens are prompt vs completion
token_ids = model_input.to_ints()
for i, (tok_id, w) in enumerate(zip(token_ids, weights.tolist())):
label = "COMPLETION" if w > 0 else "prompt"
print(f" [{i:3d}] {label:10s} {repr(tokenizer.decode([tok_id]))}")
Output
[ 0] prompt '<|im_start|>'
[ 1] prompt 'system'
[ 2] prompt '\n'
[ 3] prompt 'Answer'
[ 4] prompt ' conc'
[ 5] prompt 'is'
[ 6] prompt 'ely'
[ 7] prompt ';'
[ 8] prompt ' at'
[ 9] prompt ' most'
[ 10] prompt ' one'
[ 11] prompt ' sentence'
[ 12] prompt ' per'
[ 13] prompt ' response'
[ 14] prompt '<|im_end|>'
[ 15] prompt '\n'
[ 16] prompt '<|im_start|>'
[ 17] prompt 'user'
[ 18] prompt '\n'
[ 19] prompt 'What'
[ 20] prompt ' is'
[ 21] prompt ' the'
[ 22] prompt ' longest'
[ 23] prompt '-lived'
[ 24] prompt ' rod'
[ 25] prompt 'ent'
[ 26] prompt ' species'
[ 27] prompt '?'
[ 28] prompt '<|im_end|>'
[ 29] prompt '\n'
[ 30] prompt '<|im_start|>'
[ 31] prompt 'assistant'
[ 32] prompt '\n'
[ 33] prompt 'The'
[ 34] prompt ' naked'
[ 35] prompt ' mole'
[ 36] prompt ' rat'
[ 37] prompt ','
[ 38] prompt ' which'
[ 39] prompt ' can'
[ 40] prompt ' live'
[ 41] prompt ' over'
[ 42] prompt ' '
[ 43] prompt '3'
[ 44] prompt '0'
[ 45] prompt ' years'
[ 46] prompt '.'
[ 47] prompt '<|im_end|>'
[ 48] prompt '\n'
[ 49] prompt '<|im_start|>'
[ 50] prompt 'user'
[ 51] prompt '\n'
[ 52] prompt 'How'
[ 53] prompt ' do'
[ 54] prompt ' they'
[ 55] prompt ' live'
[ 56] prompt ' so'
[ 57] prompt ' long'
[ 58] prompt '?'
[ 59] prompt '<|im_end|>'
[ 60] prompt '\n'
[ 61] prompt '<|im_start|>'
[ 62] prompt 'assistant'
[ 63] prompt '\n'
[ 64] COMPLETION 'They'
[ 65] COMPLETION ' evolved'
[ 66] COMPLETION ' multiple'
[ 67] COMPLETION ' protective'
[ 68] COMPLETION ' mechanisms'
[ 69] COMPLETION ' including'
[ 70] COMPLETION ' special'
[ 71] COMPLETION ' hy'
[ 72] COMPLETION 'al'
[ 73] COMPLETION 'ur'
[ 74] COMPLETION 'onic'
[ 75] COMPLETION ' acid'
[ 76] COMPLETION ' that'
[ 77] COMPLETION ' prevents'
[ 78] COMPLETION ' cancer'
[ 79] COMPLETION ','
[ 80] COMPLETION ' extremely'
[ 81] COMPLETION ' stable'
[ 82] COMPLETION ' proteins'
[ 83] COMPLETION ','
[ 84] COMPLETION ' and'
[ 85] COMPLETION ' efficient'
[ 86] COMPLETION ' DNA'
[ 87] COMPLETION ' repair'
[ 88] COMPLETION ' systems'
[ 89] COMPLETION ' that'
[ 90] COMPLETION ' work'
[ 91] COMPLETION ' together'
[ 92] COMPLETION ' to'
[ 93] COMPLETION ' prevent'
[ 94] COMPLETION ' aging'
[ 95] COMPLETION '.'
[ 96] COMPLETION '<|im_end|>'
Only the final assistant message has weight 1 (completion). Everything else -- system prompt, user messages, and even earlier assistant messages -- has weight 0. This way the loss only encourages the model to produce the correct response, without overfitting to the prompt content (system instructions, questions) which the model should not need to memorize.
TrainOnWhat -- controlling loss targets
By default, build_supervised_example trains on the last assistant message. The TrainOnWhat enum gives you more control:
| Value | Trains on |
|---|---|
LAST_ASSISTANT_MESSAGE |
Only the final assistant reply (default) |
LAST_ASSISTANT_TURN |
Final assistant turn including tool calls/responses |
ALL_ASSISTANT_MESSAGES |
Every assistant message in the conversation |
ALL_MESSAGES |
All messages regardless of role |
ALL_TOKENS |
Every token including special tokens |
CUSTOMIZED |
Per-message train flags from the dataset |
# Train on ALL assistant messages instead of just the last one
_, weights_all = renderer.build_supervised_example(
messages,
train_on_what=renderers.TrainOnWhat.ALL_ASSISTANT_MESSAGES,
)
print(f"Tokens with weight > 0: {(weights_all > 0).sum().item()}")
# Compare with default (last assistant message only)
_, weights_last = renderer.build_supervised_example(messages)
print(f"Tokens with weight > 0 (default): {(weights_last > 0).sum().item()}")
Available renderers
Tinker ships renderers for several model families. Use get_renderer() with the appropriate name:
| Name | Model family | Notes |
|---|---|---|
qwen3 |
Qwen3 | Thinking enabled (default) |
qwen3_disable_thinking |
Qwen3 | Thinking disabled |
llama3 |
Llama 3 | Omits the HF preamble |
deepseekv3 |
DeepSeek V3 | Non-thinking mode (default) |
deepseekv3_thinking |
DeepSeek V3 | Thinking mode |
nemotron3 |
NVIDIA Nemotron 3 | Thinking enabled |
kimi_k2 |
Kimi K2 | Thinking format |
Each renderer produces the correct special tokens for its model family. The default renderers match HuggingFace's apply_chat_template output, so models trained with Tinker work with the OpenAI-compatible endpoint.
# Example: switching between renderers
# Each model family needs its own tokenizer
qwen_tokenizer = tokenizer_utils.get_tokenizer("Qwen/Qwen3-30B-A3B")
qwen_renderer = renderers.get_renderer("qwen3", qwen_tokenizer)
test_messages = [{"role": "user", "content": "Hello!"}]
prompt_tokens = qwen_renderer.build_generation_prompt(test_messages)
print("Qwen3 prompt:")
print(qwen_tokenizer.decode(prompt_tokens.to_ints()))
Vision inputs with ImagePart
For vision-language models (like Qwen3-VL), message content can include images alongside text. Use ImagePart for images and TextPart for text within the same message.
from tinker_cookbook.renderers import ImagePart, Message, TextPart
# A multimodal message with an image and text
multimodal_message = Message(
role="user",
content=[
ImagePart(type="image", image="https://example.com/photo.png"),
TextPart(type="text", text="What is in this image?"),
],
)
print("Multimodal message:", multimodal_message)
# Text-only messages still work as plain strings
text_message = Message(role="user", content="Describe this in one word.")
print("Text message:", text_message)
Output
To use vision renderers, you also need an image processor:
from tinker_cookbook.image_processing_utils import get_image_processor
model_name = "Qwen/Qwen3-VL-235B-A22B-Instruct"
tokenizer = tokenizer_utils.get_tokenizer(model_name)
image_processor = get_image_processor(model_name)
renderer = renderers.get_renderer("qwen3_vl_instruct", tokenizer, image_processor=image_processor)
The VL renderers handle vision special tokens (<|vision_start|>, <|vision_end|>) and image preprocessing automatically.
Custom renderers with register_renderer()
If you need a format not covered by the built-in renderers, you can register your own. This lets you use get_renderer() with a custom name throughout your codebase.
from tinker_cookbook.renderers.base import Renderer
# Define a factory function that creates your renderer
def my_renderer_factory(tokenizer, image_processor=None):
# In practice, you would return a custom Renderer subclass here.
# For demonstration, we just return the Qwen3 renderer.
from tinker_cookbook.renderers.qwen3 import Qwen3Renderer
return Qwen3Renderer(tokenizer)
# Register it under a namespaced name
renderers.register_renderer("MyOrg/custom_format", my_renderer_factory)
# Now you can use it via get_renderer
print(f"Registered renderers: {renderers.get_registered_renderer_names()}")
# Clean up
renderers.unregister_renderer("MyOrg/custom_format")
Summary
The renderer is the bridge between conversations and tokens. Its four key methods cover the full lifecycle:
| Method | Purpose | Used in |
|---|---|---|
build_generation_prompt() |
Messages to prompt tokens | RL, inference |
get_stop_sequences() |
End-of-generation tokens | Sampling |
parse_response() |
Tokens back to a message | RL, inference |
build_supervised_example() |
Messages to tokens + loss weights | SFT, DPO |
Use get_renderer(name, tokenizer) to get the right renderer for your model, and TrainOnWhat to control which parts of the conversation the model trains on.