Skip to content

tinker_cookbook.tokenizer_utils.get_tokenizer

tinker_cookbook.tokenizer_utils.get_tokenizer(model_name)

Get a tokenizer by name.

Checks the custom registry first (see register_tokenizer), then falls back to HuggingFace AutoTokenizer. HuggingFace tokenizers are cached after first load.

Parameters:

  • model_name (str) – HuggingFace model identifier (e.g. "Qwen/Qwen3-8B") or a custom registered name.

Returns: Tokenizer – A PreTrainedTokenizer instance.

from tinker_cookbook.tokenizer_utils import get_tokenizer
tokenizer = get_tokenizer("Qwen/Qwen3-8B")
tokens = tokenizer.encode("Hello world")