tinker_cookbook.supervised.FromConversationFileBuilder
class tinker_cookbook.supervised.FromConversationFileBuilder(ChatDatasetBuilder)
Build a supervised dataset from a JSONL file of chat conversations.
Each line of the file must be a JSON object with a "messages" key whose
value is a list of chat messages (dicts with "role" and "content").
builder = FromConversationFileBuilder(
file_path="data/conversations.jsonl",
test_size=50,
common_config=ChatDatasetBuilderCommonConfig(
model_name_for_tokenizer="Qwen/Qwen3-8B",
renderer_name="qwen3",
max_length=2048,
batch_size=8,
),
)
train_ds, test_ds = builder()
Fields:
- file_path (str)
- test_size (int, default:
0) - shuffle_seed (int, default:
0)
__call__()
Load the JSONL file and return (train_dataset, test_dataset).
Returns: tuple[SupervisedDataset, SupervisedDataset | None] – Training dataset and an optional held-out evaluation dataset.
Raises:
- DataFormatError: If any line in the file lacks a
"messages"key.