tinker_cookbook.supervised.StreamingSupervisedDatasetFromHFDataset
class tinker_cookbook.supervised.StreamingSupervisedDatasetFromHFDataset(SupervisedDataset)
A supervised dataset that streams from HuggingFace, reducing memory usage.
Only supports forward iteration; seeking backward raises an error.
get_batch(index)
Return a batch of Datum objects at the given index.
Only forward iteration is supported. Requesting a batch at or before
the most recently returned index raises DataValidationError.
Parameters:
- index (int) – Zero-based batch index (must be strictly greater than the previous call's index).
Returns: list[tinker.Datum] – Training datums for this batch.
Raises:
- DataValidationError: If
indexwould require backward seeking.
set_epoch(seed)
Reset the stream for a new epoch.
Parameters:
- seed (int) – Epoch seed forwarded to the underlying iterable dataset. Default
0.