Skip to content

tinker_cookbook.supervised.StreamingSupervisedDatasetFromHFDataset

class tinker_cookbook.supervised.StreamingSupervisedDatasetFromHFDataset(SupervisedDataset)

A supervised dataset that streams from HuggingFace, reducing memory usage.

Only supports forward iteration; seeking backward raises an error.

get_batch(index)

Return a batch of Datum objects at the given index.

Only forward iteration is supported. Requesting a batch at or before the most recently returned index raises DataValidationError.

Parameters:

  • index (int) – Zero-based batch index (must be strictly greater than the previous call's index).

Returns: list[tinker.Datum] – Training datums for this batch.

Raises:

  • DataValidationError: If index would require backward seeking.

set_epoch(seed)

Reset the stream for a new epoch.

Parameters:

  • seed (int) – Epoch seed forwarded to the underlying iterable dataset. Default 0.