Skip to content

tinker_cookbook.stores.FsspecStorage

class tinker_cookbook.stores.FsspecStorage()

Storage backend wrapping any fsspec.AbstractFileSystem.

Supports S3 (via s3fs), GCS (via gcsfs), Azure (via adlfs), and any other filesystem that fsspec supports.

Append strategy: Cloud backends don't support native append. This class stages append-only files locally using POSIX atomic writes, then uploads them to cloud on flush. Reads check the local stage first, so IncrementalReader sees appended data immediately.

Call flush at checkpoints or training end to persist staged data to cloud. The context manager calls close on exit (which flushes and removes the staging directory).

Data safety: Unflushed staged data lives in a local temp directory. If the process crashes before flush(), unflushed appends are lost. Flush at every checkpoint to minimize data loss on crash.

Pickle-serializable — stores protocol, root, and kwargs. Local staged data is NOT included in pickle (each process starts with an empty stage).

url(path)

Return a URI like s3://bucket/prefix/path.

Parameters:

Returns: str

read(path)

See Storage.read. Reads from local stage if available.

Parameters:

Returns: bytes

write(path, data)

See Storage.write. Writes directly to cloud.

Parameters:

Returns: None

append(path, data)

See Storage.append.

Stages appends locally using POSIX atomic writes. The first append for a given path pulls existing content from cloud (if any), then all subsequent appends are local. Call flush to upload staged data to cloud.

Parameters:

Returns: None

exists(path)

See Storage.exists.

Parameters:

Returns: bool

stat(path)

See Storage.stat.

Parameters:

Returns: StorageStat | None

read_range(path, offset, length)

See Storage.read_range.

Parameters:

Returns: bytes

list_dir(prefix)

See Storage.list_dir. Returns immediate children names only.

Parameters:

Returns: list[str]

remove(path)

See Storage.remove.

Parameters:

Returns: None

remove_dir(path)

See Storage.remove_dir.

Parameters:

Returns: None

flush()

Upload all locally staged files to cloud.

Call this at checkpoints or training end. The context manager calls this automatically on exit. Staged files are removed after upload so that subsequent appends pull fresh data from cloud, avoiding re-uploading the entire file on every flush cycle.

Returns: None

close()

Flush staged data and clean up the local staging directory.

Returns: None

aread(path)

Async version of read.

Parameters:

Returns: bytes

awrite(path, data)

Async version of write.

Parameters:

Returns: None

aappend(path, data)

Async version of append.

Parameters:

Returns: None

aexists(path)

Async version of exists.

Parameters:

Returns: bool

astat(path)

Async version of stat.

Parameters:

Returns: StorageStat | None

aread_range(path, offset, length)

Async version of read_range.

Parameters:

Returns: bytes

alist_dir(prefix)

Async version of list_dir.

Parameters:

Returns: list[str]

aremove(path)

Async version of remove.

Parameters:

Returns: None

aremove_dir(path)

Async version of remove_dir.

Parameters:

Returns: None