tinker_cookbook.stores.FsspecStorage
class tinker_cookbook.stores.FsspecStorage()
Storage backend wrapping any fsspec.AbstractFileSystem.
Supports S3 (via s3fs), GCS (via gcsfs), Azure (via adlfs),
and any other filesystem that fsspec supports.
Append strategy: Cloud backends don't support native append. This
class stages append-only files locally using POSIX atomic writes, then
uploads them to cloud on flush. Reads check the local stage
first, so IncrementalReader sees appended data immediately.
Call flush at checkpoints or training end to persist staged
data to cloud. The context manager calls close on exit
(which flushes and removes the staging directory).
Data safety: Unflushed staged data lives in a local temp directory.
If the process crashes before flush(), unflushed appends are lost.
Flush at every checkpoint to minimize data loss on crash.
Pickle-serializable — stores protocol, root, and kwargs. Local staged data is NOT included in pickle (each process starts with an empty stage).
url(path)
read(path)
write(path, data)
append(path, data)
See Storage.append.
Stages appends locally using POSIX atomic writes. The first append
for a given path pulls existing content from cloud (if any), then
all subsequent appends are local. Call flush to upload
staged data to cloud.
Parameters:
Returns: None
exists(path)
stat(path)
read_range(path, offset, length)
list_dir(prefix)
See Storage.list_dir. Returns immediate children names only.
Parameters:
- prefix (str)
Returns: list[str]
remove(path)
remove_dir(path)
flush()
Upload all locally staged files to cloud.
Call this at checkpoints or training end. The context manager calls this automatically on exit. Staged files are removed after upload so that subsequent appends pull fresh data from cloud, avoiding re-uploading the entire file on every flush cycle.
Returns: None
close()
Flush staged data and clean up the local staging directory.
Returns: None