Skip to content

tinker_cookbook.rl

Class Description
StepResult Result returned by :meth:Env.step.
Transition A single (observation, action, reward) tuple from a trajectory.
ActionExtra Extra metadata passed alongside an action to :meth:Env.step.
Env Stateful environment that a single agent interacts with.
Trajectory A complete episode: a sequence of transitions from one agent in one environment.
RolloutError A captured error from a failed trajectory rollout.
EnvGroupBuilder Builds a group of environments. The group will be used in the following way:
TrajectoryGroup A group of trajectories produced by one :class:EnvGroupBuilder.
RLDataset A dataset that produces batches of :class:EnvGroupBuilder instances.
RLDatasetBuilder Abstract builder that constructs training and optional test RL datasets.
ProblemEnv A single-turn Q&A environment that rewards correct answers and valid formatting.
ProblemGroupBuilder Builds a group of ProblemEnv instances from a factory callable.
MessageStepResult Result of a message-level step.
MessageEnv Abstract base class for message-level environments.
EnvFromMessageEnv Adapter that wraps a MessageEnv to implement the token-level Env interface.
RolloutStrategy Controls how trajectories are collected from a group of environments.
FailFast Default strategy: any trajectory error crashes the group.
RetryOnFailure Retry failed trajectories with fresh environments.
Function Description
compute_advantages() Compute advantages for each trajectory, centered within groups.
trajectory_to_data() Return one or more Datum objects corresponding to the trajectory.
assemble_training_data() Convert trajectories to training data format.