tinker_cookbook.rl

Class	Description
`StepResult`	Result returned by :meth:`Env.step`.
`Transition`	A single (observation, action, reward) tuple from a trajectory.
`ActionExtra`	Extra metadata passed alongside an action to :meth:`Env.step`.
`Env`	Stateful environment that a single agent interacts with.
`Trajectory`	A complete episode: a sequence of transitions from one agent in one environment.
`RolloutError`	A captured error from a failed trajectory rollout.
`EnvGroupBuilder`	Builds a group of environments. The group will be used in the following way:
`TrajectoryGroup`	A group of trajectories produced by one :class:`EnvGroupBuilder`.
`RLDataset`	A dataset that produces batches of :class:`EnvGroupBuilder` instances.
`RLDatasetBuilder`	Abstract builder that constructs training and optional test RL datasets.
`ProblemEnv`	A single-turn Q&A environment that rewards correct answers and valid formatting.
`ProblemGroupBuilder`	Builds a group of ProblemEnv instances from a factory callable.
`MessageStepResult`	Result of a message-level step.
`MessageEnv`	Abstract base class for message-level environments.
`EnvFromMessageEnv`	Adapter that wraps a MessageEnv to implement the token-level Env interface.
`RolloutStrategy`	Controls how trajectories are collected from a group of environments.
`FailFast`	Default strategy: any trajectory error crashes the group.
`RetryOnFailure`	Retry failed trajectories with fresh environments.

Function	Description
`compute_advantages()`	Compute advantages for each trajectory, centered within groups.
`trajectory_to_data()`	Return one or more Datum objects corresponding to the trajectory.
`assemble_training_data()`	Convert trajectories to training data format.