tinker_cookbook.rl.TrajectoryGroup

class tinker_cookbook.rl.TrajectoryGroup()

A group of trajectories produced by one EnvGroupBuilder.

Created by the rollout executor after running all environments in a group and computing group rewards. This is the primary data structure consumed by RL training algorithms.

The _G suffix follows the project convention for tensors/lists indexed over the group dimension.

Fields:

trajectories_G (list[Trajectory])
final_rewards_G (list[float]) – computed by the EnvGroupBuilder, looking at whole group
metrics_G (list[Metrics])
rollout_errors (list[RolloutError], default: field(default_factory=list)) – Empty list means no trajectory errors occurred.

get_total_rewards()

Get the total reward (return) for each trajectory in the group.

The total reward is the sum of the per-timestep rewards (from Env.step) plus the final group reward (from EnvGroupBuilder.compute_group_rewards).

Returns: list[float] – Total rewards, one per trajectory in the group.

tinker_cookbook.rl.TrajectoryGroup

class tinker_cookbook.rl.TrajectoryGroup()

get_total_rewards()

Referenced by