tinker_cookbook.rl.TrajectoryGroup
class tinker_cookbook.rl.TrajectoryGroup()
A group of trajectories produced by one EnvGroupBuilder.
Created by the rollout executor after running all environments in a group and computing group rewards. This is the primary data structure consumed by RL training algorithms.
The _G suffix follows the project convention for tensors/lists indexed
over the group dimension.
Fields:
- trajectories_G (list[Trajectory])
- final_rewards_G (list[float]) – computed by the EnvGroupBuilder, looking at whole group
- metrics_G (list[Metrics])
- rollout_errors (list[RolloutError], default:
field(default_factory=list)) – Empty list means no trajectory errors occurred.
get_total_rewards()
Get the total reward (return) for each trajectory in the group.
The total reward is the sum of the per-timestep rewards (from
Env.step) plus the final group reward (from
EnvGroupBuilder.compute_group_rewards).
Returns: list[float] – Total rewards, one per trajectory in the group.