Skip to content

tinker_cookbook.rl.Transition

class tinker_cookbook.rl.Transition()

A single (observation, action, reward) tuple from a trajectory.

Fields:

  • ob (Observation) – Observation the agent saw before taking the action.
  • ac (TokensWithLogprobs) – Action taken (tokens and their log-probabilities).
  • reward (float) – Immediate reward received after taking the action.
  • episode_done (bool) – Whether this transition ended the episode.
  • metrics (Metrics, default: field(default_factory=dict)) – Numeric values aggregated and reported in training logs.
  • logs (Logs, default: field(default_factory=dict)) – Diagnostic info for display/debugging tools (not aggregated like metrics).