tinker_cookbook.rl.Transition
class tinker_cookbook.rl.Transition()
A single (observation, action, reward) tuple from a trajectory.
Fields:
- ob (Observation) – Observation the agent saw before taking the action.
- ac (TokensWithLogprobs) – Action taken (tokens and their log-probabilities).
- reward (float) – Immediate reward received after taking the action.
- episode_done (bool) – Whether this transition ended the episode.
- metrics (Metrics, default:
field(default_factory=dict)) – Numeric values aggregated and reported in training logs. - logs (Logs, default:
field(default_factory=dict)) – Diagnostic info for display/debugging tools (not aggregated like metrics).