tinker_cookbook.rl.Env

class tinker_cookbook.rl.Env(ABC)

Stateful environment that a single agent interacts with.

Each Env instance is single-use: create it, run one episode, then discard it. Environments are created by EnvGroupBuilder.make_envs.

Implementors must override initial_observation and step.

class MyEnv(Env):
def __init__(self, question: str, answer: str, renderer):
self.question = question
self.answer = answer
self.renderer = renderer
async def initial_observation(self):
messages = [{"role": "user", "content": self.question}]
model_input, _ = self.renderer.build_generation_prompt(messages)
return model_input, self.renderer.get_stop_sequences()
async def step(self, action, *, extra=None):
response = self.renderer.tokenizer.decode(action)
reward = 1.0 if self.answer in response else 0.0
return StepResult(
reward=reward,
episode_done=True,
next_observation=tinker.ModelInput.from_ints([]),
next_stop_condition=[],
)

initial_observation()

Return the starting observation and stop condition for this episode.

Returns: tuple[Observation, StopCondition] | InitialObservationOverflow: The initial observation (model input) and the stop condition for the first generation step. Environments that enforce a token budget may instead return InitialObservationOverflow when the initial prompt already exceeds it, which ends the rollout immediately and gracefully (no sampling call is made).

Abstract method.

step(action, extra)

Advance the environment by one step given the agent's action.

Parameters:

action (Action) – Token IDs produced by the agent.
extra (ActionExtra | None) – Optional metadata about the action, such as the stop reason.

Returns: StepResult – The reward, next observation, and whether the episode is done.

Abstract method.

tinker_cookbook.rl.Env

class tinker_cookbook.rl.Env(ABC)

initial_observation()

step(action, extra)

Referenced by