Skip to content

tinker_cookbook.rl.Env

class tinker_cookbook.rl.Env(ABC)

Stateful environment that a single agent interacts with.

Each Env instance is single-use: create it, run one episode, then discard it. Environments are created by EnvGroupBuilder.make_envs.

Implementors must override initial_observation and step.

class MyEnv(Env):
def __init__(self, question: str, answer: str, renderer):
self.question = question
self.answer = answer
self.renderer = renderer
async def initial_observation(self):
messages = [{"role": "user", "content": self.question}]
model_input, _ = self.renderer.build_generation_prompt(messages)
return model_input, self.renderer.get_stop_sequences()
async def step(self, action, *, extra=None):
response = self.renderer.tokenizer.decode(action)
reward = 1.0 if self.answer in response else 0.0
return StepResult(
reward=reward,
episode_done=True,
next_observation=tinker.ModelInput.from_ints([]),
next_stop_condition=[],
)

initial_observation()

Return the starting observation and stop condition for this episode.

Returns: tuple[Observation, StopCondition] – The initial observation (model input) and the stop condition for the first generation step.

Abstract method.

step(action, extra)

Advance the environment by one step given the agent's action.

Parameters:

  • action (Action) – Token IDs produced by the agent.
  • extra (ActionExtra | None) – Optional metadata about the action, such as the stop reason.

Returns: StepResult – The reward, next observation, and whether the episode is done.

Abstract method.