tinker_cookbook.rl.Env
class tinker_cookbook.rl.Env(ABC)
Stateful environment that a single agent interacts with.
Each Env instance is single-use: create it, run one episode, then
discard it. Environments are created by EnvGroupBuilder.make_envs.
Implementors must override initial_observation and step.
class MyEnv(Env):
def __init__(self, question: str, answer: str, renderer):
self.question = question
self.answer = answer
self.renderer = renderer
async def initial_observation(self):
messages = [{"role": "user", "content": self.question}]
model_input, _ = self.renderer.build_generation_prompt(messages)
return model_input, self.renderer.get_stop_sequences()
async def step(self, action, *, extra=None):
response = self.renderer.tokenizer.decode(action)
reward = 1.0 if self.answer in response else 0.0
return StepResult(
reward=reward,
episode_done=True,
next_observation=tinker.ModelInput.from_ints([]),
next_stop_condition=[],
)
initial_observation()
Return the starting observation and stop condition for this episode.
Returns: tuple[Observation, StopCondition] – The initial observation (model input) and the stop condition for the first generation step.
Abstract method.
step(action, extra)
Advance the environment by one step given the agent's action.
Parameters:
- action (Action) – Token IDs produced by the agent.
- extra (ActionExtra | None) – Optional metadata about the action, such as the stop reason.
Returns: StepResult – The reward, next observation, and whether the episode is done.
Abstract method.