Skip to content

tinker_cookbook.rl.ProblemEnv

class tinker_cookbook.rl.ProblemEnv(Env)

A single-turn Q&A environment that rewards correct answers and valid formatting.

class MathEnv(ProblemEnv):
def __init__(self, renderer, question, answer):
super().__init__(renderer)
self.question = question
self.answer = answer
def get_question(self):
return self.question
def check_answer(self, response):
return self.answer in response
def check_format(self, response):
return response.strip() != ""
def get_reference_answer(self):
return self.answer

get_question()

Return the question text for this problem.

Returns: str

Abstract method.

check_answer(sample_str)

Return a reward (0.0 to 1.0) for the model's response.

Parameters:

  • sample_str (str) – The decoded text of the model's response.

Returns: bool – Whether the answer is correct.

Abstract method.

check_format(sample_str)

Return a format compliance reward (0.0 to 1.0).

Parameters:

  • sample_str (str) – The decoded text of the model's response.

Returns: bool – Whether the response follows the expected format.

Abstract method.

get_reference_answer()

Return the reference answer for logging purposes.

Returns: str

Abstract method.

initial_observation()

Build the initial prompt from the conversation prefix and question.

Returns: tuple[Observation, StopCondition]

step(action, extra)

Score the model's response for correctness and format compliance.

Parameters:

  • action (Action) – Token IDs of the model's response.
  • extra (ActionExtra | None) – Optional action metadata (unused).

Returns: StepResult