tinker_cookbook.rl.ProblemEnv
class tinker_cookbook.rl.ProblemEnv(Env)
A single-turn Q&A environment that rewards correct answers and valid formatting.
class MathEnv(ProblemEnv):
def __init__(self, renderer, question, answer):
super().__init__(renderer)
self.question = question
self.answer = answer
def get_question(self):
return self.question
def check_answer(self, response):
return self.answer in response
def check_format(self, response):
return response.strip() != ""
def get_reference_answer(self):
return self.answer
get_question()
Return the question text for this problem.
Returns: str
Abstract method.
check_answer(sample_str)
Return a reward (0.0 to 1.0) for the model's response.
Parameters:
- sample_str (str) – The decoded text of the model's response.
Returns: bool – Whether the answer is correct.
Abstract method.
check_format(sample_str)
Return a format compliance reward (0.0 to 1.0).
Parameters:
- sample_str (str) – The decoded text of the model's response.
Returns: bool – Whether the response follows the expected format.
Abstract method.
get_reference_answer()
Return the reference answer for logging purposes.
Returns: str
Abstract method.
initial_observation()
Build the initial prompt from the conversation prefix and question.
Returns: tuple[Observation, StopCondition]
step(action, extra)
Score the model's response for correctness and format compliance.
Parameters:
- action (Action) – Token IDs of the model's response.
- extra (ActionExtra | None) – Optional action metadata (unused).
Returns: StepResult