tinker_cookbook.distillation.train_off_policy.Config
class tinker_cookbook.distillation.train_off_policy.Config()
Configuration for off-policy distillation with soft teacher targets.
Fields:
- learning_rate (float)
- dataset_configs (list[DatasetWithTeacher])
- model_name (str)
- recipe_name (str)
- renderer_name (str | None, default:
None) - lora_rank (int, default:
32) - n_teacher_targets (int, default:
20) – Number of highest-probability teacher tokens per position used as soft targets. - teacher_concurrency (int, default:
64) – Max concurrent teacher forward passes per batch. - batch_size (int, default:
64) – Number of examples per training step. - save_every (int, default:
10) – Checkpointing and logging - eval_every (int, default:
20) - max_steps (int | None, default:
None) - load_checkpoint_path (str | None, default:
None) - log_path (str)
- wandb_project (str | None, default:
None) - wandb_name (str | None, default:
None) - base_url (str | None, default:
None) - ttl_seconds (int | None, default:
604800) – Server-side checkpoint retention (seconds). None = keep indefinitely. - enable_trace (bool, default:
False) - evaluator_builders (list[SamplingClientEvaluatorBuilder], default:
[])