tinker_cookbook.distillation.train_on_policy.Config
class tinker_cookbook.distillation.train_on_policy.Config()
Fields:
- learning_rate (float)
- dataset_configs (list[DistillationDatasetConfig])
- model_name (str)
- recipe_name (str)
- renderer_name (str | None, default:
None) - max_tokens (int)
- temperature (float, default:
1.0) - compute_post_kl (bool, default:
False) - evaluator_builders (list[SamplingClientEvaluatorBuilder], default:
[]) - lora_rank (int, default:
32) - kl_penalty_coef (float, default:
1.0) - kl_discount_factor (float, default:
0.0) - loss_fn (LossFnType, default:
'importance_sampling') – See https://tinker-docs.thinkingmachines.ai/losses - loss_fn_config (dict[str, Any] | None, default:
None) - num_substeps (int, default:
1) – Useful for very large batch sizes. - wandb_project (str | None, default:
None) - wandb_name (str | None, default:
None) - log_path (str)
- base_url (str | None, default:
None) - enable_trace (bool, default:
False) - span_chart_every (int, default:
0) - eval_every (int, default:
20) - save_every (int, default:
20) - load_checkpoint_path (str | None, default:
None) - max_steps (int | None, default:
None) – Maximum number of training steps. If None, train on the full dataset.