tinker_cookbook.distillation.sdft.Config
class tinker_cookbook.distillation.sdft.Config()
Configuration for SDFT training.
Key parameters:
-
topk: Number of top tokens for distillation (default 20). Set to 0 for the importance-sampling fallback. K=20 matches full-vocabulary KL in practice. -
learning_rate: For LoRA, use 5e-4 to 1e-3. The top-K CE loss produces larger gradients than SFT at the same LR due to more completion tokens per step (on-policy generation), so use the lower end of the range. -
teacher_sync_every: Optional periodic hard-sync of student weights into the teacher (approximating EMA).None= static frozen teacher, which works comparably to EMA in our experiments.
See main for the training loop.
Fields:
- model_name (str) – Model
- recipe_name (str)
- renderer_name (str | None, default:
None) - lora_rank (int, default:
128) - base_url (str | None, default:
None) - learning_rate (float, default:
2e-05) – Training - max_tokens (int, default:
2048) - temperature (float, default:
1.0) - loss_fn (LossFnType, default:
'cross_entropy') - topk (int, default:
20) – SDFT-specific - reverse (bool, default:
False) - demo_template (str, default:
DEFAULT_DEMO_TEMPLATE) - system_prompt (str | None, default:
None) - teacher_sync_every (int | None, default:
None) - max_context_length (int, default:
32768) - evaluator_builders (list[SamplingClientEvaluatorBuilder], default:
[]) – Evaluation - eval_every (int, default:
20) - save_every (int, default:
20) - num_substeps (int, default:
1) – Standard infra - log_path (str)
- wandb_project (str | None, default:
None) - wandb_name (str | None, default:
None) - load_checkpoint_path (str | None, default:
None) - max_steps (int | None, default:
None) - enable_trace (bool, default:
False) - span_chart_every (int, default:
0)