tinker_cookbook.distillation.sdft.Config

class tinker_cookbook.distillation.sdft.Config()

Configuration for SDFT training.

Key parameters:

topk: Number of top tokens for distillation (default 20). Set to 0 for the importance-sampling fallback. K=20 matches full-vocabulary KL in practice.
learning_rate: For LoRA, use 5e-4 to 1e-3. The top-K CE loss produces larger gradients than SFT at the same LR due to more completion tokens per step (on-policy generation), so use the lower end of the range.
teacher_sync_every: Optional periodic hard-sync of student weights into the teacher (approximating EMA). None = static frozen teacher, which works comparably to EMA in our experiments.

See main for the training loop.

Fields:

model_name (str) – Model
recipe_name (str)
renderer_name (str | None, default: None)
lora_rank (int, default: 128)
base_url (str | None, default: None)
learning_rate (float, default: 2e-05) – Training
max_tokens (int, default: 2048)
temperature (float, default: 1.0)
loss_fn (LossFnType, default: 'cross_entropy')
topk (int, default: 20) – SDFT-specific
reverse (bool, default: False)
demo_template (str, default: DEFAULT_DEMO_TEMPLATE)
system_prompt (str | None, default: None)
teacher_sync_every (int | None, default: None)
max_context_length (int, default: 32768)
evaluator_builders (list[SamplingClientEvaluatorBuilder], default: []) – Evaluation
eval_every (int, default: 20)
save_every (int, default: 20)
num_substeps (int, default: 1) – Standard infra
log_path (str)
wandb_project (str | None, default: None)
wandb_name (str | None, default: None)
load_checkpoint_path (str | None, default: None)
max_steps (int | None, default: None)
enable_trace (bool, default: False)
span_chart_every (int, default: 0)