tinker_cookbook.preference.Config
class tinker_cookbook.preference.Config()
Configuration for Direct Preference Optimization (DPO) training.
This is a chz dataclass that holds all hyperparameters, infrastructure
settings, and checkpointing options for a DPO training run.
config = Config(
log_path="~/logs/dpo_run",
model_name="meta-llama/Llama-3.1-8B-Instruct",
dataset_builder=my_dpo_dataset_builder,
dpo_beta=0.1,
learning_rate=1e-5,
)
main(config)
Fields:
- log_path (str)
- model_name (str)
- recipe_name (str)
- dataset_builder (ChatDatasetBuilder)
- load_checkpoint_path (str | None, default:
None) - renderer_name (str | None, default:
None) - learning_rate (float, default:
1e-05) - lr_schedule (LRSchedule, default:
'linear') - num_epochs (int, default:
1) - dpo_beta (float, default:
0.1) - lora_rank (int, default:
32) - num_replicas (int, default:
8) - base_url (str | None, default:
None) - evaluator_builders (list[EvaluatorBuilder], default:
[]) – Checkpointing and evaluation (0 = disabled for *_every fields) - infrequent_evaluator_builders (list[EvaluatorBuilder], default:
[]) - save_every (int, default:
20) - eval_every (int, default:
10) - infrequent_eval_every (int, default:
100) - ttl_seconds (int | None, default:
604800) – 7 days - rolling_save_every (int, default:
0) – but skips the sampler-weight export, making it cheaper than periodic checkpoints. - rolling_ttl_seconds (int, default:
7200) – 2 hours - adam_beta1 (float, default:
0.9) - adam_beta2 (float, default:
0.95) - adam_eps (float, default:
1e-08) - wandb_project (str | None, default:
None) - wandb_name (str | None, default:
None) - enable_trace (bool, default:
False) – Profiling - span_chart_every (int, default:
0) - reference_model_name (str | None, default:
None) - max_steps (int | None, default:
None) – Maximum number of training steps. If None, train for num_epochs * n_batches.