Examples and Guides

Examples and Guides#

This section provides a high-level overview of the Colab notebooks, scripts, and example directories.

All examples are located in this directory.

Category Name/Path Description
Colab Notebook qlora_gemma.ipynb End-to-end tutorial on fine-tuning (SFT) Gemma 270M model for English-French translation using parameter-efficient LoRA and QLoRA techniques.
grpo_gemma.ipynb Reinforcement learning tutorial using Group Relative Policy Optimization (GRPO) to train the Gemma 3 1B IT model for math reasoning on the GSM8K benchmark.
dpo_gemma.ipynb Preference tuning using Direct Preference Optimization (DPO) to tune the Gemma 3 1B-IT model on the GSM8K dataset.
logit_distillation.ipynb Demonstrates knowledge distillation from a Gemma 7B-IT teacher to a Gemma 2B-IT student for translation task.
Script rl/grpo/gsm8k/ Bash scripts for fine-tuning different models and presets (Gemma, Llama, etc.) on the GSM8K mathematical reasoning task using GRPO.
rl/grpo/gsm8k/verl_compatible/ Bash scripts for GRPO-training on the GSM8K dataset to train with a verl-compatible setup.
deepscaler/ Scripts and notebooks for reproducing the Deepscaler experiment (train_deepscaler_nb.py) and math evaluation.
sft/mtnt/ Bash scripts for SFT examples on the MTNT translation task for Gemma, Llama, and Qwen models.
model_load/ Examples for loading Gemma2 and Gemma3 models from safetensors format.
agentic/ Examples and scripts for agentic workflows, with async rollout.