Examples and Guides#

This section provides a high-level overview of the Colab notebooks, scripts, and example directories.

All examples are located in this directory.

Category	Name/Path	Description
Colab Notebook	`qlora_gemma.ipynb`	End-to-end tutorial on fine-tuning (SFT) Gemma 270M model for English-French translation using parameter-efficient LoRA and QLoRA techniques.
	`grpo_gemma.ipynb`	Reinforcement learning tutorial using Group Relative Policy Optimization (GRPO) to train the Gemma 3 1B IT model for math reasoning on the GSM8K benchmark.
	`dpo_gemma.ipynb`	Preference tuning using Direct Preference Optimization (DPO) to tune the Gemma 3 1B-IT model on the GSM8K dataset.
	`logit_distillation.ipynb`	Demonstrates knowledge distillation from a Gemma 7B-IT teacher to a Gemma 2B-IT student for translation task.
Script	`rl/grpo/gsm8k/`	Bash scripts for fine-tuning different models and presets (Gemma, Llama, etc.) on the GSM8K mathematical reasoning task using GRPO.
	`rl/grpo/gsm8k/verl_compatible/`	Bash scripts for GRPO-training on the GSM8K dataset to train with a verl-compatible setup.
	`deepscaler/`	Scripts and notebooks for reproducing the Deepscaler experiment (`train_deepscaler_nb.py`) and math evaluation.
	`sft/mtnt/`	Bash scripts for SFT examples on the MTNT translation task for Gemma, Llama, and Qwen models.
	`model_load/`	Examples for loading Gemma2 and Gemma3 models from safetensors format.
	`agentic/`	Examples and scripts for agentic workflows, with async rollout.