Pipeline PelatihanTraining Pipeline
Fase PelatihanTraining Phase
SELESAICOMPLETEMulti-Phase Training — AktifMulti-Phase Training — Active
Pre-training: 34,027 steps. SFT v1: 4,844 steps. DPO v1: 1,677 steps. SFT MAX: 4,000 steps (loss 22.86). DPO MAX: 1,300 steps (loss 5.32→0.083). Final checkpoint: dpo_step_1000.pt (2.8 GiB). GRPO: berjalan di akun3 (wiwitmikael), Step 50+/2000, A10G.Pre-training: 34,027 steps. SFT v1: 4,844 steps. DPO v1: 1,677 steps. SFT MAX: 4,000 steps (loss 22.86). DPO MAX: 1,300 steps (loss 5.32→0.083). Final checkpoint: dpo_step_1000.pt (2.8 GiB). GRPO: running on akun3 (wiwitmikael), Step 50+/2000, A10G.
- aqi_content_qualityLULUSPASS
- smar_identity_qualityLULUSPASS
- validate_no_hallucinationLULUSPASS
- validate_dpo_maxLULUSPASS
- pytestLULUSPASS
- py_compileLULUSPASS
Tier PelatihanTraining Tiers
Base Model
AktifActive
RAG Growth
Dalam PengembanganIn Development
Online DPO
Dalam PengembanganIn Development
Self-Play
Dalam PengembanganIn Development
DatasetDatasets
0
Training Rows
0
Eval Set
ID=1183, EN=317, AR=66
0
DPO Pairs
Teacher critiques + upgrade
0
Reasoning Traces
Target: 500+ RL Distilled