Pipeline PelatihanTraining Pipeline

MonitoringMonitoring

Fase PelatihanTraining Phase

SELESAICOMPLETE

Multi-Phase Training — AktifMulti-Phase Training — Active

Pre-training: 34,027 steps. SFT v1: 4,844 steps. DPO v1: 1,677 steps. SFT MAX: 4,000 steps (loss 22.86). DPO MAX: 1,300 steps (loss 5.32→0.083). Final checkpoint: dpo_step_1000.pt (2.8 GiB). GRPO: berjalan di akun3 (wiwitmikael), Step 50+/2000, A10G.Pre-training: 34,027 steps. SFT v1: 4,844 steps. DPO v1: 1,677 steps. SFT MAX: 4,000 steps (loss 22.86). DPO MAX: 1,300 steps (loss 5.32→0.083). Final checkpoint: dpo_step_1000.pt (2.8 GiB). GRPO: running on akun3 (wiwitmikael), Step 50+/2000, A10G.

GPU: A10G
Script: train_grpo_modal.py
Runtime: Modal (wiwitmikael)
  • aqi_content_qualityLULUSPASS
  • smar_identity_qualityLULUSPASS
  • validate_no_hallucinationLULUSPASS
  • validate_dpo_maxLULUSPASS
  • pytestLULUSPASS
  • py_compileLULUSPASS
GPU TargetA10G
RuntimeModal
Scriptphase_b.py
Model3B

Tier PelatihanTraining Tiers

1

Base Model

AktifActive

2

RAG Growth

Dalam PengembanganIn Development

3

Online DPO

Dalam PengembanganIn Development

4

Self-Play

Dalam PengembanganIn Development

DatasetDatasets

0

Training Rows

0

Eval Set

ID=1183, EN=317, AR=66

0

DPO Pairs

Teacher critiques + upgrade

0

Reasoning Traces

Target: 500+ RL Distilled

KategoriCategories

Total KategoriTotal Categories11
Data Pool370K
BahasaLanguages3
KurikulumCurriculum1,033

Distribusi EvalEval Distribution

Indonesian1,183
English317
Arabic66
Total1,555

Alur PelatihanTraining Flow

graph LR A[Data Collection] --> B[Quality Filter] B --> C[SFT 15K] B --> D[Eval 1.5K] C --> E[Pre-training] D --> F[Evaluation] E --> G[Checkpoint] F --> H{Pass} H -->|Yes| I[Deploy] H -->|No| E G --> E

Self-Learning Loop

graph TD A[Base Model] -->|Tier 1| B[Pre-training] B --> C[Conversations] C -->|Tier 2| D[RAG Growth] D --> E[Knowledge Base] E --> F[User Feedback] F -->|Tier 3| G[Online DPO] G --> H[Aligned Model] H --> I[Self-Generated Data] I -->|Tier 4| J[Self-Play] J --> A

Distribusi DataData Distribution

pie showData title "Dataset Categories (370K Total)" "General" : 107441 "Coding" : 56608 "Knowledge" : 35106 "Advanced" : 29147 "Islamic" : 25408 "Medical" : 22712 "Conversation" : 22142 "Math" : 21646 "Agentic" : 20508 "Indonesian" : 19436 "Psychology" : 9846