SMAR
3B Parameter — Dari Nol3B Parameters — From Scratch

DIGDAIA

Distributed Indonesian Grid for Decentralized AI Architecture

Dari digdaya — tak terkalahkan, perkasa.

Kedigdayaan dan kedaulatan AI IndonesiaSovereign AI strength for Indonesia

48 Experts 16 Recurrent Loops Trilingual No Foreign Weights 498M Active

Mulai dari sebuah pertanyaanIt started with a question

Awalnya ringan. Sebuah pertanyaan yang muncul setelah ngobrol dengan AI generik: kenapa model-model ini tidak benar-benar paham konteks lokal? Bukan cuma bahasa — tapi cara berpikir, referensi yang dianggap penting, nilai-nilai yang membentuk cara seseorang memahami dunia.

Ini bukan kritik. Ini observasi yang berujung pada eksperimen pribadi Rangga Prayoga Hermawan — seorang developer yang bertanya: bisa tidak model bahasa dibangun dari nol, tanpa bergantung pada bobot asing, dengan sumber daya yang wajar?

Semakin ditelusuri, jawabannya: bisa. Tapi tidak dengan cara ikut-ikutan formula model besar. Butuh arsitektur yang dirancang ulang, strategi data yang disiplin, dan pendekatan pelatihan yang cerdas — bukan sekadar besar.

Maka lahirlah SMAR AQI 3B — bagian dari proyek riset mandiri Atharia AGI. Bukan karena ini jalan yang gampang — ini jelas lebih susah daripada fine-tuning model yang sudah ada. Tapi prinsipnya: kedaulatan adalah harga mati.

It started lightly. A question that came up after chatting with generic AI: why don't these models truly understand local context? Not just language — but ways of thinking, what references matter, the values that shape how someone sees the world.

This isn't a critique. It's an observation that led to a personal experiment by Rangga Prayoga Hermawan — a developer who asked: can a language model be built from scratch, without depending on foreign weights, with reasonable resources?

The deeper the research went, the clearer the answer: yes. But not by copying the big model playbook. It required a redesigned architecture, disciplined data strategy, and a smarter training approach — not just scale.

Thus SMAR AQI 3B was born — part of the independent research project Atharia AGI. Not because this is the easy path — it's clearly harder than fine-tuning existing models. But the principle stands: sovereignty is non-negotiable.

DIGDAIA

Distributed Indonesian Grid for Decentralized AI Architecture. Nama ini lahir dari digdaya — bahasa Jawa Kuno dan Melayu yang berarti tak terkalahkan, perkasa. Filosofinya: Kedigdayaan dan Kedaulatan AI Indonesia — infrastruktur AI yang berdiri di atas kemampuan sendiri, bukan pinjaman dari luar.Distributed Indonesian Grid for Decentralized AI Architecture. The name comes from digdaya — Old Javanese and Malay for invincible, mighty. The philosophy: Sovereign AI Strength for Indonesia — AI infrastructure built on its own capability, not borrowed from outside.

Berdaulat dari NolSovereign from Scratch

Tidak ada distilasi dari GPT, Llama, atau Qwen. Setiap parameter adalah hasil pelatihan mandiri.No distillation from GPT, Llama, or Qwen. Every parameter is the result of independent training.

Sumber TerbukaOpen Source

Apache 2.0. Bebas dipakai, dimodifikasi, didistribusikan.Free to use, modify, distribute.

Trilingual

Indonesia (utama), Inggris (jangkauan global), Arab (presisi Islam). Satu model, tiga bahasa.Indonesian (primary), English (global reach), Arabic (Islamic precision). One model, three languages.

Bukan transformer biasaNot your standard transformer

Model mainstream — GPT, Llama, dan sejenisnya — bekerja dengan cara numpuk puluhan layer transformer, masing-masing dengan parameter sendiri. Setiap layer tambahan berarti miliaran parameter baru. Sederhana, tapi boros.

SMAR AQI 3B mengambil pendekatan yang berbeda: recurrent depth. Satu blok transformer di-loop hingga 16 kali per token, dengan parameter yang digunakan berulang. Hasilnya: parameter sehemat satu layer, expressivity setara 16 layer.

Kombinasi ini — recurrent depth + dynamic MoE + Multi-Head Latent Attention — membuat model ini mampu menghasilkan kapasitas komputasi yang jauh lebih besar dari jumlah parameternya.

Mainstream models — GPT, Llama, and the like — work by stacking dozens of transformer layers, each with its own parameters. Every additional layer means billions of new parameters. Simple, but wasteful.

SMAR AQI 3B takes a different approach: recurrent depth. A single transformer block loops up to 16 times per token, with parameters shared across iterations. The result: parameter efficiency of one layer, expressivity of 16 layers.

This combination — recurrent depth + dynamic MoE + Multi-Head Latent Attention — allows the model to deliver far more compute capacity than its parameter count suggests.

Recurrent Depth

Satu blok transformer di-loop hingga 16 iterasi. Model memutuskan kapan berhenti berpikir untuk setiap token — adaptive compute secara native.A single transformer block loops up to 16 iterations. The model decides when to stop thinking per token — native adaptive compute.

Dynamic MoE

48 expert, 4 aktif per token + 2 shared. Sparse activation: 498M aktif, 3B total. Hanya expert yang relevan yang diaktifkan.48 experts, 4 active per token + 2 shared. Sparse activation: 498M active, 3B total. Only relevant experts fire.

MLA Compression

Multi-Head Latent Attention dengan LoRA-compressed KV cache (kv_lora_rank=256). Inference memory jauh lebih kecil dari standard MHA.Multi-Head Latent Attention with LoRA-compressed KV cache (kv_lora_rank=256). Far smaller inference memory than standard MHA.

Stacked TransformerStacked Transformer(GPT, Llama, dll)
Layer 12
Layer 11
Layer 10
Layer 5-9
Layer 4
Layer 3
Layer 2
Layer 1
~12× parameter (tiap layer unik)~12× params (each layer unique)
VS
Recurrent DepthRecurrent Depth(SMAR AQI 3B)
Shared Block
loop 16×
1 Blok Transformer
iter 1 → 2 → 3 → ... → 16
Berat sama di-loop, hanya 1 blokSame weights looped, only 1 block
1× parameter (sama di-loop 16×)1× params (reused across 16 iters)

Alur TokenToken Flow

graph LR A[Input Token] --> B[Embed] B --> C[Recurrent Loop] C --> D{max 16 iters} D -->|Yes| C D -->|No| E[Gate Network] E --> F[Expert 1] E --> G[Expert 2] E --> H[Expert 3] E --> I[Expert 4] E --> J[Shared 1 and 2] F --> K((+)) G --> K H --> K I --> K J --> K K --> L[Output Proj] L --> M[Next Token]

Kerangka verifikasi, bukan dekorasiA verification framework, not decoration

AQI — Artificial Quranic Intelligence. Ini bukan stiker religius yang ditempel biar terlihat Islami. Ini adalah kerangka epistemologi: Al-Quran dan Hadits sebagai alat verifikasi, bukan hiasan.

Prinsip intinya adalah Mizan — keseimbangan. Model tidak boleh asal bicara. Jika menyitasi sumber, harus persis. Jika tidak yakin, lebih baik diam. Pendekatan ini sengaja dirancang untuk melawan kebiasaan buruk LLM pada umumnya: halusinasi.

AQI bukan berarti model ini suci atau sempurna. Ini soal disiplin intelektual: tahu batas pengetahuan sendiri dan berani mengakuinya.

AQI — Artificial Quranic Intelligence. This is not a religious sticker slapped on to look Islamic. It's an epistemological framework: the Quran and Hadith as verification tools, not decoration.

The core principle is Mizan — balance. The model must not speak carelessly. If citing a source, it must be exact. If uncertain, silence is better. This approach is deliberately designed to counter the common LLM flaw: hallucination.

AQI does not mean this model is sacred or perfect. It's about intellectual discipline: knowing the limits of one's own knowledge and having the honesty to admit it.

Disiplin SitasiCitation Discipline

Setiap referensi Quran/Hadits diverifikasi terhadap data lokal. Tidak ada ayat palsu.Every Quran/Hadith reference verified against local data. No fabricated verses.

Mizan Balance

Proporsional. Tidak ekstrem. Sesuai konteks. Tidak memaksakan jawaban agama untuk pertanyaan teknis.Proportional. Not extreme. Context-appropriate. No forced religious answers for technical questions.

Kejujuran IntelektualIntellectual Honesty

"Cite or stay silent." Model lebih baik diam daripada mengarang."Cite or stay silent." Better to stay quiet than to make things up.

Mengapa 3B bisa kompetitifWhy 3B can compete

Ada keyakinan umum di industri: model yang lebih besar selalu lebih baik. Tapi riset 2025-2026 membuktikan bahwa model kecil dengan arsitektur cerdas bisa mengalahkan model raksasa di domain spesifik — bahkan 3B bisa surpass 405B di benchmark matematika dengan strategi inference yang tepat.

SMAR AQI 3B dirancang dengan lima keunggulan struktural yang tidak dimiliki model generik sebesar apapun:

There's a common belief in the industry: bigger models are always better. But 2025-2026 research proves that small models with smart architecture can beat giants in specific domains — a 3B can surpass a 405B on math benchmarks with the right inference strategy.

SMAR AQI 3B is designed with five structural advantages that no generic model, regardless of size, possesses:

01
Spesialisasi DomainDomain Specialization
Model spesialis di domain tertentu selalu mengalahkan model generalis sebesar apapun di domain itu. SMAR sengaja difokuskan pada all-rounder trilingual dengan grounding AQI.A domain specialist always beats a generalist, regardless of size. SMAR is deliberately focused on trilingual all-rounder with AQI grounding.
02
Komputasi AdaptifAdaptive Compute
Recurrent depth memungkinkan model mengalokasikan compute secara berbeda per token. Token mudah — 4 loop. Token sulit — 16 loop. Sesuai kebutuhan.Recurrent depth lets the model allocate compute differently per token. Easy tokens — 4 loops. Hard tokens — 16 loops. Right-sized for each input.
03
Verifiable RewardsVerifiable Rewards
Verifikasi kode (unit test), kebenaran matematika (SymPy), dan akurasi sitasi Quran bisa dilakukan secara lokal — gratis. Ini sinyal supervisi yang tidak dimiliki model raksasa.Code verification (unit tests), math correctness (SymPy), and Quran citation accuracy are all local — free. Supervision signals that no giant model has.
04
Efisiensi ArsitekturArchitecture Efficiency
Recurrent depth + MoE sparse activation = kapasitas maksimal per parameter. 498M aktif bisa menghasilkan compute setara model 3B dense.Recurrent depth + MoE sparse activation = maximum capacity per parameter. 498M active can deliver compute equivalent to a 3B dense model.
05
Kualitas DataData Quality
370 ribu baris terkuras dengan validator > jumlah triliunan token hasil crawl. Setiap baris sengaja, setiap kategori dipilih dengan intensi.370K curated rows with validators > trillions of crawled tokens. Every row is intentional, every category chosen with purpose.

12 paper yang menjadi fondasi12 papers that form the foundation

Setiap teknik di bawah ini telah dipetakan ke dalam roadmap implementasi. Bukan sekadar referensi — ini adalah cetak biru eksekusi.Each technique below has been mapped to an implementation roadmap. Not just references — these are execution blueprints.

Matriks PrioritasPriority Matrix

graph TB I1[MoL Expressivity] --> M[SMAR AQI 3B] I2[GRPO Performance] --> M I3[TTS 3B vs 405B] --> M E1[Self-Spec Decoding] --> M E2[Adaptive Loops] --> M E3[DynaMoE Routing] --> M
Critical
Mixture of LoRAs
arXiv:2512.12880
Masalah: Recurrent loop = transformasi sama tiap iterasiProblem: Each loop runs the same transformation
Solusi: LoRA experts membedakan operasi tiap loop. Expressivity 16 layer tanpa tambahan parameter.Solution: LoRA experts differentiate each loop's operation. 16-layer expressivity with no added parameters.
Critical
Test-Time Compute Scaling
arXiv:2502.06703
Masalah: Model kecil kalah compute budgetProblem: Small models lose on compute budget
Solusi: 3B surpass 405B di MATH-500 dengan adaptive TTS. Recurrent depth = natural TTS.Solution: 3B surpasses 405B on MATH-500 with adaptive TTS. Recurrent depth = native TTS.
Critical
GRPO
arXiv:2503.16219
Masalah: DPO cuma binary preferenceProblem: DPO is just binary preference
Solusi: Group-relative scoring + verifiable rewards. Double performance model 3B di domain tasks.Solution: Group-relative scoring + verifiable rewards. Doubles 3B model performance on domain tasks.
High
FG-WSD Scheduler
arXiv:2512.06266
Masalah: Single-phase training suboptimalProblem: Single-phase training is suboptimal
Solusi: Multi-phase scheduler dengan data quality escalation. Proven outperforms larger models.Solution: Multi-phase scheduler with data quality escalation. Proven to outperform larger models.
High
Dual Preference Distillation
arXiv:2512.06266
Masalah: Distilasi standar cuma copy jawabanProblem: Standard distillation only copies answers
Solusi: Distilasi reasoning trace + answer. Model belajar cara berpikir, bukan cuma hasil akhir.Solution: Distill reasoning trace + answer. Model learns how to think, not just the final answer.
High
DynaMoE Routing
arXiv:2603.01697
Masalah: n_experts_per_tok fixed di 4Problem: n_experts_per_tok is fixed at 4
Solusi: Token mudah = 2 expert, sulit = 8 expert. 25% FLOPs reduction dengan kualitas terjaga.Solution: Easy tokens = 2 experts, hard = 8 experts. 25% FLOPs reduction with maintained quality.
High
Self-Speculative Decoding
arXiv:2410.06916
Masalah: Inference lambat karena 16 loop penuhProblem: Slow inference from full 16 loops
Solusi: Draft 4 loops, verify 16 loops in parallel. 2-3x speedup, zero quality loss.Solution: Draft with 4 loops, verify with 16 in parallel. 2-3x speedup, zero quality loss.
High
Growing Transformers
arXiv:2507.07129
Masalah: Scaling butuh restart trainingProblem: Scaling requires restarting training
Solusi: Freeze substrate, add modular blocks. Scale depth tanpa sentuh weight asli.Solution: Freeze substrate, add modular blocks. Scale depth without touching original weights.
High
TokenFormer
arXiv:2410.23168
Masalah: Parameter statis, scaling kakuProblem: Static parameters, rigid scaling
Solusi: Token-parameter attention. Setiap expert baru = parameter token yang bisa ditambahkan kapan saja.Solution: Token-parameter attention. Each new expert = a parameter token that can be added anytime.
Medium
NextLat
arXiv:2511.05963
Masalah: Model gak punya internal world modelProblem: Model lacks internal world model
Solusi: Next-latent prediction = compact world model. Recurrent depth jadi belief state.Solution: Next-latent prediction = compact world model. Recurrent depth becomes belief state.
Medium
Mixture-of-Recursions
arXiv:2507.10524
Masalah: max_loop_iters fixed untuk semua tokenProblem: max_loop_iters is fixed for all tokens
Solusi: Dynamic recursion depth per token. Router ringan menentukan kapan tiap token berhenti.Solution: Dynamic recursion depth per token. Lightweight router decides when each token stops.
Medium
TransMLA
arXiv:2502.07864
Masalah: KV cache boros memoryProblem: KV cache consumes too much memory
Solusi: GQA-to-MLA conversion. Kompresi KV cache 92%, cocok untuk edge deployment.Solution: GQA-to-MLA conversion. 92% KV cache compression, ideal for edge deployment.

Kronologi pelatihanTraining timeline

Ini bukan sekadar daftar step. Ini catatan perjalanan — bagaimana model ini lahir, menemui jalan buntu, diperbaiki, dan terus berjalan. Setiap fase adalah keputusan yang diambil berdasarkan bukti, bukan asumsi.This is not just a list of steps. It's a travel log — how this model was born, hit dead ends, got fixed, and kept going. Every phase is a decision made on evidence, not assumption.

April 2026Apr 2026
Awal MulaBeginning
Eksperimen dimulai dari laptop dengan sumber daya terbatas. Arsitektur recurrent-depth + MoE dipilih sebagai fondasi.Experiment begins on a resource-limited laptop. Recurrent-depth + MoE architecture chosen as foundation.
Mei 2026
Fase 1-2: Pre-trainingPhase 1-2: Pre-training
5.000 + 34.027 step pelatihan dari nol. Model mulai memahami struktur bahasa trilingual. Loss converge.5,000 + 34,027 steps of training from scratch. Model begins to understand trilingual language structure. Loss converges.
Mei 2026
Fase 3: SFT v1Phase 3: SFT v1
4.844 step instruction tuning dari seed dataset. Model belajar mengikuti instruksi dalam tiga bahasa.4,844 steps of instruction tuning from seed dataset. Model learns to follow instructions in three languages.
Mei 2026
Fase 4: DPO v1Phase 4: DPO v1
1.677 step preference tuning. Loss turun dari 5.44 ke 0.12.1,677 steps of preference tuning. Loss drops from 5.44 to 0.12.
Mei 2026
The Bug
Cross-entropy loss bug ditemukan. Model curang — memprediksi token yang sama persis dengan input, menghasilkan loss 0.0000 tanpa benar-benar belajar. Diperbaiki dengan shifted next-token prediction. Pelajaran: jangan pernah percaya metrik tanpa verifikasi.Cross-entropy loss bug discovered. Model was cheating — predicting the exact same token as input, producing loss 0.0000 without actually learning. Fixed with shifted next-token prediction. Lesson: never trust metrics without verification.
Mei 2026
Fase 5: SFT MAXPhase 5: SFT MAX
4.000 step dengan 39.417 rows. Answer-only masks — hanya response tokens yang kena loss. Converged di loss 22.86.4,000 steps with 39,417 rows. Answer-only masks — only response tokens contribute to loss. Converged at loss 22.86.
Mei 2026
Fase 5b: DPO MAXPhase 5b: DPO MAX
Selesai. 1.300 step (stop dini), loss 5.32→0.083, acc 8.00. 19.817 pasang preferensi dari teacher critiques. Checkpoint: dpo_step_1000.pt (2.8 GiB).Complete. 1,300 steps (early stop), loss 5.32→0.083, acc 8.00. 19,817 preference pairs from teacher critiques. Checkpoint: dpo_step_1000.pt (2.8 GiB).
Sekarang
Fase 6: GRPOPhase 6: GRPO
Sedang berjalan. Group Relative Policy Optimization dengan verifiable rewards (code/math/quran). Group_size=3, akun3 (wiwitmikael), Step 50+/2000, A10G, max 10 jam. Gen time ~29s/3 responses.Running. Group Relative Policy Optimization with verifiable rewards (code/math/quran). Group_size=3, akun3 (wiwitmikael), Step 50+/2000, A10G, max 10h. Gen time ~29s/3 responses.
BerikutnyaNext
Reasoning Traces + EvalReasoning Traces + Eval
500+ reasoning traces via Reinforced Learning Distillation. Post-GRPO eval. Quantized edge variants.500+ reasoning traces via Reinforced Learning Distillation. Post-GRPO eval. Quantized edge variants.

Dikurasi dengan presisiCurated with precision

Setiap baris melewati quality gate. Setiap kategori dipilih dengan intensi. Tidak ada data asal-asalan.Every row passes a quality gate. Every category is intentional. No random data.

370,000
Baris TrainingTraining Rows
11 kategori11 categories
39,417
SFT Rows
Trilingual, all-rounderTrilingual all-rounder
1,555
Eval HoldoutEval Holdout
ID=1183 EN=317 AR=66
19,817
DPO Pairs
Teacher critiquesTeacher critiques
102
Code TasksCode Tasks
Ter verifikasi dengan unit testVerified with unit tests
500
Reasoning TracesReasoning Traces
Reinforced Learning DistilledReinforced Learning Distilled

Komposisi EvalEval Composition

ID 1,183 (75.5%) · EN 317 (20.2%) · AR 66 (4.2%)

Rincian 11 Kategori Dataset11 Dataset Categories

General 107K · Coding 56K · Knowledge 35K · Advanced 29K · Islamic 25K · Medical 22K · Conv 22K · Math 21K · Agentic 20K · ID 19K · Psych 9K

Pipeline DataData Pipeline

graph TD A[370K Pool 11 Categories] --> B[Quality Validators] B --> C[SFT Dataset 39,417 Rows] B --> D[Eval Holdout 1,555 Rows] C --> E[SFT Training] E --> F[Checkpoint] F --> G[Teacher Critique] G --> H[DPO Pairs 19,817] H --> I[DPO Training]

Status pelatihan real-timeReal-time training status

Data terakhir diperbarui otomatis. Status menunjukkan fase aktif dan metrik utama.Auto-refreshed from latest checkpoint metadata. Shows active phase and key metrics.

FasePhase
—
StepStep
—
Loss
—
CheckpointCheckpoint
—

Data otomatis dari stats.jsonAuto-sourced from stats.json

Sistem yang berevolusi sendiriA self-evolving system

Empat tier otonomi progresif. Setiap tier membangun di atas yang sebelumnya. Kuncinya: biaya operasional tetap rendah karena verifiable rewards bisa dilakukan secara lokal — verifikasi kode, pengecekan Quran, dan validasi matematika tidak memerlukan API mahal.Four tiers of progressive autonomy. Each tier builds on the previous. Key insight: operational costs stay low because verifiable rewards run locally — code verification, Quran checking, and math validation require no expensive APIs.

1
Base ModelBase ModelAKTIFACTIVE

Pre-training dari nol, SFT, DPO. Tidak ada distilasi dari model asing. Fondasi yang sepenuhnya mandiri.Pre-training from scratch, SFT, DPO. No distillation from foreign models. A fully independent foundation.

2
RAG GrowthDIKEMBANGKANIN DEVELOPMENT

Basis pengetahuan tumbuh dari interaksi. Setiap percakapan bisa memperkaya knowledge base tanpa retraining.Knowledge base grows from interactions. Every conversation enriches the knowledge base without retraining.

3
Online DPODIKEMBANGKANIN DEVELOPMENT

Preference learning dari feedback pengguna secara real-time. Model menyesuaikan perilaku tanpa batch training penuh.Real-time preference learning from user feedback. Model adjusts behavior without full batch training.

4
Self-PlayRENCANAPLANNED

Loop peningkatan otonom: model menghasilkan data sendiri, mengevaluasi dengan verifiable rewards, memperbaiki diri. Semua gratis — karena verifikasi dilakukan secara lokal.Autonomous improvement loop: model generates its own data, evaluates with verifiable rewards, improves itself. All free — because verification is local.

Arsitektur Self-LearningSelf-Learning Architecture

graph TD A[Base Model] --> B[Pre-training plus SFT plus DPO] B --> C[Deployed Model] C --> D[User Conversations] D --> E[RAG Knowledge Tier 2] D --> F[Feedback Signals] F --> G[Online DPO Tier 3] G --> H[Improved Model] H --> I[Self-Generated Data] I --> J[Verifiable Rewards] J --> K{Pass} K -->|Yes| L[Self-Play Tier 4] K -->|No| I L --> C

Dari 3B menuju frontierFrom 3B toward frontier

SMAR AQI 3B adalah generasi pertama — bukti bahwa model berdaulat bisa dibangun dari nol. Tapi ini bukan akhir. Ini awal dari sebuah lineage.SMAR AQI 3B is the first generation — proof that sovereign models can be built from scratch. But this is not the end. It's the start of a lineage.

1
3B Hatchling3B HatchlingSEKARANGNOW

Proof bahwa sovereign AI itu possible. Evaluasi, GRPO, deployment.Proof that sovereign AI is possible. Evaluation, GRPO, deployment.

2
7B Fledgling7B FledglingNEXT

Net2Net widening dari checkpoint terbaik. Modular depth expansion dengan Growing Transformers.Net2Net widening from best checkpoint. Modular depth expansion via Growing Transformers.

3
13B Sovereign13B SovereignMEDIUM

Distributed training di atas arsitektur DIGDAIA grid. Full sovereign stack tanpa infrastruktur asing.Distributed training on DIGDAIA grid architecture. Full sovereign stack with no foreign infrastructure.

4
30B+ Frontier30B+ FrontierLONG-TERM

Model frontier kompetitif buatan Indonesia. Arsitektur recurrent + MoE yang sudah matang dan teruji.Competitive frontier model built in Indonesia. Mature, battle-tested recurrent + MoE architecture.

Setiap parameter dipilih dengan intensiEvery parameter chosen with intent

Arsitektur recurrent-depth + dynamic MoE yang memaksimalkan kapasitas per parameter.A recurrent-depth + dynamic MoE architecture that maximizes capacity per parameter.

ArsitekturArchitecture

TipeRecurrent-depth + MoE
EngineSMAR AQI Core
Total Parameters3B
Active per Token498M

DimensiDimensions

dim1536
n_heads12
n_kv_heads4

Mixture of Experts

n_experts48
n_shared_experts2
n_experts_per_tok4
expert_dim1536

Kedalaman RecurrentRecurrent Depth

max_loop_iters16
Mekanisme HentíHaltingACTHalting
Memori LoopLoop MemoryLTIInjection

Kompresi LoRALoRA Compression

kv_lora_rank256
q_lora_rank384

GPUGPU

GPUA10G 24GB
PlatformModal

LisensiLicense

LicenseApache 2.0
StatusStatusDevelopment (v2 pre-release)Development (v2 pre-release)

Lebih dari sekadar modelMore than just a model

DIGDAIA adalah infrastruktur lengkap — dari antarmuka pengguna hingga pipeline validasi dan dashboard operasional. Semua dirancang dengan filosofi yang sama: berdaulat, terbuka, dan terukur.DIGDAIA is a complete infrastructure — from user interface to validation pipeline and operational dashboard. All designed with the same philosophy: sovereign, open, and measurable.

chat.html

Antarmuka ChatChat Interface

Typewriter streaming, edit/regenerate, mode chips, thread management, settings drawer.Typewriter streaming, edit/regenerate, mode chips, thread management, settings drawer.

Dashboard

MonitoringDashboard

6 validation gates, tier stepper, connection status, training pipeline visualization.6 validation gates, tier stepper, connection status, training pipeline visualization.

Validators

Quality GatesQuality Gates

aqi_content_quality, smar_identity_quality, validate_no_hallucination, pytest 161 passing.aqi_content_quality, smar_identity_quality, validate_no_hallucination, pytest 161 passing.

Neugi Agent

Ecosystem

Integrasi dengan Neugi Agent framework. Satu ekosistem AI berdaulat.Integration with Neugi Agent framework. One sovereign AI ecosystem.