§ vn-bench · 2026public aggregation · updated 2026-04

Vietnamese language models — the leaderboard.

An aggregation of large models evaluated on Vietnamese. Numbers pulled from public leaderboards. Will expand with applied tasks curated by NRL.

Today, VN-Bench aggregates VMLU. The next version will add real-world tasks: contract extraction, official-document parsing, diacritic-aware OCR, EN/VN code-switching.

view on vmlu.ai →submit a model

§ 01 · methodology

Where do these numbers come from?

VN-Bench v0 doesn't run evaluations itself. This page aggregates public numbers from sources trusted by the Vietnamese research community — primarily the VMLU leaderboard maintained by Zalo AI and JAIST, plus results published in the VLSP track and academic papers.

Every row links back to its source. Numbers reflect data as of 2026-04-25. New models are added within a few weeks of public release.

VN-Bench v1 (in development alongside the Nôm toolkit) will add applied tasks that VMLU does not cover: document extraction, legal QA, diacritic-aware OCR, code-switching. The goal is a measure that tracks the work Vietnamese AI teams actually ship.

§ 02 · vmlu leaderboard

VMLU — top 15 models.

VMLU is a multitask suite of 10,880 multiple-choice questions across 58 subjects (STEM, humanities, social sciences, general knowledge). Zero-shot evaluation. Two tracks: from-scratch and fine-tuned models.

Numbers fromVMLU Leaderboard (Zalo AI / JAIST) ↗· Snapshot: 2026-04-25

All scores below belong to the authors of the VMLU benchmark. This page only aggregates and links back to the source.

── Fine-tuned ── 10 models

Rank	Model	Organization	Base	Avg score	Track
1	axis-sovereign	International AXIS	—	85.75	fine-tuned
2	V-LLM v1	VinSmart Future	—	85.11	fine-tuned
3	MISA-AI-1.0	MISA JSC	Qwen3	81.26	fine-tuned
4	Vi-Sovereign-Medium	NLP-CORE-Lab	Qwen3-32B	80.57	fine-tuned
5	VNPTAI.IO-Medium-R1.2	VNPT AI	—	79.61	fine-tuned
6	BnK-AI-Medium-v2.1	BnK Solution	—	78.84	fine-tuned
7	Cake-Mochi	BeFinancial	Qwen3-32B	77.64	fine-tuned
8	VNPTAI.IO-Medium-R1	VNPT AI	—	77.43	fine-tuned
9	MISA-Llama3-v1.1	MISA JSC	Llama-3	76.87	fine-tuned
10	BnK-AI-Medium-v2	BnK Solution	—	76.66	fine-tuned

── From-scratch ── 5 models

Rank	Model	Organization	Base	Avg score	Track
—	QwQ-32B	Alibaba Cloud	from scratch	76.13	from-scratch
—	Qwen2.5-72B-Instruct-AWQ	Alibaba Cloud	from scratch	69.17	from-scratch
—	Llama-3-70B	Meta	from scratch	66.44	from-scratch
—	KiLM-13b-v24.7.1	Kiki AI / Zalo	from scratch	66.07	from-scratch
—	GPT-4	OpenAI	from scratch	65.53	from-scratch

Rank applies within a single track only. Fine-tuned models tend to score higher because they're tuned to VMLU's format. To compare base capability, look at the from-scratch column.

Source: VMLU Leaderboard (snapshot 2026-04) →

§ 03 · other benchmarks

Beyond VMLU.

VMLU leans academic-MCQ. The Vietnamese community has many other benchmarks for different task shapes. Here's the short list — pick whatever matches your workload.

VLSP — Vietnamese Language and Speech Processing
VLSP Association
Annual workshop with multiple tracks: LLM, ASR, MT, semantic parsing, legal QA.
visit →
ViLLM-Eval
Academic (arXiv 2404.11086)
Comprehensive eval suite: general knowledge, reading comprehension, reasoning, conversation.
visit →
VLegal-Bench
Academic (arXiv 2512.14554)
Vietnamese legal reasoning: article prediction, summarization, citation.
visit →
VLSP 2025 LegalSLM
VLSP Association
Small language models specialized for Vietnamese legal tasks.
visit →
VLSP 2025 MLQA-TSR
VLSP Association
Vietnamese multimodal legal QA on traffic-sign regulation.
visit →

§ 04 · vn-bench v1 · contributed by nrl

Applied tasks — in development.

VMLU measures academic knowledge. VN-Bench v1 measures real work. Below are the tasks we're curating with the community. Submissions open after v1 release.

extractionin development

Contract extraction

Given a contract PDF, extract fields: contract number, signing date, parties, total value, penalty clauses. Scored by F1 on field accuracy.

parsingin development

Official-document parsing

Document number, issue date, issuing body, key content. Scored by exact match.

ocrin development

Scan OCR → JSON

Vietnamese-language scanned document → structured JSON. Scored on character accuracy + field accuracy.

tonein development

Tone-mark preservation

Generation task in Vietnamese — measures diacritic accuracy across long passages.

code-switchin development

EN/VN code-switching

Natural mixed-language dialogue — does the model understand and respond contextually.

legalpartner

Legal QA

Borrowed from VLegal-Bench — we don't duplicate, we link so the community shares one source.

§ 05 · submit

Building a Vietnamese language model?

When VN-Bench v1 launches, we'll open submissions. In the meantime, register for updates, propose tasks, or contribute eval data.

§ 06 · citations

Full source list.

This page does not run evaluations itself. All numbers come from the public sources below. Authors of the underlying benchmarks retain full credit for the model scores. If you are an author and would like a correction or removal, please contact [email protected].

[1]
VMLU: A Benchmark for Vietnamese Multitask Language UnderstandingZalo AI Research · Japan Advanced Institute of Science and Technology (JAIST)Public leaderboard · continuously updated · https://vmlu.ai/leaderboard
[2]
VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMsZalo AI / JAIST authorsACL 2025 · https://aclanthology.org/2025.acl-long.563/
[3]
VLSP — Vietnamese Language and Speech ProcessingVLSP AssociationAnnual workshop · https://vlsp.org.vn
[4]
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models—arXiv 2404.11086 · https://arxiv.org/abs/2404.11086
[5]
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning—arXiv 2512.14554 · https://arxiv.org/abs/2512.14554
[6]
VLSP 2025 Challenge on Vietnamese Legal Small Language Models (LegalSLM)VLSP AssociationVLSP 2025 · https://vlsp.org.vn/vlsp2025/eval/legalSLM
[7]
VLSP 2025 MLQA-TSR: Vietnamese Multimodal Legal Question Answering on Traffic Sign RegulationVLSP 2025 ParticipantsarXiv 2510.20381 · https://arxiv.org/abs/2510.20381
[8]
VinaLLaMA: LLaMA-based Vietnamese Foundation ModelVilmarXiv 2312.11011 · https://arxiv.org/abs/2312.11011
[9]
PhoGPT: Generative Pre-training for VietnameseVinAI ResearcharXiv 2311.02945 · https://arxiv.org/abs/2311.02945

← nrl