§ vn-bench · 2026public aggregation · updated 2026-04

Vietnamese language models — the leaderboard.

An aggregation of large models evaluated on Vietnamese. Numbers pulled from public leaderboards. Will expand with applied tasks curated by NRL.

Today, VN-Bench aggregates VMLU. The next version will add real-world tasks: contract extraction, official-document parsing, diacritic-aware OCR, EN/VN code-switching.

§ 01 · methodology

Where do these numbers come from?

VN-Bench v0 doesn't run evaluations itself. This page aggregates public numbers from sources trusted by the Vietnamese research community — primarily the VMLU leaderboard maintained by Zalo AI and JAIST, plus results published in the VLSP track and academic papers.

Every row links back to its source. Numbers reflect data as of 2026-04-25. New models are added within a few weeks of public release.

VN-Bench v1 (in development alongside the Nôm toolkit) will add applied tasks that VMLU does not cover: document extraction, legal QA, diacritic-aware OCR, code-switching. The goal is a measure that tracks the work Vietnamese AI teams actually ship.

§ 02 · vmlu leaderboard

VMLU — top 15 models.

VMLU is a multitask suite of 10,880 multiple-choice questions across 58 subjects (STEM, humanities, social sciences, general knowledge). Zero-shot evaluation. Two tracks: from-scratch and fine-tuned models.

Numbers fromVMLU Leaderboard (Zalo AI / JAIST)· Snapshot: 2026-04-25

All scores below belong to the authors of the VMLU benchmark. This page only aggregates and links back to the source.

── Fine-tuned ── 10 models
RankModelOrganizationBaseAvg scoreTrack
1axis-sovereignInternational AXIS85.75fine-tuned
2V-LLM v1VinSmart Future85.11fine-tuned
3MISA-AI-1.0MISA JSCQwen381.26fine-tuned
4Vi-Sovereign-MediumNLP-CORE-LabQwen3-32B80.57fine-tuned
5VNPTAI.IO-Medium-R1.2VNPT AI79.61fine-tuned
6BnK-AI-Medium-v2.1BnK Solution78.84fine-tuned
7Cake-MochiBeFinancialQwen3-32B77.64fine-tuned
8VNPTAI.IO-Medium-R1VNPT AI77.43fine-tuned
9MISA-Llama3-v1.1MISA JSCLlama-376.87fine-tuned
10BnK-AI-Medium-v2BnK Solution76.66fine-tuned
── From-scratch ── 5 models
RankModelOrganizationBaseAvg scoreTrack
QwQ-32BAlibaba Cloudfrom scratch76.13from-scratch
Qwen2.5-72B-Instruct-AWQAlibaba Cloudfrom scratch69.17from-scratch
Llama-3-70BMetafrom scratch66.44from-scratch
KiLM-13b-v24.7.1Kiki AI / Zalofrom scratch66.07from-scratch
GPT-4OpenAIfrom scratch65.53from-scratch

Rank applies within a single track only. Fine-tuned models tend to score higher because they're tuned to VMLU's format. To compare base capability, look at the from-scratch column.

Source: VMLU Leaderboard (snapshot 2026-04)
§ 03 · other benchmarks

Beyond VMLU.

VMLU leans academic-MCQ. The Vietnamese community has many other benchmarks for different task shapes. Here's the short list — pick whatever matches your workload.

  • VLSP — Vietnamese Language and Speech Processing

    VLSP Association

    Annual workshop with multiple tracks: LLM, ASR, MT, semantic parsing, legal QA.

    visit →
  • ViLLM-Eval

    Academic (arXiv 2404.11086)

    Comprehensive eval suite: general knowledge, reading comprehension, reasoning, conversation.

    visit →
  • VLegal-Bench

    Academic (arXiv 2512.14554)

    Vietnamese legal reasoning: article prediction, summarization, citation.

    visit →
  • VLSP 2025 LegalSLM

    VLSP Association

    Small language models specialized for Vietnamese legal tasks.

    visit →
  • VLSP 2025 MLQA-TSR

    VLSP Association

    Vietnamese multimodal legal QA on traffic-sign regulation.

    visit →
§ 04 · vn-bench v1 · contributed by nrl

Applied tasks — in development.

VMLU measures academic knowledge. VN-Bench v1 measures real work. Below are the tasks we're curating with the community. Submissions open after v1 release.

extractionin development

Contract extraction

Given a contract PDF, extract fields: contract number, signing date, parties, total value, penalty clauses. Scored by F1 on field accuracy.

parsingin development

Official-document parsing

Document number, issue date, issuing body, key content. Scored by exact match.

ocrin development

Scan OCR → JSON

Vietnamese-language scanned document → structured JSON. Scored on character accuracy + field accuracy.

tonein development

Tone-mark preservation

Generation task in Vietnamese — measures diacritic accuracy across long passages.

code-switchin development

EN/VN code-switching

Natural mixed-language dialogue — does the model understand and respond contextually.

legalpartner

Legal QA

Borrowed from VLegal-Bench — we don't duplicate, we link so the community shares one source.

§ 05 · submit

Building a Vietnamese language model?

When VN-Bench v1 launches, we'll open submissions. In the meantime, register for updates, propose tasks, or contribute eval data.

§ 06 · citations

Full source list.

This page does not run evaluations itself. All numbers come from the public sources below. Authors of the underlying benchmarks retain full credit for the model scores. If you are an author and would like a correction or removal, please contact [email protected].

  1. [1]
    VMLU: A Benchmark for Vietnamese Multitask Language UnderstandingZalo AI Research · Japan Advanced Institute of Science and Technology (JAIST)Public leaderboard · continuously updated · https://vmlu.ai/leaderboard
  2. [2]
    VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMsZalo AI / JAIST authorsACL 2025 · https://aclanthology.org/2025.acl-long.563/
  3. [3]
    VLSP — Vietnamese Language and Speech ProcessingVLSP AssociationAnnual workshop · https://vlsp.org.vn
  4. [4]
    ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language ModelsarXiv 2404.11086 · https://arxiv.org/abs/2404.11086
  5. [5]
    VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal ReasoningarXiv 2512.14554 · https://arxiv.org/abs/2512.14554
  6. [6]
    VLSP 2025 Challenge on Vietnamese Legal Small Language Models (LegalSLM)VLSP AssociationVLSP 2025 · https://vlsp.org.vn/vlsp2025/eval/legalSLM
  7. [7]
    VLSP 2025 MLQA-TSR: Vietnamese Multimodal Legal Question Answering on Traffic Sign RegulationVLSP 2025 ParticipantsarXiv 2510.20381 · https://arxiv.org/abs/2510.20381
  8. [8]
    VinaLLaMA: LLaMA-based Vietnamese Foundation ModelVilmarXiv 2312.11011 · https://arxiv.org/abs/2312.11011
  9. [9]
    PhoGPT: Generative Pre-training for VietnameseVinAI ResearcharXiv 2311.02945 · https://arxiv.org/abs/2311.02945