§ nôm · 2026apache 2.0 · v0 coming soon
Nôm
v0 in development

Built for Vietnamese.

Open-source Python toolkit for building Vietnamese AI applications.

Every team building Vietnamese AI re-implements OCR, text utilities, prompts. Nôm packages them as one library. One pip install — you focus on the product.

§ 01 · modules

Three modules. One pip install.

v0 ships with the headline module: document extraction. Other modules ship as community signal demands. Everything runs against your own LLM (OpenAI, Anthropic, or local Ollama).

nom.docApache 2.0
document extractionv0 · coming soon

PDF/scan → structured JSON. Vietnamese OCR + diacritic fixes + layout parsing + schema-driven extraction. Contracts, official docs, ID cards, receipts.

nom.textApache 2.0
Vietnamese text utilitiesv0 · coming soon

NFC normalization, diacritic correction, compound-word splitting, EN/VN code-switching detection. Small but every pipeline needs it.

nom.promptsApache 2.0
battle-tested promptsv1 · roadmap

Prompt library for contracts, official documents, applications, business email. Tested across Qwen3, Llama-3, GPT-4o, Claude. Versioned.

§ 02 · why nôm

A thousand years of Vietnamese script.

Chữ Nôm was the script Vietnamese people used to write their own language for over a thousand years — until the Latin-based Quốc Ngữ replaced it in the 20th century. Truyện Kiều by Nguyễn Du, the foundational work of Vietnamese literature, was written in Nôm. So were Nguyễn Trãi's poems and Hồ Xuân Hương's verse. The script carries a literary tradition older than most living languages have on paper.

Naming a 2026 language model after a 13th-century script isn't nostalgia. It's a thesis: Vietnam has always written its own language with its own instruments. Nôm-LLM is the next instrument in that tradition — open weights, runs locally, doesn't depend on a foreign cloud.

Released under Apache 2.0. Open weights. Public training recipe. Runs on the hardware you already own.

“Nôm is our script, for our language, by our hand.” — the spirit of reviving a script, applied to a model.
§ 03 · measurement

Measured in numbers, not words.

Nôm doesn't replace your model — it teaches the model Vietnamese context through tuned prompts, schemas, and OCR pipelines. To prove that, we're building VN-Bench v1 with applied tasks: contract extraction, official-doc parsing, diacritic-aware OCR, EN/VN code-switching.

no numbers · no estimates · no placeholders

No numbers here yet — v0 hasn't shipped, and we don't publish numbers before we measure them. The methodology and task list are open on VN-Bench. Follow along there.

§ 04 · what people use nôm for

What Nôm does well.

contracts

Internal contract Q&A

Drop 200 contract PDFs onto a company server. Ask: "How many contracts have penalty clauses above 10%?" Get answers with contract numbers and pages. Nothing leaves your network.

official docs

Official document summarization & extraction

Document number, issue date, issuing body, key content. Vietnamese OCR with diacritics — accurate even on faded faxes and old scans.

assistant

Internal assistant on a company server

Deploy on a single GPU box. Plug into internal docs, calendar, ticketing. Security level: 'never leaves the LAN.'

rag

RAG for Vietnamese documents

Tokenizer that understands tone marks, compound words, EN/VN code-switching. High-quality embeddings for Vietnamese — not English run through translation.

§ 05 · api preview

Five lines to extract a contract.

This is the planned API — not yet on PyPI. Ships with v0. Join the waitlist for the first build.

# v0 · coming soon
from nom.doc import extract

result = extract("contract.pdf", schema={
    "contract_number": str,
    "signed_date": "date",
    "party_a": "party",
    "party_b": "party",
    "total_value_vnd": "amount_vnd",
})
# {'contract_number': 'HĐ-2025-002', 'signed_date': '2025-03-14',
#  'party_a': {...}, 'party_b': {...}, 'total_value_vnd': 1_500_000_000}
§ 06 · waitlist & community

Get v0 first. Help shape it.

Nôm releases under Apache 2.0. v0 expected summer 2026. Drop your email for the first build + an invite to early-feedback channels.

Track progress, suggest features, submit tasks for VN-Bench:

bibtex
# After v0 launch:
pip install nom-vn

# To be notified at launch:
curl https://nrl.ai/api/nom/waitlist \
  -d "[email protected]"

# Or follow on GitHub:
# github.com/nrl-ai/nom · star + watch