Scaling Hardware Engineering with Vision-Guided AI: The NemoSiliconMind Workflow

Between a chip spec and working silicon, two human steps eat the schedule. Engineers read a 200-page mix of block diagrams, timing waveforms, signal tables, and text, then hand-write RTL against their personal reading of it. Each step is slow on its own. Together they compound: divergent readings produce incompatible RTL that only collides at integration, when it’s most expensive to fix.
NemoSiliconMind is an open-source, training-free framework that takes both steps off the critical path. It pairs NVIDIA’s Nemotron-3 Nano Omni, which reads the multimodal spec, with SiliconMind-V1, a focused coder that writes the Verilog — two off-the-shelf open-weight models, a structured handoff between them, and a difficulty-aware writing loop.
Demo video:
The Document Gap in IC Design
The cost of moving from spec to silicon by hand has two coupled bottlenecks. Every engineer spends hours decoding the same heterogeneous document — diagrams, signal tables, text — and emerges with a slightly different reading; each then hand-codes Verilog against that personal reading, where wrong widths and off-by-one strobes get baked in. A shared interpretation produced once, plus an automated path to RTL, collapses both costs. That’s the lever NemoSiliconMind pulls.
Figure 1 — The Document Gap.
NemoSiliconMind: One Pipeline for Both Bottlenecks
Reading — Nemotron-3 Nano Omni (the planner). The full spec, including diagrams and tables, goes to Omni, which doesn’t generate code. It produces a code-ready summary of the design intent — connectivity, signal widths, and the edge cases the raw text left implicit. When diagram and spec disagree, Omni surfaces the conflict instead of silently picking one — a trust-but-verify behavior that’s rare and valuable in hardware design. Omni also emits a difficulty rating that routes the writing step.
Writing — SiliconMind-V1 (the coder). SiliconMind-V1 takes the structured representation and generates the Verilog. It never touches the raw image — only the planner’s output. With reading already done, the coder stays small and fast.
The contract between them is what makes it training-free. Neither model is jointly fine-tuned — the leverage comes from the handoff format itself.
Figure 2 — End-to-end NemoSiliconMind pipeline.
Difficulty-Driven Task Routing
Nemotron-3 Nano Omni’s difficulty rating selects between two SiliconMind-V1 generation modes:
- Regular mode (easy modules). A standard combinational block, a basic register file, a 3-bit grant arbiter — SiliconMind-V1 generates the RTL in a single forward pass.
- Agentic mode (hard modules). When the planner rates a module as harder, SiliconMind-V1 runs through an internal three-step cycle. No external simulator, no tool calls in the loop:
- Initial implementation. The model writes a first draft of the RTL.
- Self-test. The model derives its own test scenarios and reasons through how the draft would behave on them — entirely cognitive, no code is executed.
- Self-debug. If the self-test surfaced inconsistencies, the model revises the draft.
Extra reasoning only helps when needed — see Figure 4 below.
Results
We evaluated on ChipGPTV — the upstream test harness ships with broken testbenches and missing reference files that report failure even when the design under test is correct, so we shipped a patch that repairs the harness without altering any design problem. We measure pass@1 and tokens per problem.
Functional correctness
NemoSiliconMind — Nemotron-3 Nano Omni planning and SiliconMind-V1 doing the coding — reaches 77.3% Pass@1. The sharpest comparison is Qwen3-VL*: roughly 7× the parameter count of NemoSiliconMind’s combined planner-plus-coder, yet still 7.6 points behind — a structured handoff to a focused coder beats scaling a single direct-prompted model.
Figure 3 — NemoSiliconMind achieves 77.3% Pass@1 on ChipGPTV, edging out models up to 7× larger.
Token efficiency
A uniform-effort SiliconMind-V1 (no planner, no routing) burns 7.60k tokens per problem; NemoSiliconMind’s difficulty-aware routing brings the same problems to 1.84k — 4.1× fewer. Nemotron-3 Nano Omni’s structured representation replaces multi-turn clarification on the reading side; easy problems skip the self-test/self-debug cycle on the writing side.
Figure 4 — Average tokens per problem on the shared easy-task subset.
Quickstart
With Nemotron-3 Nano Omni serving on 4 GPUs and SiliconMind-V1 on the remaining 4 of a single 8-GPU node (both via vLLM), drive the pipeline from Python:
import asyncio
from nemosilicon import run_pipeline
result = asyncio.run(run_pipeline(
user_text="Implement a 3-bit grant arbiter; see diagram.",
image_path="/path/to/diagram.png",
))
print(result.difficulty.label) # "easy" or "hard"
print(result.solve_trace.strategy) # "regular" or "agentic"
print(result.final_code) # generated Verilog
The router picks the strategy; you just hand it a spec and a diagram.
Getting Started & What’s Next
Everything is open, reproducible, and self-hostable — specs never leave your environment, which matters for design houses with strict data-sovereignty requirements.
- NemoSiliconMind framework — GitHub
- SiliconMind-V1 weights — Hugging Face
- SiliconMind-V1 project page — as-siliconmind.github.io
- Nemotron-3 Nano Omni (BF16) — Hugging Face
The release bundles the Python pipeline, vLLM serve scripts, a live-demo UI, and the ChipGPTV patch needed to reproduce the results above. Tested on H200s with CUDA 12.9.
Recognition. SiliconMind-V1 won the 2026 Best AI Awards (智慧創新大賞), will be presented at the 50th IEEE COMPSAC 2026, and runs live at COMPUTEX 2026 as part of Open Source Team Taiwan (Taipei Nangang Hall 2, map).
What’s next.
- Post-training Nemotron-3 Nano Omni for local-language OCR. Datasheets in our home market mix English signal names with traditional Chinese annotations — adapting the perception layer to those documents directly is the next accuracy win on our roadmap.
- Expanding past RTL into physical design. The same read-plan-write pattern has analogues downstream in floorplanning, placement, and DRC triage. Spec-to-RTL is just the first hop.
Building hardware-design tooling or evaluating multimodal planners? Clone the repo — training-free means you don’t have to take our word for any of it.