SeeleAI Technical Report

EVA01

Unified Native 3D Understanding and Generation via Mixture-of-Transformers

EVA01 treats mesh as a native language for multimodal models: understand the object, generate geometry, then keep editing across long context without losing identity.

Paper Data Coming soon Code Coming soon
Mode 01Understand Mesh
Mode 02Generate Shape
Mode 03Edit in Context
EVA01 multi-turn 3D understanding, generation, and editing teaser
Native 3D interaction sequenceONLINE

Capability Loadout

Mesh-native by design.

EVA01 extends the modality boundary of MLLMs so 3D is not an external attachment. Geometry becomes part of the sequence, routed through experts that share global context.

Slot 01

3D Understanding

Answer questions over mesh inputs while preserving the semantic priors of a multimodal backbone.

Slot 02

Text-to-3D Generation

Generate native 3D structure from language without treating geometry as a detached post-process.

Slot 03

Multi-turn Editing

Apply localized structural edits across turns while keeping object identity inside the same interaction history.

Result Stage

Generation that keeps playing.

Instead of a one-shot reconstruction, EVA01 supports a continuous 3D workflow: generate an asset, ask about it, then edit it through the next instruction.

Qualitative text-to-3D and image-to-3D generation comparisons
Qualitative generation. Text- and image-conditioned examples show how EVA01 maps prompts and visual cues into plausible 3D assets.
Versatile multi-turn editing gallery

Editing as a trajectory

Every edit is conditioned on the full interaction context, enabling structural changes without explicit masks.

System Map

Two experts, one context.

The Understanding Expert and Generation Expert are coupled through shared global self-attention, with hard modality routing separating semantic reasoning from geometric synthesis.

EVA01 Mixture-of-Transformers method overview
Mixture-of-Transformers. EVA01 aligns the MLLM semantic latent space with the 3D geometric manifold through mirrored experts and shared attention.
EVA01 data pipeline for 3D understanding, generation, and editing
Data pipeline. The training path mixes understanding, generation, and interleaved editing examples into a unified sequence curriculum.
Load meshBring 3D structure into the multimodal sequence.
Route expertsSeparate semantic and geometric computation.
Share contextUse global attention to keep the interaction coherent.
Emit editGenerate the next geometry state in context.

Gallery Cartridge

From prompt to playable asset.

A compact look at EVA01's visual language: structured meshes, local edits, auxiliary views, and long-horizon identity preservation.

Authors

SeeleAI crew.

Team LeadersZhengdong Guo; Shimu Wang.
Algorithm LeaderZongyuan Yang.
Core ContributorsWanli Ma; Zongyuan Yang; Mingjing Yi.
ContributorsChenzhuo Fan; Bocheng Li; Baolin Liu; Yuke Lou; Yingde Song; Qianchi Yang.

Names are alphabetical by last name within each role.

Citation

Reference checkpoint.

Please cite EVA01 if you find the work useful.

@article{eva01seeleai2026,
  title   = {EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers},
  author  = {{SeeleAI Team}},
  journal = {arXiv preprint, forthcoming},
  year    = {2026}
}