SeeleAI Technical Report

EVA01

Unified Native 3D Understanding and Generation via Mixture-of-Transformers

EVA01 treats mesh as a native language for multimodal models: understand the object, generate geometry, then keep editing across long context without losing identity.

Paper Data Coming soon Code Coming soon

Mode 01Understand Mesh

Mode 02Generate Shape

Mode 03Edit in Context

EVA01 multi-turn 3D understanding, generation, and editing teaser — Native 3D interaction sequenceONLINE

Capability Loadout

Mesh-native by design.

EVA01 extends the modality boundary of MLLMs so 3D is not an external attachment. Geometry becomes part of the sequence, routed through experts that share global context.

Slot 01

3D Understanding

Answer questions over mesh inputs while preserving the semantic priors of a multimodal backbone.

Slot 02

Text-to-3D Generation

Generate native 3D structure from language without treating geometry as a detached post-process.

Slot 03

Multi-turn Editing

Apply localized structural edits across turns while keeping object identity inside the same interaction history.

Result Stage

Generation that keeps playing.

Instead of a one-shot reconstruction, EVA01 supports a continuous 3D workflow: generate an asset, ask about it, then edit it through the next instruction.

Qualitative text-to-3D and image-to-3D generation comparisons — **Qualitative generation.** Text- and image-conditioned examples show how EVA01 maps prompts and visual cues into plausible 3D assets.

Editing as a trajectory

Every edit is conditioned on the full interaction context, enabling structural changes without explicit masks.

System Map

Two experts, one context.

The Understanding Expert and Generation Expert are coupled through shared global self-attention, with hard modality routing separating semantic reasoning from geometric synthesis.

EVA01 Mixture-of-Transformers method overview — **Mixture-of-Transformers.** EVA01 aligns the MLLM semantic latent space with the 3D geometric manifold through mirrored experts and shared attention.

EVA01 data pipeline for 3D understanding, generation, and editing — **Data pipeline.** The training path mixes understanding, generation, and interleaved editing examples into a unified sequence curriculum.

Load meshBring 3D structure into the multimodal sequence.

Route expertsSeparate semantic and geometric computation.

Share contextUse global attention to keep the interaction coherent.

Emit editGenerate the next geometry state in context.

Gallery Cartridge

From prompt to playable asset.

A compact look at EVA01's visual language: structured meshes, local edits, auxiliary views, and long-horizon identity preservation.

Design note

Not just prettier pixels

The page highlights mesh-level structure because EVA01's core claim is about native 3D reasoning and generation, not only rendered appearance.

Edit loop

One asset, many turns

Sequential instructions can add, replace, unfold, or attach parts while the model keeps the object identity in context.

Method stack

Semantics meet geometry

EVA01 separates high-level multimodal reasoning from mesh-native generation, then reconnects them through shared context.

Game asset flow

Built for iterative design

The same interaction loop can support concept exploration, structural refinement, and asset variation without restarting from zero.

Additional EVA01 generation and editing results across multiple trajectory states — **Additional generation and editing.** Multi-turn trajectories show main renders, auxiliary RGB/normal views, and concise generation or editing prompts.

Authors

SeeleAI crew.

Team LeadersZhengdong Guo; Shimu Wang.

Algorithm LeaderZongyuan Yang.

Core ContributorsWanli Ma; Zongyuan Yang; Mingjing Yi.

ContributorsChenzhuo Fan; Bocheng Li; Baolin Liu; Yuke Lou; Yingde Song; Qianchi Yang.

Names are alphabetical by last name within each role.

Citation

Reference checkpoint.

Please cite EVA01 if you find the work useful.

@article{eva01seeleai2026,
  title   = {EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers},
  author  = {{SeeleAI Team}},
  journal = {arXiv preprint, forthcoming},
  year    = {2026}
}