MLP Splatting Object-Centric Neural Fields

Imperial College London
^* Denotes Equal Contribution

ECCV 2026

TL;DR: Primitives composed of local light-fields, each centered on and weighted by an anisotropic Gaussian, learn to decompose the scene at the object or part level.

MLP-Splatting efficiently reconstructs even the most complex objects with just a few hundred primitives: matching Feature-3DGS in scene quality (+0.6 dB PSNR) while using 1/63 the primitives, 1/6 the memory, and rendering 2× faster.

51 MLPs

483 MLPs

97 MLPs

44 MLPs

46 MLPs

143 MLPs

MLP-Splatting offers the compactness of a neural scene representation while keeping each primitive’s influence independent, making downstream tasks such as scene editing straightforward. E.g. with added semantic guidance on top of the proposed method, language-guided editing becomes possible (see examples).

Abstract

3D representations are fundamental to scene rendering, understanding, and interaction. Recent approaches, such as 3D Gaussian Splatting and Neural Radiance Fields, achieve impressive photorealistic novel-view synthesis, but lack the ability to easily decompose scene elements into a few primitives, requiring additional segmentation or grouping for object-level manipulation. We present MLP-Splatting, a method that enables scene decomposition via a few expressive light-field primitives while providing photorealistic novel-view synthesis.

MLP-Splatting models each primitive as an independent compact MLP with localized spatial support that predicts radiance and opacity. In contrast to low-level Gaussian primitives or a single global radiance field, our neural primitives provide greater expressive capacity while remaining spatially localized. Rendering is performed through efficient sparse volumetric compositing over ray–primitive interactions.

Our primitives are supervised using RGB supervision alone, which yields primitives that represent local scene regions often corresponding to objects or object parts, enabling interactive object-level editing without segmentation masks by selecting a handful of primitives. Our method, augmented with optional semantic feature distillation, enables open-vocabulary scene interaction and open-set instant segmentation. Compared to state-of-the-art methods, we achieve substantially lower memory usage ($1/15\times$) and faster rendering ($3\times$), as we show in our experiments compared to semantic 3DGS methods.

Interactive Demo

Below is an interactive demo of the easy object selection, editing, and language-guided editing enabled by our object-level primitives.

Easy object selection.

Per-object SE(3) and color edits.

Language-guided segmentation and deletion.

Results

Table 1. Novel-view synthesis evaluation on Replica and ScanNet datasets. Replica — PSNR: Feature-3DGS 36.18, Ours 36.25; SSIM: 0.964, 0.971; LPIPS: 0.079, 0.090. ScanNet — PSNR: 23.32, 25.35; SSIM: 0.817, 0.830; LPIPS: 0.362, 0.403. — **Table 1.** Novel-view synthesis evaluation on Replica [Straub et al. 2019] and ScanNet [Dai et al. 2017] datasets.

3D semantic segmentation with LSeg-guided embeddings rendered on novel views. Four columns — Ground Truth, Rendered Image, Rendered Feature, Segmentation — comparing Feature-3DGS and Ours against ground truth. Top two rows: Replica dataset; bottom two rows: ScanNet dataset. — **Figure 4.** 3D semantic segmentation with LSeg-guided embeddings rendered on novel views. The top two rows are from the Replica dataset [Straub et al. 2019], and the bottom two rows are from the ScanNet [Dai et al. 2017] dataset.

BibTeX

@InProceedings{kim2026mlpsplatting,
  title     = {MLP Splatting: Object-Centric Neural Fields},
  author    = {Kim, Shinjeong and Cheng, Yuzhou and Kong, Xin and Kelly, Paul H. J. and Davison, Andrew J.},
  booktitle = {The European Conference on Computer Vision (ECCV)},
  year      = {2026}
}

Teaser

Decomposed objects

Method figures

Abstract

Interactive Demo

Results

BibTeX