Output list
Preprint
Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals
Posted to a preprint site 22/08/2025
ArXiv.org
Reconstructing the 3D geometry, pose, and motion of animals is a long-standing problem, which has a wide range of applications, from biology, livestock management, and animal conservation and welfare to content creation in digital entertainment and Virtual/Augmented Reality (VR/AR). Traditionally, 3D models of real animals are obtained using 3D scanners. These, however, are intrusive, often prohibitively expensive, and difficult to deploy in the natural environment of the animals. In recent years, we have seen a significant surge in deep learning-based techniques that enable the 3D reconstruction, in a non-intrusive manner, of the shape and motion of dynamic objects just from their RGB image and/or video observations. Several papers have explored their application and extension to various types of animals. This paper surveys the latest developments in this emerging and growing field of research. It categorizes and discusses the state-of-the-art methods based on their input modalities, the way the 3D geometry and motion of animals are represented, the type of reconstruction techniques they use, and the training mechanisms they adopt. It also analyzes the performance of some key methods, discusses their strengths and limitations, and identifies current challenges and directions for future research.
Preprint
Posted to a preprint site 2025
ArXiv.org
This paper introduces a novel computational framework for modeling and analyzing the spatiotemporal shape variability of tree-like 4D structures whose shapes deform and evolve over time. Tree-like 3D objects, such as botanical trees and plants, deform and grow at different rates. In this process, they bend and stretch their branches and change their branching structure, making their spatiotemporal registration challenging. We address this problem within a Riemannian framework that represents tree-like 3D objects as points in a tree-shape space endowed with a proper elastic metric that quantifies branch bending, stretching, and topological changes. With this setting, a 4D tree-like object becomes a trajectory in the tree-shape space. Thus, the problem of modeling and analyzing the spatiotemporal variability in tree-like 4D objects reduces to the analysis of trajectories within this tree-shape space. However, performing spatiotemporal registration and subsequently computing geodesics and statistics in the nonlinear tree-shape space is inherently challenging, as these tasks rely on complex nonlinear optimizations. Our core contribution is the mapping of the tree-like 3D objects to the space of the Extended Square Root Velocity Field, where the complex elastic metric is reduced to the L2 metric. By solving spatial registration in the ESRVF space, analyzing tree-like 4D objects can be reformulated as the problem of analyzing elastic trajectories in the ESRVF space. Based on this formulation, we develop a comprehensive framework for analyzing the spatiotemporal dynamics of tree-like objects, including registration under large deformations and topological differences, geodesic computation, statistical summarization through mean trajectories and modes of variation, and the synthesis of new, random tree-like 4D shapes.
Preprint
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Posted to a preprint site 2025
ArXiv.org
We propose a novel framework for the statistical analysis of genus-zero 4D surfaces, i.e., 3D surfaces that deform and evolve over time. This problem is particularly challenging due to the arbitrary parameterizations of these surfaces and their varying deformation speeds, necessitating effective spatiotemporal registration. Traditionally, 4D surfaces are discretized, in space and time, before computing their spatiotemporal registrations, geodesics, and statistics. However, this approach may result in suboptimal solutions and, as we demonstrate in this paper, is not necessary. In contrast, we treat 4D surfaces as continuous functions in both space and time. We introduce Dynamic Spherical Neural Surfaces (D-SNS), an efficient smooth and continuous spatiotemporal representation for genus-0 4D surfaces. We then demonstrate how to perform core 4D shape analysis tasks such as spatiotemporal registration, geodesics computation, and mean 4D shape estimation, directly on these continuous representations without upfront discretization and meshing. By integrating neural representations with classical Riemannian geometry and statistical shape analysis techniques, we provide the building blocks for enabling full functional shape analysis. We demonstrate the efficiency of the framework on 4D human and face datasets. The source code and additional results are available at https://4d-dsns.github.io/DSNS/.
Preprint
Posted to a preprint site 2025
ArXiv.org
Neural fields such as DeepSDF and Neural Radiance Fields have recently revolutionized novel-view synthesis and 3D reconstruction from RGB images and videos. However, achieving high-quality representation, reconstruction, and rendering requires deep neural networks, which are slow to train and evaluate. Although several acceleration techniques have been proposed, they often trade off speed for memory. Gaussian splatting-based methods, on the other hand, accelerate the rendering time but remain costly in terms of training speed and memory needed to store the parameters of a large number of Gaussians. In this paper, we introduce a novel neural representation that is fast, both at training and inference times, and lightweight. Our key observation is that the neurons used in traditional MLPs perform simple computations (a dot product followed by ReLU activation) and thus one needs to use either wide and deep MLPs or high-resolution and high-dimensional feature grids to parameterize complex nonlinear functions. We show in this paper that by replacing traditional neurons with Radial Basis Function (RBF) kernels, one can achieve highly accurate representation of 2D (RGB images), 3D (geometry), and 5D (radiance fields) signals with just a single layer of such neurons. The representation is highly parallelizable, operates on low-resolution feature grids, and is compact and memory-efficient. We demonstrate that the proposed novel representation can be trained for 3D geometry representation in less than 15 seconds and for novel view synthesis in less than 15 mins. At runtime, it can synthesize novel views at more than 60 fps without sacrificing quality.
Preprint
Posted to a preprint site 2025
ArXiv.org
Explainable Artificial Intelligence (XAI) methods, such as Local Interpretable Model-Agnostic Explanations (LIME), have advanced the interpretability of black-box machine learning models by approximating their behavior locally using interpretable surrogate models. However, LIME's inherent randomness in perturbation and sampling can lead to locality and instability issues, especially in scenarios with limited training data. In such cases, data scarcity can result in the generation of unrealistic variations and samples that deviate from the true data manifold. Consequently, the surrogate model may fail to accurately approximate the complex decision boundary of the original model. To address these challenges, we propose a novel Instance-based Transfer Learning LIME framework (ITL-LIME) that enhances explanation fidelity and stability in data-constrained environments. ITL-LIME introduces instance transfer learning into the LIME framework by leveraging relevant real instances from a related source domain to aid the explanation process in the target domain. Specifically, we employ clustering to partition the source domain into clusters with representative prototypes. Instead of generating random perturbations, our method retrieves pertinent real source instances from the source cluster whose prototype is most similar to the target instance. These are then combined with the target instance's neighboring real instances. To define a compact locality, we further construct a contrastive learning-based encoder as a weighting mechanism to assign weights to the instances from the combined set based on their proximity to the target instance. Finally, these weighted source and target instances are used to train the surrogate model for explanation purposes.
Preprint
SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting
Posted to a preprint site 2025
ArXiv.org
3D Gaussian Splatting (3DGS) enables fast, high-quality novel view synthesis but typically relies on densification followed by pruning to optimize the number of Gaussians. Existing mask-based pruning, such as MaskGS, regularizes the global mean of the mask, which is misaligned with the local per-pixel (per-ray) reconstruction loss that determines image quality along individual camera rays. This paper introduces SVR-GS, a spatially variant regularizer that renders a per-pixel spatial mask from each Gaussian's effective contribution along the ray, thereby applying sparsity pressure where it matters: on low-importance Gaussians. We explore three spatial-mask aggregation strategies, implement them in CUDA, and conduct a gradient analysis to motivate our final design. Extensive experiments on Tanks\&Temples, Deep Blending, and Mip-NeRF360 datasets demonstrate that, on average across the three datasets, the proposed SVR-GS reduces the number of Gaussians by 1.79\(\times\) compared to MaskGS and 5.63\(\times\) compared to 3DGS, while incurring only 0.50 dB and 0.40 dB PSNR drops, respectively. These gains translate into significantly smaller, faster, and more memory-efficient models, making them well-suited for real-time applications such as robotics, AR/VR, and mobile perception.
Preprint
Posted to a preprint site 2024
ArXiv.org
Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse multi-view RGB images of the objects of interest are available. This paper shows that training neural representations with first-order differential properties (surface normals) leads to highly accurate 3D surface reconstruction, even with as few as two RGB images. Using input RGB images, we compute approximate ground-truth surface normals from depth maps produced by an off-the-shelf monocular depth estimator. During training, we directly locate the surface point of the SDF network and supervise its normal with the one estimated from the depth map. Extensive experiments demonstrate that our method achieves state-of-the-art reconstruction accuracy with a minimal number of views, capturing intricate geometric details and thin structures that were previously challenging to capture.
Preprint
Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models
Posted to a preprint site 2024
ArXiv.org
While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods. The source code will be made publicly available at https://github.com/nextaistudio/BoxIt2BindIt.
Preprint
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
Posted to a preprint site 2024
arXiv.org
We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.