Output list
Conference proceeding
Normal-guided Detail-Preserving Neural Implicit Function for High-Fidelity 3D Surface Reconstruction
Published 22/05/2025
Proceedings of the ACM on computer graphics and interactive techniques, 8, 1, 12
Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse multi-view RGB images of the objects of interest are available. This paper shows that training neural representations with first-order differential properties (surface normals) leads to highly accurate 3D surface reconstruction, even with as few as two RGB images. Using input RGB images, we compute approximate ground-truth surface normals from depth maps produced by an off-the-shelf monocular depth estimator. During training, we directly locate the surface point of the SDF network and supervise its normal with the one estimated from the depth map. Extensive experiments demonstrate that our method achieves state-of-the-art reconstruction accuracy with a minimal number of views, capturing intricate geometric details and thin structures that were previously challenging to capture. The source code and additional results are available at https://graphics-research-group.github.io/sn-nir.
Conference proceeding
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Published 2025
Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online), 21783 - 21792
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10/06/2025–17/06/2025
We propose a novel framework for the statistical analysis of genus-zero 4D surfaces, i.e., 3D surfaces that deform and evolve over time. This problem is particularly challenging due to the arbitrary parameterizations of these surfaces and their varying deformation speeds, necessitating effective spatiotemporal registration. Traditionally, 4D surfaces are discretized, in space and time, before computing their spatiotemporal registrations, geodesics, and statistics. However, this approach may result in suboptimal solutions and, as we demonstrate in this paper, is not necessary. In contrast, we treat 4D surfaces as continuous functions in both space and time. We introduce Dynamic Spherical Neural Surfaces (D-SNS), an efficient smooth and continuous spatiotemporal representation for genus-0 4D surfaces. We then demonstrate how to perform core 4D shape analysis tasks such as spatiotemporal registration, geodesics computation, and mean 4D shape estimation, directly on these continuous representations without upfront discretization and meshing. By integrating neural representations with classical Riemannian geometry and statistical shape analysis techniques, we provide the building blocks for enabling full functional shape analysis. We demonstrate the efficiency of the framework on 4D human and face datasets. The source code and additional results are available at https://4d-dsns.github.io/DSNS/.
Conference proceeding
Published 2024
Proceedings - 2024 25th International Conference on Digital Image Computing: Techniques and Applications, DICTA 2024, 9 - 16
25th International Conference on Digital Image Computing: Techniques and Applications (DICTA 2024), 27/11/2024–29/11/2024, Perth, WA
Learning to generate motions of thin structures such as plant leaves in dynamic view synthesis is challenging. This is because thin structures usually undergo small but fast, non-rigid motions as they interact with air and wind. When given a set of RGB images or videos of a scene with moving thin structures as input, existing methods that map the scene to its corresponding canonical space for rendering novel views fail as the object movements are too subtle compared to the background. Disentangling the objects with thin parts from the background scene is also challenging when the parts show fast and rapid motions. To address these issues, we propose a Neural Radiance Field (NeRF)-based framework that accurately reconstructs thin structures such as leaves and captures their subtle, fast motions. The framework learns the geometry of a scene by mapping the dynamic images to a canonical scene in which the scene remains static. We propose a ray masking network to further decompose the canonical scene into foreground and background, thus enabling the network to focus more on foreground movements. We conducted experiments using a dataset containing thin structures such as leaves and petals, which include image sequences collected by us and one public image sequence. Experiments show superior results compared to existing methods. Video outputs are available at https://dythinobjects.com/.
Conference proceeding
Statistical 3D and 4D Shape Analysis: Theory and Applications in the Era of Generative AI
Published 2024
Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine, 5 - 6
MM '24: The 32nd ACM International Conference on Multimedia, 28/10/2024–01/11/2024, Melbourne, VIC
The need for 3D and 4D (i.e., 3D + time) shape analysis arises in many branches of science ranging from anatomy, bioinformatics, medicine, and biology to computer graphics, multimedia, and virtual and augmented reality. In fact, shape is an essential property of natural and man-made 3D objects. It deforms over time as a result of many internal and external factors. For instance, anatomical organs such as bones, kidneys, and subcortical structures in the brain deform due to natural growth or disease progression; human faces deform as a consequence of talking, executing facial expressions, and aging. Similarly, human body actions and motions such as walking, jumping, and grasping are the result of the deformation, over time, of the human body shape. The ability to understand and model (1) the typical shape and deformation patterns of a class of 3D objects, and (2) the variability of these shapes and deformations within and across object classes has many applications. For example, in medical diagnosis and biological growth modeling, one is interested in measuring the intensity of pain from facial deformations, and in distinguishing between normal growth and disease progression using the shape of the body and its deformation over time. In computer vision and graphics, the ability to statistically model such spatiotemporal variability can be used to summarize collections of 3D objects and their animation, and simulate animations and motions. Similar to 3D morphable models, these tools can also be used in a generative model for synthesizing large corpora of labeled longitudinal 3D shape data, e.g., 4D faces, virtual humans, and various objects. In this talk, I will share the research undertaken by my group and collaborators in the area of statistical analysis and modelling of static (i.e., 3D) and dynamic (i.e., 4D) shapes. I will first highlight the importance of this topic for various applications ranging from biology and medicine to computer graphics and virtual/augmented reality. I will then structure my talk into three parts. The first one focuses on 3D shapes that bend, stretch, and change in topology. I will introduce our mathematical framework, termed Square Normal Fields (SRNF) [6, 10-12, 15], which provides (1) an efficient representation of 3D shapes, (2) an elastic metric for quantifying shape differences between objects, (3) mechanisms for computing correspondences and geodesics between such shapes, and (4) methods for characterizing populations of 3D shapes using generative models. I will consider both shapes that bend and stretch [6, 10-12, 15] but also those that change their structure and topology [18-22]. The second part of the talk will focus on 4D shapes, i.e., 3D shapes that move and deform as the result of normal growth or disease progression [9, 14]. I will summarize the latest solutions we developed for the statistical analysis of the spatio-temporal variability in such 4D shape data and highlight their applications in various fields. The third part of this talk will focus on the role statistical 3D and 4D shape models played and have to play in the era of Deep Learning and Generative AI. I will particularly highlight their importance and the role they played in advancing the field of 3D and 4D reconstruction and generation from images, videos, and text [1-5, 7, 8, 13, 16, 17]. I will conclude the talk by sharing insights into potential future developments in and applications of statistical 3D and 4D shape models.
Conference proceeding
[DEMO] Comprehensive workspace calibration for visuo-haptic augmented reality
Published 2014
2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 333 - 334
2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 10/09/2014–12/09/2014, Munich, Germany
Visuo-haptic augmented reality systems enable users to see and touch digital information that is embedded in the real world. Precise colocation of computer graphics and the haptic stylus is necessary to provide a realistic user experience. PHANToM haptic devices are often used in such systems to provide haptic feedback. They consist of two interlinked joints, whose angles define the position of the haptic stylus and three sensors at the gimbal to sense its orientation. Previous work has focused on a calibration procedures that align the haptic workspace within a global reference coordinate system and an algorithms that compensate the non-linear position error, which is caused by inaccuracies in the joint angle sensors. In our science and technology paper “Comprehensive Workspace Calibration for Visuo-Haptic Augmented Reality” [1], we present an improved workspace calibration that additionally compensates for errors in the gimbal sensors. This enables us to also align the orientation of the haptic stylus with high precision. To reduce the required time for calibration and to increase the sampling coverage, we utilize time-delay estimation to temporally align external sensor readings. This enables users to continuously move the haptic stylus during the calibration process, as opposed to commonly used point and hold processes. This demonstration showcases the complete workspace calibration procedure as described in our paper including a mixed reality demo scenario, that allows users to experience the calibrated workspace. Additionally, we demonstrate an early stage of our proposed future work in improved user guidance during the calibration procedure using visual guides.
Conference proceeding
EG 3DOR 2011: Eurographics 2011 Workshop on 3D Object Retrieval : Llandudno, UK, April 10th, 2011
Published 2011
Eurographics 2011 Workshop on 3D Object Retrieval , 10/04/2011, Llandudno, UK,
The 3DOR workshop series provides a unique venue for researchers, students, and practitioners interested in the definition, evaluation, and application of 3D object analysis and retrieval. Eight papers were selected for oral presentation, giving an acceptance rate of 50%. This year, like last year, 3DOR'11 hosts the Shape Retrieval Evaluation Contest (SHREC'11) which plays an important role in the evolution of 3D Object Retrieval research. SHREC'11 contributes to the proceedings with four additional papers that detail the results of the competition. The proceedings furthermore comprise six papers providing timely research results of high quality which were included in the workshop program in form of poster presentations.
Conference proceeding
Tongue Detection from Video for Human-Computer Interaction
Date presented 03/2008
電子情報通信学会2008総合大会 予稿集, 274
2008 IEICE General Conference, 19/03/2008–20/03/2008, Kitakyushu Science and Research Park, Japan