Ferdous Sohel

Professor, Information Technology

Computer vision and multimedia computation

Artificial intelligence

Digital Agriculture

AI in Health and Medicine

AI in Environmental Monitoring

Preprint Open access

A Riemannian Framework for the Elastic Analysis of the Spatiotemporal Variability in the Shape and Structure of Tree-like 4D Objects

by Tahmina Khanam, Hamid Laga, Mohammed Bennamoun, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Guan Wang and Anuj Srivastava

Posted to a preprint site 2025

ArXiv.org

This paper introduces a novel computational framework for modeling and analyzing the spatiotemporal shape variability of tree-like 4D structures whose shapes deform and evolve over time. Tree-like 3D objects, such as botanical trees and plants, deform and grow at different rates. In this process, they bend and stretch their branches and change their branching structure, making their spatiotemporal registration challenging. We address this problem within a Riemannian framework that represents tree-like 3D objects as points in a tree-shape space endowed with a proper elastic metric that quantifies branch bending, stretching, and topological changes. With this setting, a 4D tree-like object becomes a trajectory in the tree-shape space. Thus, the problem of modeling and analyzing the spatiotemporal variability in tree-like 4D objects reduces to the analysis of trajectories within this tree-shape space. However, performing spatiotemporal registration and subsequently computing geodesics and statistics in the nonlinear tree-shape space is inherently challenging, as these tasks rely on complex nonlinear optimizations. Our core contribution is the mapping of the tree-like 3D objects to the space of the Extended Square Root Velocity Field, where the complex elastic metric is reduced to the L2 metric. By solving spatial registration in the ESRVF space, analyzing tree-like 4D objects can be reformulated as the problem of analyzing elastic trajectories in the ESRVF space. Based on this formulation, we develop a comprehensive framework for analyzing the spatiotemporal dynamics of tree-like objects, including registration under large deformations and topological differences, geodesic computation, statistical summarization through mean trajectories and modes of variation, and the synthesis of new, random tree-like 4D shapes.

Preprint Open access

FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

by Xinlong Wan, Xiaoyan Jiang, Guangsheng Luo, Ferdous Sohel and Jenqneng Hwang

Posted to a preprint site 2025

ArXiv.org

Automatic crack segmentation is a cornerstone technology for intelligent visual perception modules in road safety maintenance and structural integrity systems. Existing deep learning models and ``pre-training + fine-tuning'' paradigms often face challenges of limited adaptability in resource-constrained environments and inadequate scalability across diverse data domains. To overcome these limitations, we propose FlexiCrackNet, a novel pipeline that seamlessly integrates traditional deep learning paradigms with the strengths of large-scale pre-trained models. At its core, FlexiCrackNet employs an encoder-decoder architecture to extract task-specific features. The lightweight EdgeSAM's CNN-based encoder is exclusively used as a generic feature extractor, decoupled from the fixed input size requirements of EdgeSAM. To harmonize general and domain-specific features, we introduce the information-Interaction gated attention mechanism (IGAM), which adaptively fuses multi-level features to enhance segmentation performance while mitigating irrelevant noise. This design enables the efficient transfer of general knowledge to crack segmentation tasks while ensuring adaptability to diverse input resolutions and resource-constrained environments. Experiments show that FlexiCrackNet outperforms state-of-the-art methods, excels in zero-shot generalization, computational efficiency, and segmentation robustness under challenging scenarios such as blurry inputs, complex backgrounds, and visually ambiguous artifacts. These advancements underscore the potential of FlexiCrackNet for real-world applications in automated crack detection and comprehensive structural health monitoring systems.

Preprint Open access

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

by Tahmina Khanam, Hamid Laga, Mohammed Bennamoun, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Guan Wang and Anuj Srivastava

Posted to a preprint site 2024

arXiv.org

We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.

Preprint Open access

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

by Xu Linping, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin and Ferdous Sohel

Posted to a preprint site 2023

ArXiv.org

Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.

Ferdous Sohel

Professor, Information Technology

Output list