Ferdous Sohel

Professor, Information Technology

Computer vision and multimedia computation

Artificial intelligence

Digital Agriculture

AI in Health and Medicine

AI in Environmental Monitoring

Journal article Open access Peer reviewed

QTopic: A novel quantum perspective on learning topics from text

by Monika Kabir, Mohammed Kaosar and Ferdous Sohel

Published 2026

Neurocomputing (Amsterdam), 669, 132483

Topic modeling is an unsupervised technique in natural language processing (NLP) used to identify hidden topic structures within large text datasets. Among traditional approaches to topic modeling, latent Dirichlet allocation, BERTopic, and Top2Vec, are widely adopted to uncover hidden topics in text data. However, these methods often struggle with poor performance in scenarios involving limited data availability or high-dimensional textual features. In this research, we propose QTopic, a novel hybrid quantum-classical topic modeling architecture that leverages quantum properties through parameterized quantum circuits. By integrating quantum-enhanced sampling into the inference pipeline, the proposed model captures richer topic distributions by mapping textual data into a higher-dimensional space. Benchmark experiments demonstrate that QTopic consistently outperforms classical approaches in terms of coherence, diversity, and topic distinctiveness, particularly when modeling a small number of topics. This study demonstrates the promise of quantum techniques in advancing unsupervised NLP, while also highlighting hardware limitations that present challenges for future research.

Journal article Open access Peer reviewed

Real-time fault detection in multirotor UAVs using lightweight deep learning and high-fidelity simulation data with single and double fault magnitudes

by Md Najmul Mowla, Davood Asadi and Ferdous Sohel

Published 2026

Complex & intelligent systems, 12, 2, 62

Robust fault detection and diagnosis (FDD) in multirotor unmanned aerial vehicles (UAVs) remains challenging due to limited actuator redundancy, nonlinear dynamics, and environmental disturbances. This work introduces two lightweight deep learning architectures: the Convolutional-LSTM Fault Detection Network (CLFDNet), which combines multi-scale one-dimensional convolutional neural networks (1D-CNN), long short-term memory (LSTM) units, and an adaptive attention mechanism for spatio-temporal fault feature extraction; and the Autoencoder LSTM Multi-loss Fusion Network (AELMFNet), a soft attention–enhanced LSTM autoencoder optimized via multi-loss fusion for fine-grained fault severity estimation. Both models are trained and evaluated on UAV-Fault Magnitude V1, a high-fidelity simulation dataset containing 114,230 labeled samples with motor degradation levels ranging from 5% to 40% in the take-off, hover, navigation, and descent phases, representing the most probable and recoverable fault scenarios in quadrotor UAVs. Including coupled faults enables models to learn correlated degradation patterns and actuator interactions while maintaining controllability under standard flight laws. CLFDNet achieves 96.81% precision in fault severity classification and 100% accuracy in motor fault localization with only 19.6K parameters, demonstrating suitability for real-time onboard applications. AELMFNet achieves the lowest reconstruction loss of 0.001 with Huber loss and an inference latency of 6 ms/step, underscoring its efficiency for embedded deployment. Comparative experiments against 15 baselines, including five classical machine learning models, five state-of-the-art fault detection methods, and five attention-based deep learning variants, validate the effectiveness of the proposed architectures. These findings confirm that lightweight deep models enable accurate and efficient diagnosis of UAV faults with minimal sensing.

Journal article Open access Peer reviewed

Automatic pixel-level annotation for plant disease severity estimation

by Masoud Rezaei, Dean Diepeveen, Hamid Laga and Ferdous Sohel

Published 2026

Computers and electronics in agriculture, 241, 111316

Plant disease adversely impacts food production and quality. Alongside detecting the disease, estimating its severity is important in managing the disease. Artificial intelligence deep learning-based techniques for plant disease detection are emerging. Unlike most of these techniques, which focus on disease recognition, this study addresses various plant disease-related tasks, including annotation, severity classification, lesion detection, and leaf segmentation. We propose a novel approach that learns the disease symptoms, which are then used to segment disease lesions for severity estimation. To demonstrate the work, a dataset of barley images was used. We captured the images of barley plants inoculated with diseases on test-bed paddocks at various growth stages. The dataset was automatically annotated at a pixel level using a trained vision transformer to obtain the ground truth labels. The annotated dataset was applied to train salient object detection (SOD) methods. Two top-performing lightweight SOD models were used to segment the disease lesion areas. To evaluate the performance of the SODs, we have tested them on our dataset and several other datasets, including the Coffee dataset, which has expert pixel-level labels that were unseen during the training step. Several morphological and spectral disease symptoms, including those akin to the widely used ABCD rule for human skin-cancer detection, i.e., asymmetry (A), border irregularity (B), colour variance (C), and diameter (D), are learned. To the best of our knowledge, this is the first study to incorporate these ABCD features in plant disease detection. We further extract visual and texture features using the grey level co-occurrence matrix (GLCM) and fuse them with the ABCD features. For the coffee dataset, our method achieved 82+% detection accuracy on the severity classification task. The results demonstrate the performance of the proposed method in detecting plant diseases and estimating their severity.

Journal article Open access Peer reviewed

A semi-supervised approach for classifying insect developmental phases from repurposed IP102

by Fatin Faiaz Ahsan, Melissa L. Thomas, Hamid Laga and Ferdous Sohel

Published 2026

Computers and electronics in agriculture, 242, 111337

Identifying insect pests, whether as adults, larvae, or eggs, is critical in pest management. Computational learning algorithms have demonstrated strong potential in achieving high identification performance, but these methods typically require large, balanced, and well-annotated datasets. This creates a challenge for insect pest identification, as rare species, despite often being the most damaging to crops, are underrepresented in available datasets. Moreover, annotating large-scale datasets is both costly and labour-intensive. To address this issue, we develop a semi-supervised learning approach, Cost-Focal FixMatch, which extends the widely used FixMatch framework by integrating class-aware reweighting and focal loss to better handle class imbalance. Specifically, we introduce a simple yet robust method for applying class weighting in cross-entropy and focal loss functions. The proposed method generates higher-quality pseudo labels compared to the baseline, ensuring better learning. We evaluate our approach using a repurposed IP102 dataset, which comprises four primary insect life stages, and a mixed IP102 dataset, where the class labels jointly represent insect species and their corresponding life stages. Our method considerably improves the classification of minority classes, achieving a notable increase in recall for the Larva class from 64% under the baseline FixMatch to 82% using MobileNetV3Small backbone. On the Mixed IP102 dataset, our approach achieves almost 9% better improved average recall than the baseline FixMatch built upon the EfficientNetV2S network.

Journal article Open access Peer reviewed

3D-CDNeT: Cross-domain learning with enhanced speed and robustness for point cloud recognition

by Abu Bakor Hayat Arnob, A.A.M. Muzahid, Hua Han, Yujin Zhang and Ferdous Sohel

Published 2026

Neurocomputing (Amsterdam), 662, 131939

Despite progress in 3D object recognition using deep learning (DL), challenges such as domain shift, occlusion, and viewpoint variations hinder robust performance. Additionally, the high computational cost and lack of labeled data limit real-time deployment in applications such as autonomous driving and robotic manipulation. To address these challenges, we propose 3D-CDNeT, a novel cross-domain deep learning network designed for unsupervised learning, enabling efficient and robust point cloud recognition. At the core of our model is a lightweight graph-infused attention encoder (GIAE) that enables effective feature interaction between the source and target domains. It not only improves recognition accuracy but also reduces inference time, which is essential for real-time applications. To enhance robustness and adaptability, we introduce a feature invariance learning module (FILM) using contrastive loss for learning invariant features. In addition, we adopt a Generative Decoder (GD) based on a Variational Auto-Encoder (VAE) to model diverse latent spaces and reconstruct meaningful 3D structures from the point cloud. This reconstruction process acts as a self-supervised generative objective that complements the discriminative recognition task, guiding the encoder to learn structure-preserving and domain-invariant features that improve recognition under occlusion and cross-domain conditions. Our proposed model unifies generative and discriminative tasks by using self-attention on the object covariance matrix to facilitate efficient information exchange, enabling the extraction of both local and global features. We further develop a self-supervised pretraining strategy that learns both global and local object invariances through GIAE and GD, respectively. A new loss function, combining contrastive loss and Chamfer distance, is proposed to strengthen cross-domain feature alignment. Experimental results on three benchmark datasets demonstrate that 3D-CDNeT outperforms existing state-of-the-art (SOTA) methods in recognition accuracy and inference speed, offering a practical solution for real-time 3D perception tasks. It achieves accuracies of 90.6 % on ModelNet40, 95.2 % on ModelNet10, and 76.4 % on the ScanObjectNN dataset in linear evaluation tasks, all while reducing runtime by 45 % without compromising performance. Detailed qualitative comparisons and ablation studies are provided to validate the effectiveness of each component and demonstrate the superior performance of our proposed method.

Journal article Peer reviewed

Towards building robust models for unimodal and multimodal medical imaging data

by Joy Dhar, Puneet Goyal, Maryam Haghighat, Nayyar Zaidi, Ferdous Sohel, Quoc Bao Vo and K C Santosh

Published 2026

Information fusion, 127, 103822

Deep neural network (DNN) models applied to medical image analysis are highly vulnerable to adversarial attacks, at both the example (input) and feature (model) levels. Ensuring DNN robustness against these adversarial attacks is crucial for accurate diagnostics. However, existing example-level and feature-level defense strategies, including adversarial training and image-level preprocessing, struggle to achieve effective adversarial robustness in medical image analysis. This challenge arises primarily from difficulties in capturing complex texture features in medical images and the inherent risk of changing intrinsic structural information in the input data. To overcome this challenge, we propose a novel medical imaging protector framework named MI-Protector. This framework comprises two defense methods for unimodal learning and one for multimodal fusion learning, addressing both example-level and feature-level vulnerabilities to robustly protect DNNs against adversarial attacks. For unimodal learning, we introduce an example-level defense mechanism using a generative model with a purifier, termed DGMP. The purifier comprises of a trainable neural network and a pre-trained generator from the generative model, which automatically removes a wide variety of adversarial perturbations. For example and feature-level defense mechanism, we propose unimodal attention noise injection mechanism – (UMAN), to protect learning models at the example and feature layers. To protect the multimodal fusion learning network, we propose the multimodal information fusion attention noise (MMIFAN) injection method, which offers protection at the feature layers while the non-learnable UMAN is applied at the example layer. Extensive experiments conducted on 16 datasets across various medical imaging modalities demonstrate that our framework provides superior robustness compared to existing methods against adversarial attacks. Code: https://github.com/misti1203/MI-Protector.

Journal article Open access Peer reviewed

A computational learning pipeline for glaucoma progression detection based on the prediction of visual field changes from fundus photographs

by Md.Reduanul Haque, Andrew Mehnert, William Huxley Morgan, Graham Mann and Ferdous Sohel

Published 2026

Expert systems with applications, 298, Part C, 129907

Detection of glaucoma progression is crucial to managing patients, permitting individualized care plans and treatment. It is a challenging task requiring the assessment of structural changes to the optic nerve head and functional changes based on visual field testing. Artificial intelligence, especially deep learning techniques, has shown promising results in many applications, including glaucoma diagnosis. This paper proposes a two-stage computational learning pipeline for detecting glaucoma progression using only fundus photographs. In the first stage, a deep learning model takes a time series of fundus photographs as input and outputs a vector of predictions where each element represents the overall rate of change in visual field (VF) sensitivity values for a sector (region) of the optic nerve head (ONH). We implemented two deep learning models—ResNet50 and InceptionResNetV2—for this stage. In the second stage, a binary classifier (weighted logistic regression) takes the predicted vector as input to detect progression. We also propose a novel method for constructing annotated datasets from temporal sequences of clinical fundus photographs and corresponding VF data suitable for machine learning. Each dataset element comprises a temporal sequence of photographs together with a vector-valued label. The label is derived by computing the pointwise linear regression of VF sensitivity values at each VF test location, mapping these locations to eight ONH sectors, and assigning the overall rate of change in each sector to one of the elements of the vector. We used a retrospective clinical dataset with 82 patients collected at multiple timepoints over five years in our experiments. The InceptionResNetV2-based implementation yielded the best performance, achieving detection accuracies of 97.28 ± 1.10 % for unseen test data (i.e., each dataset element is unseen but originates from the same set of patients appearing in the training dataset), and 87.50 ± 0.70 % for test data from unseen patients (training and testing patients are entirely different). The testing throughput was 11.60 ms per patient. These results demonstrate the efficacy of the proposed method for detecting glaucoma progression from fundus photographs.

Journal article Peer reviewed

Joint Adversarial Attack: An Effective Approach to Evaluate Robustness of 3D Object Tracking

by Riran Cheng, Xupeng Wang, Ferdous Sohel and Hang Lei

Published 2026

Pattern recognition, 172, Part A, 112359

Deep neural networks (DNNs) have widely been used in 3D object tracking, thanks to its superior capabilities to learn from geometric training samples and locate tracking targets. Although the DNN based trackers show vulnerability to adversarial examples, their robustness in real-world scenarios with potentially complex data defects has rarely been studied. To this end, a joint adversarial attack method against 3D object tracking is proposed, which simulates defects of the point cloud data in the form of point filtration and perturbation simultaneously. Specifically, a voxel-based point filtration module is designed to filter points of the tracking template, which is described by the voxel-wise binary distribution regarding the density of the point cloud. Furthermore, a voxel-based point perturbation module adds voxel-wise perturbations to the filtered template, whose direction is constrained by local geometrical information of the template. Experiments conducted on popular 3D trackers demonstrate that the proposed joint attack have decreased the success and precision of existing 3D trackers by 30.2% and 35.4% respectively in average, which made an improvement of 30.5% over existing attack methods.

Journal article Open access Peer reviewed

An infrared multi-object tracking method for automatic circadian behavior analysis of adult Tuta absoluta

by Tengpeng Cui, Wenyong Li, Shuangyin Liu, Cheng Qu, Chuanheng Sun, Xinting Yang, Ferdous Sohel and Wenzheng Li

Published 2025

Smart agricultural technology, 12, 101624

Adult Tuta absoluta tracking is a crucial prerequisite to automatically analyzing their circadian behavior in a cage-rearing environment. However, individual nonlinear motions and occlusions pose significant challenges to the effective tracking of multiple insects. In this study, an infrared multi-object tracking method (YOLOsingle bondOCSORT) is proposed to mitigate these issues for circadian behavior analysis of adult Tuta absoluta. In the proposed approach, an Haar wavelet down-sampling module is integrated into the backbone of YOLO11 to enhance feature extraction under complex motion and occlusion conditions. Additionally, a multi-scale spatial and attention aggregation module is incorporated to preserve the detailed features of small objects by fusing spatial and channel information, while also enhancing the attention to target-relevant channels in the aggregation path. Lastly, a new detection head is added to the YOLO11 model to further boost detection performance of small adult Tuta absoluta. In the multi-object association stage, a GIoU-based association strategy is integrated into the OCSORT algorithm to improve the matching accuracy in complex scenarios such as occlusion and dense environments. Experimental results show that the proposed method achieves increases in all tracking metrics, with HOTA improving by 3.3%, MOTA by 8.9%, MOTP by 2.3%, and IDF1 by 3.0%, compared with the original model in the target tracking phase. Moreover, except a few false positives, no missed detections are observed under different density conditions. These results demonstrate that the proposed YOLOsingle bondOCSORT method has robust performance in the multi-object tracking of adult Tuta absoluta, offering technical support for contactless behavior monitoring of small insects.

Journal article Open access Peer reviewed

Analytical quality control in targeted lipidomics: Evaluating the performance of commercial plasma as a surrogate for pooled study samples

by Alanah Grant-St James, Aude-Claire Lee, Alex J. Lee, Julien Wist, Ferdous Sohel, Kok Wai Wong, Bu B. Yeap, Ruey Leng Loo, Amanda Henry and Daniella Susic ... (15 authors)

Published 2025

Analytica chimica acta, 1365, 344225

Pooled quality control (PQC) samples are the gold standard for data quality monitoring in metabolic phenotyping studies. Typically composed of equal parts from all study samples, PQCs can be challenging to generate in large cohorts or when sample volumes are low. As an alternative, externally sourced matrix-matched surrogate QCs (sQC) have been proposed. This study evaluates the performance of sQCs against PQCs for assessing analytical variation, data pre-processing, and downstream data analysis in a targeted lipidomics workflow.
Plasma samples (n = 701) from the Microbiome Understanding in Maternity Study, along with PQC (n = 80) and sQC (n = 80) samples, were analyzed using a lipidomics assay targeting 1162 lipids. QC samples were injected throughout acquisition, and data pre-processing was performed using each strategy. For simplicity, a subset (n = 381) of the study samples was used to assess differences in downstream statistical analyses.
Both QC approaches demonstrated high analytical repeatability. While PQC and sQC compositions differed, use of PQCs retained less than 4 % more lipid species during pre-processing. Univariate analysis identified more statistically significant lipids with PQC-based pre-processing, but multivariate model performance was similar between datasets.
This study provides a comprehensive comparison of QC strategies and emphasizes the importance of careful QC workflow selection. While PQCs offer advantages, sQCs serve as a suitable alternative for quality assessment and pre-processing. Their commercial availability also supports use as intra- and inter-laboratory long-term references, aiding data harmonization across studies and laboratories.
[Display omitted]
•Comparison of two quality control workflows; pooled study and surrogate QC samples.•In-depth assessment of lipid composition, precision, and filtering.•OPLS-DA model predictive power maintained with both QC pre-processing strategies.•Surrogate QC samples are a robust alternative to a pooled QC in targeted lipidomics.

Ferdous Sohel

Professor, Information Technology

Output list