Ferdous Sohel

Professor, Information Technology

Computer vision and multimedia computation

Artificial intelligence

Digital Agriculture

AI in Health and Medicine

AI in Environmental Monitoring

Conference presentation

Universal Adversarial Attack for Trustworthy LiDAR based Object Detection in Embedded Applications

by Mumuxin Cai, Xupeng Wang, Ferdous Sohel and Hang Lei

Date presented 20/09/2024

2024 4th International Conference on Intelligent Technology and Embedded Systems (ICITES), 7 - 14

4th International Conference on Intelligent Technology and Embedded Systems (ICITES) 2024, 20/09/2024–24/09/2024, Chengdu, China

3D object detection based on deep neural networks (DNNs) has widely been adopted in the field of embedded applications, such as autonomous driving. Nonetheless, recent studies have demonstrated that LiDAR data tends to exhibit intense corruptions, resulting in the failure of 3D object detection tasks. Considering the vulnerability of existing DNNs and the wide applications of 3D object detection to safety-critical scenarios, the robustness of deep 3D detection models under adversarial attacks is investigated in this work. The proposed universal adversarial attack is encoded into a perturbation voxel, which adds point-wise perturbations to benign LiDAR scenes. The detector-level perturbation voxel is generated by suppressing the detector's predictions on training samples, which covers the entire perceptual range of the detector. The designed perturbation voxels can be applied to the entire scene to simulate the global perturbation inherent in LiDAR. And it can adapt to other detectors with various point cloud representations, which makes attack universal. To test the effectiveness of the proposed attack, it was launched against a number of deep 3D detectors using several datasets. The results demonstrate its superiority over existing art.

Conference paper

Adversary distillation for One-Shot attacks on 3D target tracking

by Zhengyi Wang, Xupeng Wang, Ferdous Sohel, Mohammed Bennamoun, Yong Liao and Jiali Yu

Date presented 05/2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23/05/2022–27/05/2022, Singapore

Considering the vulnerability of existing deep models in the adversarial scenario, the robustness of 3D target tracking is not guaranteed. In this paper, we present an efficient generation based adversarial attack, termed Adversary Distillation Network (AD-Net), which is able to distract a victim tracker in a single shot. In contrast to existing adversarial attacks derived from point perturbations, the proposed method designs a generative network to distill an adversarial example from a tracking template through point-wise filtration. A binary distribution encoding layer is specialized to filter points, which is modeled as a Bernoulli distribution and approximated in a differentiable formulation. To boost the performance of adversarial example generation, a feature extraction module is deployed, which leverages the PointNet++ architecture to learn hierarchical features for the template points as well as similarities with the search areas. Experimental results on the KITTI vision benchmark show that the proposed adversarial attack can effectively mislead popular deep 3D trackers.

Conference paper

Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation

by L. Xu, W. Ouyang, M. Bennamoun, F. Boussaid, F. Sohel and D. Xu

Published 2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV), 6964 - 6973

IEEE/CVF International Conference on Computer Vision (ICCV) 2021, 2021, Montreal, QC, Canada

Semantic segmentation is a challenging task in the absence of densely labelled data. Only relying on class activation maps (CAM) with image-level labels provides deficient segmentation supervision. Prior works thus consider pre-trained models to produce coarse saliency maps to guide the generation of pseudo segmentation labels. However, the commonly used off-line heuristic generation process cannot fully exploit the benefits of these coarse saliency maps. Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels. Inspired by their similar structured semantics, we also propose to learn a cross-task global pixellevel affinity map from the saliency and segmentation representations. The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks. The mutual boost between pseudo label updating and cross-task affinity learning enables iterative improvements on segmentation performance. Extensive experiments demonstrate the effectiveness of the proposed auxiliary learning network structure and the cross-task affinity learning method. The proposed approach achieves state-of-the-art weakly supervised segmentation performance on the challenging PASCAL VOC 2012 and MS COCO benchmarks.

Conference paper

PD-Net: Point Dropping Network for Flexible Adversarial Example Generation with Lo Regularization

by Z. Wang, X. Wang, F. Sohel and Y. Liao

Published 2021

2021 International Joint Conference on Neural Networks (IJCNN)

International Joint Conference on Neural Networks (IJCNN) 2021, 18/07/2021–22/07/2021, Shenzhen, China

It is a challenging task to generate adversarial point clouds, considering the irregular structure of a point cloud, the large search space, and the requirement of imperception to humans. In this paper, a flexible adversarial point cloud generation method, named Point Dropping Network (PD-Net), is proposed, which can be trained to craft adversarial examples in a single forward pass. The network is designed to launch untargeted black-box attacks to deep 3D models through point dropping regularized by the L0 norm, in contrast to the widely adopted point perturbation methods. To enable incorporation into a deep neural network, the probability of a point to be dropped, which can be described by a Bernoulli distribution, is approximated by a hard concrete distribution. The network of PD-Net consists of an encoder and a decoder, where the former encodes geometric information of each point and the latter learns to drop points from their local features in an unsupervised way. Experiments on two popular deep 3D models (including PointNet and PointNet++) show that the proposed PD-Net degrades the recognition accuracy to a large extent and achieves a high flexibility at the same time.

Conference paper

Performance evaluation of anomaly detection in imbalanced system log data

by H. Studiawan and F. Sohel

Published 2020

2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4)

2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), 27/07/2020–28/07/2020, London, UK

An administrator needs to examine operating system log files for any anomalous events. In real-life log data, the number of anomalies is often smaller than the normal ones. This imbalance situation affects the performance of the anomaly detectors because a large number of normal events feed the training of the classifier. In this paper, we evaluate popular machine learning methods and consider this problem of data imbalance. We compare data oversampling and undersampling approaches before inputting them to the classifier. Experimental results demonstrate that by taking data imbalance into consideration, there is an improvement in the method performance in terms of precision and recall scores.

Conference paper Open access

Automatic event log abstraction to support forensic investigation

by H. Studiawan, F. Sohel and C. Payne

Published 2020

Proceedings of the Australasian Computer Science Week Multiconference

ACSW '20: Australasian Computer Science Week 2020, 03/02/2020–07/02/2020, Swinburne University of Technology, Melbourne

Abstraction of event logs is the creation of a template that contains the most common words representing all members in a group of event log entries. Abstraction helps the forensic investigators to obtain an overall view of the main events in a log file. Existing log abstraction methods require user input parameters. This manual input is time consuming due to the need to identify the best parameters, especially when a log file is large. We propose an automatic method to facilitate event log abstraction avoiding the need for the user to manually identify suitable parameters. We model event logs as a graph and propose a new graph clustering approach to group log entries. The abstraction is then extracted from each cluster. Experimental results show that the proposed method achieves superior performance compared to existing approaches with an F-measure of 95.35%.

Conference paper

Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning

by M.Z. Hossain, F. Sohel, M.F. Shiratuddin, H. Laga and M. Bennamoun

Published 2019

2019 Digital Image Computing: Techniques and Applications (DICTA)

Digital Image Computing: Techniques and Applications (DICTA) 2019, 02/12/2019–04/12/2019, Hyatt Regency Perth, Australia

In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.

Conference paper

An improved approach to weakly supervised semantic segmentation

by L. Xu, M. Bennamoun, F. Boussaid, S. An and F. Sohel

Published 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, 12/05/2019–17/05/2019, Brighton, United Kingdom

Weakly supervised semantic segmentation with image-level labels is of great significance since it alleviates the dependency on dense annotations. However, it is a challenging task as it aims to achieve a mapping from high-level semantics to low-level features. In this work, we propose a three-step method to bridge this gap. First, we rely on the interpretable ability of deep neural networks to generate attention maps with class localization information by back-propagating gradients. Secondly, we employ an off-the-shelf object saliency detector with an iterative erasing strategy to obtain saliency maps with spatial extent information of objects. Finally, we combine these two complementary maps to generate pseudo ground-truth images for the training of the segmentation network. With the help of the pre-trained model on the MS-COCO dataset and a multi-scale fusion method, we obtained mIoU of 62.1% and 63.3% on PASCAL VOC 2012 val and test sets, respectively, achieving new state-of-the-art results for the weakly supervised semantic segmentation task.

Conference paper Peer reviewed

MON: Multiple Output Neurons

by Y. Jan, F. Sohel, M.F. Shiratuddin and K.W. Wong

Published 2019

Neural Information Processing, 1143

26th International Conference, ICONIP 2019, 12/12/2019–15/12/2019, Sydney, NSW

Existing basic artificial neurons merge multiple weighted inputs and generate a single activated output. This paper explores the applicability of a new structure of a neuron, which merges multiple weighted inputs like existing neurons, but instead of generating single output, it generates multiple outputs. The proposed “Multiple Output Neuron” (MON) can reduce computation in a basic XOR network. Furthermore, a MON based convolutional neural network layer (MONL) is described. Proposed MONL can backpropagate errors, thus can be used along with other CNN layers. MONL reduces the network computations, by reducing the number of filters. Reduced number of filters limits the network performance, thus MON based neuroevolution (MON-EVO) technique is also proposed. MON-EVO evolves the MONs into single output neurons for further improvement in training. Existing neuroevolution techniques do not utilize backpropagation but MONs can utilize backpropagation. Experimental networks trained using the CIFAR-10 classification dataset show that proposed MONL and MON-EVO provide a solution for reduced training computation and neuroevolution using backpropagation.

Conference paper

Improving follicular lymphoma identification using the class of interest for transfer learning

by U.V. Somaratne, K.W. Wong, J. Parry, F. Sohel, X. Wang and H. Laga

Published 2019

2019 Digital Image Computing: Techniques and Applications (DICTA)

Digital Image Computing: Techniques and Applications (DICTA) 2019, 02/12/2019–04/12/2019, Hyatt Regency Perth, Australia

Follicular Lymphoma (FL) is a type of lymphoma that grows silently and is usually diagnosed in its later stages. To increase the patients' survival rates, FL requires a fast diagnosis. While, traditionally, the diagnosis is performed by visual inspection of Whole Slide Images (WSI), recent advances in deep learning techniques provide an opportunity to automate this process. The main challenge, however, is that WSI images often exhibit large variations across different operating environments, hereinafter referred to as sites. As such, deep learning models usually require retraining using labeled data from each new site. This is, however, not feasible since the labelling process requires pathologists to visually inspect and label each sample. In this paper, we propose a deep learning model that uses transfer learning with fine-tuning to improve the identification of Follicular Lymphoma on images from new sites that are different from those used during training. Our results show that the proposed approach improves the prediction accuracy with 12% to 52% compared to the initial prediction of the model for images from a new site in the target environment.

Ferdous Sohel

Professor, Information Technology

Output list