Output list
Conference proceeding
Intelligent Intrusion Detection in IoT: Integrating Machine Learning and Feature Automation
Date presented 20/03/2025
Proceedings (International Conference on Computer and Automation Engineering. Online), 374 - 379
17th International Conference on Computer and Automation Engineering (ICCAE) 2025, 20/03/2025–22/03/2025, Perth, Australia.
The rapid growth of the internet of things (IoT) raises serious security concerns, demanding effective protection from cyber-attacks. Intrusion detection depends on identifying key features, but despite advances in automation, manual feature selection remains necessary, limiting scalability. To address this limitation, we introduced a hybrid feature selection method that combines filter and wrapper techniques to automatically select important features and enhance the efficiency of machine learning (ML) models for intrusion detection tasks. We utilized the mutual information (MI) algorithm as the filter method and recursive feature elimination (RFE) as the wrapper method. We evaluated the performance of the proposed model on publicly available datasets, HIKARI2021 and UNSW-NB15. We compared the results with several existing methods, and our approach outperformed the state-of-the-art (SOTA) methods in terms of accuracy and training time. We presented comprehensive results, including both quantitative and qualitative analysis, to demonstrate the effectiveness and efficiency of our proposed methods.
Conference proceeding
Published 2025
Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998), 4317
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), 06/04/2025–11/04/2025, Hyderabad, India
The advancements in light detection and ranging (LiDAR) sensors and 3D object detection techniques have boosted their deployment in a wide range of applications, autonomous driving, in particular. However, it has been demonstrated that 3D object detection models based on deep neural networks exhibit vulnerabilities and tend to be susceptible to adversarial attacks. Nonetheless, there exists a scarcity of defensive strategies explicitly tailored for mitigating adversarial attacks on 3D object detection. In this paper, we introduce LiDAR-SPD, a novel approach to defend against adversarial attacks targeting LiDAR-based 3D object detectors. Specifically, a spherical purification unit is designed, which encompasses two pivotal processes: spherical projection and spherical diffusion. The former leverages a spatial projection strategy to eliminate adversarial point clouds inserted in occluded regions, while the latter employs a diffusion model to regenerate points, rendering it closer to a pristine LiDAR scene. Comprehensive experiments conducted on the KITTI dataset demonstrate that our proposed LiDAR-SPD method effectively thwarts various types of adversarial attacks, decreasing the attack success rates against 3D object detectors by 60%.
Conference proceeding
Superpoints Guided Local Explanation For Deep 3D Trackers
Published 2025
Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing
3D object tracking has become a popular research topic because of its broad application prospects. However, it remains a challenging task to advance the trustworthiness of deep trackers, caused by the complex network structure of black-box models. In this paper, a local explanation method for 3D object tracking is proposed, which trains an interpretable surrogate model to reveal the contribution of superpoints in the search area. Specifically, local points of the search area with comparable geometric features are aggregated as superpoints, which serve as the fundamentals of the explanation. In contrast to the commonly used voxels, superpoints capture semantic information of the search area, and facilitate an intuitive understanding of the predictions. In addition, a distance-aware masking strategy is proposed for generating the sample set to train a surrogate model, which corresponds to latent contributions of superpoints to predictions and improves the efficacy of the explanation. Experiments have demonstrated that the proposed explainability approach can effectively provide explanations to deep tracking models.
Conference proceeding
RSM: Refined Saliency Map For Explainable 3D Object Tracking
Published 2025
Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing
2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 06/04/2025–11/04/2025
Saliency maps play a major role in understanding the decision-making process of 3D models by illustrating the importance of individual points from the input to model predictions. However, saliency maps typically suffer from inaccuracies due to not considering the potential classification of contributions made by a point. In this paper, a two-stage explainability method for 3D object tracking is proposed to generate a refined saliency map (RSM), which refines the contributions of points to positive and negative based on their actual effects on tracking performances. Specifically, in stage I, a point-wise growing downsampling algorithm is developed to generate subsets of the search area, under which the model’s behavior is evaluated to precisely identify the points with negative contributions. Subsequently, a voxel-wise downsampling algorithm is performed along with the deviation metric to select points with positive contributions in stage II. Experiments demonstrate that RSM can generate high-quality explanations to popular 3D trackers.
Conference proceeding
Published 2025
Proceedings (International Conference on Computer and Automation Engineering. Online), 53 - 58
17th International Conference on Computer and Automation Engineering (ICCAE) 2025, 20/03/2025–22/03/2025, Perth, Australia
Electrocardiograms (ECGs) are essential tools for diagnosing cardiac conditions. However, analyzing 12-lead ECG signals manually is time-consuming, making automated classification crucial for efficient and accurate diagnosis. This study investigates both handcrafted and deep learning (DL)-based feature extraction techniques for classifying 12-lead ECG signals. The aim is to enhance diagnostic accuracy and efficiency. We extracted QRSTP peaks and computed various time-domain features, including heart rate, heart rate variability ( H R V ), R R intervals, median R-R intervals, SDNN, RMSSD, and PNN60, based on the R -peaks during the handcrafted feature extraction process from the 12 -lead ECG signals. Additionally, we applied several DL algorithms, including CNN, ResNet18, VGG16, and DenseNet83, to extract new global features from the raw ECG signals. We used a merged dataset from five different sources, including the "CPC Database" and "CPC Database Extra" from the China Physiological Signal Challenge 2018 (CPSC2018), the "St Petersburg INCART 12-lead Arrhythmia Database," the "PTB Diagnostic ECG Database" and "PTBXL" from the Physikalisch-Technische Bundesanstalt (PTB), the "Georgia Database," and an undisclosed American database. The dataset comprises 22,797 12-lead ECG recordings. We have conducted a series of experiments for performance evaluation for feature extraction and training models. The experimental results show that combining handcrafted and DL features outperforms the DL-based methods in improving classification performance. Both quantitative and qualitative studies, along with ablation experiments, are conducted to further validate our approach.
Conference proceeding
Unsupervised Symbolization with Adaptive Features for LoRa-Based Localization and Tracking
Date presented 18/12/2024
2024 International Conference on Sustainable Technology and Engineering (i-COSTE)
International Conference on Sustainable Technology and Engineering (i-COSTE), 18/12/2024–20/12/2024, Perth, WA
While LoRa overcomes the high-power consumption and deployment costs of GPS and mobile networks, it faces challenges in accuracy. This paper presents a method for LoRa-based localization and tracking. It uses unsupervised symbolization to analyze received signal features. We use partitioning, D-Markov machines for symbolization and the Chinese restaurant process to achieve unsupervised symbolization. In particular, a novel adaptive feature extraction technique is proposed in partitioning to overcome the problems of over-tracking and under-tracking. Mean spectral kurtosis analysis is performed across several partitioning techniques to assess their symbolization effectiveness. This enables the selection of the most appropriate partitioning technique. This enhances the localization and tracking accuracy of target objects by focusing on robustness to noise and multipath effects. The proposed method learns and estimates the distance range simultaneously, thereby eliminating the need for a separate offline training phase and the storage of reference coordinates. Experimental results using LoRa highlight the proposed method's efficacy in real-time localization, tracking, and superiority over the state-of-the-art method.
Conference proceeding
DEER: Deep Emotion-Sets for Fine-Grained Emotion Recognition
Published 2024
Proceedings - 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 158 - 165
25th International Conference on Digital Image Computing: Techniques and Applications (DICTA 2024), 27/11/2024–29/11/2024, Perth, WA
For robots to effectively interact with humans in-the-wild, it is essential that they accurately recognize their emotions. To achieve this, important facial features must be captured to reliably comprehend human emotions. Most facial emotion recognition (FER) research works have used single-shot images for classifying emotions, and in certain instances, several networks have been utilized for voting against each image. These approaches have functioned well; however, there is potential for improvement in terms of precision. In this paper, we propose emotion-sets as a unique encoding for face image data (with various people and face angles) to classify emotion classes, as opposed to the conventional single-image-based classification. For each image in an emotion-set, prediction confidence against each emotion is utilized as a vote. The results are generated by a combination of two distinct voting methods, including Majority Voting and Weighted Voting. The proposed method achieves state-of-the-art accuracy on the Facial Emotion Recognition 2013 (FER2013), Cohn Kanade (CK+), and Facial Emotion Recognition Group (FERG) datasets without using techniques like data augmentation, feature extraction, or extra training data, which are used by most state-of-the-art works. Our experimental findings indicate that the suggested emotion-set classification yields more accurate results than the current state-of-the-art FER methods.
Conference proceeding
Published 2024
Proceedings - 2024 25th International Conference on Digital Image Computing: Techniques and Applications, DICTA 2024, 9 - 16
25th International Conference on Digital Image Computing: Techniques and Applications (DICTA 2024), 27/11/2024–29/11/2024, Perth, WA
Learning to generate motions of thin structures such as plant leaves in dynamic view synthesis is challenging. This is because thin structures usually undergo small but fast, non-rigid motions as they interact with air and wind. When given a set of RGB images or videos of a scene with moving thin structures as input, existing methods that map the scene to its corresponding canonical space for rendering novel views fail as the object movements are too subtle compared to the background. Disentangling the objects with thin parts from the background scene is also challenging when the parts show fast and rapid motions. To address these issues, we propose a Neural Radiance Field (NeRF)-based framework that accurately reconstructs thin structures such as leaves and captures their subtle, fast motions. The framework learns the geometry of a scene by mapping the dynamic images to a canonical scene in which the scene remains static. We propose a ray masking network to further decompose the canonical scene into foreground and background, thus enabling the network to focus more on foreground movements. We conducted experiments using a dataset containing thin structures such as leaves and petals, which include image sequences collected by us and one public image sequence. Experiments show superior results compared to existing methods. Video outputs are available at https://dythinobjects.com/.
Conference proceeding
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Published 2023
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023-, 800 - 803
Interspeech 2023, 20/08/2023–24/08/2023, Dublin, Ireland
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, underutilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beamsearch Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
Conference proceeding
Published 2018
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1493 - 1497
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15/04/2018–20/04/2018, Calgary, AB, Canada
Coral species, with complex morphology and ambiguous boundaries, pose a great challenge for automated classification. CNN activations, which are extracted from fully connected layers of deep networks (FC features), have been successfully used as powerful universal representations in many visual tasks. In this paper, we investigate the transferability and combined performance of FC features and CONY features (extracted from convolutional layers) in the coral classification of two image modalities (reflectance and fluorescence), using a typical deep network (e.g. VGGNet). We exploit vector of locally aggregated descriptors (VLAD) encoding and principal component analysis (PCA) to compress dense CONY features into a compact representation. Experimental results demonstrate that encoded CONV3 features achieve superior performances on reflectance and fluorescence coral images, compared to FC features. The combination of these two features further improves the overall accuracy and achieves state-of-the-art performance on the challenging EFC dataset.