Output list
Book chapter
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-Shaped Structures
Published 2024
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXVII, 326 - 341
We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT) [21]. By solving the spatial registration in the SRVFT space, which is equipped with an L2\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathbb {L}^2$$\end{document} metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.
Book chapter
RCNN for region of interest detection in whole slide images
Published 2020
Neural Information Processing, 1333, 625 - 632
Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN, which is a deep machine learning technique, for detecting such ROIs only using a small number of labelled WSIs for training. For experimentation, we used real WSIs from a public hospital pathology service in Western Australia. We used 60 WSIs for training the RCNN model and another 12 WSIs for testing. The model was further tested on a new set of unseen WSIs. The results show that RCNN can be effectively used for ROI detection from WSIs.
Book chapter
Deep learning for scene understanding
Published 2019
Advances in Computational Intelligence, 509, 21 - 51
With the progress in the field of computer vision, we are moving closer and closer towards the ultimate aim of human like vision for machines. Scene understanding is an essential part of this research. It seeks the goal that any image should be as understandable and decipherable for computers as it is for humans. The stall in the progress of the different components of scene understanding, due to the limitations of the traditional algorithms, has now been broken by the induction of neural networks for computer vision tasks. The advancements in parallel computational hardware has made it possible to train very deep and complex neural network architectures. This has vastly improved the performances of algorithms for all the different components of scene understanding. This chapter analyses these contributions of deep learning and also presents the advancements of high level scene understanding tasks, such as caption generation for images. It also sheds light on the need to combine these individual components into an integrated system.
Book chapter
Automatic Graph-Based clustering for security logs
Published 2019
Primate Life Histories, Sex Roles, and Adaptability, 914 - 926
Computer security events are recorded in several log files. It is necessary to cluster these logs to discover security threats, detect anomalies, or identify a particular error. A problem arises when large quantities of security log data need to be checked as existing tools do not provide sufficiently sophisticated grouping results. In addition, existing methods need user input parameters and it is not trivial to find optimal values for these. Therefore, we propose a method for the automatic clustering of security logs. First, we present a new graph-theoretic approach for security log clustering based on maximal clique percolation. Second, we add an intensity threshold to the obtained maximal clique to consider the edge weight before proceeds to the percolations. Third, we use the simulated annealing algorithm to optimize the number of percolations and intensity threshold for maximal clique percolation. The entire process is automatic and does not need any user input. Experimental results on various real-world datasets show that the proposed method achieves superior clustering results compared to other methods.
Book chapter
Deep learning for marine species recognition
Published 2019
Advances in Computational Intelligence, 509, 129 - 145
Research on marine species recognition is an important part of the actions for the protection of the ocean environment. It is also an under-exploited application area in the computer vision community. However, with the developments of deep learning, there has been an increasing interest about this topic. In this chapter, we present a comprehensive review of the computer vision techniques for marine species recognition, mainly from the perspectives of both classification and detection. In particular, we focus on capturing the evolution of various deep learning techniques in this area. We further compare the contemporary deep learning techniques with traditional machine learning techniques, and discuss the complementary issues between these two approaches. This chapter examines the attributes and challenges of a number of popular marine species datasets (which involve coral, kelp, plankton and fish) on recognition tasks. In the end, we highlight a few potential future application areas of deep learning in marine image analysis such as segmentation and enhancement of image quality.
Book chapter
Global regularizer and temporal-aware cross-entropy for skeleton-based early action recognition
Published 2019
Computer Vision – ACCV 2018, 11364, 729 - 745
In this paper, we propose a new approach to recognize the class label of an action before this action is fully performed based on skeleton sequences. Compared to action recognition which uses fully observed action sequences, early action recognition with partial sequences is much more challenging mainly due to: (1) the global information of a long-term action is not available in the partial sequence, and (2) the partial sequences at different observation ratios of an action contain a number of sub-actions with diverse motion information. To address the first challenge, we introduce a global regularizer to learn a hidden feature space, where the statistical properties of the partial sequences are similar to those of the full sequences. We introduce a temporal-aware cross-entropy to address the second challenge and achieve better prediction performance. We evaluate the proposed method on three challenging skeleton datasets. Experimental results show the superiority of the proposed method for skeleton-based early action recognition.
Book chapter
Computer vision for human-machine interaction
Published 2018
Computer Vision for Assistive Healthcare, 127 - 145
Human-machine Interaction (HMI) refers to the communication and interaction between a human and a machine via a user interface. Nowadays, natural user interfaces such as gestures have gained increasing attention as they allow humans to control machines through natural and intuitive behaviours. In gesture-based HMI, a sensor such as Microsoft Kinect is used to capture the human postures and motions, which are processed to control a machine. The key task of gesture-based HMI is to recognize the meaningful expressions of human motions using the data provided by kinect, including RGB(red, green, blue), depth, and skeleton information. In this chapter, we focus on the gesture recognition task for HMI and introduce current deep learning methods that have been used for human motion analysis and RGB-D-based gesture recognition. More specifically, we briefly introduce the convolutional neural networks (CNNs), and the present several deep learning frameworks based on CNNs that have been used for gesture recognition by using RGB, depth and skeleton sequences.
Book chapter
Deep learning for coral classification
Published 2017
Handbook of Neural Computation, 383 - 401
This chapter presents a summary of the use of deep learning for underwater image analysis, in particular for coral species classification. Deep learning techniques have achieved the state-of-the-art results in various computer vision tasks such as image classification, object detection, and scene understanding. Marine ecosystems are complex scenes and hence difficult to tackle from a computer vision perspective. Automated technology to monitor the health of our oceans can facilitate in detecting and identifying marine species while freeing up experts from the repetitive task of manual annotation. Classification of coral species is a challenging task in itself and deep learning has a potential of solving this problem efficiently.
Book chapter
Deep neural networks for mobile person recognition with audio-visual signals
Published 2017
Mobile Biometrics, 97 - 129
This chapter starts with a general and brief introduction of biometrics and audiovisual person recognition using mobile phone data. It begins with a discussion of what constitutes a biometric recognition system, and it then details the steps followed when audio-visual signals are used as inputs. This is followed by a review of the existing speaker and face recognition systems which have been evaluated on a mobile biometric database. We then discuss the key motivations of using deep neural network (DNN) for person recognition. We finally introduce a Deep Boltzmann Machine (DBM)- DNN, in short DBM-DNN, based framework for person recognition. An overview of the sections and sub-sections of this chapter is shown in Figure 4.1.
Book chapter
Video coding for mobile communications
Published 2008
Mobile Multimedia Communications, 109 - 150
With the significant influence and increasing requirements of visual mobile communications in our everyday lives, low bit-rate video coding to handle the stringent bandwidth limitations of mobile networks has become a major research topic. With both processing power and battery resources being inherently constrained, and signals having to be transmitted over error-prone mobile channels, this has mandated the design requirement for coders to be both low complexity and robust error resilient. To support multilevel users, any encoded bit-stream should also be both scalable and embedded. This chapter presents a review of appropriate image and video coding techniques for mobile communication applications and aims to provide an appreciation of the rich and far-reaching advancements taking place in this exciting field, while concomitantly outlining both the physical significance of popular quality image and video coding metrics and some of the research challenges that remain to be resolved.