Abstract
Comprehending biological reproduction and cellular metabolism is facilitated by the Enzyme Commission, which matches protein sequences to the biochemical reactions they catalyse through EC numbers. In recent years, several methods have been proposed for predicting enzyme function. However, these methods still encounter challenges. Firstly, traditional methods for manually designing enzyme features are complex and cumbersome, lacking an effective generalized method for embedding enzyme sequences. Secondly, the distribution gap between different enzymes is significant, which resulting in existing methods struggling to predict multilevel enzyme functions. Thirdly, traditional enzyme function prediction models only extract single view feature of enzyme, so there is still room for further improving the ability of these models to extract enzyme data. To address these challenges, a new multilevel enzyme function prediction model (SMENET) based on multi-view semantics is proposed. This method uses protein large language model to extract semantic information. Subsequently, this semantic information is fed into multiple information extraction network modules, followed by using Biologic Sematic Attention to integrate these views' information. Finally, a multi-view adaptive fusion network is designed to extract the best common representation between multiple semantic views. Extensive experiments were conducted on multiple datasets to validate the effectiveness of SMENET. The code and dataset of this study are available at https://github.com/zerohanwen/SMENET.