Preprint
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
ArXiv.org
Cornell University
2023
Abstract
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
Details
- Title
- An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
- Authors/Creators
- Xu LinpingJiawei JiangDejun ZhangXianjun Xia - The University of Western AustraliaLi ChenYijian XiaoPiao DingShenyi SongSixing YinFerdous Sohel - Murdoch University, Centre for Crop and Food Innovation
- Publication Details
- ArXiv.org
- Publisher
- Cornell University
- Identifiers
- 991005634368607891
- Murdoch Affiliation
- Centre for Crop and Food Innovation; School of Information Technology
- Resource Type
- Preprint
Metrics
37 File views/ downloads
119 Record Views