This pull request significantly extends the project's hardware compatibility by integrating a dedicated SpacemiT Execution Provider for ONNX Runtime. The changes enable efficient model inference on SpacemiT RISC-V CPUs, leveraging their RVV1.0 capabilities. This involves updates to the build system, new CMake modules for toolchain and ONNX Runtime package handling, and modifications to the core provider and session management logic to recognize and configure the SpacemiT EP.
This PR adds RK NPU support for SenseVoice non-streaming ASR models by implementing a new RKNN backend with greedy CTC decoding.
- Adds offline RKNN implementation for SenseVoice models including model loading, feature processing, and CTC decoding
- Introduces export tools to convert SenseVoice models from PyTorch to ONNX and then to RKNN format
- Implements provider-aware validation to prevent mismatched model and provider usage
This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC.
- Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path
- Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.)
- Provides comprehensive examples and tests across all supported platforms and languages
This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations.
- Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.)
- Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting
- Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages
This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate.
- Added new OnlineToneCtcModel implementation with specialized processing for T-one models
- Integrated T-one support into the existing CTC model pipeline and Python bindings
- Added Python example and test scripts for the new functionality
This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models.
- Adds scripts to download, process, and test T-one models in sherpa-onnx format
- Creates GitHub workflow for automated model export and publishing
- Updates kaldi-native-fbank dependency to version 1.22.1
This PR simplifies the usage of the non-Android Java API by providing platform-specific JAR files that include native shared libraries, eliminating the need for users to manually manage native dependencies.
- Refactored LibraryUtils.java to support multiple library loading methods including extracting from JAR resources
- Added build infrastructure to create platform-specific native library JAR files
- Introduced debug capabilities and improved error handling for library loading