97 Commits

Author SHA1 Message Date
Fangjun Kuang
b1db3eaa8d
Export Whisper to RK NPU (#2983) 2026-01-05 19:46:30 +08:00
Fangjun Kuang
13b8b84a89
Add C and CXX API for Google MedASR model (#2946) 2025-12-29 09:56:08 +08:00
Fangjun Kuang
afa59281c1
Build APKs for MatchaTTS Chinese+English (#2882) 2025-12-10 15:57:56 +08:00
Fangjun Kuang
8fac37f7d1
Load QNN context binary for faster startup (#2877) 2025-12-09 17:55:19 +08:00
alex-spacemit
586cd19e22
Add spacemit ort ep for spacemit riscv cpus (#2837)
This pull request significantly extends the project's hardware compatibility by integrating a dedicated SpacemiT Execution Provider for ONNX Runtime. The changes enable efficient model inference on SpacemiT RISC-V CPUs, leveraging their RVV1.0 capabilities. This involves updates to the build system, new CMake modules for toolchain and ONNX Runtime package handling, and modifications to the core provider and session management logic to recognize and configure the SpacemiT EP.
2025-12-02 14:36:31 +08:00
Fangjun Kuang
d1c458b95d
Add C++ QNN support for Zipformer CTC models. (#2809) 2025-11-24 18:14:22 +08:00
Fangjun Kuang
16d62b6a08
Add Android demo with QNN (Qualcomm NPU) for SenseVoice ASR (#2803) 2025-11-20 17:22:07 +08:00
Fangjun Kuang
2fcde7d3c6
Support hotwords with byte level bpe (#2802) 2025-11-19 18:21:16 +08:00
Fangjun Kuang
1832b35070
Add C# API for Omnilingual ASR CTC models (#2775) 2025-11-13 15:12:20 +08:00
Fangjun Kuang
c691318b95
Support RK NPU for SenseVoice non-streaming ASR models (#2589)
This PR adds RK NPU support for SenseVoice non-streaming ASR models by implementing a new RKNN backend with greedy CTC decoding.

- Adds offline RKNN implementation for SenseVoice models including model loading, feature processing, and CTC decoding
- Introduces export tools to convert SenseVoice models from PyTorch to ONNX and then to RKNN format
- Implements provider-aware validation to prevent mismatched model and provider usage
2025-09-12 10:46:38 +08:00
Fangjun Kuang
7e42ba2c0c
Add various languge bindings for Wenet non-streaming CTC models (#2584)
This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC.

- Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path
- Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.)
- Provides comprehensive examples and tests across all supported platforms and languages
2025-09-10 18:52:18 +08:00
Fangjun Kuang
686b909e2f
Add various language bindings for streaming T-one Russian ASR models (#2576)
This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations.

- Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.)
- Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting
- Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages
2025-09-09 16:51:18 +08:00
Fangjun Kuang
858b5052a2
Add C++ and Python support for T-one streaming Russian ASR models (#2575)
This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate.

- Added new OnlineToneCtcModel implementation with specialized processing for T-one models
- Integrated T-one support into the existing CTC model pipeline and Python bindings
- Added Python example and test scripts for the new functionality
2025-09-09 12:07:34 +08:00
Fangjun Kuang
e4f48ce6a6
Export models from https://github.com/voicekit-team/T-one to sherpa-onnx (#2571)
This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models.

- Adds scripts to download, process, and test T-one models in sherpa-onnx format
- Creates GitHub workflow for automated model export and publishing
- Updates kaldi-native-fbank dependency to version 1.22.1
2025-09-08 17:22:23 +08:00
Fangjun Kuang
7c9d071ef7
Simplify the usage of our non-Android Java API (#2533)
This PR simplifies the usage of the non-Android Java API by providing platform-specific JAR files that include native shared libraries, eliminating the need for users to manually manage native dependencies.

- Refactored LibraryUtils.java to support multiple library loading methods including extracting from JAR resources
- Added build infrastructure to create platform-specific native library JAR files
- Introduced debug capabilities and improved error handling for library loading
2025-08-26 20:13:07 +08:00
Fangjun Kuang
e8dd5cd2a0
Split sherpa-onnx Python package (#2521) 2025-08-25 10:16:58 +08:00
yangjun
6eac1af8ac
Fix ctrl+c may lead to coredump (#2511) 2025-08-19 18:31:34 +08:00
Fangjun Kuang
bfbd603342
Add Kotlin and Java API for KittenTTS (#2461) 2025-08-07 22:19:11 +08:00
Fangjun Kuang
6b16c0b864
Export https://github.com/KittenML/KittenTTS to sherpa-onnx (#2456) 2025-08-07 11:59:40 +08:00
Fangjun Kuang
9d25c90a59
Add JavaScript API (node-addon) for homophone replacer (#2158) 2025-04-28 20:52:42 +08:00
Fangjun Kuang
eee5575836
Add Kotlin and Java API for Dolphin CTC models (#2086) 2025-04-02 21:16:14 +08:00
Fangjun Kuang
3420c89883
Export silero_vad v4 to RKNN (#2067) 2025-03-30 12:00:52 +08:00
cjsdurj
b87fce9a7f
c-api add wave write to buffer. (#1962)
Co-authored-by: jian.chen03 <jian.chen03@transwarp.io>
2025-03-10 17:21:23 +08:00
ivan provalov
94728bfbee
Fixing Whisper Model Token Normalization (#1904) 2025-02-21 12:58:01 +08:00
Fangjun Kuang
316424b382
Add C++ and Python API for FireRedASR AED models (#1867) 2025-02-16 22:45:24 +08:00
Fangjun Kuang
c84a833863
Add C++ and Python API for Kokoro 1.0 multilingual TTS model (#1795) 2025-02-06 22:57:13 +08:00
Fangjun Kuang
08cefe8488
Export Kokoro 1.0 to sherpa-onnx (#1788) 2025-02-05 08:24:43 +08:00
Fangjun Kuang
af671e2b63
Add C API for Kokoro TTS models (#1717) 2025-01-16 15:07:26 +08:00
Fangjun Kuang
a00d3b4821
Add Java API for Matcha-TTS models. (#1673) 2025-01-02 15:15:30 +08:00
Fangjun Kuang
3422b9388d
Add Kotlin API for Matcha-TTS models. (#1668) 2024-12-31 19:20:52 +08:00
Fangjun Kuang
314545f938
Add speaker identification APIs for HarmonyOS (#1607)
* Add speaker embedding extractor API for HarmonyOS

* Add ArkTS API for speaker identification
2024-12-09 19:23:18 +08:00
Fangjun Kuang
bd4b223920
Add Kotlin and Java API for Moonshine models (#1474) 2024-10-26 22:30:29 +08:00
Fangjun Kuang
d468527f62
C API for speaker diarization (#1402) 2024-10-09 17:10:03 +08:00
Fangjun Kuang
70165cb42d
Speaker diarization example with onnxruntime Python API (#1395) 2024-10-06 16:37:29 +08:00
Lim Yao Chong
3bffc24d64
Add Python binding for online punctuation models (#1312) 2024-09-09 10:26:53 +08:00
Fangjun Kuang
6b8877f185
Downgrade flutter sdk versions. (#1305) 2024-08-30 11:47:27 +08:00
Fangjun Kuang
65f1c0fab2
Add Pascal API for reading wave files (#1243) 2024-08-11 22:43:42 +08:00
Fangjun Kuang
94e256244d
Add blank penalty for various language bindings. (#1234) 2024-08-08 10:43:31 +08:00
Fangjun Kuang
994c3e7c96
Add VAD + Non-streaming ASR example for JavaScript API. (#1170) 2024-07-26 12:42:08 +08:00
Fangjun Kuang
25f0a10468
Add C++ runtime for SenseVoice models (#1148) 2024-07-18 22:54:18 +08:00
Fangjun Kuang
dd0ff2ca06
Support onnxruntime 1.18.0 (#906) 2024-07-10 17:05:26 +08:00
Fangjun Kuang
1fe12c5107
Support the platform iOS for Flutter (#1079) 2024-07-06 19:43:37 +08:00
Fangjun Kuang
f5e9a162d1
Publish flutter packages for Android (#1074) 2024-07-04 20:07:07 +08:00
Fangjun Kuang
6e09933d99
Inverse text normalization API for other programming languages (#1019) 2024-06-17 17:02:39 +08:00
Fangjun Kuang
fd5a0d1e00
Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970) 2024-06-05 00:26:40 +08:00
Fangjun Kuang
031134b4d4
Add TTS for node-addon-api (#871) 2024-05-13 19:24:09 +08:00
Fangjun Kuang
17cd3a5f01
Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854) 2024-05-10 12:15:39 +08:00
Fangjun Kuang
2f9553d838
Begin to add node-addon-api for sherpa-onnx (#826) 2024-05-03 14:47:40 +08:00
Fangjun Kuang
88202f05bb
Add Java API for audio tagging (#820) 2024-04-28 22:26:04 +08:00
Fangjun Kuang
f2d074aea9
Fix a bug for offline paraformer (#816) 2024-04-26 16:40:42 +08:00