1393 Commits

Author SHA1 Message Date
Fangjun Kuang
aecc39418d
Fix building wheels (#2619) 2025-09-22 16:52:55 +08:00
colourmebrad
ef5c23e6c9
exposing online punctuation model support in node-addon-api (#2609)
* exposing online punctuation model support in node-addon-api

* renaming nodejs-addon-examples/test_punctuation.js to test_offline_punctuation.js

* adding test_online_punctuation to nodejs-addon-examples and updating CI to run test_offline_punctuation and test_online_punctuation
2025-09-19 23:29:55 +08:00
Fangjun Kuang
26aa2fa932
Release v1.12.14 (#2608) v1.12.14 2025-09-18 15:09:10 +08:00
Fangjun Kuang
86af28157b
Add a C++ example for simulated streaming ASR (#2607) 2025-09-18 14:59:50 +08:00
Fangjun Kuang
9102f34179
Fix TDT decoding for NeMo TDT transducers (#2606) 2025-09-18 10:52:29 +08:00
Fangjun Kuang
a45384b874
Provide pre-compiled whls for cuda 12.x on Linux x64 and Windows x64 (#2601) 2025-09-15 17:12:45 +08:00
Fangjun Kuang
aa66810c5c
Provide pre-compiled shepra-onnx libs/binaries for CUDA 12.x + onnxruntime 1.22.0 (#2599) 2025-09-15 12:05:21 +08:00
Fangjun Kuang
bff2691e8c
Add CI tests for dart spoken language identifcation example (#2598) 2025-09-15 09:28:34 +08:00
Kirill Bukaev
12b96ac2da
Add Dart API for spoken language identification (#2596) 2025-09-15 09:12:11 +08:00
Fangjun Kuang
1b9987dc42
Fix setting rknn core mask (#2594)
We need to set the core mask after `rknn_dup_context()`.
2025-09-12 21:18:17 +08:00
Fangjun Kuang
32c248b8a0
Release v1.12.13 (#2593) v1.12.13 2025-09-12 16:03:15 +08:00
Fangjun Kuang
c415092fef
Upload RKNN models for sense-voice (#2592) 2025-09-12 15:54:03 +08:00
Fangjun Kuang
c691318b95
Support RK NPU for SenseVoice non-streaming ASR models (#2589)
This PR adds RK NPU support for SenseVoice non-streaming ASR models by implementing a new RKNN backend with greedy CTC decoding.

- Adds offline RKNN implementation for SenseVoice models including model loading, feature processing, and CTC decoding
- Introduces export tools to convert SenseVoice models from PyTorch to ONNX and then to RKNN format
- Implements provider-aware validation to prevent mismatched model and provider usage
2025-09-12 10:46:38 +08:00
Fangjun Kuang
926b288525
Fix initializing symbol table for OnlineRecognizer. (#2590) 2025-09-12 09:37:06 +08:00
Fangjun Kuang
04a98ca8bf
Release v1.12.12 (#2586) v1.12.12 2025-09-10 22:55:01 +08:00
Fangjun Kuang
7e42ba2c0c
Add various languge bindings for Wenet non-streaming CTC models (#2584)
This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC.

- Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path
- Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.)
- Provides comprehensive examples and tests across all supported platforms and languages
2025-09-10 18:52:18 +08:00
Fangjun Kuang
71f87e1808
Export ASLP-lab/WSYue-ASR/tree/main/u2pp_conformer_yue to sherpa-onnx (#2582) 2025-09-10 14:27:09 +08:00
Fangjun Kuang
19b01899cb
Upload new sense-voice models (#2580) 2025-09-10 09:41:33 +08:00
Fangjun Kuang
9a73770eab
Export KittenTTS mini v0.1 to sherpa-onnx (#2578) 2025-09-09 18:33:37 +08:00
Fangjun Kuang
a1d6592d48
Fix the missing online punctuation in android aar (#2577) 2025-09-09 18:01:43 +08:00
Fangjun Kuang
686b909e2f
Add various language bindings for streaming T-one Russian ASR models (#2576)
This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations.

- Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.)
- Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting
- Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages
2025-09-09 16:51:18 +08:00
Fangjun Kuang
858b5052a2
Add C++ and Python support for T-one streaming Russian ASR models (#2575)
This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate.

- Added new OnlineToneCtcModel implementation with specialized processing for T-one models
- Integrated T-one support into the existing CTC model pipeline and Python bindings
- Added Python example and test scripts for the new functionality
2025-09-09 12:07:34 +08:00
Fangjun Kuang
e4f48ce6a6
Export models from https://github.com/voicekit-team/T-one to sherpa-onnx (#2571)
This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models.

- Adds scripts to download, process, and test T-one models in sherpa-onnx format
- Creates GitHub workflow for automated model export and publishing
- Updates kaldi-native-fbank dependency to version 1.22.1
2025-09-08 17:22:23 +08:00
Fangjun Kuang
e870afc0e6
Update README to include https://github.com/Mentra-Community/MentraOS (#2565)
This PR adds documentation for MentraOS, a smart glasses operating system that integrates sherpa-onnx for speech recognition functionality. The addition showcases another real-world application using the sherpa-onnx library.

- Adds a new section documenting MentraOS integration with sherpa-onnx
- Includes description of MentraOS features and platform support
- References related pull request for implementation details
2025-09-05 16:23:28 +08:00
Fangjun Kuang
4167b86ca1
Add hint for loading model files from SD card on Android. (#2564)
This PR adds a helpful hint for Android developers who are trying to load model files from the SD card instead of the app's assets. The change detects when an absolute path is provided while an asset manager is still being used, which is a common configuration mistake.

- Adds validation to detect absolute paths when using Android asset manager
- Provides clear error messages guiding users to set assetManager to null for SD card file access
- References the related issue for additional context (#2562)
2025-09-05 16:06:42 +08:00
Fangjun Kuang
1568ac27eb
Avoid appending blanks for Cantonese vits tts. (#2559) 2025-09-04 15:01:20 +08:00
Fangjun Kuang
e254c38f08
Fix cantonese vits tts (#2558) 2025-09-04 14:00:14 +08:00
Fangjun Kuang
0823ddcbbb
Disable loading libs from jar on Android. (#2557)
This PR disables loading native libraries from JAR resources specifically on Android platforms. The change prevents potential issues with JAR-based library loading on Android while maintaining compatibility with other platforms.
2025-09-04 12:13:27 +08:00
凌封
daac04bdaf
Support armv8l in Java API (#2556) 2025-09-02 20:13:19 +08:00
Fangjun Kuang
b0f355721b
Update kaldifst and kaldi-decoder (#2551) 2025-09-01 16:59:03 +08:00
Fangjun Kuang
c2cad93ef4
Fix using sherpa-onnx as a cmake sub-project. (#2550) 2025-09-01 15:29:19 +08:00
Fangjun Kuang
0b5af832ec
Fix building for risc-v (#2549) 2025-09-01 15:04:51 +08:00
Fangjun Kuang
a9187d5c75
Release v1.12.11 (#2547) v1.12.11 2025-09-01 14:09:24 +08:00
Fangjun Kuang
f0e68cdee1
Fix linking (#2546) 2025-09-01 11:59:46 +08:00
Fangjun Kuang
27311b8aea
Fix c api (#2545)
This PR fixes the C API by adding proper support for durations in offline recognition results. The issue addresses problems introduced in a previous PR where the durations field was added to the C API struct but not properly handled across all language bindings.

Key changes:

- Adds durations field handling across multiple language bindings (Swift, Kotlin, Java, C#)
- Fixes field ordering in C API struct to ensure ABI compatibility
- Updates JNI implementation to properly extract and pass durations data
2025-09-01 11:23:49 +08:00
Wei Kang
c149696cb3
Add Zipvoice (#2487)
Co-authored-by: yaozengwei <yaozengwei@outlook.com>
2025-08-27 19:50:00 +08:00
Fangjun Kuang
6768ca7893
Fix uploading win32 libs to huggingface (#2537)
This PR fixes the uploading process for win32 libraries to Hugging Face by updating Windows OS detection and correcting the file copy destination path.

- Replaces deprecated wmic command with PowerShell-based OS detection for better reliability
- Adds fallback mechanism using cmd /c ver when PowerShell is unavailable
- Corrects the destination path for win32 library archives to include version subdirectory
2025-08-27 16:47:53 +08:00
Fangjun Kuang
d30aa980b7
Add one more German tts model from OpenVoiceOS. (#2536) 2025-08-26 23:19:31 +08:00
Fangjun Kuang
408808b30a
Fix wasm for kws (#2535) 2025-08-26 22:30:04 +08:00
Fangjun Kuang
7c9d071ef7
Simplify the usage of our non-Android Java API (#2533)
This PR simplifies the usage of the non-Android Java API by providing platform-specific JAR files that include native shared libraries, eliminating the need for users to manually manage native dependencies.

- Refactored LibraryUtils.java to support multiple library loading methods including extracting from JAR resources
- Added build infrastructure to create platform-specific native library JAR files
- Introduced debug capabilities and improved error handling for library loading
2025-08-26 20:13:07 +08:00
Fangjun Kuang
9d0adcd3f5
Support BPE models with byte fallback. (#2531) 2025-08-26 12:03:02 +08:00
Fangjun Kuang
f45cd87a24
Add license info about tts models from OpenVoiceOS (#2530) 2025-08-26 07:24:02 +08:00
Fangjun Kuang
eaf2eb2ed5
Fix releasing go packages (#2529) 2025-08-25 20:01:02 +08:00
Fangjun Kuang
f1f8149a47
Generate tts samples for MatchaTTS (English). (#2527) 2025-08-25 16:04:50 +08:00
Fangjun Kuang
4694d675bd
Add two more Piper tts models (#2525)
This PR adds support for two new Piper TTS (Text-to-Speech) models: an Indonesian model (id_ID-news_tts-medium) and a Hindi model (hi_IN-rohan-medium).
2025-08-25 14:42:25 +08:00
Fangjun Kuang
6b1fbdedd2
Release v1.12.10 (#2523) v1.12.10 2025-08-25 11:49:31 +08:00
Fangjun Kuang
3d5d1b9b3c
Fix kokoro tts for punctuations (#2522) 2025-08-25 11:06:28 +08:00
Fangjun Kuang
e8dd5cd2a0
Split sherpa-onnx Python package (#2521) 2025-08-25 10:16:58 +08:00
Fangjun Kuang
44a92efbdc
Support 16KB page size for Android (#2520) 2025-08-25 10:00:51 +08:00
Brad Murray
06ae4a7c15
Add tdt duration to APIs (#2514) 2025-08-21 10:55:04 +08:00