k2-fsa_sherpa-onnx

mirror of https://github.com/k2-fsa/sherpa-onnx.git synced 2026-01-09 07:41:06 +08:00

Author	SHA1	Message	Date
Fangjun Kuang	aecc39418d	Fix building wheels (#2619 )	2025-09-22 16:52:55 +08:00
colourmebrad	ef5c23e6c9	exposing online punctuation model support in node-addon-api (#2609 ) * exposing online punctuation model support in node-addon-api * renaming nodejs-addon-examples/test_punctuation.js to test_offline_punctuation.js * adding test_online_punctuation to nodejs-addon-examples and updating CI to run test_offline_punctuation and test_online_punctuation	2025-09-19 23:29:55 +08:00
Fangjun Kuang	26aa2fa932	Release v1.12.14 (#2608 ) v1.12.14	2025-09-18 15:09:10 +08:00
Fangjun Kuang	86af28157b	Add a C++ example for simulated streaming ASR (#2607 )	2025-09-18 14:59:50 +08:00
Fangjun Kuang	9102f34179	Fix TDT decoding for NeMo TDT transducers (#2606 )	2025-09-18 10:52:29 +08:00
Fangjun Kuang	a45384b874	Provide pre-compiled whls for cuda 12.x on Linux x64 and Windows x64 (#2601 )	2025-09-15 17:12:45 +08:00
Fangjun Kuang	aa66810c5c	Provide pre-compiled shepra-onnx libs/binaries for CUDA 12.x + onnxruntime 1.22.0 (#2599 )	2025-09-15 12:05:21 +08:00
Fangjun Kuang	bff2691e8c	Add CI tests for dart spoken language identifcation example (#2598 )	2025-09-15 09:28:34 +08:00
Kirill Bukaev	12b96ac2da	Add Dart API for spoken language identification (#2596 )	2025-09-15 09:12:11 +08:00
Fangjun Kuang	1b9987dc42	Fix setting rknn core mask (#2594 ) We need to set the core mask after `rknn_dup_context()`.	2025-09-12 21:18:17 +08:00
Fangjun Kuang	32c248b8a0	Release v1.12.13 (#2593 ) v1.12.13	2025-09-12 16:03:15 +08:00
Fangjun Kuang	c415092fef	Upload RKNN models for sense-voice (#2592 )	2025-09-12 15:54:03 +08:00
Fangjun Kuang	c691318b95	Support RK NPU for SenseVoice non-streaming ASR models (#2589 ) This PR adds RK NPU support for SenseVoice non-streaming ASR models by implementing a new RKNN backend with greedy CTC decoding. - Adds offline RKNN implementation for SenseVoice models including model loading, feature processing, and CTC decoding - Introduces export tools to convert SenseVoice models from PyTorch to ONNX and then to RKNN format - Implements provider-aware validation to prevent mismatched model and provider usage	2025-09-12 10:46:38 +08:00
Fangjun Kuang	926b288525	Fix initializing symbol table for OnlineRecognizer. (#2590 )	2025-09-12 09:37:06 +08:00
Fangjun Kuang	04a98ca8bf	Release v1.12.12 (#2586 ) v1.12.12	2025-09-10 22:55:01 +08:00
Fangjun Kuang	7e42ba2c0c	Add various languge bindings for Wenet non-streaming CTC models (#2584 ) This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC. - Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path - Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.) - Provides comprehensive examples and tests across all supported platforms and languages	2025-09-10 18:52:18 +08:00
Fangjun Kuang	71f87e1808	Export ASLP-lab/WSYue-ASR/tree/main/u2pp_conformer_yue to sherpa-onnx (#2582 )	2025-09-10 14:27:09 +08:00
Fangjun Kuang	19b01899cb	Upload new sense-voice models (#2580 )	2025-09-10 09:41:33 +08:00
Fangjun Kuang	9a73770eab	Export KittenTTS mini v0.1 to sherpa-onnx (#2578 )	2025-09-09 18:33:37 +08:00
Fangjun Kuang	a1d6592d48	Fix the missing online punctuation in android aar (#2577 )	2025-09-09 18:01:43 +08:00
Fangjun Kuang	686b909e2f	Add various language bindings for streaming T-one Russian ASR models (#2576 ) This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations. - Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.) - Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting - Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages	2025-09-09 16:51:18 +08:00
Fangjun Kuang	858b5052a2	Add C++ and Python support for T-one streaming Russian ASR models (#2575 ) This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate. - Added new OnlineToneCtcModel implementation with specialized processing for T-one models - Integrated T-one support into the existing CTC model pipeline and Python bindings - Added Python example and test scripts for the new functionality	2025-09-09 12:07:34 +08:00
Fangjun Kuang	e4f48ce6a6	Export models from https://github.com/voicekit-team/T-one to sherpa-onnx (#2571 ) This PR exports models from the T-one repository (https://github.com/voicekit-team/T-one) to sherpa-onnx format, creating a complete pipeline for Russian speech recognition using streaming CTC models. - Adds scripts to download, process, and test T-one models in sherpa-onnx format - Creates GitHub workflow for automated model export and publishing - Updates kaldi-native-fbank dependency to version 1.22.1	2025-09-08 17:22:23 +08:00
Fangjun Kuang	e870afc0e6	Update README to include https://github.com/Mentra-Community/MentraOS (#2565 ) This PR adds documentation for MentraOS, a smart glasses operating system that integrates sherpa-onnx for speech recognition functionality. The addition showcases another real-world application using the sherpa-onnx library. - Adds a new section documenting MentraOS integration with sherpa-onnx - Includes description of MentraOS features and platform support - References related pull request for implementation details	2025-09-05 16:23:28 +08:00
Fangjun Kuang	4167b86ca1	Add hint for loading model files from SD card on Android. (#2564 ) This PR adds a helpful hint for Android developers who are trying to load model files from the SD card instead of the app's assets. The change detects when an absolute path is provided while an asset manager is still being used, which is a common configuration mistake. - Adds validation to detect absolute paths when using Android asset manager - Provides clear error messages guiding users to set assetManager to null for SD card file access - References the related issue for additional context (#2562)	2025-09-05 16:06:42 +08:00
Fangjun Kuang	1568ac27eb	Avoid appending blanks for Cantonese vits tts. (#2559 )	2025-09-04 15:01:20 +08:00
Fangjun Kuang	e254c38f08	Fix cantonese vits tts (#2558 )	2025-09-04 14:00:14 +08:00
Fangjun Kuang	0823ddcbbb	Disable loading libs from jar on Android. (#2557 ) This PR disables loading native libraries from JAR resources specifically on Android platforms. The change prevents potential issues with JAR-based library loading on Android while maintaining compatibility with other platforms.	2025-09-04 12:13:27 +08:00
凌封	daac04bdaf	Support armv8l in Java API (#2556 )	2025-09-02 20:13:19 +08:00
Fangjun Kuang	b0f355721b	Update kaldifst and kaldi-decoder (#2551 )	2025-09-01 16:59:03 +08:00
Fangjun Kuang	c2cad93ef4	Fix using sherpa-onnx as a cmake sub-project. (#2550 )	2025-09-01 15:29:19 +08:00
Fangjun Kuang	0b5af832ec	Fix building for risc-v (#2549 )	2025-09-01 15:04:51 +08:00
Fangjun Kuang	a9187d5c75	Release v1.12.11 (#2547 ) v1.12.11	2025-09-01 14:09:24 +08:00
Fangjun Kuang	f0e68cdee1	Fix linking (#2546 )	2025-09-01 11:59:46 +08:00
Fangjun Kuang	27311b8aea	Fix c api (#2545 ) This PR fixes the C API by adding proper support for durations in offline recognition results. The issue addresses problems introduced in a previous PR where the durations field was added to the C API struct but not properly handled across all language bindings. Key changes: - Adds durations field handling across multiple language bindings (Swift, Kotlin, Java, C#) - Fixes field ordering in C API struct to ensure ABI compatibility - Updates JNI implementation to properly extract and pass durations data	2025-09-01 11:23:49 +08:00
Wei Kang	c149696cb3	Add Zipvoice (#2487 ) Co-authored-by: yaozengwei <yaozengwei@outlook.com>	2025-08-27 19:50:00 +08:00
Fangjun Kuang	6768ca7893	Fix uploading win32 libs to huggingface (#2537 ) This PR fixes the uploading process for win32 libraries to Hugging Face by updating Windows OS detection and correcting the file copy destination path. - Replaces deprecated wmic command with PowerShell-based OS detection for better reliability - Adds fallback mechanism using cmd /c ver when PowerShell is unavailable - Corrects the destination path for win32 library archives to include version subdirectory	2025-08-27 16:47:53 +08:00
Fangjun Kuang	d30aa980b7	Add one more German tts model from OpenVoiceOS. (#2536 )	2025-08-26 23:19:31 +08:00
Fangjun Kuang	408808b30a	Fix wasm for kws (#2535 )	2025-08-26 22:30:04 +08:00
Fangjun Kuang	7c9d071ef7	Simplify the usage of our non-Android Java API (#2533 ) This PR simplifies the usage of the non-Android Java API by providing platform-specific JAR files that include native shared libraries, eliminating the need for users to manually manage native dependencies. - Refactored LibraryUtils.java to support multiple library loading methods including extracting from JAR resources - Added build infrastructure to create platform-specific native library JAR files - Introduced debug capabilities and improved error handling for library loading	2025-08-26 20:13:07 +08:00
Fangjun Kuang	9d0adcd3f5	Support BPE models with byte fallback. (#2531 )	2025-08-26 12:03:02 +08:00
Fangjun Kuang	f45cd87a24	Add license info about tts models from OpenVoiceOS (#2530 )	2025-08-26 07:24:02 +08:00
Fangjun Kuang	eaf2eb2ed5	Fix releasing go packages (#2529 )	2025-08-25 20:01:02 +08:00
Fangjun Kuang	f1f8149a47	Generate tts samples for MatchaTTS (English). (#2527 )	2025-08-25 16:04:50 +08:00
Fangjun Kuang	4694d675bd	Add two more Piper tts models (#2525 ) This PR adds support for two new Piper TTS (Text-to-Speech) models: an Indonesian model (id_ID-news_tts-medium) and a Hindi model (hi_IN-rohan-medium).	2025-08-25 14:42:25 +08:00
Fangjun Kuang	6b1fbdedd2	Release v1.12.10 (#2523 ) v1.12.10	2025-08-25 11:49:31 +08:00
Fangjun Kuang	3d5d1b9b3c	Fix kokoro tts for punctuations (#2522 )	2025-08-25 11:06:28 +08:00
Fangjun Kuang	e8dd5cd2a0	Split sherpa-onnx Python package (#2521 )	2025-08-25 10:16:58 +08:00
Fangjun Kuang	44a92efbdc	Support 16KB page size for Android (#2520 )	2025-08-25 10:00:51 +08:00
Brad Murray	06ae4a7c15	Add tdt duration to APIs (#2514 )	2025-08-21 10:55:04 +08:00

1 2 3 4 5 ...

1393 Commits