212 Commits

Author SHA1 Message Date
colourmebrad
ef5c23e6c9
exposing online punctuation model support in node-addon-api (#2609)
* exposing online punctuation model support in node-addon-api

* renaming nodejs-addon-examples/test_punctuation.js to test_offline_punctuation.js

* adding test_online_punctuation to nodejs-addon-examples and updating CI to run test_offline_punctuation and test_online_punctuation
2025-09-19 23:29:55 +08:00
Fangjun Kuang
bff2691e8c
Add CI tests for dart spoken language identifcation example (#2598) 2025-09-15 09:28:34 +08:00
Fangjun Kuang
7e42ba2c0c
Add various languge bindings for Wenet non-streaming CTC models (#2584)
This PR adds support for Wenet non-streaming CTC models to sherpa-onnx by introducing the SherpaOnnxOfflineWenetCtcModelConfig struct and integrating it across all language bindings and APIs. The implementation follows the same pattern as other CTC model types like Zipformer CTC.

- Introduces SherpaOnnxOfflineWenetCtcModelConfig struct with a single model field for the ONNX model path
- Adds the new config to SherpaOnnxOfflineModelConfig and updates all language bindings (C++, Pascal, Kotlin, Java, Go, C#, Swift, JavaScript, etc.)
- Provides comprehensive examples and tests across all supported platforms and languages
2025-09-10 18:52:18 +08:00
Fangjun Kuang
686b909e2f
Add various language bindings for streaming T-one Russian ASR models (#2576)
This PR adds support for streaming T-one Russian ASR models across various language bindings in the sherpa-onnx library. The changes enable T-one CTC (Connectionist Temporal Classification) model integration by adding new configuration structures and example implementations.

- Introduces OnlineToneCtcModelConfig structures across all language bindings (C, C++, Swift, Java, Kotlin, Go, etc.)
- Adds T-one CTC model support to WASM implementations for both ASR and keyword spotting
- Provides comprehensive example implementations demonstrating T-one model usage in multiple programming languages
2025-09-09 16:51:18 +08:00
Fangjun Kuang
858b5052a2
Add C++ and Python support for T-one streaming Russian ASR models (#2575)
This PR adds support for T-one streaming Russian ASR models in both C++ and Python APIs. The T-one model is a CTC-based Russian speech recognition model with specific characteristics including float16 state handling, 300ms frame lengths, and 8kHz sampling rate.

- Added new OnlineToneCtcModel implementation with specialized processing for T-one models
- Integrated T-one support into the existing CTC model pipeline and Python bindings
- Added Python example and test scripts for the new functionality
2025-09-09 12:07:34 +08:00
Fangjun Kuang
283d8fed70
Add Swift API for computing speaker embeddings (#2492) 2025-08-14 20:42:33 +08:00
Fangjun Kuang
353658eabb
Add C# API for KittenTTS (#2477) 2025-08-08 20:22:05 +08:00
Fangjun Kuang
c726f9d7ee
Add Swift API for KittenTTS (#2476) 2025-08-08 20:16:33 +08:00
Fangjun Kuang
9f3e70e598
Add Dart API for KittenTTS (#2475) 2025-08-08 20:14:54 +08:00
Fangjun Kuang
26b0e8162d
Add JavaScript API (WebAssembly) for KittenTTS (#2471) 2025-08-08 16:31:28 +08:00
Fangjun Kuang
6a61791066
Add JavaScript API (node-addon) for KittenTTS (#2470) 2025-08-08 16:27:51 +08:00
Fangjun Kuang
9122c9d272
Add Python API for KittenTTS. (#2466) 2025-08-08 15:01:43 +08:00
Fangjun Kuang
8ab5cba598
Add APIs for Online NeMo CTC models (#2454) 2025-08-07 09:28:16 +08:00
Fangjun Kuang
0514aeeb0c
Add Swift API for ten-vad (#2387) 2025-07-12 19:45:42 +08:00
Fangjun Kuang
7f1d71fed3
Add Dart API for ten-vad (#2386) 2025-07-12 19:41:01 +08:00
Fangjun Kuang
71aea2f19c
Add C# API for ten-vad (#2385) 2025-07-12 18:39:18 +08:00
Fangjun Kuang
fd9a687ec2
Add Pascal/Go/C#/Dart API for NeMo Canary ASR models (#2367)
Add support for the new NeMo Canary ASR model across multiple language bindings by introducing a Canary model configuration and setter method on the offline recognizer.

- Define Canary model config in Pascal, Go, C#, Dart and update converter functions
- Add SetConfig API for offline recognizer (Pascal, Go, C#, Dart)
- Extend CI/workflows and example scripts to test non-streaming Canary decoding
2025-07-10 14:53:33 +08:00
Askars Salimbajevs
f0960342ad
Add LODR support to online and offline recognizers (#2026)
This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.

- Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id.
- Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths.
- Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
2025-07-09 16:23:46 +08:00
Fangjun Kuang
df4615ca1d
Add C/CXX/JavaScript API for NeMo Canary models (#2357)
This PR introduces support for NeMo Canary models across C, C++, and JavaScript APIs 
by adding new Canary configuration structures, updating bindings, extending examples,
and enhancing CI workflows.

- Add OfflineCanaryModelConfig to all language bindings (C, C++, JS, ETS).
- Implement SetConfig methods and NAPI wrappers for updating recognizer config at runtime.
- Update examples and CI scripts to demonstrate and test NeMo Canary model usage.
2025-07-07 23:38:04 +08:00
Fangjun Kuang
0e738c356c
Add C++ runtime and Python API for NeMo Canary models (#2352) 2025-07-07 17:03:49 +08:00
Fangjun Kuang
3bf986d08d
Support non-streaming zipformer CTC ASR models (#2340)
This PR adds support for non-streaming Zipformer CTC ASR models across 
multiple language bindings, WebAssembly, examples, and CI workflows.

- Introduces a new OfflineZipformerCtcModelConfig in C/C++, Python, Swift, Java, Kotlin, Go, Dart, Pascal, and C# APIs
- Updates initialization, freeing, and recognition logic to include Zipformer CTC in WASM and Node.js
- Adds example scripts and CI steps for downloading, building, and running Zipformer CTC models

Model doc is available at
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/icefall/zipformer.html
2025-07-04 15:57:07 +08:00
Fangjun Kuang
bda427f4b2
Add API to get version information (#2309) 2025-06-25 00:22:21 +08:00
Fangjun Kuang
d57e4f84de
Add Python API for source separation (#2283) 2025-06-05 20:44:26 +08:00
Fangjun Kuang
2b2788332e
Add C++ support for UVR models (#2269) 2025-06-01 17:22:08 +08:00
Fangjun Kuang
99defc5b90
Add nodejs example for parakeet-tdt-0.6b-v2. (#2219) 2025-05-15 11:27:22 +08:00
Fangjun Kuang
85df96d528
Add Dart API for homophone replacer (#2167) 2025-04-30 23:15:28 +08:00
Fangjun Kuang
63d01a9534
Add Swift API for homophone replacer. (#2164) 2025-04-29 18:50:41 +08:00
Fangjun Kuang
51f8824219
Add homonphone replacer example for Python API. (#2161) 2025-04-29 15:59:34 +08:00
Fangjun Kuang
9d25c90a59
Add JavaScript API (node-addon) for homophone replacer (#2158) 2025-04-28 20:52:42 +08:00
Fangjun Kuang
a0aef1f6cd
Add JavaScript API (WASM) for homophone replacer (#2157) 2025-04-28 20:47:49 +08:00
Fangjun Kuang
f64c58342b
Support replacing homonphonic phrases (#2153) 2025-04-27 15:31:11 +08:00
Fangjun Kuang
be0f382a54
Support Giga AM transducer V2 (#2136) 2025-04-20 10:15:20 +08:00
Fangjun Kuang
07a5701af6
Add Dart API for Dolphin CTC models (#2095) 2025-04-03 15:59:38 +08:00
Fangjun Kuang
903e825eba
Add Javascript (node-addon) API for Dolphin CTC models (#2094) 2025-04-03 15:03:33 +08:00
Fangjun Kuang
639ad1744f
Add Javascript (WebAssembly) API for Dolphin CTC models (#2093) 2025-04-03 15:02:06 +08:00
Fangjun Kuang
74f402e490
Add Swift API for Dolphin CTC models (#2091) 2025-04-03 00:03:11 +08:00
Fangjun Kuang
2dc0f91904
Add C# API for Dolphin CTC models (#2089) 2025-04-02 23:36:22 +08:00
Fangjun Kuang
0de7e1b9f0
Add C++ and Python API for Dolphin CTC models (#2085) 2025-04-02 19:09:00 +08:00
Fangjun Kuang
0aacf02dd8
Add C++ runtime for vocos (#2014) 2025-03-17 17:05:15 +08:00
Fangjun Kuang
c972554ad1
Add JavaScript API (wasm) for speech enhancement GTCRN models (#2007) 2025-03-15 17:41:23 +08:00
Fangjun Kuang
6a97f8adcf
Add JavaScript (node-addon) API for speech enhancement GTCRN models (#1996) 2025-03-12 15:52:01 +08:00
Fangjun Kuang
fd78a482df
Add Dart API for speech enhancement GTCRN models (#1993) 2025-03-12 12:39:08 +08:00
Fangjun Kuang
d3e27d5e21
Add C# API for speech enhancement GTCRN models (#1990) 2025-03-11 18:58:17 +08:00
Fangjun Kuang
c12d1d88c0
Add Swift API for speech enhancement GTCRN models (#1989) 2025-03-11 18:03:13 +08:00
Fangjun Kuang
5d2d792b1d
Add Python API for speech enhancement GTCRN models (#1978) 2025-03-10 19:02:17 +08:00
Fangjun Kuang
488a6e687c
Add C++ runtime for speech enhancement GTCRN models (#1977)
See also https://github.com/Xiaobin-Rong/gtcrn
2025-03-10 18:11:16 +08:00
Fangjun Kuang
1e2328242d
Test using sherpa-onnx as a cmake subproject (#1961) 2025-03-06 12:12:56 +08:00
Fangjun Kuang
ed922e61b5
Fix publishing pre-built windows libraries (#1905) 2025-02-21 11:59:27 +08:00
Fangjun Kuang
b5d89d7bcb
Add Dart API for FireRedAsr AED Model (#1877) 2025-02-17 15:17:08 +08:00
Fangjun Kuang
b03f6e6e8c
Add Swift API for FireRedAsr AED Model (#1876) 2025-02-17 15:16:23 +08:00