mirror of https://github.com/k2-fsa/sherpa-onnx.git synced 2026-01-09 07:41:06 +08:00

History

FunASR-nano: switch to unified KV-cache LLM (#2995 )

This PR updates FunASR-nano inference from the prefill+decode dual-model pipeline to a single unified KV-cache model.

## Summary

Previously, FunASR-nano required two separate ONNX models:
- `llm_prefill.onnx`
- `llm_decode.onnx`

This PR switches to a single model:
- `llm.onnx`

The new pipeline uses a static KV cache + KV-delta incremental update mechanism, and relies on `cache_position` to differentiate prefill vs. decode steps. This significantly simplifies model/session management and reduces deployment complexity.

## Key changes

- **Single LLM session / single model file**: `llm.onnx` replaces `llm_prefill.onnx` + `llm_decode.onnx`.

- **Unified KV-cache implementation**:
  - static KV cache layout
  - KV-delta update for decode
  - `cache_position` distinguishes prefill vs. decode behavior

- **Config changes (breaking)**:
  - `funasr_nano.llm_prefill` and `funasr_nano.llm_decode` are deprecated/removed
  - use only `funasr_nano.llm`

- **Not backward compatible**:
  - users must re-export models in KV-delta/unified-KV format

- **Trade-off**: slightly slower, but lower VRAM duplication

2026-01-07 10:41:53 +08:00

web

Support TDNN models from the yesno recipe from icefall (#262 )

2023-08-12 19:50:22 +08:00

add-punctuation-online.py

Add Python binding for online punctuation models (#1312 )

2024-09-09 10:26:53 +08:00

add-punctuation.py

Add Python API for punctuation models. (#762 )

2024-04-13 13:28:17 +08:00

audio-tagging-from-a-file-ced.py

Add Python API example for CED audio tagging. (#793 )

2024-04-19 18:33:18 +08:00

audio-tagging-from-a-file.py

Add Python API and Python examples for audio tagging (#753 )

2024-04-11 11:12:48 +08:00

generate-subtitles.py

Add Parakeet TDT model for generating subtitles (#2649 )

2025-10-07 13:39:11 +08:00

http_server.py

add streaming-server with web client (#164 )

2023-05-30 22:46:52 +08:00

inverse-text-normalization-offline-asr.py

Support onnxruntime 1.18.0 (#906 )

2024-07-10 17:05:26 +08:00

inverse-text-normalization-online-asr.py

Add inverse text normalization for online ASR (#1020 )

2024-06-17 18:39:23 +08:00

keyword-spotter-from-microphone.py

Fix keyword spotting. (#1689 )

2025-01-20 16:41:10 +08:00

keyword-spotter.py

Fix keyword spotting. (#1689 )

2025-01-20 16:41:10 +08:00

non_streaming_server.py

Modify the model used (#1855 )

2025-02-13 15:08:04 +08:00

offline-decode-files.py

Add LODR support to online and offline recognizers (#2026 )

2025-07-09 16:23:46 +08:00

offline-dolphin-ctc-decode-files.py

Add C++ and Python API for Dolphin CTC models (#2085 )

2025-04-02 19:09:00 +08:00

offline-fire-red-asr-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-funasr-nano-decode-files.py

FunASR-nano: switch to unified KV-cache LLM (#2995 )

2026-01-07 10:41:53 +08:00

offline-medasr-ctc-decode-files.py

Add C++ runtime and Python API for Google MedASR models (#2935 )

2025-12-25 17:25:56 +08:00

offline-moonshine-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-nemo-canary-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-nemo-ctc-decode-files.py

Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854 )

2024-05-10 12:15:39 +08:00

offline-nemo-parakeet-decode-file.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-nemo-transducer-decode-files.py

Add C++ runtime for non-streaming faster conformer transducer from NeMo. (#854 )

2024-05-10 12:15:39 +08:00

offline-omnilingual-asr-ctc-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-sense-voice-ctc-decode-files-with-hr.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

offline-sense-voice-ctc-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-source-separation-spleeter.py

Add Python API for source separation (#2283 )

2025-06-05 20:44:26 +08:00

offline-source-separation-uvr.py

Add Python API for source separation (#2283 )

2025-06-05 20:44:26 +08:00

offline-speaker-diarization.py

Alex/feat add python example (#2490 )

2025-08-14 20:44:38 +08:00

offline-speech-enhancement-gtcrn.py

Add Python API for speech enhancement GTCRN models (#1978 )

2025-03-10 19:02:17 +08:00

offline-telespeech-ctc-decode-files.py

Add C++ runtime for Tele-AI/TeleSpeech-ASR (#970 )

2024-06-05 00:26:40 +08:00

offline-tts-play.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

offline-tts.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

offline-websocket-client-decode-files-paralell.py

Add non-streaming websocket server for python (#259 )

2023-08-11 15:56:24 +08:00

offline-websocket-client-decode-files-sequential.py

Add non-streaming websocket server for python (#259 )

2023-08-11 15:56:24 +08:00

offline-whisper-decode-files.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

offline-zeroshot-tts.py

Use a shorter name for Zipvoice models. (#2894 )

2025-12-11 20:30:19 +08:00

offline-zipformer-ctc-decode-files.py

Support non-streaming zipformer CTC ASR models (#2340 )

2025-07-04 15:57:07 +08:00

online-decode-files.py

Add LODR support to online and offline recognizers (#2026 )

2025-07-09 16:23:46 +08:00

online-nemo-ctc-decode-files.py

Add C++ support for streaming NeMo CTC models. (#857 )

2024-05-10 16:26:43 +08:00

online-t-one-ctc-decode-files.py

Add C++ and Python support for T-one streaming Russian ASR models (#2575 )

2025-09-09 12:07:34 +08:00

online-websocket-client-decode-file.py

Support streaming zipformer CTC (#496 )

2023-12-22 13:46:33 +08:00

online-websocket-client-microphone.py

Add non-streaming websocket server for python (#259 )

2023-08-11 15:56:24 +08:00

online-zipformer-ctc-hlg-decode-file.py

Add HLG decoding for streaming CTC models (#731 )

2024-04-03 21:31:42 +08:00

README.md

Add VAD + Non-streaming ASR Python example. (#332 )

2023-09-22 11:53:47 +08:00

simulate-streaming-paraformer-microphone.py

Add simulate streaming ASR Python example for Paraformer (#2839 )

2025-12-01 11:54:11 +08:00

simulate-streaming-sense-voice-microphone.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

speaker-identification-with-vad-dynamic.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

speaker-identification-with-vad-non-streaming-asr-alsa.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

speaker-identification-with-vad-non-streaming-asr.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

speaker-identification-with-vad.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

speaker-identification.py

Replace torchaudio with soundfile in python-api-examples (#765 )

2024-04-13 23:39:07 +08:00

speech-recognition-from-microphone-with-endpoint-detection-alsa.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

speech-recognition-from-microphone-with-endpoint-detection.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

speech-recognition-from-microphone.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

speech-recognition-from-url.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

spoken-language-identification.py

Support spoken language identification with whisper (#694 )

2024-03-24 22:57:00 +08:00

streaming_server.py

'update20241203' (#1589 )

2024-12-04 09:22:24 +08:00

streaming-paraformer-asr-microphone.py

Fix displaying streaming speech recognition results for Python. (#2196 )

2025-05-09 21:48:49 +08:00

two-pass-speech-recognition-from-microphone.py

Fix displaying streaming speech recognition results for Python. (#2196 )

2025-05-09 21:48:49 +08:00

two-pass-wss.py

Add C++ and Python API for Omnilingual ASR models. (#2772 )

2025-11-13 12:25:30 +08:00

vad-alsa.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

vad-microphone.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

vad-remove-non-speech-segments-alsa.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

vad-remove-non-speech-segments-from-file.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

vad-remove-non-speech-segments.py

Add Java/Kotlin API and Android support for ten-vad (#2389 )

2025-07-12 19:55:37 +08:00

vad-with-non-streaming-asr.py

Remove cppjieba (#2664 )

2025-10-10 10:54:32 +08:00

README.md

File description

./http_server.py It defines which files to server. Files are saved in ./web.
non_streaming_server.py WebSocket server for non-streaming models.
vad-remove-non-speech-segments.py It uses silero-vad to remove non-speech segments and concatenate all speech segments into a single one.
vad-with-non-streaming-asr.py It shows how to use VAD with a non-streaming ASR model for speech recognition from a microphone