WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
翻译 - NeMo:用于对话式AI的工具包
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,典型的应用包括:语音识别、语音翻译、语音合成等
#安卓# Vosk 是一个离线的语言识别工具。支持 Python, Java, Node.JS, C#, C++ ,能识别20+种语言,包括中文、英语、法语等。
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
翻译 - Silero模型:经过预先训练的STT模型和基准测试非常简单
Production First and Production Ready End-to-End Speech Recognition Toolkit
翻译 - 生产优先和生产就绪的端到端语音识别工具包
#大语言模型# Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
#安卓# Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployme...
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
Synchronized Translation for Videos. Video dubbing
#计算机科学# 💬 An On-Premises, Streaming Speech Recognition System
翻译 - :speech_balloon:本地流式语音识别系统