WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
翻译 - NeMo:用于对话式AI的工具包
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,典型的应用包括:语音识别、语音翻译、语音合成等
#计算机科学#A PyTorch-based Speech Toolkit
翻译 - 基于Pytorch的语音工具包
#安卓#Vosk 是一个离线的语言识别工具。支持 Python, Java, Node.JS, C#, C++ ,能识别20+种语言,包括中文、英语、法语等。
#大语言模型#🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
#安卓#Sherpa-ONNX 是一个轻量级语音识别框架, 基于 Kaldi 和 onnxruntime,无需联网即可实现语音转文本、文本转语音、说话人分离以及语音活动检测(VAD)。支持嵌入式系统、安卓、iOS、鸿蒙系统、树莓派、RISC-V、x86_64 服务器、WebSocket 服务器 / 客户端,以及 C/C++、Python、Kotlin、C#、Go、NodeJS、Java、Swift、Dart、JavaScript、Flutter、Object Pascal、Lazarus、Rust 等编程语言。
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
翻译 - Silero模型:经过预先训练的STT模型和基准测试非常简单
#大语言模型#TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
#大语言模型#Multilingual Voice Understanding Model
#大语言模型#Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Production First and Production Ready End-to-End Speech Recognition Toolkit
翻译 - 生产优先和生产就绪的端到端语音识别工具包
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...
#大语言模型#Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
OpenAI Whisper ASR Webservice API
#计算机科学#pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are ...
翻译 - pytorch-kaldi是一个用于开发最先进的DNN / RNN混合语音识别系统的项目。 DNN部分由pytorch管理,而特征提取,标签计算和解码则通过kaldi工具箱执行。
#计算机科学#🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
翻译 - TSTT-用于语音转文本的深度学习工具包,在研发和生产中经过了实战测试