#计算机科学# 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
#计算机科学# EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
#自然语言处理# ModelScope: bring the notion of Model-as-a-Service to life.
#自然语言处理# Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
#计算机科学# Foundational model for human-like, expressive TTS
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
#大语言模型# 💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
#计算机科学# Data manipulation and transformation for audio signal processing, powered by PyTorch
翻译 - 由PyTorch支持的音频信号处理数据处理和转换
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
#计算机科学# Noise supression using deep filtering
Community list of startups working with AI in audio and music technology
#计算机科学# Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
SALMONN: Speech Audio Language Music Open Neural Network
State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.