asr · GitHub Topics

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python 14.93 k

17 小时前

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

翻译 - NeMo：用于对话式AI的工具包

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python 13.62 k

5 小时前

PaddlePaddle / PaddleSpeech

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，典型的应用包括：语音识别、语音翻译、语音合成等

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition 声音克隆 vocoder voice-recognition self-supervised-learning Whisper

Python 11.77 k

5 天前

speechbrain / speechbrain

#计算机科学#A PyTorch-based Speech Toolkit

翻译 - 基于Pytorch的语音工具包

speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition speaker-diarization speaker-verification PyTorch huggingface transformers language-model 深度学习

Python 9.67 k

2 天前

alphacep / vosk-api

#安卓#Vosk 是一个离线的语言识别工具。支持 Python, Java, Node.JS, C#, C++ ，能识别20+种语言，包括中文、英语、法语等。

speech-recognition asr voice-recognition speech-to-text Android iOS 树莓派深度学习深度神经网络 speech-to-text-android speaker-verification Python offline 隐私 kaldi deepspeech vosk stt

Jupyter Notebook 9.23 k

1 个月前

wzpan / wukong-robot

#大语言模型#🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

人工智能 speaker asr tts unit Home Assistant raspeberry-pi amazon-echo alexa snowboy google-home anyq muse bci ChatGPT gpt3 openai

Python 6.78 k

6 个月前

k2-fsa / sherpa-onnx

#安卓#Sherpa-ONNX 是一个轻量级语音识别框架，基于 Kaldi 和 onnxruntime，无需联网即可实现语音转文本、文本转语音、说话人分离以及语音活动检测(VAD)。支持嵌入式系统、安卓、iOS、鸿蒙系统、树莓派、RISC-V、x86_64 服务器、WebSocket 服务器 / 客户端，以及 C/C++、Python、Kotlin、C#、Go、NodeJS、Java、Swift、Dart、JavaScript、Flutter、Object Pascal、Lazarus、Rust 等编程语言。

asr onnx Windows Linux macOS C++Android iOS 树莓派 aarch64 arm32 C#.NET mfc speech-to-text text-to-speech vits RISC-V lazarus object-pascal

C++ 5.59 k

3 天前

TEN-framework / TEN-Agent

#大语言模型#TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...

agent gemini gpt-4 大语言模型 multimodal nextjs14 openai realtime voice-assistant C++Go Python 人工智能 gpt-4o rag vision real-time asr low-latency tts

Python 5.59 k

1 天前

FunAudioLLM / SenseVoice

#大语言模型#Multilingual Voice Understanding Model

人工智能 asr gpt-4o speech-recognition speech-to-text aigc 大语言模型 Python PyTorch multilingual

Python 5.31 k

20 天前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

翻译 - Silero模型：经过预先训练的STT模型和基准测试非常简单

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook 5.22 k

1 年前

xiangyuecn / Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

recorder record JavaScript HTML h5 luyin mp3 wav amr ogg webm WebRTC audio recording asr

JavaScript 5.16 k

13 天前

NexaAI / nexa-sdk

#大语言模型#Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...

asr edge-computing 大语言模型 on-device-ai on-device-ml SDK stable-diffusion transformers tts vlm language-model sdk-python Whisper audio

Python 4.49 k

1 个月前

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

翻译 - 生产优先和生产就绪的端到端语音识别工具包

e2e-models PyTorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition Whisper

Python 4.44 k

15 天前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook 4.37 k

1 个月前

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...

youtube-api subtitles YouTube transcripts Python subtitle 命令行界面 captions asr

Python 3.73 k

18 天前

PeterH0323 / Streamer-Sales

#大语言模型#Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...

chat-application internlm2 大语言模型聊天机器人 text-generation chat ChatGPT gpt rag tts asr digital-human

Python 3.16 k

1 个月前

tensorflow / lingvo

#自然语言处理#Lingvo

翻译 - 语言

speech-recognition translation speech-to-text machine-translation mnist seq2seq language-model tts asr lm 自然语言处理 Tensorflow speech research distributed gpu-computing speech-synthesis

Python 2.84 k

1 个月前

ahmetoner / whisper-asr-webservice

OpenAI Whisper ASR Webservice API

automatic-speech-recognition speech-recognition speech-to-text openai-whisper Docker asr speech

Python 2.52 k

2 个月前

coqui-ai / STT

#计算机科学#🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

翻译 - TSTT-用于语音转文本的深度学习工具包，在研发和生产中经过了实战测试

stt speech-to-text Tensorflow 深度学习 automatic-speech-recognition asr voice-recognition speech-recognition

C++ 2.41 k

1 年前

mravanelli / pytorch-kaldi

#计算机科学#pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are ...

翻译 - pytorch-kaldi是一个用于开发最先进的DNN / RNN混合语音识别系统的项目。 DNN部分由pytorch管理，而特征提取，标签计算和解码则通过kaldi工具箱执行。

speech-recognition gru dnn kaldi rnn-model PyTorch timit 深度学习深度神经网络 recurrent-neural-networks multilayer-perceptron-network lstm speech asr rnn

Python 2.38 k

3 年前