A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
翻译 - NeMo:用于对话式AI的工具包
PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,典型的应用包括:语音识别、语音翻译、语音合成等
#计算机科学#End-to-End Speech Processing Toolkit
翻译 - 端到端语音处理工具包
#计算机科学#Speech To Speech: an effort for an open-sourced and modular GPT4-o
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
#自然语言处理#Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
#自然语言处理#Tracking the progress in end-to-end speech translation
#大语言模型#MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
Zero -- A neural machine translation system
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
#计算机科学#Repository containing the open source code of works published at the FBK MT unit.
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
#Awesome#List of direct speech-to-speech translation papers.
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"