text-to-audio · GitHub Topics

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python 8.93 k

11 小时前

hkchengrex / MMAudio

#计算机科学#[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio audio-synthesis 机器视觉深度学习 text-to-audio

Python 1.3 k

4 天前

declare-lab / tango

A family of diffusion models for text-to-audio generation.

audio-generation diffusion diffusion-models language-models large-language-models text-to-audio

Python 1.16 k

3 个月前

gitmylo / audio-webui

A webui for different audio related Neural Networks

人工智能 audioldm bark rvc text-to-audio text-to-speech 声音克隆 audiocraft music generative-music tts aio all-in-one

Python 1.15 k

8 个月前

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing

Python 1.06 k

8 个月前

declare-lab / TangoFlux

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

generative-ai text-to-audio

Jupyter Notebook 702

1 个月前

Text-to-Audio / Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

diffusion-models latent-diffusion latent-space text-to-audio

Python 645

1 年前

ivcylc / OpenMusic

OpenMusic: SOTA Text-to-music (TTM) Generation

人工智能 diffusion-models music-generation text-to-audio ai-music audioldm diffusion-transformer dit hifi-gan vall-e

Python 550

2 个月前

lucidrains / nuwa-pytorch

#计算机科学#Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

人工智能深度学习 transformers attention-mechanism text-to-video text-to-audio

Python 546

2 年前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML 453

9 天前

AMAAI-Lab / mustango

Mustango: Toward Controllable Text-to-Music Generation

diffusion-models large-language-models text-to-audio

Python 357

24 天前

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

diffusion-models generative-ai text-to-audio

Python 261

1 个月前

happylittlecat2333 / Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

audio-generation diffusion diffusion-models large-language-models text-to-audio

Jupyter Notebook 181

1 年前

ilaria-manco / word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

text-to-audio audio-generation music-generation ai-music

Python 119

3 年前

bnsantoso / sub-to-audio

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

text-to-audio text-to-speech Python tts audio-processing

Python 114

1 年前

sony / soundctm

Pytorch implementation of SoundCTM

audio-generation diffusion-models PyTorch text-to-audio

Python 87

13 天前

keonlee9420 / WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

text-to-speech neural-tts audio synthesis non-autoregressive score-matching duration robust PyTorch tts speech-synthesis text-to-audio end-to-end

Python 69

4 年前

RhythrosaLabs / soundstorm

#大语言模型#Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusias...

algorithmic-composition audio-processing chat-gpt 聊天机器人 ChatGPT gpt gpt-4 MIDI sound sound-processing text-to-audio

Python 32

1 年前