Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
#计算机科学#[CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A family of diffusion models for text-to-audio generation.
A webui for different audio related Neural Networks
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
#计算机科学#Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
OpenMusic: SOTA Text-to-music (TTM) Generation
#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Mustango: Toward Controllable Text-to-Music Generation
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Pytorch implementation of SoundCTM
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
#大语言模型#Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusias...
#计算机科学#Creative Text-to-Audio Generation via Synthesizer Programming @ ICML'24