#计算机科学#StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Self-Supervised Speech Pre-training and Representation Learning Toolkit
翻译 - 自我监督的语音预训练和表征学习工具包。
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
#计算机科学#A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trai...
A neural speech codec based on discrete WavLM representations
A collections of audio codecs with a standardized API
In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.
#计算机科学#This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering
SOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.
This repository combines `WavLM`, a powerful speech representation model from Microsoft, with `MSDD` (Multi-Scale Diarization Decoder), a state-of-the-art approach for speaker diarization from Nvidia...
Universal Pooling Method for Speaker Verification Utilizing Pre-trained Multi-layer Features, 2025 preprint
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
WavLM Large + RawNetX Speaker Verification Base: End-to-End Speaker Verification Architecture
Acoustic Transformer Models for Audio Classification
CryCeleb2023 experiments