A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
#计算机科学#Human Emotion Understanding using multimodal dataset.
#计算机科学#Transformer-based online speech recognition system with TensorFlow 2
#大语言模型#Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
#计算机科学#End to End Multiview Lip Reading
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Kaldi-based audio-visual speech recognition
🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.
Real-Time Audio-visual Speech Recongition
In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-
Code related to the fMRI experiment on the contextual modulation of the McGurk Effect
#计算机科学#Human Emotion Understanding using multimodal dataset