Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"
Co-Separating Sounds of Visual Objects (ICCV 2019)
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
#计算机科学#The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition