audio-visual-learning · GitHub Topics

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

audio-visual-learning face-animation talking-head video-generation

Python 1.71 k

1 年前

tanshuai0219 / EDTalk

[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation

audio-visual-learning face-animation talking-face-generation talking-head video-generation

Python 407

3 个月前

OpenNLPLab / AVSBench

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

audio-visual-learning

Python 396

5 个月前

xid32 / NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...

multimodal-large-language-models audio-visual-learning question-answering video-captioning

Python 307

3 个月前

YapengTian / AVE-ECCV18

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

audio-visual-learning

Python 181

4 年前

alvinliu0 / HA2G

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

audio-visual-learning cvpr2022

Python 137

2 年前

rhgao / co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

audio-visual-learning sound-separation cross-modality

Python 94

2 年前

yanbeic / CCL

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

distillation audio-visual-learning cvpr2021 contrastive-learning PyTorch video-recognition

Python 86

4 年前

ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-learning multi-modal-learning

Python 63

1 年前

roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

audio-visual-learning representation-learning

Python 51

1 年前

praveena2j / JointCrossAttentional-AV-Fusion

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

affective-computing attention-model audio-visual-learning emotion emotion-recognition multimodal-learning

Python 43

1 年前

jasongief / PSP_CVPR_2021

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning

Python 41

3 年前

praveena2j / Joint-Cross-Attention-for-Audio-Visual-Fusion

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

affective-computing attention attention-model audio-visual-learning emotion-recognition multimodal-learning

Python 38

4 个月前

MengyuanChen21 / CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

audio-visual-learning cvpr2023 video-understanding

Python 35

2 年前

stoneMo / AVGN

Official implementation for AVGN

audio-visual-learning

Python 34

2 年前

stoneMo / EZ-VSL

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

self-supervised-learning audio-visual-learning

Python 33

3 年前

kyuyeonpooh / objects-that-sound

#计算机科学#The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

cross-modal-retrieval 深度学习 audio-visual-learning

Python 32

1 年前

stoneMo / DeepAVFusion

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

attention-mechanism audio-visual-learning multimodal-learning self-supervised-learning transformer-architecture masked-image-modeling

Python 31

8 个月前

jasongief / CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning

Python 29

2 年前

praveena2j / Cross-Attentional-AV-Fusion

FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition

affective-computing attention-model audio-visual-learning emotion emotion-recognition multimodal-learning

Python 28

4 个月前