A Doctor for your data
#计算机科学#A curated, but incomplete, list of data-centric AI resources.
#自然语言处理#Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
#大语言模型#A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
#计算机科学#Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
#计算机科学#Enhancing Efficiency in Multidevice Federated Learning through Data Selection
TRIAGE: Characterizing and auditing training data for improved regression (NeurIPS 2023)
#计算机科学#Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling
#计算机科学#Code for our paper "Towards Trustworthy Dataset Distillation" (Pattern Recognition 2025)
Collaboratively Learning Federated Models from Noisy Decentralized Data
#计算机科学#A multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications (ECAI Tutorial 2024)
#计算机科学#Implementation of data typology for imbalanced datasets.