Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
A framework for prompt tuning using Intent-based Prompt Calibration
#自然语言处理#DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
#计算机科学#Perception toolkit for sim2real training and validation in Unity
翻译 - Sim2real培训和验证的感知工具包
#自然语言处理#Synthetic data curation for post-training and structured data extraction
#大语言模型#A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
#计算机科学#Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
#自然语言处理#[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
A curated list of awesome projects which use Machine Learning to generate synthetic content.
#计算机科学#NVIDIA Deep learning Dataset Synthesizer (NDDS)
#计算机科学#Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
#计算机科学#Generate large synthetic data using an LLM
#计算机科学#SynthDet - An end-to-end object detection pipeline using synthetic data
#计算机科学#Unity's privacy-preserving human-centric synthetic data generator
Random dataframe and database table generator
#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Compose multimodal datasets 🎹
[NeurIPS D&B Track 2024] Official implementation of HumanVid
#数据仓库#awesome synthetic (text) datasets
#计算机科学#A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.