Context aware, pluggable and customizable data protection and de-identification SDK for text, images and structured data.
翻译 - 用于文本和图像的上下文感知,可插拔和可自定义的数据保护和匿名服务
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
A framework for prompt tuning using Intent-based Prompt Calibration
#自然语言处理#Synthetic data curation for post-training and structured data extraction
#自然语言处理#DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
#计算机科学#Perception toolkit for sim2real training and validation in Unity
翻译 - Sim2real培训和验证的感知工具包
#大语言模型#A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
#计算机科学#Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
#自然语言处理#[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
A curated list of awesome projects which use Machine Learning to generate synthetic content.
#计算机科学#NVIDIA Deep learning Dataset Synthesizer (NDDS)
#计算机科学#Generate large synthetic data using an LLM
#计算机科学#Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
#计算机科学#SynthDet - An end-to-end object detection pipeline using synthetic data
Compose multimodal datasets 🎹
#计算机科学#Unity's privacy-preserving human-centric synthetic data generator
Random dataframe and database table generator
#数据仓库#[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
[NeurIPS D&B Track 2024] Official implementation of HumanVid
#数据仓库#awesome synthetic (text) datasets