synthetic-dataset-generation

Context aware, pluggable and customizable data protection and de-identification SDK for text, images and structured data.

翻译 - 用于文本和图像的上下文感知，可插拔和可自定义的数据保护和匿名服务

data-loss-prevention dlp Python pii anonymization 隐私 data-protection data-anonymization presidio text-anonymization de-identification data-masking privacy-protection Microsoft transformers pii-detection llms synthetic-dataset-generation

Python 4.43 k

6 天前

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

人工智能 huggingface llms openai Python rlhf synthetic-data synthetic-dataset-generation

Python 2.63 k

5 天前

Eladlev / AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

prompt-engineering prompt-tuning synthetic-dataset-generation

Python 2.47 k

2 天前

bespokelabsai / curator

#自然语言处理#Synthetic data curation for post-training and structured data extraction

synthetic-data agents 大语言模型 prompt Python synthetic-dataset-generation 深度学习 fine-tuning instruction-tuning 机器学习自然语言处理

Python 1.2 k

1 天前

datadreamer-dev / DataDreamer

#自然语言处理#DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

深度学习机器学习自然语言处理 nlp-library Python PyTorch transformers alignment fine-tuning gpt instruction-tuning 大语言模型 llmops llms openai synthetic-data synthetic-dataset-generation

Python 1.01 k

2 个月前

Unity-Technologies / com.unity.perception

#计算机科学#Perception toolkit for sim2real training and validation in Unity

翻译 - Sim2real培训和验证的感知工具包

perception object-detection detection 机器视觉深度学习 synthetic-dataset-generation domain-randomization pose-estimation 机器学习 segmentation

C# 954

5 个月前

BatsResearch / bonito

#大语言模型#A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

大语言模型 synthetic-data synthetic-dataset-generation zero-shot-learning domain-adaptation gpt task-adaptation

Python 764

1 个月前

nicolas-hbt / pygraft

#计算机科学#Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

data-generator graph-generator knowledge-base knowledge-graph ontology schema Semantic Web ontology-generation Python synthetic-data synthetic-dataset-generation contributions-welcome owl RDF (Resource Description Framework)人工智能 benchmarking linked-data 机器学习 semantics

Python 681

9 个月前

magpie-align / magpie

#自然语言处理#[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

alignment llama2 llama3 大语言模型自然语言处理 Bukkit phi3 qwen2 synthetic-data synthetic-dataset-generation dataset gemma

Python 673

1 个月前