training-data · GitHub Topics

snorkel-team / snorkel

#计算机科学#A system for quickly generating training data with weak supervision

翻译 - 一种在监管不力的情况下快速生成训练数据的系统

机器学习人工智能 weak-supervision labeling 数据科学 Python snorkel training-data data-augmentation data-slicing

Python 5.84 k

1 年前

diffgram / diffgram

#数据仓库#The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool training-data video-annotation data-annotation Kubernetes 数据科学 data-analytics image-annotation 机器学习深度学习 data annotations 数据集 labeling datastore

Python 1.86 k

5 个月前

ydataai / ydata-synthetic

#计算机科学#Synthetic data generators for tabular and time-series data

Generative Adversarial Network 深度学习 synthetic-data tensorflow2 机器学习 training-data Python timeseries gans PyTorch time-series

Jupyter Notebook 1.53 k

1 个月前

NorskRegnesentral / skweak

#自然语言处理#skweak: A software toolkit for weak supervision applied to NLP tasks

翻译 - skweak：适用于 NLP 任务的弱监督软件工具包

weak-supervision 自然语言处理 distant-supervision nlp-library spaCy Python 数据科学 training-data

Python 922

7 个月前

OvidijusParsiunas / myvision

#计算机科学#Computer vision based ML training data generation tool 🚀

翻译 - 基于计算机视觉的机器学习训练数据生成工具：火箭：

机器学习机器视觉 object-detection training-data annotation labelling annotation-tool coco vgg Tensorflow yolo model vision image-annotation labeling-tool tagging Image 人工智能

JavaScript 597

2 个月前

alteryx / compose

#计算机科学#A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.

机器学习 automl 数据科学 labeling-tool labeling 人工智能 training-data data-labeling

Python 505

12 天前

a-maliarov / amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

captcha captcha-solver amazon Python pillow training-data data-extraction

Python 467

10 个月前

sparkfish / augraphy

#计算机科学#Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

data-augmentation 深度神经网络 training-data 机器学习 data-pipeline 图像处理 synthetic-data synthetic-dataset-generation 机器视觉

Python 404

13 天前

Slava / label-tool

#计算机科学#Web application for image labeling and segmentation

image-labeling image-labeling-tool 机器视觉机器学习 training-data segmentation computer-vision-tools image-annotation boundingbox data-labeling

JavaScript 352

2 年前

d5555 / TagEditor

#自然语言处理#🏖TagEditor - Annotation tool for spaCy

annotation-tool spaCy coreference-resolution text-annotation labeling-tool 自然语言处理 annotation 机器学习数据科学 neural-networks training-data named-entity-recognition

192

3 年前

Geocene / trainset

#计算机科学#A lightweight web application for brushing labels onto time series data; useful for building training sets.

labeling-tool 机器学习 training-data labeling painting time-series-classification

JavaScript 170

2 年前

KennethEnevoldsen / augmenty

#自然语言处理#Augmenty is an augmentation library based on spaCy for augmenting texts.

augmentation spacy-extension spaCy 自然语言处理 nlproc Python text-classification training-data

Python 153

1 年前

avinashsen707 / AUBOi5-D435-ROS-DOPE

#计算机科学#Aubo i5 Dual Arm Collaborative Robot - RealSense D435 - 3D Object Pose Estimation - ROS

pose-estimation object-detection dataset 深度学习 Ubuntu ros blender training-data

C++ 119

3 年前

tzano / fountain

Natural Language Data Augmentation Tool for Conversational Systems

nlu data-generator 聊天机器人 training-data natural-language conversational-ai

Python 115

2 年前

enginBozkurt / carla-training-data

#计算机科学#Generating training data from the Carla driving simulator in the KITTI dataset format

carla-simulator training-data 深度学习人工智能 kitti-dataset autonomous-driving self-driving-car autonomous-vehicles

Python 108

6 年前

rahul051296 / small-talk-rasa-stack

Collection of casual conversations that can be used with the Rasa Stack

smalltalk training-data conversational-ai dialogflow

Python 85

5 年前

google-research-datasets / swim-ir

#自然语言处理#SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask promptin...

数据集深度学习 information-retrieval 机器学习 multilingual 自然语言处理 training-data

1 年前