#计算机科学#A system for quickly generating training data with weak supervision
翻译 - 一种在监管不力的情况下快速生成训练数据的系统
#数据仓库#The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
#计算机科学#Synthetic data generators for tabular and time-series data
#自然语言处理#skweak: A software toolkit for weak supervision applied to NLP tasks
翻译 - skweak:适用于 NLP 任务的弱监督软件工具包
#计算机科学#Computer vision based ML training data generation tool 🚀
翻译 - 基于计算机视觉的机器学习训练数据生成工具:火箭:
#计算机科学#A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
#计算机科学#Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
#计算机科学#Web application for image labeling and segmentation
#自然语言处理#🏖TagEditor - Annotation tool for spaCy
#计算机科学#A lightweight web application for brushing labels onto time series data; useful for building training sets.
#自然语言处理#Augmenty is an augmentation library based on spaCy for augmenting texts.
Natural Language Data Augmentation Tool for Conversational Systems
#计算机科学#Aubo i5 Dual Arm Collaborative Robot - RealSense D435 - 3D Object Pose Estimation - ROS
#计算机科学#Generating training data from the Carla driving simulator in the KITTI dataset format
Collection of casual conversations that can be used with the Rasa Stack
#自然语言处理#SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask promptin...
COVID-19 Coughs files for training AI models
#大语言模型#Convert all files in git repository to .txt files. Useful for training LLMs on your codebase.
#计算机科学#Full resources supporting the publication "A Pragmatic Guide to Geoparsing Evaluation."