#数据仓库#The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
翻译 - 在数据集中查找标签错误并使用嘈杂的标签进行学习。
#计算机科学#Refine high-quality datasets and visual AI models
翻译 - 用于构建高质量数据集和计算机视觉模型的开源工具
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
翻译 - Miller就像awk,sed,cut,join和对名称索引数据(例如CSV,TSV和表格JSON)进行排序
A light-weight, flexible, and expressive statistical data testing library
Jupyter notebook and datasets from the pandas video series
#自然语言处理#General Assembly's 2015 Data Science course in Washington, DC
#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
翻译 - 受Pandas和LINQ启发的JavaScript数据转换和分析工具包。
#计算机科学#Prepping tables for machine learning
#大语言模型#An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Easy to use Python library of customized functions for cleaning and analyzing data.
#计算机科学#The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
#计算机科学#🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
#计算机科学#Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
#计算机科学#Data Science Feature Engineering and Selection Tutorials
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊