#数据仓库#OpenRefine(原名Google Refine) 是一个强大的数据清洗和转换工具
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
翻译 - 从命令行查询和更新数据结构。与jq / yq相似,但支持JSON,TOML,YAML和XML,运行时相关性为零。
#计算机科学#A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
#计算机科学#Carefully curated resource links for data science in one place
翻译 - 精心策划了一个地方的数据科学资源链接
#数据仓库#Blazing-fast Data-Wrangling toolkit
#大语言模型#ETL, Analytics, Versioning for Unstructured Data
#时序数据库#A Python toolbox for gaining geometric insights into high-dimensional data
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
翻译 - 桌面应用程序可有效搜索大型数据包捕获和Zeek日志。
#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
翻译 - 受Pandas和LINQ启发的JavaScript数据转换和分析工具包。
#计算机科学#Prepping tables for machine learning
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
#计算机科学#Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
#计算机科学#Materials for following along with Hands-On Data Analysis with Pandas.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algor...
An introductory workshop on pandas with notebooks and exercises for following along. Slides contain all solutions.
翻译 - 为期 3 小时的大熊猫介绍性研讨会,附有笔记本和练习以供后续学习。
Data Analysis and Visualization in R for Ecologists
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
Like awk, but with SQL and table joins