☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
#计算机科学#🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgra...
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
翻译 - 用于数据掌握、重复数据删除和实体解析的可扩展模糊匹配。
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
翻译 - Optimus 是一个易于使用、可靠且高性能的工作流编排器,用于数据转换、数据建模、管道和数据质量管理。
#时序数据库#Advanced and Fast Data Transformation in R
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
💄 Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Like awk, but with SQL and table joins
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
📄 Concise selector to extract JSON from HTML.
翻译 - c简洁的选择器,可从HTML中提取JSON。
#时序数据库#An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
#算法刷题#O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
#计算机科学#A simple Spark-powered ETL framework that just works 🍺
#自然语言处理#A curated list of Clojure resources for dealing with domain-specific languages.
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Data transformation and utility functions for R
#算法刷题#Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University