Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Logstash 是一个实时的管道式开源日志收集引擎。 Logstash可以动态的将不同来源的数据进行归一并且将格式化的数据存储到你选择的位置。 对你的所有做数据清洗和大众化处理,以便做数据分析和可视化。
#计算机科学#Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
翻译 - 用于构建数据湖,数据仓库和分析平台的端到端GoodReads数据管道。
This repository is a getting started guide to Singer.
Making data lake work for time series
#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A simplified, lightweight ETL Framework based on Apache Spark
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Knowledge Graph Toolkit
A tool for building feature stores.
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Extract, Transform, Index Data. CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing.
Configurable Extract, Transform, and Load