Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Logstash 是一个实时的管道式开源日志收集引擎。 Logstash可以动态的将不同来源的数据进行归一并且将格式化的数据存储到你选择的位置。 对你的所有做数据清洗和大众化处理,以便做数据分析和可视化。
一个高性能ELT 框架,powered by Apache Arrow
#计算机科学#Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
翻译 - 用于构建数据湖,数据仓库和分析平台的端到端GoodReads数据管道。
This repository is a getting started guide to Singer.
Making data lake work for time series
#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A simplified, lightweight ETL Framework based on Apache Spark
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Knowledge Graph Toolkit
A tool for building feature stores.
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
#大语言模型#Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Configurable Extract, Transform, and Load