etl-framework · GitHub Topics

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

batch-processing kafka pathway Python streaming 机器学习 real-time data-analytics data-pipelines data-processing dataflow etl etl-framework iot-analytics Rust stream-processing time-series-analysis

Python 24.05 k

20 小时前

elastic / logstash

Logstash 是一个实时的管道式开源日志收集引擎。 Logstash可以动态的将不同来源的数据进行归一并且将格式化的数据存储到你选择的位置。对你的所有做数据清洗和大众化处理，以便做数据分析和可视化。

etl-framework streaming Logging Java jruby real-time-processing

Java 14.43 k

1 天前

cloudquery / cloudquery

一个高性能ELT 框架，powered by Apache Arrow

Amazon Web Services Google 云 Azure SQL data-integration elt etl etl-framework BigQuery data-collection data-engineering Kubernetes data airbyte GitHub API 数据分析 Google Go cspm attack-surface-management

Go 6.07 k

20 小时前

noflo / noflo

Flow-based programming for JavaScript

noflo fbp etl-framework flow-based-programming 无代码

JavaScript 3.52 k

1 年前

DAGWorks-Inc / hamilton

#计算机科学#Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

数据科学 Python dag data-engineering dataframe etl etl-framework etl-pipeline feature-engineering 机器学习 pandas 软件工程数据分析 lineage llmops mlops orchestration Hacktoberfest rag

Jupyter Notebook 2.09 k

6 天前

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

翻译 - 用于构建数据湖，数据仓库和分析平台的端到端GoodReads数据管道。

etl-pipeline etl-framework Apache Spark apache-airflow airflow redshift emr-cluster livy s3 data-lake scheduler data-migration data-engineering data-engineering-pipeline Python etl-job

Python 1.36 k

5 年前

singer-io / getting-started

This repository is a getting started guide to Singer.

etl etl-framework Python 数据分析

Makefile 1.29 k

7 个月前

marsupialtail / quokka

Making data lake work for time series

data-lake-analytics distributed etl-framework mlops SQL

Python 1.16 k

8 个月前

stitchfix / hamilton

#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

Python pandas dag 数据科学 data-engineering NumPy 软件工程 etl-framework etl-pipeline etl feature-engineering dataframe data-platform 机器学习

Python 861

2 年前

Cinchoo / ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

CSV Parser writer reader flat XML JSON keyvalue etl etl-framework C#.NET parquet YAML avro

C# 824

4 个月前

apache / seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

apache data-integration data-pipeline etl-framework high-performance offline real-time seatunnel sql-engine

Java 662

13 小时前

YotpoLtd / metorikku

A simplified, lightweight ETL Framework based on Apache Spark

big-data Apache Spark Scala etl-framework distributed-computing SQL etl etl-pipeline

Scala 585

1 年前

seanharr11 / etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.

etl-framework etl Python 数据库 migrations sqlalchemy

Python 557

6 年前

usc-isi-i2 / kgtk

Knowledge Graph Toolkit

graphs RDF (Resource Description Framework)etl-framework embeddings wikidata toolkit

Jupyter Notebook 377

1 年前

quintoandar / butterfree

A tool for building feature stores.

Python package data-engineering etl-framework etl feature-store 数据科学 pyspark

Python 299

3 天前

data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws-s3 data lakehouse redshift 数据科学 etl-framework Amazon Web Services

Python 240

9 天前

Nextdoor / bender

Bender - Serverless ETL Framework

aws-lambda etl-framework aws-s3 etl Java

Java 185

1 年前

velocitybolt / open-extract

#大语言模型#Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.

人工智能 etl etl-framework 大语言模型 autogen crewai langchain langgraph openai rag unstructured-data Python

Python 166

14 天前