SeaTunnel (原名为 waterdrop)是一个易用的支持海量数据实时同步的高性能分布式数据集成平台,每天可以稳定同步数百亿数据
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Concurrent and multi-stage data ingestion and data processing with Elixir
Pravega - Streaming as a new software defined storage primitive
翻译 - Pravega-流式传输作为一种新的软件定义的存储原语
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
Use SQL to build ELT pipelines on a data lakehouse.
#自然语言处理#A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Apache Paimon Rust The rust implementation of Apache Paimon.
Apache Spark examples exclusively in Java
Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate
Enables custom tracing of Java applications in Dynatrace
#区块链#Download and warehouse historical trading data