#Mysql高可用中间件#ShardingSphere 是一个用于分表分库的数据库中间件,由JDBC、Proxy 和 Sidecar组成
Airbyte 开源 EL(T) 平台,帮助用户将数据从应用程序,API 和数据库中同步到数据仓库
The leader in Next-Generation Customer Data Infrastructure
Flink CDC Connector 是ApacheFlink的一组数据源连接器
Privacy and Security focused Segment-alternative, in Golang and React
翻译 - Golang和React中针对隐私和安全性的细分市场替代方案
A list of useful resources to learn Data Engineering from scratch
翻译 - 从零开始学习数据工程的有用资源列表
Memphis.dev is a highly scalable and effortless data streaming platform
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
#计算机科学#An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
A lightweight stream processing library for Go
翻译 - 流处理库
CLI task management & automation tool
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
🔥🔥🔥 Open source composable CDP - alternative to hightouch and census.
#计算机科学#Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
#网络爬虫#Example end to end data engineering project.
#自然语言处理#Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
#计算机科学#Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).