#计算机科学#Apache Airflow 是一个workflow工作流调度、编排、监控平台
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Airbyte 开源 EL(T) 平台,帮助用户将数据从应用程序,API 和数据库中同步到数据仓库
Doris 是百度开源的支持对海量大数据进行快速分析的MPP数据库。
An orchestration platform for the development, production, and observation of data assets.
翻译 - 用于构建数据应用程序的Python库:ETL,ML,数据管道等。
Fancy stream processing made operationally mundane
翻译 - 普通任务和数据工程的声明式流处理
#计算机科学#🧙 Build, run, and manage data pipelines for integrating and transforming data.
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Flink CDC Connector 是ApacheFlink的一组数据源连接器
Privacy and Security focused Segment-alternative, in Golang and React
翻译 - Golang和React中针对隐私和安全性的细分市场替代方案
#编辑器#Build data pipelines, the easy way 🛠️
翻译 - Orchest是用于创建数据科学管道的工具。
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Par...
翻译 - AWS上的Pandas
Open source data anonymization and synthetic data platform for developers. Anonymize your production data and sync it across your environments so that developers can safely use it.
Spreadsheet with AI, Code, Connections
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Go...
#Awesome#A curated list with resources about node-based UIs
#自然语言处理#🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
翻译 - DevLake:用于 DevOps 工具的开源数据湖和仪表板。
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage