No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
#编辑器#Build data pipelines, the easy way 🛠️
翻译 - Orchest是用于创建数据科学管道的工具。
StreamX 的初衷是为了让流处理更简单. 打造一个一站式大数据平台,流批一体,湖仓一体的解决方案
#计算机科学#Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
翻译 - 与数据工程相关的项目很少,包括数据建模,云上的基础架构设置,数据仓库和数据湖开发。
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
翻译 - 用于构建数据湖,数据仓库和分析平台的端到端GoodReads数据管道。
#计算机科学#A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
#大语言模型#Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
#计算机科学#A Clojure high performance data processing system
A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection
A simplified, lightweight ETL Framework based on Apache Spark
#大语言模型#Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
Pythonic Stream-like manipulation of iterables.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
#计算机科学#A simple Spark-powered ETL framework that just works 🍺
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
This is a template you can use for your next data engineering portfolio project.
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.