#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Data pipelines from re-usable components
The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.
#网络爬虫#A project structure for doing and sharing data engineer work.
Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL
e-Portfolio showcasing my personal projects.
DataSift auto applies a data pre-processing pipeline to Data Science Projects.
An extension that registers all pharmacies in Argentina.
This repo contains the DAGs that run on my local Airflow environment. I use the local environment to test my DAGs before deploying them to virtual machines via Kubernetes
#计算机科学#A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nan...
Weaving together different threads (services like image/audio converse, ETL services, etc.) to enable the World Wide Flow
A Python and Spark based ETL framework. While it operates within speed limits that is framework and standards, but offers boundless possibilities.
End To End MLOPS Project With ETL Pipelines- Building Network Security System
This project demonstrates a complete ETL pipeline for Formula 1 racing data using Azure Databricks, Delta Lake, and Azure Data Factory. It covers data ingestion, transformation with PySpark and Spark ...
project in process