data-processing · GitHub Topics

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

batch-processing kafka pathway Python streaming 机器学习 real-time data-analytics data-pipelines data-processing dataflow etl etl-framework iot-analytics Rust stream-processing time-series-analysis

Python 28.54 k

12 小时前

onceupon / Bash-Oneliner

Linux Bash 实用命令集合

oneliner-commands Bash data-processing linux-administration 终端 Linux variables grep xargs system hardware Shell one-liners

10.48 k

3 个月前

johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

data-processing data-cleaning CSV csv-format streaming-data streaming-algorithms tsv JSON json-data data-reduction 统计 statistical-analysis DevOps devops-tools tabular-data 命令行界面 command-line-tools

Go 9.33 k

3 天前

TomWright / dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

JSON YAML configuration selector 数据结构 Parser yaml-processor json-processing devops-tools Go 命令行界面 toml Query (disambiguation)update XML data-processing data-wrangling

Go 7.49 k

3 个月前

NVIDIA / DALI

#计算机科学#A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

fast-data-pipeline image-augmentation data-augmentation 图像处理 data-processing 深度学习机器学习 Python 神经网络 gpu gpu-tensorflow audio-processing PyTorch mxnet paddle

C++ 5.45 k

13 小时前

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

data-processing duckdb

Python 4.71 k

4 个月前

modelscope / data-juicer

#大语言模型#Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

数据分析数据科学 large-language-models 大语言模型数据可视化 instruction-tuning pre-training multi-modal synthetic-data data data-pipeline data-processing foundation-models

Python 4.68 k

3 天前

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

pandas validation schema dataframes Testing pandas-dataframe data-validation data-cleaning assertions hypothesis-testing data-processing

Python 3.88 k

20 小时前

dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir data-ingestion data-processing concurrent

Elixir 2.54 k

24 天前

microsoft / DialoGPT

#计算机科学#Large-scale pretraining for dialogue

dialogue 机器学习 PyTorch transformer text-generation dialogpt gpt-2 text-data data-processing

Python 2.39 k

3 年前

asyml / texar

#自然语言处理#Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/