GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

data-processing

Website
Wikipedia
https://static.github-zh.com/github_avatars/pathwaycom?size=40
pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

batch-processingkafkapathwayPythonstreaming机器学习real-timedata-analyticsdata-pipelinesdata-processingdataflowetletl-frameworkiot-analyticsRuststream-processingtime-series-analysis
Python 28.54 k
12 小时前
onceupon/Bash-Oneliner
https://static.github-zh.com/github_avatars/onceupon?size=40
onceupon / Bash-Oneliner

Linux Bash 实用命令集合

oneliner-commandsBashdata-processinglinux-administration终端LinuxvariablesgrepxargssystemhardwareShellone-liners
10.48 k
3 个月前
johnkerl/miller
https://static.github-zh.com/github_avatars/johnkerl?size=40
johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

data-processingdata-cleaningCSVcsv-formatstreaming-datastreaming-algorithmstsvJSONjson-datadata-reduction统计statistical-analysisDevOpsdevops-toolstabular-data命令行界面command-line-tools
Go 9.33 k
3 天前
https://static.github-zh.com/github_avatars/TomWright?size=40
TomWright / dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

JSONYAMLconfigurationselector数据结构Parseryaml-processorjson-processingdevops-toolsGo命令行界面tomlQuery (disambiguation)updateXMLdata-processingdata-wrangling
Go 7.49 k
3 个月前
NVIDIA/DALI
https://static.github-zh.com/github_avatars/NVIDIA?size=40
NVIDIA / DALI

#计算机科学#A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

fast-data-pipelineimage-augmentationdata-augmentation图像处理data-processing深度学习机器学习Python神经网络gpugpu-tensorflowaudio-processingPyTorchmxnetpaddle
C++ 5.45 k
13 小时前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40
deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

data-processingduckdb
Python 4.71 k
4 个月前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / data-juicer

#大语言模型#Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

数据分析数据科学large-language-models大语言模型数据可视化instruction-tuningpre-trainingmulti-modalsynthetic-datadatadata-pipelinedata-processingfoundation-models
Python 4.68 k
3 天前
https://static.github-zh.com/github_avatars/unionai-oss?size=40
unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

pandasvalidationschemadataframesTestingpandas-dataframedata-validationdata-cleaningassertionshypothesis-testingdata-processing
Python 3.88 k
20 小时前
dashbitco/broadway
https://static.github-zh.com/github_avatars/dashbitco?size=40
dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixirdata-ingestiondata-processingconcurrent
Elixir 2.54 k
24 天前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / DialoGPT

#计算机科学#Large-scale pretraining for dialogue

dialogue机器学习PyTorchtransformertext-generationdialogptgpt-2text-datadata-processing
Python 2.39 k
3 年前
https://static.github-zh.com/github_avatars/asyml?size=40
asyml / texar

#自然语言处理#Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

机器学习自然语言处理Tensorflow深度学习text-generationPythonmachine-translationdialog-systemstexarbertgpt-2xlnettext-datadata-processing
Python 2.39 k
4 年前
https://static.github-zh.com/github_avatars/cocoindex-io?size=40
cocoindex-io / cocoindex

#大语言模型#Real-time data transformation framework for AI. Ultra performant, with incremental processing.

人工智能change-data-capturedatadata-indexingetlindexingpipelinePythonragreal-timeRustsemantic-searchstreamingdata-engineeringdata-infrastructuredata-processingdataflowhelp-wantedknowledge-graph大语言模型
Rust 2 k
14 小时前
https://static.github-zh.com/github_avatars/numaproj?size=40
numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Kubernetesstream-processingdata-processingpipelinemap-reduceHacktoberfest
Go 1.89 k
5 小时前
https://static.github-zh.com/github_avatars/bytewax?size=40
bytewax / bytewax

#计算机科学#Python Stream Processing

Pythonstream-processingRustdata-engineeringdata-processing数据科学dataflow机器学习streaming-data
Python 1.77 k
3 个月前
https://static.github-zh.com/github_avatars/python-bonobo?size=40
python-bonobo / bonobo

Extract Transform Load for Python 3.5+

data-processingbonoboPython自动化parallelization
Python 1.59 k
2 年前
https://static.github-zh.com/github_avatars/pyper-dev?size=40
pyper-dev / pyper

Concurrent Python made simple

asyncioconcurrencyPythonthreadingdata-pipelinesdata-processingmultiprocessingparallel-computingdatadata-collectiondata-engineering
Python 1.44 k
5 个月前
https://static.github-zh.com/github_avatars/GoogleCloudPlatform?size=40
GoogleCloudPlatform / data-science-on-gcp

#计算机科学#Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

数据分析数据可视化cloud-computing机器学习data-pipelinedata-processing数据科学data-engineering
Jupyter Notebook 1.38 k
3 个月前
allenai/dolma
https://static.github-zh.com/github_avatars/allenai?size=40
allenai / dolma

#自然语言处理#Data and tools for generating and inspecting OLMo pre-training data.

data-processinglarge-language-models大语言模型machile-learning自然语言处理
Python 1.26 k
4 天前
https://static.github-zh.com/github_avatars/NVIDIA-NeMo?size=40
NVIDIA-NeMo / Curator

#大语言模型#Scalable data pre processing and curation toolkit for LLMs

data-curation大语言模型datadata-prepdata-preparationdata-processingdata-qualitydatacurationdatarecipesEntity resolutionfine-tuninglarge-language-modelslarge-scale-data-processingllmappsPython
Jupyter Notebook 980
2 天前
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / GODEL

#计算机科学#Large-scale pretrained models for goal-directed dialog

data-processingdialoguedialogue-systems机器学习text-datatext-generationtransformersconversational-ailanguage-groundinggrounded-generationdialogptlanguage-modelpretrained-modelPyTorchtransformer
Python 870
2 年前
loading...