#

data-processing

johnkerl/miller
https://static.github-zh.com/github_avatars/johnkerl?size=40

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

翻译Miller就像awk,sed,cut,join和对名称索引数据(例如CSV,TSV和表格JSON)进行排序

Go 9.25 k
7 天前
https://static.github-zh.com/github_avatars/TomWright?size=40

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

翻译从命令行查询和更新数据结构。与jq / yq相似,但支持JSON,TOML,YAML和XML,运行时相关性为零。

Go 7.41 k
18 天前
NVIDIA/DALI
https://static.github-zh.com/github_avatars/NVIDIA?size=40

#计算机科学#A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

翻译一个包含高度优化的构建块和执行引擎的库,用于深度学习应用程序中的数据预处理

C++ 5.35 k
15 小时前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40

A lightweight data processing framework built on DuckDB and 3FS.

Python 4.53 k
1 个月前
https://static.github-zh.com/github_avatars/unionai-oss?size=40
Python 3.75 k
5 小时前
dashbitco/broadway
https://static.github-zh.com/github_avatars/dashbitco?size=40

Concurrent and multi-stage data ingestion and data processing with Elixir

Elixir 2.51 k
11 天前
https://static.github-zh.com/github_avatars/asyml?size=40

#自然语言处理#Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Python 2.39 k
4 年前
https://static.github-zh.com/github_avatars/microsoft?size=40
Python 2.38 k
2 年前
https://static.github-zh.com/github_avatars/numaproj?size=40

Kubernetes-native platform to run massively parallel data/streaming jobs

Go 1.86 k
15 小时前
https://static.github-zh.com/github_avatars/python-bonobo?size=40
Python 1.59 k
2 年前
https://static.github-zh.com/github_avatars/GoogleCloudPlatform?size=40
Jupyter Notebook 1.38 k
24 天前
allenai/dolma
https://static.github-zh.com/github_avatars/allenai?size=40
Python 1.2 k
8 小时前
https://static.github-zh.com/github_avatars/GoogleCloudPlatform?size=40

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

856
4 年前
https://static.github-zh.com/github_avatars/asyml?size=40

#自然语言处理#Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

翻译将最好的TF集成到PyTorch中,用于机器学习,自然语言处理和文本生成

Python 745
3 年前
loading...
Website
Wikipedia