Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
翻译 - Miller就像awk,sed,cut,join和对名称索引数据(例如CSV,TSV和表格JSON)进行排序
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
翻译 - 从命令行查询和更新数据结构。与jq / yq相似,但支持JSON,TOML,YAML和XML,运行时相关性为零。
#计算机科学#A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
翻译 - 一个包含高度优化的构建块和执行引擎的库,用于深度学习应用程序中的数据预处理
A lightweight data processing framework built on DuckDB and 3FS.
A light-weight, flexible, and expressive statistical data testing library
Concurrent and multi-stage data ingestion and data processing with Elixir
#自然语言处理#Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
#计算机科学#Large-scale pretraining for dialogue
翻译 - 对话的大规模预培训
Kubernetes-native platform to run massively parallel data/streaming jobs
#计算机科学#Python Stream Processing
Extract Transform Load for Python 3.5+
#计算机科学#Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Concurrent Python made simple
#自然语言处理#Data and tools for generating and inspecting OLMo pre-training data.
#计算机科学#Large-scale pretrained models for goal-directed dialog
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
#大语言模型#Scalable data pre processing and curation toolkit for LLMs
#自然语言处理#Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
翻译 - 将最好的TF集成到PyTorch中,用于机器学习,自然语言处理和文本生成