data-preprocessing-pipelines

#大语言模型#Open source project for data preparation of LLM application builders

data-preparation finetuning 大语言模型 llmapps data data-prep data-preprocessing data-preprocessing-pipelines datacuration large-language-models large-scale-data-processing Python ray Apache Spark datarecipes Code quality Entity resolution Malware

HTML 613

2 天前

preprocessy / preprocessy

#计算机科学#Python package for Customizable Data Preprocessing Pipelines

pipelines preprocessing 机器学习 python-library data-engineering 数据科学 data-preprocessing-pipelines Hacktoberfest hacktoberfest2022

Jupyter Notebook 42

6 天前

shamspias / gpt3-data-preprocessing

#计算机科学#This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting fo...

人工智能 data-preprocessing data-preprocessing-pipelines 数据科学 gpt-3 机器学习

Python 6

2 年前

firefly-cpp / succulent

#计算机科学#Collect POST requests

data-collection data-preprocessing-pipelines 数据科学 ESP32 机器学习树莓派

Python 3

1 个月前

vuanhngo14 / Decision-Tree-from-Scratch

Understand and Implement decision tree

data-preprocessing data-preprocessing-pipelines 数据可视化 decision-tree

Jupyter Notebook 1

1 年前

kolhesamiksha / Nemo_Curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

curator data-preprocessing-pipelines generative-ai nemo Nvidia synthetic-dataset-generation

Jupyter Notebook 1

4 个月前

PrasunDatta / adorsho-praniSheba_Preprocessing-Pipeline-of-Muzzle-Data-of-Cow

This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.

data-preprocessing-pipelines image-preprocessing Jupyter Notebook python-script

Jupyter Notebook 0

2 年前

SaraLittleSquirrel / Obesity-estimator

#计算机科学#Project for Machine Learning Data Mining course

adaboost data-mining data-preprocessing-pipelines decision-tree 机器学习 NumPy pandas random-forest scikit-learn support-vector-machines

Jupyter Notebook 0

1 年前

DigitalLifeYZQiu / Data-Process-Library

The data process library to help better industrial data understanding.

data-preprocessing-pipelines

Jupyter Notebook 0

6 个月前

MustofAhmed41 / Data-Preprocessing-using-Distributed-Database

#计算机科学#Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data ...

数据库机器学习 plsql data-preprocessing-pipelines distributed-database

2 年前