#大语言模型#Open source project for data preparation of LLM application builders
#计算机科学#Python package for Customizable Data Preprocessing Pipelines
#计算机科学#This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting fo...
#计算机科学#Collect POST requests
Understand and Implement decision tree
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
This work highlights my contribution as a "ML Engineer" at "adorsho praniSheb"(an ML based agro farming company of Bangladesh) where I was assigned the task of designing the preprocessing pipeline.
#计算机科学#Project for Machine Learning Data Mining course
The data process library to help better industrial data understanding.
#计算机科学#Machine learning models cannot be directly applied to raw data. This desktop application consists of a central server and two client servers. The main servers send raw data to clients, where the data ...