#网络爬虫#Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
翻译 - 具有多语言支持的集成语料库工具,用于语言,文学和翻译研究
#网络爬虫#A very simple news crawler with a funny name
#网络爬虫#Bitextor generates translation memories from multilingual websites
#自然语言处理#UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
#自然语言处理#Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Python library for handling audio datasets.
#自然语言处理#OpusFilter - Parallel corpus processing toolkit
Utilities for Processing the Switchboard Dialogue Act Corpus
An advanced, extensible web front-end for the Manatee-open corpus search engine
An open source reimplementation of Benny Brodda's BETA in Python
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
#自然语言处理#A set of workflows for corpus building through OCR, post-correction and normalisation
#计算机科学#Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
A parser for annotated MuseScore 3 files.
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
#自然语言处理#Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
#自然语言处理#Reading the data from OPIEC - an Open Information Extraction corpus
Rezonator: Dynamics of human engagement
Utilities for Processing the Meeting Recorder Dialogue Act Corpus