Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
#自然语言处理#工业级的 Python/CPython 自然语言处理(NLP)库
Fast, secure, efficient backup program
翻译 - 快速,安全,高效的备份程序
#安全#Deduplicating archiver with compression and authenticated encryption.
翻译 - 通过压缩和经过身份验证的加密对存档程序进行重复数据删除。
#安全#Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
翻译 - 适用于 Windows、macOS 和 Linux 的跨平台备份工具,具有快速增量备份、客户端端到端加密、压缩和重复数据删除功能。包括 CLI 和 GUI。
Prometheus Alertmanager
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
翻译 - :id:一个python库,用于精确和可扩展的模糊匹配,重复数据删除和实体解析。
#自然语言处理#A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
翻译 - 一个C库,用于解析/规范世界各地的街道地址。由统计NLP和开放地理数据提供支持。
A fast high compression read-only file system for Linux, Windows and macOS
翻译 - A fast high compression read-only file system
#安全#rustic - fast, encrypted, and deduplicated backups powered by Rust
Extremely fast tool to remove duplicates and other lint from your filesystem
Simple, configuration-driven backup software for servers and workstations
#自然语言处理#Datasets, SOTA results of every fields of Chinese NLP
翻译 - 中国自然语言处理各领域的数据集,SOTA结果
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Config driven, easy backup cli for restic.
#计算机科学#A powerful and modular toolkit for record linkage and duplicate detection in Python
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
翻译 - 用于数据掌握、重复数据删除和实体解析的可扩展模糊匹配。
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Insightful Tutorials and Papers about Knowledge Graphs
#大语言模型#Scalable data pre processing and curation toolkit for LLMs