🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
翻译 - :id:一个python库,用于精确和可扩展的模糊匹配,重复数据删除和实体解析。
#自然语言处理#A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
翻译 - 一个C库,用于解析/规范世界各地的街道地址。由统计NLP和开放地理数据提供支持。
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
#计算机科学#A powerful and modular toolkit for record linkage and duplicate detection in Python
#自然语言处理#Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
🆔 Command line tool for deduplicating CSV files
🆔 Examples for using the dedupe library
#Awesome#A list of free data matching and record linkage software.
Super Fast String Matching in Python
🔎 Finds fuzzy matches between CSV files
#计算机科学#PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
#计算机科学#Link Discovery Framework for Metric Spaces.
Spark RDD with Lucene's query and entity linkage capabilities
Resources for tackling record linkage / deduplication / data matching problems
#自然语言处理#A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
Record Linkage ToolKit (Find and link entities)
Link Wikidata items to large catalogs
Python package for deduplication/entity resolution using active learning
Python implementation of anonymous linkage using cryptographic linkage keys
#Awesome#List of entity resolution software and resources.