Find duplicate files
Deduplication Based Filesystem
Deduplication tool for yarn.lock files
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
翻译 - :id:一个python库,用于精确和可扩展的模糊匹配,重复数据删除和实体解析。
RabbitMQ Plugin for filtering message duplicates
"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook
Fast, secure, efficient backup program
翻译 - 快速,安全,高效的备份程序
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
翻译 - 用于数据掌握、重复数据删除和实体解析的可扩展模糊匹配。
Streaming Deduplication Package for Go
Blocklist compilation and deduplication
Resources for tackling record linkage / deduplication / data matching problems
文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。
#安全#Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
翻译 - 适用于 Windows、macOS 和 Linux 的跨平台备份工具,具有快速增量备份、客户端端到端加密、压缩和重复数据删除功能。包括 CLI 和 GUI。
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
魔兽世界聊天消息去重插件 | Message Deduplication Addon for Wow
Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). Please post problem...
翻译 - 基于git packfile格式的非常高效的备份系统,提供快速的增量保存和全局重复数据删除(文件间以及文件内,包括虚拟机映像)。当前版本是0.30,开发分支是master。请将问题或补丁发布到邮件列表中进行讨论(请参阅下面的自述文件的结尾)。
Python写的文件去重,递归找文件,计算MD5,去重;File deduplication script in Python.