Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)
Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). Please post problem...
翻译 - 基于git packfile格式的非常高效的备份系统,提供快速的增量保存和全局重复数据删除(文件间以及文件内,包括虚拟机映像)。当前版本是0.30,开发分支是master。请将问题或补丁发布到邮件列表中进行讨论(请参阅下面的自述文件的结尾)。
Image Deduplication in Python
😎 Finding duplicate images made easy!
翻译 - 😎查找重复图像变得容易!
Remove exact and approximate duplicates from your dataset in FiftyOne!
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Ultra Fast Optimized Image Deduplication.
Image similarity in Golang. Version 4 (LATEST)
Deduplication Based Filesystem
Deduplication tool for yarn.lock files
Streaming Deduplication Package for Go
Blocklist compilation and deduplication
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
翻译 - 用于数据掌握、重复数据删除和实体解析的可扩展模糊匹配。
Resources for tackling record linkage / deduplication / data matching problems
"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
翻译 - :id:一个python库,用于精确和可扩展的模糊匹配,重复数据删除和实体解析。
Stable Diffusion 是一个 text-to-image 扩散模型
文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。
RabbitMQ Plugin for filtering message duplicates
#安全#Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
翻译 - 适用于 Windows、macOS 和 Linux 的跨平台备份工具,具有快速增量备份、客户端端到端加密、压缩和重复数据删除功能。包括 CLI 和 GUI。
#计算机科学#Image-to-Image Translation in PyTorch
翻译 - PyTorch中的图像到图像翻译
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.