GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub

编程语言

”deduplication“ 的搜索结果

dupeguru
@arsenetar

Find duplicate files

PythonEntity resolution
Python6.35 k
1 年前
Entity resolution

Entity Resolution is the task of detecting different entity profiles that describe the same real-world objects.

  • entity-resolution
  • 查看主题

相关主题

Entity resolutiondedupePythoncrawlingrecord-linkageencryption爬虫backupcrawlers

Google   Bing   GitHub

sdfs
@opendedup

Deduplication Based Filesystem

Java381
2 年前
bedup
@g2p

Btrfs deduplication

Python326
5 年前
yarn-deduplicate
@scinos

Deduplication tool for yarn.lock files

Yarnduplicatesdedupe
TypeScript1.39 k
10 天前
semhash
@MinishLab

#数据仓库#Fast Semantic Text Deduplication & Filtering

数据集Entity resolutionpreprocessing
Python761
1 个月前
rabbitmq-message-deduplication
@noxdafox

RabbitMQ Plugin for filtering message duplicates

exchangerabbitmq
Elixir313
2 个月前
Dedupe.io
dedupe
Dedupe.io@dedupeio

🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

deduperecord-linkagePythonpython-libraryEntity resolution
Python4.33 k
1 天前
Duke
@larsga

Duke is a fast and flexible deduplication engine written in Java

Java623
2 年前
Vinta Software
deduplication-slides
Vinta Software@vintasoftware

"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook

Jupyter Notebook85
3 年前
zinggAI/zingg
zingg
@zinggAI

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

fuzzymatchfuzzy-matchingEntity resolutiondedupemasterdata
Java1.06 k
2 天前
Klaus Post
dedup
Klaus Post@klauspost

Streaming Deduplication Package for Go

Go187
7 年前
TheFuckingList
@eded333

Blocklist compilation and deduplication

adguard-blocklistpihole-blocklistsblocklistshostsfilterlist
Shell27
3 年前
pcompress
@moinakg

A Parallelized Data Deduplication and Compression utility

C277
4 年前
rdedup
@dpc

#安全#Data deduplication engine, supporting optional compression and public key encryption.

backupencryptionEntity resolution
Rust843
3 年前
record-linkage-resources
@ropeladder

Resources for tackling record linkage / deduplication / data matching problems

record-linkageEntity resolutionPython
125
1 年前
Synology_enable_Deduplication
@007revad

Enable deduplication with non-Synology SSDs and unsupported NAS models

diskstationrackstationsynologysynology-disk-stationsynology-dsm
Shell200
2 天前
containerd
nydus-snapshotter
containerd@containerd

A containerd snapshotter with data deduplication and lazy loading in P2P fashion

container-image
Go191
23 天前
deduplication-detecting
@GuohuaZhuang

文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。

Shell12
12 年前
kopia
@kopia

#安全#Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

Entity resolutionbackupgoogle-cloud-storageencryptioncloud
Go10.18 k
4 天前
Archive Team
wget-lua
Archive Team@ArchiveTeam

#网络爬虫#Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

warcwgetLuaarchiving
C125
6 个月前
courlan
@adbar

#网络爬虫#Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

urlurl-parsing爬虫uri
Python142
6 个月前
loading...