mail-deduplicate

πŸ“§ CLI to deduplicate mails from mail boxes

https://github.com/kdeldycke/mail-deduplicate

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • βœ“
    CITATION.cff file
    Found CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • βœ“
    DOI references
    Found 3 DOI reference(s) in README
  • βœ“
    Academic publication links
    Links to: zenodo.org
  • βœ“
    Committers with academic emails
    2 of 24 committers (8.3%) from academic institutions
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary

Keywords

babyl cleanup cli dedupe deduplication email mail mailbox maildir mbox mh mmdf python

Keywords from Contributors

gravitational-lensing profiles interactive network-simulation hacking embedded test-data-generator test-data faker-generator faker
Last synced: 4 months ago · JSON representation ·

Repository

πŸ“§ CLI to deduplicate mails from mail boxes

Basic Info
Statistics
  • Stars: 181
  • Watchers: 10
  • Forks: 40
  • Open Issues: 27
  • Releases: 25
Topics
babyl cleanup cli dedupe deduplication email mail mailbox maildir mbox mh mmdf python
Created almost 13 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Funding License Code of conduct Citation

readme.md

Mail Deduplicate

Last release Python versions Unittests status Documentation status Coverage status DOI

What is Mail Deduplicate?

Provides the mdedup CLI, an utility to deduplicate mails from a set of boxes.

Mail Deduplicate

Features

  • Duplicate detection based on cherry-picked and normalized mail headers.
  • Fetch mails from multiple sources.
  • Reads and writes to mbox, maildir, babyl, mh and mmdf formats.
  • Deduplication strategies based on size, content, timestamp, file path or random choice.
  • Copy, move or delete the resulting set of duplicates.
  • Dry-run mode.
  • Protection against false-positives with safety checks on size and content differences.
  • Supports macOS, Linux and Windows.
  • Standalone executables for Linux, macOS and Windows.
  • Shell auto-completion for Bash, Zsh and Fish.

⚠️ Warning: Performances

mdedup implementation is quite naive at the moment and everything resides in memory.

If this is good enough for a volume of a couple of gigabytes, the more emails mdedup try to parse, the closer you'll reach the memory limits of your machine. In which case mdedup will exit abruptly, zapped by the OOM killer of your OS. Of course your mileage may vary depending on your hardware.

You can influence implementation of this feature with pull requests, purchasing business support 🀝 and sponsorship 🫢.

Example

Installation

Python

uv is the fastest way to run mdedup from sources on any platform, thanks to its uvx command:

shell-session $ uvx --from mail-deduplicate mdedup

Executables

Standalone binaries of mdedup's latest version are available for several platforms and architectures:

| Platform | x86_64 | arm64 | | ----------- | --------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | | Linux | Download mdedup-linux-x64.bin | Download mdedup-linux-arm64.bin | | macOS | Download mdedup-macos-x64.bin | Download mdedup-macos-arm64.bin | | Windows | Download mdedup-windows-x64.exe | |

Other alternatives installation methods are available in the documentation.

Owner

  • Name: Kevin Deldycke
  • Login: kdeldycke
  • Kind: user
  • Location: ☁︎

Entrepreneur, VP, Engineering Manager, Founding Engineer - Billing, Payments & IAM.

Citation (citation.cff)

cff-version: 1.2.0
title: "Mail Deduplicate"
message: "If you use this software, please cite it as below."
type: software
authors:
  - family-names: "Deldycke"
    given-names: "Kevin"
    email: kevin@deldycke.com
    orcid: "https://orcid.org/0000-0001-9748-9014"
doi: 10.5281/zenodo.7364256
version: 7.6.3
# The release date is kept up to date by the external workflows. See:
# https://github.com/kdeldycke/workflows/blob/33b704b489c1aa18b7b7efbf963e153e91e1c810/.github/workflows/changelog.yaml#L135-L137
date-released: 2025-04-20
url: "https://github.com/kdeldycke/mail-deduplicate"

GitHub Events

Total
  • Create event: 97
  • Release event: 2
  • Issues event: 18
  • Watch event: 13
  • Delete event: 95
  • Issue comment event: 198
  • Push event: 301
  • Pull request event: 216
  • Fork event: 3
Last Year
  • Create event: 97
  • Release event: 2
  • Issues event: 18
  • Watch event: 13
  • Delete event: 95
  • Issue comment event: 198
  • Push event: 301
  • Pull request event: 216
  • Fork event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 1,435
  • Total Committers: 24
  • Avg Commits per committer: 59.792
  • Development Distribution Score (DDS): 0.304
Past Year
  • Commits: 133
  • Committers: 3
  • Avg Commits per committer: 44.333
  • Development Distribution Score (DDS): 0.195
Top Committers
Name Email Commits
Kevin Deldycke k****n@d****m 999
dependabot[bot] 4****] 266
Adam Spiers a****m@s****t 36
Adam Spiers m****e@a****g 35
Rolf Leggewie f****s@r****z 28
Kevin Murray s****m@k****u 13
Kevin Deldycke k****e@s****m 12
Nicolas Cenerario n****s@r****m 7
Ryan Seto m****f@g****m 6
Jeff Epler j****r@u****t 4
Peng Bai p****g@b****m 4
Juan Tascon j****n@h****g 3
Marcel Martin m****n@t****e 3
Matija Nalis m****t@v****r 3
Zaz Brown z****n@z****m 3
reedog117 p****1@g****m 3
Ben Reser b****n@r****g 2
Hiroshi Shirosaki h****i@g****m 2
Daiji Fukagawa d****8@g****m 1
Kian-Meng Ang k****g@c****g 1
Konrad Anton k****d@m****e 1
Kristoffer GrΓΆnlund k****g@k****e 1
Tristan Henderson t****h@s****k 1
Florian Joerg f****n@a****m 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 21
  • Total pull requests: 607
  • Average time to close issues: 2 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 11
  • Total pull request authors: 5
  • Average comments per issue: 1.24
  • Average comments per pull request: 1.31
  • Merged pull requests: 285
  • Bot issues: 5
  • Bot pull requests: 597
Past Year
  • Issues: 11
  • Pull requests: 191
  • Average time to close issues: 11 days
  • Average time to close pull requests: 16 days
  • Issue authors: 5
  • Pull request authors: 3
  • Average comments per issue: 0.73
  • Average comments per pull request: 1.07
  • Merged pull requests: 30
  • Bot issues: 0
  • Bot pull requests: 185
Top Authors
Issue Authors
  • deajan (4)
  • github-actions[bot] (4)
  • zaz (4)
  • dependabot[bot] (3)
  • turian (2)
  • johanneskastl (2)
  • portalgun (1)
  • danielhatton (1)
  • evrix (1)
  • ian-kelling (1)
  • diresi (1)
Pull Request Authors
  • dependabot[bot] (537)
  • github-actions[bot] (154)
  • zaz (6)
  • kianmeng (2)
  • shirosaki (2)
Top Labels
Issue Labels
πŸ› bug (11) πŸ“¦ dependencies (4) πŸ“š documentation (4) πŸ™ help wanted (4) 🎁 feature request (3) ✨ enhancement (1)
Pull Request Labels
πŸ“¦ dependencies (551) πŸ“š documentation (83) πŸ†™ changelog (31) πŸ€– ci (27) ✨ enhancement (3) πŸ› bug (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 132 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 23
  • Total maintainers: 1
pypi.org: mail-deduplicate

πŸ“§ CLI to deduplicate mails from mail boxes

  • Versions: 23
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 132 Last month
Rankings
Stargazers count: 5.8%
Forks count: 6.6%
Dependent packages count: 10.0%
Average: 11.5%
Downloads: 13.6%
Dependent repos count: 21.7%
Maintainers (1)
Funding
  • https://github.com/sponsors/kdeldycke
Last synced: 4 months ago

Dependencies

poetry.lock pypi
  • appdirs 1.4.4 develop
  • astroid 2.8.2 develop
  • atomicwrites 1.4.0 develop
  • attrs 21.2.0 develop
  • black 20.8b1 develop
  • bleach 3.3.0 develop
  • bump2version 1.0.1 develop
  • cached-property 1.5.2 develop
  • cffi 1.14.5 develop
  • check-wheel-contents 0.3.3 develop
  • coverage 6.0.1 develop
  • cryptography 3.4.7 develop
  • dataclasses 0.8 develop
  • iniconfig 1.1.1 develop
  • isort 5.8.0 develop
  • jeepney 0.6.0 develop
  • keyring 23.0.1 develop
  • lazy-object-proxy 1.6.0 develop
  • mccabe 0.6.1 develop
  • mypy-extensions 0.4.3 develop
  • pathspec 0.8.1 develop
  • pkginfo 1.7.0 develop
  • platformdirs 2.3.0 develop
  • pluggy 0.13.1 develop
  • py 1.10.0 develop
  • pycparser 2.20 develop
  • pydantic 1.7.4 develop
  • pylint 2.11.1 develop
  • pytest 6.2.5 develop
  • pytest-cov 3.0.0 develop
  • pytest-randomly 3.10.1 develop
  • pywin32-ctypes 0.2.0 develop
  • pyyaml 6.0 develop
  • readme-renderer 29.0 develop
  • regex 2021.7.1 develop
  • requests-toolbelt 0.9.1 develop
  • rfc3986 1.5.0 develop
  • secretstorage 3.3.1 develop
  • toml 0.10.2 develop
  • tomli 1.2.1 develop
  • tqdm 4.61.1 develop
  • twine 3.4.2 develop
  • typed-ast 1.4.3 develop
  • webencodings 0.5.1 develop
  • wheel-filename 1.3.0 develop
  • wrapt 1.12.1 develop
  • yamllint 1.26.3 develop
  • alabaster 0.7.12
  • arrow 1.2.0
  • babel 2.9.1
  • boltons 21.0.0
  • certifi 2021.5.30
  • chardet 4.0.0
  • click 8.0.1
  • click-help-colors 0.9.1
  • click-log 0.3.2
  • colorama 0.4.4
  • docutils 0.16
  • idna 2.10
  • imagesize 1.2.0
  • importlib-metadata 4.6.0
  • jinja2 3.0.1
  • markupsafe 2.0.1
  • packaging 20.9
  • pygments 2.9.0
  • pyparsing 2.4.7
  • python-dateutil 2.8.1
  • pytz 2021.1
  • requests 2.25.1
  • six 1.16.0
  • snowballstemmer 2.1.0
  • sphinx 4.3.2
  • sphinx-rtd-theme 1.0.0
  • sphinxcontrib-applehelp 1.0.2
  • sphinxcontrib-devhelp 1.0.2
  • sphinxcontrib-htmlhelp 2.0.0
  • sphinxcontrib-jsmath 1.0.1
  • sphinxcontrib-qthelp 1.0.3
  • sphinxcontrib-serializinghtml 1.1.5
  • tabulate 0.8.9
  • tomlkit 0.7.2
  • typing-extensions 3.10.0.0
  • urllib3 1.26.6
  • zipp 3.5.0
pyproject.toml pypi
  • black ^20.8b1 develop
  • bump2version ^1.0.1 develop
  • check-wheel-contents ^0.3.3 develop
  • coverage ^6.0 develop
  • pylint ^2.11.1 develop
  • pytest ^6.2.5 develop
  • pytest-cov ^3.0.0 develop
  • pytest-randomly ^3.10.1 develop
  • pyyaml ^6.0 develop
  • twine ^3.4.2 develop
  • yamllint ^1.26.3 develop
  • arrow >=0.17,<1.3
  • boltons >=20.2.1,<22.0.0
  • click ^8.0.0
  • click-help-colors >=0.8,<0.10
  • click-log ^0.3.2
  • python ^3.6
  • sphinx >=3.4.2,<5.0.0
  • sphinx_rtd_theme >=0.5.1,<1.1.0
  • tabulate ^0.8.7
  • tomlkit ^0.7.0
.github/workflows/tests.yaml actions
  • actions/checkout v3.3.0 composite
  • actions/setup-python v4.4.0 composite
  • codecov/codecov-action v3.1.1 composite
.github/workflows/autofix.yaml actions
.github/workflows/autolock.yaml actions
.github/workflows/changelog.yaml actions
.github/workflows/docs.yaml actions
.github/workflows/label-sponsors.yaml actions
.github/workflows/labeller-file-based.yaml actions
.github/workflows/labels.yaml actions
.github/workflows/lint.yaml actions
.github/workflows/release.yaml actions
  • actions/download-artifact v3.0.2 composite