Skip to content

Commit

Permalink
media magic tests
Browse files Browse the repository at this point in the history
  • Loading branch information
e3rd committed Jan 9, 2025
1 parent 7ee56cf commit 45eba69
Show file tree
Hide file tree
Showing 28 changed files with 299 additions and 403 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/run-unittest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.11, 3.12]
python-version: [3.11, 3.12, 3.13]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -15,4 +15,4 @@ jobs:
- name: Install dependencies
run: pip install -e .
- name: Run tests
run: python3 tests.py
run: python3 -m unittest discover tests
57 changes: 39 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
<p align="center"><b>Deduplidog</b> – Deduplicator that covers your back.</p>
<p align="center">
<img src="./asset/logo.jpg" />
</p>

[![Build Status](https://github.com/CZ-NIC/deduplidog/actions/workflows/run-unittest.yml/badge.svg)](https://github.com/CZ-NIC/deduplidog/actions)

Yet another file deduplicator.

- [About](#about)
* [What are the use cases?](#what-are-the-use-cases)
Expand All @@ -18,9 +22,9 @@ Yet another file deduplicator.
# About

## What are the use cases?
* I have downloaded photos and videos from the cloud. Oh, both Google Photos and Youtube shrink the file and changes the format. Moreover, it have shortened the file name to 47 characters and capitalize the extension. So how should I know that I have them all backed up offline?
* I have downloaded photos and videos from the cloud. Oh, both Google Photos and Youtube *shrink the files* and change their format. Moreover, they shorten the file names to 47 characters and capitalize the extensions. So how am I supposed to know if I have everything backed up offline when the copies are resized?
* My disk is cluttered with several backups and I'd like to be sure these are all just copies.
* I merge data from multiple sources. Some files in the backup might have the former orignal file modification date that I might wish to restore.
* I merge data from multiple sources. Some files in the backup might have *the former orignal file modification date* that I might wish to restore.

## What is compared?

Expand Down Expand Up @@ -53,19 +57,30 @@ The program does not write anything to the disk, unless `execute=True` is set. F

Install with `pip install deduplidog`.

It works as a standalone program with both CLI and TUI interfaces. Just launch the `deduplidog` command.
Moreover, it works best when imported from a [Jupyter Notebook](https://jupyter.org/).
It works as a standalone program with all the CLI, TUI and GUI interfaces. Just launch the `deduplidog` command.

# Examples

## Media magic confirmation

Let's compare two folders.

```bash
deduplidog --work-dir folder1 --original-dir folder2 --media-magic --rename --execute
```

By default, `--confirm-one-by-one` is True, causing every change to be manually confirmed before it takes effect. So even though `--execute` is there, no change happen without confirmation. The change that happen is the `--rename`, the file in the `--work-dir` will be prefixed with the `` character. The `--media-magic` mode considers an image a duplicate if it has the same name and a similar image hash, even if the files are of different sizes.

![Confirmation](asset/warnings_confirmation_example.avif "Confirmation, including warnings")

Note that the default button is 'No' as there are some warnings. First, the file in the folder we search for duplicates in is bigger than the one in the original folder. Second, it is also older, suggesting that it might be the actual original.


## Duplicated files
Let's take a closer look to a use-case.

```python3
import logging
from deduplidog import Deduplidog

Deduplidog("/home/user/duplicates", "/media/disk/origs", ignore_date=True, rename=True)
```bash
deduplidog --work-dir /home/user/duplicates --original-dir /media/disk/origs" --ignore-date --rename
```
This command produced the following output:
Expand All @@ -85,9 +100,8 @@ Warnings: 1
We found out all the files in the *duplicates* folder seem to be useless but one. It's date is earlier than the original one. The life buoy icon would prevent any action. To suppress this, let's turn on `set_both_to_older_date`. See with full log.
```python3
Deduplidog("/home/user/duplicates", "/media/disk/origs",
ignore_date=True, rename=True, set_both_to_older_date=True, log_level=logging.INFO)
```bash
deduplidog --work-dir /home/user/duplicates --original-dir /media/disk/origs --ignore-date --rename --set-both-to-older-date --log-level=10
```
```
Expand All @@ -112,9 +126,8 @@ Affected size: 59.9 kB
You see, the log is at the most brief, yet transparent form. The files to be affected at the work folder are prepended with the 🔨 icon whereas those affected at the original folder uses 📄 icon. We might add `execute=True` parameter to perform the actions. Or use `inspect=True` to inspect.
```python3
Deduplidog("/home/user/duplicates", "/media/disk/origs",
ignore_date=True, rename=True, set_both_to_older_date=True, inspect=True)
```bash
deduplidog --work-dir /home/user/duplicates --original-dir /media/disk/origs --ignore-date --rename --set-both-to-older-date --inspect
```
The `inspect=True` just produces the commands we might subsequently use.
Expand All @@ -133,7 +146,7 @@ You face a directory that might contain some images twice. Let's analyze. We tur
```
$ deduplidog --work-dir ~/shuffled/ --media-magic --ignore-name --skip-bigger --log-level=20
Only files with media suffixes are taken into consideration. Nor the size nor the date is compared. Nor the name!
Duplicates from the work dir at 'shuffled' (only if smaller than the pair file) would be (if execute were True) left intact (because no action is selected).
Duplicates from the work dir at 'shuffled' (only if smaller than the pair file) would be (if execute were True) left intact (because no action is selected, nothing will happen).
Number of originals: 9
Caching image hashes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 16.63it/s]
Expand Down Expand Up @@ -197,7 +210,15 @@ Find the duplicates. Normally, the file must have the same size, date and name.
| output | bool | False | Stores the output log to a file in the current working directory. (Never overwrites an older file.) |
## Utils
In the `deduplidog.utils` packages, you'll find a several handsome tools to help you. You will find parameters by using you IDE hints.
The library might be invoked from a [Jupyter Notebook](https://jupyter.org/).
```python3
from deduplidog import Deduplidog
Deduplidog("/home/user/duplicates", "/media/disk/origs", ignore_date=True, rename=True).start()
```
In the `deduplidog.utils` packages, you'll find a several handsome tools to help you. You will find parameters by using your IDE hints.
### `images`
*`urls: Iterable[str | Path]`* Display a ribbon of images.
Expand Down
Binary file added asset/logo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added asset/warnings_confirmation_example.avif
Binary file not shown.
10 changes: 5 additions & 5 deletions deduplidog/__main__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
import sys

from mininterface import run
from mininterface import Cancelled, run

from .deduplidog import Deduplidog


def main():
# NOTE: I'd like to have the default in case work dir is not specified args=("--work-dir", str(Path.cwd()))
# Currently, args overthrows CLI arguments.
with run(Deduplidog, interface=None) as m:
# with run(Deduplidog, interface="tui") as m:
# m = run(Deduplidog, interface="gui")
# if 1:
# m.facet._layout # TODO
try:
while True:
print("")
Expand All @@ -23,6 +21,8 @@ def main():
# deduplidog.perform()
# else:
m.env.start(m)
except Cancelled:
continue
except Exception as e:
print("-"*100)
print(e)
Expand Down
15 changes: 0 additions & 15 deletions deduplidog/cli.py

This file was deleted.

11 changes: 6 additions & 5 deletions deduplidog/deduplidog.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ class Deduplidog:
media: OmitArgPrefixes[Media]
helper: OmitArgPrefixes[Helper]

work_dir: Path
work_dir: Path = Path.cwd()
"""Folder of the files suspectible to be duplicates."""

original_dir: Path | None = None
Expand Down Expand Up @@ -255,6 +255,7 @@ def start(self, interface=None):
self.reset()
self.check()
self.perform()
return self

def perform(self):
# build file list of the originals
Expand Down Expand Up @@ -406,7 +407,7 @@ def check(self):
def _get_action(self, passive=False):
action = self.action.rename, self.action.replace_with_original, self.action.delete, self.action.replace_with_symlink
if not sum(action):
return f"{'left' if passive else 'leave'} intact (because no action is selected)"
return f"{'left' if passive else 'leave'} intact (because no action is selected, nothing will happen)"
elif sum(action) > 1:
raise AssertionError("Choose only one execute action (like only rename).")
elif self.action.rename:
Expand Down Expand Up @@ -697,9 +698,9 @@ def _find_similar(self, work_file: Path, candidates: list[Path]):
for original in candidates:
ost, wst = original.stat(), work_file.stat()
if (self.match.ignore_date
or wst.st_mtime == ost.st_mtime
or self.match.tolerate_hour and self.match.tolerate_hour[0] <= (wst.st_mtime - ost.st_mtime)/3600 <= self.match.tolerate_hour[1]
) and (self.match.ignore_size or wst.st_size == ost.st_size and (not self.match.checksum or crc(original) == crc(work_file))):
or wst.st_mtime == ost.st_mtime
or self.match.tolerate_hour and self.match.tolerate_hour[0] <= (wst.st_mtime - ost.st_mtime)/3600 <= self.match.tolerate_hour[1]
) and (self.match.ignore_size or wst.st_size == ost.st_size and (not self.match.checksum or crc(original) == crc(work_file))):
return original

def _find_similar_media(self, work_file: Path, comparing_image: bool, candidates: list[Path]):
Expand Down
10 changes: 0 additions & 10 deletions deduplidog/form.tcss

This file was deleted.

65 changes: 0 additions & 65 deletions deduplidog/tui.py

This file was deleted.

Loading

0 comments on commit 45eba69

Please sign in to comment.