-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete files that don't match inventory items #117
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #117 +/- ##
===========================================
+ Coverage 31.37% 59.95% +28.57%
===========================================
Files 15 19 +4
Lines 1224 2412 +1188
===========================================
+ Hits 384 1446 +1062
- Misses 840 966 +126 ☔ View full report in Codecov by Sentry. |
fd2cf70
to
41ac719
Compare
@yarikoptic I tested this by first taking a backup of |
@yarikoptic Should this be merged now, or are there any other scenarios you want me to test first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge, even if just to facilitate further testing by @aaronkanzer and @kabilar on their use case. I left a few questions to give me a few bits of understanding .
Also please test 1 more "scenario". Following up on
@yarikoptic I tested this by first taking a backup of dandisets/ and then running s3invsync again on the same backup directory but targeting the manifest from a week prior. The run exited successfully, and files were indeed deleted, though I didn't check the actual inventories for correctness
That is cute sneaky. Overall, if we do
- run1: DATE1
- run2: DATE2 (back in time)
- run3: DATE1
- run4: DATE2 (again)
unless some "trailing delete" policy deletes anything, I expect results from run3
to bringing back state identical to run1
, and then run4
identical to run2
"regardless" (since nothing new should emerge but we should delete the same).
Since trailing delete on manifests is not yet implemented in production, run1
and run3
must also be identical (ref).
You could use cp --reflink=always
(on typhon's btrfs) to copy after each run without incurring heavy data transfer/time. And then just straight diff
to see if anything different.
use tokio::{sync::oneshot, time::timeout}; | ||
|
||
#[test] | ||
fn nursery_is_send() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this test work if both internal functions just announced not used or are they used by those Nursery's in the tokio async tests somehow automagically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test asserts that Nursery
implements the Send
trait (meaning it can be sent between threads) as long as the return type of the tasks does so as well. If it doesn't, the test code won't even compile, because the argument to require_send()
won't meet the type contraint.
} { | ||
if let Some(entry) = clnt.peek_inventory_csv(&fspec).await? { | ||
if sender.send((fspec, entry)).await.is_err() { | ||
// Assume we're shutting down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codecov reports that this condition is never hit. Does it mean that within tests we just never have "big enough" file that we do partial read or that we always downloading full files during such "peeking"?
Or related -- did you track sample executions to see that we are indeed not downloading full manifests here, thus downloading them fully twice altogether (once in peeking for sorting and once then for full use)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are currently no tests that do any I/O, so nothing in src/syncer/mod.rs
is tested. I don't know why codecov is highlighting just that line.
Also, for the record, this check has nothing to do with whether the read was partial or not; line 245 is reached if the channel for sending the peek results was closed, but I don't believe that can actually happen for the code here, so maybe codecov is smarter than it looks....
Or related -- did you track sample executions to see that we are indeed not downloading full manifests here, thus downloading them fully twice altogether (once in peeking for sorting and once then for full use)?
I did not track that. I'm not sure how you'd do that, given that the peeking code doesn't even write anything to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have assumed it logged on its operation -- then could have been checked in the logs on either read in full or just in part
@yarikoptic OK, I did the following: I still have the backup taken for the first scenario downloaded, and I updated it to the latest (2025-01-12) manifest earlier today to make sure my last-minute tweaks hadn't broken anything. So, I used |
@yarikoptic Regarding diffing the download dir and its backup, the paths of all the files & directories are the same between the two, at least. Do you need their contents to be compared as well? |
I didn't see you not, works ensure that downloads are ok, doesn't |
@yarikoptic The problem with |
That's great! So let's consider this one good and proceed! I will do my mighty contribution then by clicking the green button ;) |
Closes #33.
To do: