-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add safe file writer/reader #46
Conversation
* log warn if waiting to obtain the lock * retry renaming & log warn when 'finalizing' a safe write
util/fileutil/safe.go
Outdated
count++ | ||
_ = os.Remove(path) | ||
err = os.Rename(tmp, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrying here comes from discussion(s) around problems of what is happening:
- on fabric nodes (linux)
- on darwin when running unit tests concurrently
Looking at robustio
in github.com/rogpeppe/go-internal
- darwin is deemed flaky and
Rename
,RemoveAll
andReadFile
are retried (see code for details) up to anarbitraryTimeout
of 2s. - linux is not deemed flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OS is not really the problem - it's the file system. On both linux and darwin one can use different file systems and each will have its own consistency semantics. So I think making any decisions based on OS is not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OS is what we talk to (not even the real one but the os package) and 'should' provide guarantees or document [potential] deviations.
The only things I can say:
- comment on os.Rename says 'Even within the same directory, on non-Unix platforms Rename is not an atomic operation.'
- implementation for darwin goes to
src/syscall/zsyscall_darwin_amd64.go
and calls:
_, _, e1 := syscall(abi.FuncPCABI0(libc_rename_trampoline), uintptr(unsafe.Pointer(_p0)), uintptr(unsafe.Pointer(_p1)), 0)
- for linux, is goes to
src/syscall/zsyscall_linux_amd64.go
and calls:
_, _, e1 := Syscall6(SYS_RENAMEAT, uintptr(olddirfd), uintptr(unsafe.Pointer(_p0)), uintptr(newdirfd), uintptr(unsafe.Pointer(_p1)), 0, 0)
So the question is whether the 'flaky' comment applies to the implementation or results from ad-hoc observations. I forwarded your comment to the maintainer of the project (issue 264 in github.com/rogpeppe/go-internal
), but not sure we'll ever get an answer.
I also observe comments in other packages, like github.com/google/renameio
stating that it is necessary to call fsynch
before closing and renaming.
see https://github.com/google/renameio/blob/c1c62ad1756cd19ffb560d66a3633d71d6104f82/tempfile.go#L150-L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but not sure we'll ever get an answer
we won't, ticket was closed with these comments:
The code, godocs, and comments are largely borrowed from the Go codebase - they support the main OSes on a best-effort basis. https://pkg.go.dev/cmd/go/internal/robustio
Closing as resolved, because we haven't made these (hard) choices ourselves.
util/fileutil/safe.go
Outdated
// lockSafeFile takes a lock in 'safeFiles' for the given path and returns the | ||
// unlocker that must be called after using the locked file. | ||
// This uses real sync.Mutex. Alternatively we could use: | ||
// - advisory file locking as done by go internals in package src/cmd/go/internal/lockedfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that lockedfile
is publicly available via github.com/rogpeppe/go-internal
* add func 'LockCount' to namedLocks
util/fileutil/safe.go
Outdated
count := 0 | ||
for { | ||
count++ | ||
_ = os.Remove(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe os.Remove is necessary before os.Rename. I would try without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be right, the doc says:
Rename renames (moves) oldpath to newpath. If newpath already exists and is not a directory, Rename replaces it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out and it works. Also, it makes the unit test TestConcurrentSafeReaderWriter fail, which wanted to show that we have a race if we are not using go-level locks while reading and writing... So, at least on OSX, it seems that go-level locks are not needed in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it just shows the algorithm is wonky (also see the comment added in the test):
- the writer fails - as expected in the test - because a reader renamed the file
- a subsequent reader was able to read the file but - at this point - this could be a partial file which would bring us to the original problem
The reason is that the algorithm requires deleting the file before renaming in order to signal a potential reader.
I am closing this PR because this code is fragile, requires strict synchronization and also I believe a call to Read
some file should never make any modification to the file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me (other than existing comments).
* add note
I am closing this PR because this code is fragile and requires strict synchronization. Replaced by #48 |
add safe file writer/reader from branch
cbor-v2
and use
namedLocks
to lock file for safe read or writeAlso: