Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add safe file writer/reader #46

Closed
wants to merge 4 commits into from
Closed

add safe file writer/reader #46

wants to merge 4 commits into from

Conversation

elv-gilles
Copy link
Collaborator

add safe file writer/reader from branch cbor-v2

and use namedLocks to lock file for safe read or write

Also:

  • log warn if waiting to obtain the lock
  • retry renaming & log warn when 'finalizing' a safe write

lukseven and others added 2 commits July 1, 2024 20:09
* log warn if waiting to obtain the lock
* retry renaming & log warn when 'finalizing' a safe write
Comment on lines 88 to 90
count++
_ = os.Remove(path)
err = os.Rename(tmp, path)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying here comes from discussion(s) around problems of what is happening:

  • on fabric nodes (linux)
  • on darwin when running unit tests concurrently

Looking at robustio in github.com/rogpeppe/go-internal

  • darwin is deemed flaky and Rename, RemoveAll and ReadFile are retried (see code for details) up to an arbitraryTimeout of 2s.
  • linux is not deemed flaky

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OS is not really the problem - it's the file system. On both linux and darwin one can use different file systems and each will have its own consistency semantics. So I think making any decisions based on OS is not a good idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OS is what we talk to (not even the real one but the os package) and 'should' provide guarantees or document [potential] deviations.

The only things I can say:

  • comment on os.Rename says 'Even within the same directory, on non-Unix platforms Rename is not an atomic operation.'
  • implementation for darwin goes to src/syscall/zsyscall_darwin_amd64.goand calls:
	_, _, e1 := syscall(abi.FuncPCABI0(libc_rename_trampoline), uintptr(unsafe.Pointer(_p0)), uintptr(unsafe.Pointer(_p1)), 0)
  • for linux, is goes to src/syscall/zsyscall_linux_amd64.go and calls:
_, _, e1 := Syscall6(SYS_RENAMEAT, uintptr(olddirfd), uintptr(unsafe.Pointer(_p0)), uintptr(newdirfd), uintptr(unsafe.Pointer(_p1)), 0, 0)

So the question is whether the 'flaky' comment applies to the implementation or results from ad-hoc observations. I forwarded your comment to the maintainer of the project (issue 264 in github.com/rogpeppe/go-internal), but not sure we'll ever get an answer.

I also observe comments in other packages, like github.com/google/renameio stating that it is necessary to call fsynch before closing and renaming.
see https://github.com/google/renameio/blob/c1c62ad1756cd19ffb560d66a3633d71d6104f82/tempfile.go#L150-L171

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not sure we'll ever get an answer

we won't, ticket was closed with these comments:

The code, godocs, and comments are largely borrowed from the Go codebase - they support the main OSes on a best-effort basis. https://pkg.go.dev/cmd/go/internal/robustio

Closing as resolved, because we haven't made these (hard) choices ourselves.

// lockSafeFile takes a lock in 'safeFiles' for the given path and returns the
// unlocker that must be called after using the locked file.
// This uses real sync.Mutex. Alternatively we could use:
// - advisory file locking as done by go internals in package src/cmd/go/internal/lockedfile.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that lockedfile is publicly available via github.com/rogpeppe/go-internal

* add func 'LockCount' to namedLocks
util/fileutil/safe.go Outdated Show resolved Hide resolved
count := 0
for {
count++
_ = os.Remove(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe os.Remove is necessary before os.Rename. I would try without it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right, the doc says:

Rename renames (moves) oldpath to newpath. If newpath already exists and is not a directory, Rename replaces it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out and it works. Also, it makes the unit test TestConcurrentSafeReaderWriter fail, which wanted to show that we have a race if we are not using go-level locks while reading and writing... So, at least on OSX, it seems that go-level locks are not needed in that case.

Copy link
Collaborator Author

@elv-gilles elv-gilles Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it just shows the algorithm is wonky (also see the comment added in the test):

  • the writer fails - as expected in the test - because a reader renamed the file
  • a subsequent reader was able to read the file but - at this point - this could be a partial file which would bring us to the original problem

The reason is that the algorithm requires deleting the file before renaming in order to signal a potential reader.

I am closing this PR because this code is fragile, requires strict synchronization and also I believe a call to Read some file should never make any modification to the file system.

util/fileutil/safe.go Outdated Show resolved Hide resolved
util/fileutil/safe.go Outdated Show resolved Hide resolved
util/fileutil/safe.go Outdated Show resolved Hide resolved
elv-nate
elv-nate previously approved these changes Jul 4, 2024
Copy link
Contributor

@elv-nate elv-nate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me (other than existing comments).

util/syncutil/named_locks.go Show resolved Hide resolved
@elv-gilles
Copy link
Collaborator Author

elv-gilles commented Jul 15, 2024

I am closing this PR because this code is fragile and requires strict synchronization.

Replaced by #48

@elv-gilles elv-gilles closed this Jul 15, 2024
@elv-gilles elv-gilles deleted the safe-file branch July 16, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants