Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use os.fsencode to encode bytes #5662

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

snejus
Copy link
Member

@snejus snejus commented Mar 10, 2025

No description provided.

@snejus snejus requested review from JOJ0 and wisp3rwind March 10, 2025 07:49
Copy link

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Member

@wisp3rwind wisp3rwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely out of my depth here; to review this properly, I'm lacking knowledge on the whole en/decoding business.

That said, I left a comment on one change that seemed unclear.

Superficially, this seems plausible as a fix to #5648, since that issue is probably caused by partial conversion of en/decoding usage.

Comment on lines +286 to +287
str1 = unidecode(str1)
str2 = unidecode(str2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch that this was a no-op these days!

@@ -216,7 +217,7 @@ def value_match(cls, pattern: P, value: Any):
"""Determine whether the value matches the pattern. The value
may have any type.
"""
return cls.string_match(pattern, util.as_string(value))
return cls.string_match(pattern, os.fsdecode(str(value or "")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, it definitely deviates from the previous behaviour:

  • str(b'abc') == "b'abc'"
  • Does it really make sense to involve fsdecode here? The value has no immediate relation to the filesystem, but is meant to match database fields (path queries are handled separately in library.PathQuery). Before, the ignore error handler was used: That way, there are definitely no decoding errors. Now, I'm not sure what could happen for decoding errors: Surrogates maybe? What would that mean for string_match.

@FlorentLM
Copy link

FlorentLM commented Mar 19, 2025

Damn I was about to do this exact thing, glad I checked the issues first haha

Beets is as-is unusable for me with the current util._fsencoding function (there are surrogated characters in place of some swedish chars in an Opeth album and the parsing errors and I can't import the rest of the music).

Using os.fsencode works.

@snejus snejus requested a review from Copilot March 20, 2025 17:31
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the codebase to use os.fsencode and os.fsdecode for handling bytes and string conversions, replacing custom encoding routines. Key changes include the removal of arg_encoding/_fsencoding functions, uniform conversions in command, query, and utility functions, and corresponding updates in tests and plugins.

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
beetsplug/convert.py Replace custom decoding with os.fsdecode for command and argument processing.
beets/dbcore/query.py Update value conversion using os.fsdecode instead of util.as_string.
beets/util/init.py Remove arg_encoding/_fsencoding and simplify bytestring_path and displayable_path.
Various test files Adjust tests to match new encoding conversion methods.
beetsplug/hook.py, ipfs.py, etc Update encoding usage consistently across plugins and command handling.
Comments suppressed due to low confidence (2)

beets/util/init.py:391

  • The previous try/except fallback to utf-8 was removed in bytestring_path. Please verify that os.fsencode handles all edge cases in non-standard locales or with unusual inputs.
return os.fsencode(str_path)

beets/dbcore/query.py:220

  • [nitpick] While os.fsdecode returns a string unchanged if given a str, wrapping str(value) in os.fsdecode may be redundant; consider clarifying the intent with a comment.
return cls.string_match(pattern, os.fsdecode(str(value or "")))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 14: surrogates not allowed
3 participants