Skip to content

get_fields() only reports value of last radio button #3273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saeub opened this issue Apr 30, 2025 · 5 comments
Open

get_fields() only reports value of last radio button #3273

saeub opened this issue Apr 30, 2025 · 5 comments
Labels
needs-discussion The PR/issue needs more discussion before we can continue workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@saeub
Copy link

saeub commented Apr 30, 2025

I have a PDF with three (mutually exclusive) radio buttons. PdfReader.get_fields() seems to ignore all but the last radio button in the group.

Minimal example PDF: example.pdf

Reproducible using this LaTeX code:

\documentclass{article}
\usepackage{hyperref}
\begin{document}
\begin{Form}
option1: \ChoiceMenu[radio,name=option]{}{=option1}

option2: \ChoiceMenu[radio,name=option]{}{=option2}

option3: \ChoiceMenu[radio,name=option]{}{=option3}
\end{Form}
\end{document}

PdfReader("example.pdf").get_fields() returns the following:

  • When option1, option2, or no option is selected:
    {'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/Off']}}
  • When option3 is selected:
    {'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/Off']}}

/_States_ only seems to contain option3, and /V only changes when option3 is selected.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.12.20-2-MANJARO-x86_64-with-glibc2.41

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.4.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none
@stefan6419846
Copy link
Collaborator

Thanks for the report. Could you please add the necessary/full code you used for testing here, including setting the different selections?

@stefan6419846 stefan6419846 added the needs-example-code The issue needs a minimal and complete (e.g. all imports) example showing the problem label May 1, 2025
@saeub
Copy link
Author

saeub commented May 2, 2025

I selected the different options manually in a PDF viewer. Here are the PDFs, each with one option selected:

This code will reproduce the outputs:

from pypdf import PdfReader

for filename in [f"example_option{i}.pdf" for i in range(1, 4)]:
    print(PdfReader(filename).get_fields())

@stefan6419846
Copy link
Collaborator

I just did some quick analysis and while _build_field is called for each of the buttons,

key = self._get_qualified_field_name(field)
will lead to the seen behavior that only the last button state is kept.

These are the current definitions inside the PDF file (with option 1 selected):

{'/Type': '/Annot', '/Rect': [192, 652.31, 205.94, 668.19], '/Subtype': '/Widget', '/F': 4, '/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/H': '/P', '/BS': {'/W': 1, '/S': '/S'}, '/MK': {'/BC': [1, 0, 0], '/BG': [1, 1, 1]}, '/DA': '/ZaDb 10 Tf 0 0 0 rg', '/V': '/\376\377\000o\000p\000t\000i\000o\000n\0001', '/DV': '/\Fld@default ', '/AP': {'/N': {'/\376\377\000o\000p\000t\000i\000o\000n\0001': IndirectObject(7, 0, 140371820540432)}}, '/AS': '/\376\377\000o\000p\000t\000i\000o\000n\0001', '/M': 'D:20250430214458'}

{'/Type': '/Annot', '/Rect': [191.995, 637.421, 205.942, 653.306], '/Subtype': '/Widget', '/F': 4, '/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/H': '/P', '/BS': {'/W': 1, '/S': '/S'}, '/MK': {'/BC': [1, 0, 0], '/BG': [1, 1, 1], '/CA': 'H'}, '/DA': '/ZaDb 10 Tf 0 0 0 rg', '/V': '/\Fld@default ', '/DV': '/\Fld@default ', '/AP': {'/N': {'/\376\377\000o\000p\000t\000i\000o\000n\0002': IndirectObject(7, 0, 140371820540432)}}}

{'/Type': '/Annot', '/Rect': [191.995, 622.532, 205.942, 638.417], '/Subtype': '/Widget', '/F': 4, '/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/H': '/P', '/BS': {'/W': 1, '/S': '/S'}, '/MK': {'/BC': [1, 0, 0], '/BG': [1, 1, 1], '/CA': 'H'}, '/DA': '/ZaDb 10 Tf 0 0 0 rg', '/V': '/\Fld@default ', '/DV': '/\Fld@default ', '/AP': {'/N': {'/\376\377\000o\000p\000t\000i\000o\000n\0003': IndirectObject(7, 0, 140371820540432)}}}

To be honest, I am not sure how we should handle this correctly. As far as I understand section 12.7.4.2 of the PDF 2.0 specification, this PDF file might violate the standard:

The T entry in the field dictionary [...] holds a text string defining the field’s partial field name. The fully qualified field name is not explicitly defined but shall be constructed from the partial field names of the field and all of its ancestors. [...]

[...]

In addition, actual field dictionaries with the same fully qualified field name shall have the same field type (FT), value (V), and default value (DV).

@stefan6419846 stefan6419846 added needs-discussion The PR/issue needs more discussion before we can continue workflow-forms From a users perspective, forms is the affected feature/workflow and removed needs-example-code The issue needs a minimal and complete (e.g. all imports) example showing the problem labels May 2, 2025
@saeub
Copy link
Author

saeub commented May 3, 2025

Thanks a lot for your analysis! I'm not very familiar with the PDF standard, so I can't judge whether the files violate it. But I've tested several PDF viewers and they generally seem to interpret the files as expected.

I also did some more tests with different PDFs. Here's one where, curiously, it's exactly the other way around -- only the first button state is kept:

\documentclass{article}
\usepackage{hyperref}
\begin{document}
\begin{Form}
\ChoiceMenu[radio,name=option]{}{option1=option1,option2=option2,option3=option3}
\end{Form}
\end{document}

example2_option1.pdf
example2_option2.pdf
example2_option3.pdf

>>> from pypdf import PdfReader
>>> for filename in [f"example2_option{i}.pdf" for i in range(1, 4)]:
...     print(PdfReader(filename).get_fields())
...     
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}}
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}}
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}}

@stefan6419846
Copy link
Collaborator

The difference between PDF readers and pypdf is that readers are able to hide such details, while pypdf needs a way to expose this to the user. Thus it only becomes obvious here and might lead to such unexpected behavior.

Regarding the new example: I decompressed the content streams and as far as I can see, only the first field/button has an actual reference.

  • /Root is the object 29.
  • Object 29 references the /AcroForm as object 24.
  • Object 24 references the /Fields as only object 6.
  • Object 6 is the first field/button and set to the default value (/V == /DV).
  • Object 2 (the first page) references the objects 6, 7 and 8 in its /Annots.
  • Object 7 and 8 are the other fields/buttons.

At the moment, /Annots are not considered for retrieving fields, as per section 12.5.6.19 of the PDF 2.0 specification, they are mostly related to the appearance and not actual fields. Apparently, this is not the case with actual PDF files.

example2_option3_1.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion The PR/issue needs more discussion before we can continue workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

2 participants