-
Notifications
You must be signed in to change notification settings - Fork 1.5k
get_fields()
only reports value of last radio button
#3273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Could you please add the necessary/full code you used for testing here, including setting the different selections? |
I selected the different options manually in a PDF viewer. Here are the PDFs, each with one option selected: This code will reproduce the outputs: from pypdf import PdfReader
for filename in [f"example_option{i}.pdf" for i in range(1, 4)]:
print(PdfReader(filename).get_fields()) |
I just did some quick analysis and while Line 604 in 5bc9faf
These are the current definitions inside the PDF file (with option 1 selected):
To be honest, I am not sure how we should handle this correctly. As far as I understand section 12.7.4.2 of the PDF 2.0 specification, this PDF file might violate the standard:
|
Thanks a lot for your analysis! I'm not very familiar with the PDF standard, so I can't judge whether the files violate it. But I've tested several PDF viewers and they generally seem to interpret the files as expected. I also did some more tests with different PDFs. Here's one where, curiously, it's exactly the other way around -- only the first button state is kept: \documentclass{article}
\usepackage{hyperref}
\begin{document}
\begin{Form}
\ChoiceMenu[radio,name=option]{}{option1=option1,option2=option2,option3=option3}
\end{Form}
\end{document} example2_option1.pdf >>> from pypdf import PdfReader
>>> for filename in [f"example2_option{i}.pdf" for i in range(1, 4)]:
... print(PdfReader(filename).get_fields())
...
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}}
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}}
{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0001', '/Off']}} |
The difference between PDF readers and pypdf is that readers are able to hide such details, while pypdf needs a way to expose this to the user. Thus it only becomes obvious here and might lead to such unexpected behavior. Regarding the new example: I decompressed the content streams and as far as I can see, only the first field/button has an actual reference.
At the moment, |
I have a PDF with three (mutually exclusive) radio buttons.
PdfReader.get_fields()
seems to ignore all but the last radio button in the group.Minimal example PDF: example.pdf
Reproducible using this LaTeX code:
PdfReader("example.pdf").get_fields()
returns the following:option1
,option2
, or no option is selected:{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\Fld@default ', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/Off']}}
option3
is selected:{'option': {'/T': 'option', '/FT': '/Btn', '/Ff': 49152, '/V': '/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/DV': '/\\Fld@default ', '/_States_': ['/\\376\\377\\000o\\000p\\000t\\000i\\000o\\000n\\0003', '/Off']}}
/_States_
only seems to containoption3
, and/V
only changes whenoption3
is selected.Environment
Which environment were you using when you encountered the problem?
The text was updated successfully, but these errors were encountered: