Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally restore ToMan functionality #36

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

xsawyerx
Copy link
Collaborator

@xsawyerx xsawyerx commented Nov 30, 2017

[Each commit here has been done separately with - hopefully - a good commit message. It is easier to review them separately than together.]

This Pull Request introduces ToMan back and provides conditions for when to use ToMan and when to use ToTerm, falling back to ToText (as before).

The conditions are as follows (or should be):

  • If groff is new enough, use ToMan. This was the previous system that was moved to ToTerm but worked well for non-macOS operating systems. ToTerm works well for those Macs but fails any non-macOS. macOS carries an older version of groff.
  • If groff is too old (macOS) and our pager is less, use ToTerm. ToTerm will continue to (conditionally) set the -R to make less work well.
  • If groff is too old and our pager is not less, use ToText. This is the fallback and was set before our code for deciding whetther to use ToMan or ToTerm.

The way this PR works is by moving the detection of groff to the main module (and hopefully refactoring it to allow other stuff in the future) and then making the decision in line 593 and below in Perldoc.pm.

This should resolve RT#120229.

The ToMan formatter would search for the groff version it has available
but based on CPAN RT #120229, we need to determine this much earlier to
decide whether to pass on to ToMan or not.

This does not make any behavioral changes (other than temporarily
removing a debug message).

* Two helper functions from Pod::Perldoc::BaseTo were moved into
  Perldoc.pm: _get_path_components() and _find_executable_in_path().
* Perldoc.pm now has a method inspect_execs() which tries to find all
  available executables of a given program. It uses data from a new
  helper function, _exec_data(). Currently only nroff is supported.
* From ToMan we removed the code for detecting nroff and instead
  incorporated them in _exec_data() or a new function
  _find_executable() -- both in Perldoc.pm.
* Perldoc.pm's _inspect_execs() now calls _find_executable(),
  _find_executable_in_path(), and _get_path_components().

ToTerm can now be switched to ToMan and everything still works as it did
before, but we are not doing this yet.

The next commit will start addressing the logic for picking ToMan
or ToTerm.
This is just too much to read or figure out at once. Instead of every
possible condition (including unless !(...), which is ridiculous), it's
simply the obvious cocnditions: A set of OS's or a check on term.

If I understand this logic correctly (and I had to write some tests to
verify), it means that we are not using ToTerm for Windows, DOS,
AmigaOS, or for the recognized terms listed ("*dumb*", "*emacs*",
"*none*", and "*unknown*". The fallback is ToText. I guess that makes
sense.
The formatter class will need to know which pagers have been found so it
could determine whether ToTerm is a good option.
The idea is:

* If we have an updated version of groff, we can just use ToMan.
* If not, we can use ToTerm on the condition that less is updated
  enough and supports the "-R" flag that will be added to it by
  ToTerm.
* Otherwise, do not apply anything, using the first default set of
  ToText.
@afresh1
Copy link
Contributor

afresh1 commented May 2, 2019

I would like to also see this default ToMan if the nroffer is mandoc, on OpenBSD it works just fine.

@xsawyerx
Copy link
Collaborator Author

xsawyerx commented May 4, 2019

This was delayed due to comments from Zefram. It's on my list of things to do, but I need to address his comments first.

xsawyerx added 2 commits May 6, 2019 20:15
This is resolving an issue raised by Zefram:

    The use of `$less_bin --version` is dubious, because, per
    ->pagers_guessing, $less_bin may contain shell redirection
    characters, such that "--version" wouldn't necessarily function as
    a command-line argument to less(1).  You may need to parse pager
    strings in more detail.

Instead we're removing any shell redirect character and anything
that runs afterwards.
We had assumed that the first `less` pager is of a sufficient version
but that is not necessarily the case.

Instead, we now go through all `less` pagers and test each one separately.
When we find one of the right version, we stop. If the user doesn't like
this, they should set their first pager to the right `less` pager.
@xsawyerx xsawyerx closed this May 27, 2019
@xsawyerx xsawyerx deleted the fix/toman-default branch May 27, 2019 21:05
@xsawyerx xsawyerx restored the fix/toman-default branch May 27, 2019 21:07
@xsawyerx xsawyerx reopened this May 27, 2019
@briandfoy
Copy link
Owner

I've just taken over the maintenance of Pod::Perldoc, so I'll look into this. I know it's years old, and it might take me a bit to catch up.

@briandfoy briandfoy added Type: enhancement improve a feature that already exists Status: stalled something is blocking progress Priority: low get to this whenever labels Dec 5, 2023
@xsawyerx
Copy link
Collaborator Author

xsawyerx commented Dec 6, 2023

I've just taken over the maintenance of Pod::Perldoc, so I'll look into this. I know it's years old, and it might take me a bit to catch up.

My "to-do list" includes this ticket, which I really want to address it. Do you want to have a quick chat about it?

@xsawyerx
Copy link
Collaborator Author

xsawyerx commented Dec 14, 2023

Here is a summary list of things that must be reviewed before this PR is merged.

  • Detection of less(1) in the pager list

You use a substring search for the character sequence "less", but that may be misleading because it could appear as substring in a $ENV{PAGER} that is for a different pager.

I considered it "fine enough," using the detection from adding -R to the pager, but I want to revisit this.

Zefram's comment on this being similar to the -R:

That logic (in its current form) has rather different requirements. It's not adding -R to the command line, but to $ENV{LESS}, so a false positive doesn't matter there. A pager other than less(1) will just ignore the environment variable. But a false positive is a big problem for the new logic.

  • Detection of less(1) version with --version

The use of $less_bin --version is dubious, because, per ->pagers_guessing, $less_bin may contain shell redirection characters, such that "--version" wouldn't necessarily function as a command-line argument to less(1). You may need to parse pager strings in more detail.

I need to find the execution file and then call only that with --version, which would remove any redirection characters which might have appeared.

  • Check we are using the pager we think we are

Where the acceptability of the formatter depends on a sufficiently recent less(1), you need to ensure that that is the pager actually used, which you are not doing. All you are doing is detecting that a sufficiently recent less(1) is somewhere in the list of candidate pagers. You probably want to filter the pager list to contain only acceptable versions of less(1).

Reducing it to "which are available" (by going through them and calling system) and then picking the first one as less makes more sense.

  • Respecting %ENV pager vars to fully override

(Determined below to keep pelrdoc's behavior different than man et. al.)

However, where the first pager setting comes from $ENV{PAGER} or $ENV{PERLDOC_PAGER} it would be rude to dishonour that. An environmental pager setting, where supplied, should be the only candidate pager, such that you will pass up the opportunity to use ToTerm rather than use a different pager.

I can change the guessing function to explicitly override whatever pager is available by those configurations and then, as a secondary measure, try to reduce the list to existing pagers, and if the first one is less, continue with the logic.

Zefram was okay with it (or at least agreed it makes sense).

  • Reliably passing -R to the pager

When you are depending on -R, you also need to actually pass -R to the pager, which you're not reliably doing. You haven't changed the logic for this, which sets $ENV{LESS} if it wasn't set but doesn't do anything to pass -R if $ENV{LESS} was set. If you're parsing shell code in pager settings, it's probably best to add -R there and leave the environment untouched.

I had left it as is since ToTerm sets it in its own formatter. I can move this out of the way, but this would make it a particular ToTerm change that appears outside it. Although, your suggestion would be safer since it would be parsing the pager string and make sure it's in the right place. Not sure which is best.

Zefram's feedback on this being part of current ToTerm logic:

ToTerm is doing a bad job. It's not strictly necessary to remove its inadequate "-R" logic to replace with the correct stuff, but it is necessary to implement correct "-R" addition. The existing crap logic would be superfluous but harmless if it remained.

  • ANSI escape sequence support in ToTerm

To make ToTerm acceptable, you need to also check that the terminal is of a type that accepts the ANSI escape sequences. The existing logic will avoid trying ToTerm if $ENV{TERM} has certain values indicating the lack of cursor addressability, but that's not sufficient, in two respects.

Firstly, an addressable cursor does not in the slightest imply the use of ANSI escape sequences. And secondly, the check needs to be the opposite way round, because the consequences of sending unrecognised escape sequences are much worse than the consequences of trying to page on a dumb terminal.

So you need to check that $ENV{TERM} is set to a type that you specifically know accepts these sequences. The default
behaviour, for a terminal type that is not specifically recognised, has to be to conservatively not send the dubious sequences.

We need to make sure less can understand escape codes as escape codes and that the terminal supports them and can display them properly.

From Zefram:

Yes. This is a rather strange requirement, but comes from the strange behaviour of the -R option. less(1) isn't interpreting the escape codes itself and then rendering a fancy text image to the actual terminal, which is what it does with backspace overstriking and which it could readily do for escape sequences. Instead it only understands how the escape sequences work syntactically, and passes the escape sequences through to the terminal without any knowledge of how the terminal will interpret them. So the burden is rather unpleasantly placed on the user to ensure that the terminal is suitable before supplying "-R" to less(1).

We have a solution for less. For the terminal, we will need to create a list of known terminals that support escape codes and check ENV{TERM} for them. If we do not know your terminal, we cannot be sure it supports escape codes, and thus move to ToText as a fallback.

If so, this means the logic is:

  • If you have an old version of groff

  • And your less is old, or

  • Your terminal is unknown (meaning it's implicitly known to not support escape codes), then:
    -> You get ToText.

  • If you have an old version of groff

  • And your less is new enough

  • And we know your terminal (and implicitly know it to support escape codes), then:
    -> You get ToTerm.

  • If you have a new enough version of groff, then:
    -> You get ToMan

Zefram approves this.

@briandfoy
Copy link
Owner

Okay, let's see how this works out.

Much of the future improvements require making sure we get the right
pager. To be able to do make the adjustments successfully, we need to
set a baseline with this test.

It tests environment variables, "-m" option, and OS options.

It already exposes a few options that might be worth creating tickets
for.
We are trying to determine the formatter class (ToTerm, ToMan, etc.)
during `init()` but we only do the pager guessing (to figure out
which pagers we have) much later when we want to run the paging.

Moving the page guessing to the init point allows us to use the
pager availability and version to determine the formatter.

We also need to improve the detection of `less(1)` version and to
do that, we need to collect all possible pagers first, which is
done in `pagers_guessing()`.
This change checks that the code for detecting the `less` version
without getting confused by arguments or redirections.

It tests every OS setup available, and checks whether it receives
the correct binary (after cleaning up redirection and arguments),
as well as the index of that binary, just so we know it's not the
same binary each time (if "less" is repeated, for example).

This regex has another character ("\s") that the original code
doesn't, to prove that it's correct to add it.

I also temporarily added a redirect to one of the "less" entries,
and it showed that the redirection regex works (though it broke
other tests, so I removed it before committing).
The `less` binary might include arguments (for example, we support
`/usr/bin/less -isrR` for Cygwin), so it's better to clean up the
arguments before calling `--version`.

Arguably, this isn't a bug, but it's better to separate the arguments
from the `--version` call.
@xsawyerx
Copy link
Collaborator Author

xsawyerx commented Dec 17, 2023

Changes:

  • Detect pagers list before determining the formatters. This will help with making finer-grained decisions on the formatters based on the pagers.
  • Improved the version checking regex. It's slightly better, even if we don't have a known case where it would break.
  • I added a LOT of tests for both checking how we determine pagers and the version checking.
* Detection of `less(1)` version with `--version`

The use of $less_bin --version is dubious, because, per ->pagers_guessing, $less_bin may contain shell redirection characters, such that "--version" wouldn't necessarily function as a command-line argument to less(1). You may need to parse pager strings in more detail.

I need to find the execution file and then call only that with --version, which would remove any redirection characters which might have appeared.

Considering we already clean up the less version now (verified by a test), I don't think this is an issue any longer.

The idea of finding the execution file is not likely, considering we're trying to support multiple operating systems and don't know which one supports which. MSWin32 has support for it, but only from certain versions, for example.

lib/Pod/Perldoc.pm Outdated Show resolved Hide resolved
@briandfoy
Copy link
Owner

Is this something that's ready to merge or are you planning more work?

@briandfoy
Copy link
Owner

briandfoy commented Dec 17, 2023

I'm about to go off grid until the second week of January, but I've made Sawyer a co-maintainer in PAUSE and he's a collaborator on this repo. Feel free to make appropriate decisions in my absence.

I suggest releasing a dev release if you want people to try it (but they can also just install from the repo). If you tag the repo (see the latest release- tag. When you push to this repo, there's a GitHub workflow that will see that tag and make a release for GitHub packages (but not CPAN).

Whatever happens can be fixed if it goes wrong. Be bold. We only really need to be bulletproof when we ask perl5-porters to include it in perl release.

@briandfoy briandfoy added Priority: high work on this first Status: needs testing the fix needs to be tested and removed Priority: low get to this whenever Status: stalled something is blocking progress labels Dec 17, 2023
@xsawyerx
Copy link
Collaborator Author

Is this something that's ready to merge or are you planning more work?

I've only addressed one issue so far. There are a few other, so definitely planning more work before we can merge.

I originally copied it over for convenience. It should've have been
committed and it's not being called.
As Zefram pointed out in a ticket, the `/less/` regexp match here doesn't
mean we didn't get a different pager with the string "less" in its name.

By checking that we received a version, we're also checking a regexp
match on the version call to return the result for "less" pager
(because the response to `--version` includes "less VERSION_NUMBER").

This means that now pagers that have the string "less" in their name
would not be tripped by it (unless they answer to `--version` with
"less VERSION_NUMBER") and `less` binaries that have the string "less"
but not only "less" would still work (like "lessng" if one would exist).

Arguably, we might want to try the `--version` check on all pagers
instead of ones that match `/less/` in their name, but... I'm not sure
we should.
@xsawyerx
Copy link
Collaborator Author

We now detect less more accurately before adding -R to it.

Copy link
Owner

@briandfoy briandfoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that less version match need to be anchored?

@xsawyerx
Copy link
Collaborator Author

Does that less version match need to be anchored?

We could, but I think we should be more permissive on this front.

We're also not being very strict on the binary. We check for any binary with the word "less" in it. We're okay with someone having a binary called lessng or myfancylessstyledmore as long as we can understand it as less (which still remains to be figured out).

@xsawyerx
Copy link
Collaborator Author

xsawyerx commented Dec 28, 2023

On:

Respecting %ENV pager vars to fully override

While man does fully override the pager based on what MAN_PAGER= says, we specifically documented the following behavior:

perldoc will use, in order of preference, the pager defined in PERLDOC_PAGER, MANPAGER, or PAGER before trying to find a pager on its own. (MANPAGER is not used if perldoc was told to display plain text or unformatted pod.)

When using perldoc in it's -m mode (display module source code), perldoc will attempt to use the pager set in PERLDOC_SRC_PAGER. A useful setting for this command is your favorite editor as in /usr/bin/nano. (Don't judge me.)

Note the words "[...] before trying to find a pager on its own."

This means we explicitly state that we will continue to try one of our own if we fail, unlike man which would do as Zefram suggests - only using the provided pager using the environment variable (MAN_PAGER).

I think we need to decide here between breaking our expected behavior so far and whatever expected behavior someone has if they assumed we had the same logic as man.

While I was in favor of the former at first, I'm now in favor of the latter. I.e., keep it as is.

@briandfoy
Copy link
Owner

Since perldoc isn't man, we've been doing it differently for at least a decade, and it's documented to do that, I say we stick with what we are doing now to find a pager.

@briandfoy
Copy link
Owner

Is there anything more this needs? Can we let people play with it to get feedback?

@briandfoy briandfoy added the Status: stalled something is blocking progress label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: high work on this first Status: needs testing the fix needs to be tested Status: stalled something is blocking progress Type: enhancement improve a feature that already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants