Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

Commit 89e50e6

Browse files
committed
Rework iso9660 view action
- use xorriso -> isoinfo -> 7z fallback chain - ignore the Joliet tree with isoinfo - improve error reporting - dev notes: src/vfs/extfs/helpers/README.iso9660
1 parent 9b2faec commit 89e50e6

File tree

2 files changed

+230
-5
lines changed

2 files changed

+230
-5
lines changed

misc/ext.d/misc.sh.in

+32-5
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,38 @@ do_view_action() {
1313

1414
case "${filetype}" in
1515
iso9660)
16-
if which isoinfo > /dev/null 2>&1; then
17-
isoinfo -d -i "${MC_EXT_FILENAME}" && isoinfo -l -R -J -i "${MC_EXT_FILENAME}"
18-
else
19-
7za l "${MC_EXT_FILENAME}"
20-
fi
16+
# Contrary to isoinfo, xorriso is happy with pretty any file, even a
17+
# dir. Let's check if it is some readable iso 9660 image indeed.
18+
iso=y
19+
file -b -- "${MC_EXT_FILENAME}" 2>&1 | grep -q 9660 || iso=n
20+
21+
if [ "$iso" = y ]; then
22+
if command -v xorriso >/dev/null; then
23+
# 2>&1 is important here since xorriso_main.c:yell_xorriso() always
24+
# prints a header like "xorriso 1.5.4 : RockRidge filesystem
25+
# manipulator, libburnia project." to stderr
26+
xorriso -report_about WARNING -dev "${MC_EXT_FILENAME}" -toc -print '' -find / -exec lsdl 2>&1
27+
elif command -v isoinfo >/dev/null; then
28+
# Joliet support in isoinfo is inadequate. It only works well
29+
# for latin characters. It can't convert non-latin filenames to
30+
# utf-8. `isoinfo -R` means "Rock Ridge with ECMA-119 fallback",
31+
# here we ignore the Joliet tree.
32+
# More details: src/vfs/extfs/helpers/README.iso9660
33+
isoinfo -d -i "${MC_EXT_FILENAME}" && isoinfo -l -R -i "${MC_EXT_FILENAME}"
34+
elif _7z=$(command -v 7zz || command -v 7z); then
35+
# 7z prefers Joliet over Rock Ridge. When there is only Rock
36+
# Ridge present, p7zip version 16.02 (shipped with some distros)
37+
# incorrectly converts non-latin filenames.
38+
# More details: src/vfs/extfs/helpers/README.iso9660
39+
"$_7z" l -- "${MC_EXT_FILENAME}"
40+
else
41+
echo 'Neither of these tools is available: xorriso, isoinfo, 7z' >&2
42+
false
43+
fi
44+
else
45+
echo 'It does not look like a file of ISO 9660 format' >&2
46+
false
47+
fi
2148
;;
2249
cat)
2350
cat "${MC_EXT_FILENAME}" 2>/dev/null

src/vfs/extfs/helpers/README.iso9660

+198
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
Notes on isoinfo
2+
================
3+
4+
Below we'll use such sample Rock Ridge+Joliet `utf8-rj.iso` image (the
5+
effective locale is en_US.UTF-8):
6+
7+
mkdir utf8
8+
for x in latin cyrillic-{абв,а,б,в}; do echo "contents of $x.txt" > utf8/"$x".txt; done
9+
xorriso -joliet on -as mkisofs -r -o utf8-rj.iso utf8
10+
11+
Rock Ridge doesnt feature a "charset" concept for filenames. By default iso9660
12+
tools print the names as-is and it is not a big problem these days, since most
13+
likely the names are utf-8 encoded and the terminals are utf-8 as well. xorriso
14+
since 2009 supports `-auto_charset` option to save/load the charset from the
15+
`isofs.cs` xattr on the root dir. It is likely a xorriso-only thing. Also,
16+
there is `-in_charset` option to set the source charset when opening an
17+
existing iso.
18+
19+
isoinfo is a simple tool, it always prints RR names raw, which is fine:
20+
21+
> isoinfo -i utf8-rj.iso -l -R
22+
23+
Directory listing of /
24+
dr-xr-xr-x 1 0 0 2048 May 29 2024 [ 19 02] .
25+
dr-xr-xr-x 1 0 0 2048 May 29 2024 [ 19 02] ..
26+
-r--r--r-- 1 0 0 28 May 29 2024 [ 33 00] cyrillic-а.txt
27+
-r--r--r-- 1 0 0 32 May 29 2024 [ 34 00] cyrillic-абв.txt
28+
-r--r--r-- 1 0 0 28 May 29 2024 [ 35 00] cyrillic-б.txt
29+
-r--r--r-- 1 0 0 28 May 29 2024 [ 36 00] cyrillic-в.txt
30+
-r--r--r-- 1 0 0 22 May 29 2024 [ 37 00] latin.txt
31+
32+
Joliet filenames are UCS-2 encoded, it is the standard. When iso9660 tools
33+
create images, they convert from whatever input charset is to UCS-2. When they
34+
list some image's content, they convert from UCS-2 to the local charset. It
35+
sounds much better than the RR case, but there is a problem: isoinfo cant
36+
convert to utf-8. It can only convert to a selection of 1-byte charsets, the
37+
conversion tables are under `cdrkit-1.1.11/libunls/`. Among the tables there is
38+
the almighty `nls_iconv.c`, but it is only used by mkisofs. When isoinfo cant
39+
convert some char in a Joliet name to the current charset, it uses an
40+
underscore instead:
41+
42+
> isoinfo -i utf8-rj.iso -l -J
43+
44+
Directory listing of /
45+
d--------- 0 0 0 2048 May 29 2024 [ 23 02] .
46+
d--------- 0 0 0 2048 May 29 2024 [ 23 02] ..
47+
---------- 0 0 0 28 May 29 2024 [ 33 00] cyrillic-_.txt
48+
---------- 0 0 0 32 May 29 2024 [ 34 00] cyrillic-___.txt
49+
---------- 0 0 0 28 May 29 2024 [ 35 00] cyrillic-_.txt
50+
---------- 0 0 0 28 May 29 2024 [ 36 00] cyrillic-_.txt
51+
---------- 0 0 0 22 May 29 2024 [ 37 00] latin.txt
52+
53+
Underscored names can be used to extract files:
54+
55+
> isoinfo -i utf8-rj.iso -J -x /cyrillic-___.txt
56+
contents of cyrillic-абв.txt
57+
58+
Notice, in the listing above there are three files named `cyrillic-_.txt`.
59+
Let's try to extract that name:
60+
61+
> isoinfo -i utf8-rj.iso -J -x /cyrillic-_.txt
62+
contents of cyrillic-а.txt
63+
contents of cyrillic-б.txt
64+
contents of cyrillic-в.txt
65+
66+
It printed contents of ALL three files.
67+
68+
It is possible to produce the correct listing with isoinfo:
69+
70+
> isoinfo -i utf8-rj.iso -l -J -j cp1251 | iconv -f cp1251
71+
72+
Directory listing of /
73+
d--------- 0 0 0 2048 May 29 2024 [ 23 02] .
74+
d--------- 0 0 0 2048 May 29 2024 [ 23 02] ..
75+
---------- 0 0 0 28 May 29 2024 [ 33 00] cyrillic-а.txt
76+
---------- 0 0 0 32 May 29 2024 [ 34 00] cyrillic-абв.txt
77+
---------- 0 0 0 28 May 29 2024 [ 35 00] cyrillic-б.txt
78+
---------- 0 0 0 28 May 29 2024 [ 36 00] cyrillic-в.txt
79+
---------- 0 0 0 22 May 29 2024 [ 37 00] latin.txt
80+
81+
but it only works because we know ahead symbols used in the filenames can be
82+
converted to cp1251 without issues. This trick can be used with extraction as
83+
well:
84+
85+
> isoinfo -i utf8-rj.iso -J -j cp1251 -x /"$(echo cyrillic-б.txt | iconv -t cp1251)"
86+
contents of cyrillic-б.txt
87+
88+
To summarize, Joliet support in isoinfo is inadequate. It only works well for
89+
latin characters. It cant convert non-latin filenames to utf-8, which is a must
90+
these days. For the best results, use `isoinfo -R`, which stands for "Rock
91+
Ridge with ECMA-119 fallback".
92+
93+
Notice: `-J` option makes isoinfo only use the Joliet tree (or throw an error
94+
if there is none), no matter the other options. So `isoinfo -J -R` is literally
95+
`isoinfo -J`.
96+
97+
98+
Notes on 7-zip
99+
==============
100+
101+
Below we'll use such sample Rock Ridge+Joliet `utf8-rj.iso` and Rock Ridge only
102+
`utf8-r.iso` images (the effective locale is en_US.UTF-8):
103+
104+
mkdir utf8
105+
for x in latin cyrillic-абв; do echo "contents of $x.txt" > utf8/"$x".txt; done
106+
xorriso -joliet on -as mkisofs -r -o utf8-rj.iso utf8
107+
xorriso -as mkisofs -r -o utf8-r.iso utf8
108+
109+
Notice: speaking about iso9660 support in 7-zip here, hence the only binaries
110+
of interest are 7z and 7zz.
111+
112+
There are at least three widely used 7-zip flavours as of Q1 2024:
113+
114+
- p7zip 16.02, which is "the command line version of 7-Zip for Linux / Unix,
115+
made by an independent developer", quoting 7-zip.org. It is shipped with Ubuntu
116+
16.10 to 23.10. Package:p7zip-full, binary:7z
117+
118+
- p7zip fork by p7zip-project: https://github.com/p7zip-project/p7zip. It is
119+
packaged by Arch Linux. Package:p7zip, binary:7z
120+
121+
- builds from 7-zip.org sources. It appeared in Ubuntu 22.04, package:7zip,
122+
binary:7zz. Since Ubuntu 24.04, p7zip-full is a transitional package to 7zip,
123+
now 7zip provides 7z, and 7zip-standalone provides 7zz
124+
125+
7-zip prefers Joliet over Rock Ridge, there is no cli option to change that.
126+
When Joliet is present, `7z l` correctly converts filenames to the current
127+
locale from Joliet's UCS-2:
128+
129+
> 7z l utf8-rj.iso | sed -n '/^----/,/^----/p'
130+
------------------- ----- ------------ ------------ ------------------------
131+
2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt
132+
2024-05-30 15:34:22 ..... 22 22 latin.txt
133+
------------------- ----- ------------ ------------ ------------------------
134+
135+
But when there is only Rock Ridge, p7zip 16.02 assumes the filenames are
136+
encoded in some 1-byte encoding (CP_OEMCP constant in the sources) and converts
137+
it to the current locale from that. `utf8-r.iso` has RR names in utf-8, the
138+
current locale is utf-8 as well. `7z l` prints it as double utf-8 encoded:
139+
140+
> 7z l utf8-r.iso | sed -n '/^----/,/^----/p'
141+
------------------- ----- ------------ ------------ ------------------------
142+
2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt
143+
2024-05-30 15:34:22 ..... 22 22 latin.txt
144+
------------------- ----- ------------ ------------ ------------------------
145+
146+
It could be tricked to print the names raw:
147+
148+
> LC_CTYPE=C 7z l utf8-r.iso | sed -n '/^----/,/^----/p'
149+
------------------- ----- ------------ ------------ ------------------------
150+
2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt
151+
2024-05-30 15:34:22 ..... 22 22 latin.txt
152+
------------------- ----- ------------ ------------ ------------------------
153+
154+
But the same trick breaks it for Joliet images:
155+
156+
> LC_CTYPE=C 7z l utf8-rj.iso | sed -n '/^----/,/^----/p'
157+
------------------- ----- ------------ ------------ ------------------------
158+
2024-05-30 15:34:22 ..... 32 32 cyrillic-???.txt
159+
2024-05-30 15:34:22 ..... 22 22 latin.txt
160+
------------------- ----- ------------ ------------ ------------------------
161+
162+
So, to correctly list some iso with p7zip 16.02, we need to detect if it
163+
contains Joliet or RR only and apply the trick to the latter. Joliet could be
164+
detected using such shell function:
165+
166+
is_joliet() {
167+
local skip=16 mark
168+
169+
# Loop through the volume descriptor set
170+
# https://en.wikipedia.org/wiki/ISO_9660#Volume_descriptor_set
171+
while true; do
172+
mark=$(od -j$((2048*skip)) -N6 -An -tx1 <"$1" 2>/dev/null | tr -d ' ')
173+
174+
case "$mark" in
175+
??4344303031) # Type (1 byte) + CD001
176+
case "$mark" in
177+
ff*) return 1 ;; # Terminator
178+
02*) return 0 ;; # Joliet
179+
esac ;;
180+
*)
181+
return 1 ;;
182+
esac
183+
184+
skip=$((skip+1))
185+
done
186+
}
187+
188+
With that, listing could be done like this:
189+
190+
env=
191+
is_joliet "$iso" || env='LC_CTYPE=C'
192+
env $env 7z l "$iso"
193+
194+
Out of the mentioned 7-zip flavours, only p7zip 16.02 has the problem with RR
195+
names conversion. 7zz binary is a recent invention, it likely was never
196+
affected. So, when both 7z and 7zz are available, 7zz should be preferred. For
197+
example, in Ubuntu 22.04, 7z is of p7zip 16.02 kind, while 7zz is built from
198+
7-zip.org sources (version 21.07).

0 commit comments

Comments
 (0)