|
| 1 | +Notes on isoinfo |
| 2 | +================ |
| 3 | + |
| 4 | +Below we'll use such sample Rock Ridge+Joliet `utf8-rj.iso` image (the |
| 5 | +effective locale is en_US.UTF-8): |
| 6 | + |
| 7 | + mkdir utf8 |
| 8 | + for x in latin cyrillic-{абв,а,б,в}; do echo "contents of $x.txt" > utf8/"$x".txt; done |
| 9 | + xorriso -joliet on -as mkisofs -r -o utf8-rj.iso utf8 |
| 10 | + |
| 11 | +Rock Ridge doesnt feature a "charset" concept for filenames. By default iso9660 |
| 12 | +tools print the names as-is and it is not a big problem these days, since most |
| 13 | +likely the names are utf-8 encoded and the terminals are utf-8 as well. xorriso |
| 14 | +since 2009 supports `-auto_charset` option to save/load the charset from the |
| 15 | +`isofs.cs` xattr on the root dir. It is likely a xorriso-only thing. Also, |
| 16 | +there is `-in_charset` option to set the source charset when opening an |
| 17 | +existing iso. |
| 18 | + |
| 19 | +isoinfo is a simple tool, it always prints RR names raw, which is fine: |
| 20 | + |
| 21 | + > isoinfo -i utf8-rj.iso -l -R |
| 22 | + |
| 23 | + Directory listing of / |
| 24 | + dr-xr-xr-x 1 0 0 2048 May 29 2024 [ 19 02] . |
| 25 | + dr-xr-xr-x 1 0 0 2048 May 29 2024 [ 19 02] .. |
| 26 | + -r--r--r-- 1 0 0 28 May 29 2024 [ 33 00] cyrillic-а.txt |
| 27 | + -r--r--r-- 1 0 0 32 May 29 2024 [ 34 00] cyrillic-абв.txt |
| 28 | + -r--r--r-- 1 0 0 28 May 29 2024 [ 35 00] cyrillic-б.txt |
| 29 | + -r--r--r-- 1 0 0 28 May 29 2024 [ 36 00] cyrillic-в.txt |
| 30 | + -r--r--r-- 1 0 0 22 May 29 2024 [ 37 00] latin.txt |
| 31 | + |
| 32 | +Joliet filenames are UCS-2 encoded, it is the standard. When iso9660 tools |
| 33 | +create images, they convert from whatever input charset is to UCS-2. When they |
| 34 | +list some image's content, they convert from UCS-2 to the local charset. It |
| 35 | +sounds much better than the RR case, but there is a problem: isoinfo cant |
| 36 | +convert to utf-8. It can only convert to a selection of 1-byte charsets, the |
| 37 | +conversion tables are under `cdrkit-1.1.11/libunls/`. Among the tables there is |
| 38 | +the almighty `nls_iconv.c`, but it is only used by mkisofs. When isoinfo cant |
| 39 | +convert some char in a Joliet name to the current charset, it uses an |
| 40 | +underscore instead: |
| 41 | + |
| 42 | + > isoinfo -i utf8-rj.iso -l -J |
| 43 | + |
| 44 | + Directory listing of / |
| 45 | + d--------- 0 0 0 2048 May 29 2024 [ 23 02] . |
| 46 | + d--------- 0 0 0 2048 May 29 2024 [ 23 02] .. |
| 47 | + ---------- 0 0 0 28 May 29 2024 [ 33 00] cyrillic-_.txt |
| 48 | + ---------- 0 0 0 32 May 29 2024 [ 34 00] cyrillic-___.txt |
| 49 | + ---------- 0 0 0 28 May 29 2024 [ 35 00] cyrillic-_.txt |
| 50 | + ---------- 0 0 0 28 May 29 2024 [ 36 00] cyrillic-_.txt |
| 51 | + ---------- 0 0 0 22 May 29 2024 [ 37 00] latin.txt |
| 52 | + |
| 53 | +Underscored names can be used to extract files: |
| 54 | + |
| 55 | + > isoinfo -i utf8-rj.iso -J -x /cyrillic-___.txt |
| 56 | + contents of cyrillic-абв.txt |
| 57 | + |
| 58 | +Notice, in the listing above there are three files named `cyrillic-_.txt`. |
| 59 | +Let's try to extract that name: |
| 60 | + |
| 61 | + > isoinfo -i utf8-rj.iso -J -x /cyrillic-_.txt |
| 62 | + contents of cyrillic-а.txt |
| 63 | + contents of cyrillic-б.txt |
| 64 | + contents of cyrillic-в.txt |
| 65 | + |
| 66 | +It printed contents of ALL three files. |
| 67 | + |
| 68 | +It is possible to produce the correct listing with isoinfo: |
| 69 | + |
| 70 | + > isoinfo -i utf8-rj.iso -l -J -j cp1251 | iconv -f cp1251 |
| 71 | + |
| 72 | + Directory listing of / |
| 73 | + d--------- 0 0 0 2048 May 29 2024 [ 23 02] . |
| 74 | + d--------- 0 0 0 2048 May 29 2024 [ 23 02] .. |
| 75 | + ---------- 0 0 0 28 May 29 2024 [ 33 00] cyrillic-а.txt |
| 76 | + ---------- 0 0 0 32 May 29 2024 [ 34 00] cyrillic-абв.txt |
| 77 | + ---------- 0 0 0 28 May 29 2024 [ 35 00] cyrillic-б.txt |
| 78 | + ---------- 0 0 0 28 May 29 2024 [ 36 00] cyrillic-в.txt |
| 79 | + ---------- 0 0 0 22 May 29 2024 [ 37 00] latin.txt |
| 80 | + |
| 81 | +but it only works because we know ahead symbols used in the filenames can be |
| 82 | +converted to cp1251 without issues. This trick can be used with extraction as |
| 83 | +well: |
| 84 | + |
| 85 | + > isoinfo -i utf8-rj.iso -J -j cp1251 -x /"$(echo cyrillic-б.txt | iconv -t cp1251)" |
| 86 | + contents of cyrillic-б.txt |
| 87 | + |
| 88 | +To summarize, Joliet support in isoinfo is inadequate. It only works well for |
| 89 | +latin characters. It cant convert non-latin filenames to utf-8, which is a must |
| 90 | +these days. For the best results, use `isoinfo -R`, which stands for "Rock |
| 91 | +Ridge with ECMA-119 fallback". |
| 92 | + |
| 93 | +Notice: `-J` option makes isoinfo only use the Joliet tree (or throw an error |
| 94 | +if there is none), no matter the other options. So `isoinfo -J -R` is literally |
| 95 | +`isoinfo -J`. |
| 96 | + |
| 97 | + |
| 98 | +Notes on 7-zip |
| 99 | +============== |
| 100 | + |
| 101 | +Below we'll use such sample Rock Ridge+Joliet `utf8-rj.iso` and Rock Ridge only |
| 102 | +`utf8-r.iso` images (the effective locale is en_US.UTF-8): |
| 103 | + |
| 104 | + mkdir utf8 |
| 105 | + for x in latin cyrillic-абв; do echo "contents of $x.txt" > utf8/"$x".txt; done |
| 106 | + xorriso -joliet on -as mkisofs -r -o utf8-rj.iso utf8 |
| 107 | + xorriso -as mkisofs -r -o utf8-r.iso utf8 |
| 108 | + |
| 109 | +Notice: speaking about iso9660 support in 7-zip here, hence the only binaries |
| 110 | +of interest are 7z and 7zz. |
| 111 | + |
| 112 | +There are at least three widely used 7-zip flavours as of Q1 2024: |
| 113 | + |
| 114 | +- p7zip 16.02, which is "the command line version of 7-Zip for Linux / Unix, |
| 115 | +made by an independent developer", quoting 7-zip.org. It is shipped with Ubuntu |
| 116 | +16.10 to 23.10. Package:p7zip-full, binary:7z |
| 117 | + |
| 118 | +- p7zip fork by p7zip-project: https://github.com/p7zip-project/p7zip. It is |
| 119 | +packaged by Arch Linux. Package:p7zip, binary:7z |
| 120 | + |
| 121 | +- builds from 7-zip.org sources. It appeared in Ubuntu 22.04, package:7zip, |
| 122 | +binary:7zz. Since Ubuntu 24.04, p7zip-full is a transitional package to 7zip, |
| 123 | +now 7zip provides 7z, and 7zip-standalone provides 7zz |
| 124 | + |
| 125 | +7-zip prefers Joliet over Rock Ridge, there is no cli option to change that. |
| 126 | +When Joliet is present, `7z l` correctly converts filenames to the current |
| 127 | +locale from Joliet's UCS-2: |
| 128 | + |
| 129 | + > 7z l utf8-rj.iso | sed -n '/^----/,/^----/p' |
| 130 | + ------------------- ----- ------------ ------------ ------------------------ |
| 131 | + 2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt |
| 132 | + 2024-05-30 15:34:22 ..... 22 22 latin.txt |
| 133 | + ------------------- ----- ------------ ------------ ------------------------ |
| 134 | + |
| 135 | +But when there is only Rock Ridge, p7zip 16.02 assumes the filenames are |
| 136 | +encoded in some 1-byte encoding (CP_OEMCP constant in the sources) and converts |
| 137 | +it to the current locale from that. `utf8-r.iso` has RR names in utf-8, the |
| 138 | +current locale is utf-8 as well. `7z l` prints it as double utf-8 encoded: |
| 139 | + |
| 140 | + > 7z l utf8-r.iso | sed -n '/^----/,/^----/p' |
| 141 | + ------------------- ----- ------------ ------------ ------------------------ |
| 142 | + 2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt |
| 143 | + 2024-05-30 15:34:22 ..... 22 22 latin.txt |
| 144 | + ------------------- ----- ------------ ------------ ------------------------ |
| 145 | + |
| 146 | +It could be tricked to print the names raw: |
| 147 | + |
| 148 | + > LC_CTYPE=C 7z l utf8-r.iso | sed -n '/^----/,/^----/p' |
| 149 | + ------------------- ----- ------------ ------------ ------------------------ |
| 150 | + 2024-05-30 15:34:22 ..... 32 32 cyrillic-абв.txt |
| 151 | + 2024-05-30 15:34:22 ..... 22 22 latin.txt |
| 152 | + ------------------- ----- ------------ ------------ ------------------------ |
| 153 | + |
| 154 | +But the same trick breaks it for Joliet images: |
| 155 | + |
| 156 | + > LC_CTYPE=C 7z l utf8-rj.iso | sed -n '/^----/,/^----/p' |
| 157 | + ------------------- ----- ------------ ------------ ------------------------ |
| 158 | + 2024-05-30 15:34:22 ..... 32 32 cyrillic-???.txt |
| 159 | + 2024-05-30 15:34:22 ..... 22 22 latin.txt |
| 160 | + ------------------- ----- ------------ ------------ ------------------------ |
| 161 | + |
| 162 | +So, to correctly list some iso with p7zip 16.02, we need to detect if it |
| 163 | +contains Joliet or RR only and apply the trick to the latter. Joliet could be |
| 164 | +detected using such shell function: |
| 165 | + |
| 166 | + is_joliet() { |
| 167 | + local skip=16 mark |
| 168 | + |
| 169 | + # Loop through the volume descriptor set |
| 170 | + # https://en.wikipedia.org/wiki/ISO_9660#Volume_descriptor_set |
| 171 | + while true; do |
| 172 | + mark=$(od -j$((2048*skip)) -N6 -An -tx1 <"$1" 2>/dev/null | tr -d ' ') |
| 173 | + |
| 174 | + case "$mark" in |
| 175 | + ??4344303031) # Type (1 byte) + CD001 |
| 176 | + case "$mark" in |
| 177 | + ff*) return 1 ;; # Terminator |
| 178 | + 02*) return 0 ;; # Joliet |
| 179 | + esac ;; |
| 180 | + *) |
| 181 | + return 1 ;; |
| 182 | + esac |
| 183 | + |
| 184 | + skip=$((skip+1)) |
| 185 | + done |
| 186 | + } |
| 187 | + |
| 188 | +With that, listing could be done like this: |
| 189 | + |
| 190 | + env= |
| 191 | + is_joliet "$iso" || env='LC_CTYPE=C' |
| 192 | + env $env 7z l "$iso" |
| 193 | + |
| 194 | +Out of the mentioned 7-zip flavours, only p7zip 16.02 has the problem with RR |
| 195 | +names conversion. 7zz binary is a recent invention, it likely was never |
| 196 | +affected. So, when both 7z and 7zz are available, 7zz should be preferred. For |
| 197 | +example, in Ubuntu 22.04, 7z is of p7zip 16.02 kind, while 7zz is built from |
| 198 | +7-zip.org sources (version 21.07). |
0 commit comments