Skip to content

Commit

Permalink
Dollar-single-quote
Browse files Browse the repository at this point in the history
  • Loading branch information
magicant committed Dec 10, 2024
1 parent 8307b46 commit 9b2d911
Show file tree
Hide file tree
Showing 11 changed files with 271 additions and 16 deletions.
1 change: 1 addition & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- The non-standard terminators `;|` and `;;&` are also supported
to resume pattern matching with the next item unless in the
POSIXly-correct mode.
- Dollar-single-quotes are now supported.
- After the `bg` built-in resumed a job, the `!` special parameter
expands to the process ID of the job.
- An interactive shell no longer exits on an error in the `exec`
Expand Down
1 change: 1 addition & 0 deletions NEWS.ja
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
分岐も実行させることができるようになった
- 非標準の拡張として `;|` もしくは `;;&` で区切ることで次の分岐
からパターンマッチングを再開させることもできる
- ドル一重引用符に対応した
- `bg` 組込みでジョブを再開した後は `!` 特殊パラメータはジョブの
プロセス ID に展開されるようになった
- POSIX 準拠モードであっても、対話シェルが `exec` 組込みで失敗した
Expand Down
5 changes: 5 additions & 0 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ share/initialization/sample) ファイルを参考に自分用の `~/.yashrc`
不定です。この挙動は厳密には POSIX に従っていませんが、POSIX
ロケールでのバイト単位再比較がワイド文字では不可能なためこのように
なっています。
* プログラム全体に亘って文字列がワイド文字として扱われているため、yash
のほとんどの機能は有効な文字を表さないバイト列を透過的に扱うことが
できません。この設計は、シェルに任意のバイトを受け付けることを求める
要件が POSIX に追加される前に行われたもので、今後この要件をサポート
することは現実的ではありません。


## 既知の問題
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,11 @@ system.
total ordering of characters, order of uncomparable results are
unstable. This limitation is not strictly POSIX-compliant, but
inevitable due to use of wide characters in the whole shell.
* Most part of the shell cannot handle bytes that do not represent
valid characters, because string operations are written in terms of
wide character strings. This design choice was made before POSIX
added requirements for the shell to accept arbitrary bytes in some
operations, and it is too late to fully implement them.


## Known Issues
Expand Down
26 changes: 25 additions & 1 deletion doc/ja/syntax.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,33 @@
* バックスラッシュ (+\+) は直後の一文字をクォートします。
+
例外として、バックスラッシュの直後に改行がある場合、それは改行をクォートしているのではなく、dfn:[行の連結]と見なされます。バックスラッシュと改行が削除され、バックスラッシュがあった行とその次の行が元々一つの行であったかのように扱われます。
* 二つの一重引用符 (+'+) で囲んだ部分では、全ての文字は通常の文字と同じように扱われます。改行を一重引用符で囲むこともできます。ただし、一重引用符を一重引用符で囲むことはできません
* 二つの一重引用符 (+'+) で囲んだ部分では、全ての文字は通常の文字と同じように扱われます。改行を一重引用符で囲むこともできます。通常は一重引用符を一重引用符で囲むことはできませんが、<<dollar-single,ドル一重引用符>>内でエスケープすれば一重引用符を一重引用符でクォートできます
* 二つの二重引用符 (+"+) で囲んだ部分も一重引用符で囲んだ部分と同様にクォートされますが、いくつか例外があります。二重引用符で囲んだ部分では、パラメータ展開・コマンド置換・数式展開が通常通り解釈されます。またバックスラッシュは +$+, +`+, +"+, +\+ の直前にある場合および行の連結を行う場合にのみ引用符として扱われ、それ以外のバックスラッシュは通常の文字と同様に扱われます。

[[dollar-single]]
=== ドル一重引用符

一重引用符で囲まれた部分の直前に +$+ が付いている場合、その +$&#x27;+ と +&#x27;+ で囲まれた間の部分では以下のエスケープ記法が認識されます。それ以外の点では二つの +&#x27;+ で囲まれた通常のクォートと同様です。

+\"+, +\'+, +\\+::
それぞれ +"+, +&#x27;+, +\+ そのものを表します。
このようにエスケープされた +&#x27;+ はドル一重引用符の終わりとはみなされません。
+\a+, +\b+, +\e+, +\f+, +\n+, +\r+, +\t+, +\v+::
順に、アラート (ベル)・バックスペース・エスケープ・フォームフィード・改行・キャリッジリターン・水平タブ・垂直タブの文字を表します。
+\c+::
直後にある文字に対応する制御文字を表します。例えば +\cA+ は Ctrl-A によって入力される SOH 制御文字を表します。この記法ではヌル文字 (Ctrl-@, +\c@+) はサポートされません。また FS 制御文字 (Ctrl-\) を表すにはバックスラッシュ自体もエスケープする必要があります (+\c\+ ではなく +\c\\+ のように)。
+\x+::
直後に続く 1 桁または 2 桁の十六進数によって指定された値の文字を表します。
例えば +\x20+ は十六進法で 20 の値を持つ文字に変換されます。
+\0+, +\1+, +\2+, ..., +\377+::
直後に続く 1~3 桁の八進数によって指定された値の文字を表します。
値は 8 ビット以内に収まる必要があります。

上記以外のバックスラッシュ記法はエラーとみなされ、当該部分の文字は +?+ に置換されます。

[NOTE]
十六進数または八進数のエスケープ記法で表される値は、ワイド文字のコードポイントとして解釈されます。この動作は、値を生のバイトとして扱うことを要求する POSIX には準拠していません。

[[aliases]]
== エイリアス

Expand Down
47 changes: 44 additions & 3 deletions doc/syntax.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,16 +61,57 @@ quotation marks:
dfn:[line continuation] rather than a newline being quoted. The two
characters are removed from the input and the two lines surrounding the line
continuation are concatenated into a single line.
* A pair of single-quotation marks (+'+) quote any characters between them
except another single-quotation. Note that newlines can be quoted using
single-quotations.
* A pair of single-quotation marks (+'+) quote any characters between them.
Note that newlines can be quoted using single-quotations.
Single-quotations cannot quote a single-quotation mark itself unless the
mark is escaped in a <<dollar-single,dollar-single-quoted>> string.
* Double-quotation marks (+"+) are like single-quotations, but they have a few
exceptions: Parameter expansion, command substitution, and arithmetic
expansion are interpreted as usual even between double-quotations. A
backslash between double-quotations is treated as a quotation mark only when
it is followed by +$+, +`+, +"+, +\+, or a newline; other backslashes are
treated as normal characters.

[[dollar-single]]
=== Dollar-single-quotes

Dollar-single-quotes are a special form of quotation introduced by +$&#x27;+
and terminated by +&#x27;+. This is similar to normal single-quotation marks,
but the following backslash escapes are recognized in the quoted string:

+\"+, +\'+, +\\+::
Represents literal +"+, +&#x27;+, and +\+, respectively. Note that a
single-quotation mark escaped by a backslash does not terminate the
dollar-single-quoted string.
+\a+, +\b+, +\e+, +\f+, +\n+, +\r+, +\t+, +\v+::
Represents the alert (bell), backspace, escape, form feed, newline, carriage
return, horizontal tab, and vertical tab characters, respectively.
+\c+::
Represents the control character corresponding to a character that follows.
For example, +\cA+ represents the SOH character, which can be entered by
typing Ctrl-A.
The null character (Ctrl-@, +\c@+) is not supported.
To specify the FS character (Ctrl-\), the following backslash must also be
escaped as in +\c\\+ rather than +\c\+.
+\x+::
Represents the character whose value is specified by hexadecimal digits that
follow. For example, +\x20+ represents the character that has the value 20
in hexadecimal. There must be at least one digit that follows +\x+ and at
most two digits are recognized.
+\0+, +\1+, +\2+, ..., +\377+::
Represents the character whose value is specified by octal digits. There
must be at least one digit and at most three digits are recognized. The
value must fit in 8 bits.

Any other use of backslashes is considered as an error and converted to the
+?+ character.

[NOTE]
The value represented by a hexadecimal or octal digit escape is interpreted as
a wide character codepoint.
This behavior does not conform to POSIX, which requires that the value be
treated as a raw byte.

[[aliases]]
== Aliases

Expand Down
134 changes: 133 additions & 1 deletion expand.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* Yash: yet another shell */
/* expand.c: word expansion */
/* (C) 2007-2021 magicant */
/* (C) 2007-2024 magicant */

/* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -140,6 +140,8 @@ static void add_empty_field(plist_T *dest, const wchar_t *p)
static inline void add_sq(
const wchar_t *restrict *ss, xwcsbuf_T *restrict buf, bool escape)
__attribute__((nonnull));
static wchar_t *interpret_dsq(const wchar_t *restrict *ss)
__attribute__((nonnull,malloc,warn_unused_result));
static inline bool should_escape(charcategory_T cc, escaping_T escaping)
__attribute__((const));
static wchar_t *quote_removal_free(
Expand Down Expand Up @@ -472,6 +474,23 @@ struct expand_four_T expand_four(const wordunit_T *restrict w,
assert(*ss == L'\'');
fill_ccbuf(&valuebuf, &ccbuf, defaultcc | CC_QUOTED);

wb_wccat(&valuebuf, L'\'');
sb_ccat(&ccbuf, defaultcc | CC_QUOTATION);
break;
case L'$':
// Check for dollar-single-quotes.
if (quoting != Q_WORD || indq || ss[1] != '\'')
goto default_;

wb_wccat(&valuebuf, L'$');
wb_wccat(&valuebuf, L'\'');
sb_ccat(&ccbuf, defaultcc | CC_QUOTATION);
sb_ccat(&ccbuf, defaultcc | CC_QUOTATION);

wb_catfree(&valuebuf, interpret_dsq(&ss));
assert(*ss == L'\'');
fill_ccbuf(&valuebuf, &ccbuf, defaultcc | CC_QUOTED);

wb_wccat(&valuebuf, L'\'');
sb_ccat(&ccbuf, defaultcc | CC_QUOTATION);
break;
Expand Down Expand Up @@ -1702,6 +1721,119 @@ void add_sq(const wchar_t *restrict *ss, xwcsbuf_T *restrict buf, bool escape)
}
}

/* Expands the content of a dollar-single-quoted string to a newly-malloced
* string.
* `ss' is a pointer to a pointer to the string to be expanded. Initially,
* `(*ss)[0]' and `(*ss)[1]' must be '$' and '\'', respectively, which are
* skipped in this function. The following characters in the string are
* accumulated in the string to be returned, interpreting any backslash escapes
* encountered, until a closing '\'' is found (or the end of the string is
* reached). When this function returns, `*ss' is updated so that `**ss' is the
* closing quote or terminating null character. */
/* If an escape contained in `*ss' produces a null character, the rest of the
* string is ignored by the caller. This is one of the behaviors allowed by
* POSIX. This function could have been designed to take an `xwcsbuf_T *'
* argument and append the result directly to it, but that would not be
* compliant because the null character would also hide the characters after the
* closing quote. */
wchar_t *interpret_dsq(const wchar_t *restrict *ss)
{
xwcsbuf_T buf;
wb_init(&buf);

const wchar_t *s = *ss;
assert(s[0] == '$');
assert(s[1] == '\'');
s += 2;

for (;;) {
switch (*s) {
case L'\0':
case L'\'':
*ss = s;
return wb_towcs(&buf);
case L'\\':
s++;
switch (*s) {
case L'"':
case L'\'':
case L'\\':
wb_wccat(&buf, *s);
s++;
break;
case L'a': wb_wccat(&buf, L'\a'); s++; break;
case L'b': wb_wccat(&buf, L'\b'); s++; break;
case L'e': wb_wccat(&buf, L'\033'); s++; break;
case L'f': wb_wccat(&buf, L'\f'); s++; break;
case L'n': wb_wccat(&buf, L'\n'); s++; break;
case L'r': wb_wccat(&buf, L'\r'); s++; break;
case L't': wb_wccat(&buf, L'\t'); s++; break;
case L'v': wb_wccat(&buf, L'\v'); s++; break;
case L'c':
s++;
wchar_t c;
if (*s == L'\\') {
s++;
if (*s == L'\\')
c = L'\\' ^ 0x40;
else // Oops, unknown escape!
c = L'?';
} else if (*s == L'?' || (L'A' <= *s && *s <= L'_')) {
c = *s ^ 0x40;
} else if (L'a' <= *s && *s <= L'z') {
c = *s ^ 0x60;
} else { // Oops, unknown escape!
c = L'?';
}
wb_wccat(&buf, c);
s++;
break;
case L'x':
s++;
int value;
if (L'0' <= *s && *s <= L'9') {
value = *s - L'0';
} else if (L'A' <= *s && *s <= L'F') {
value = *s - L'A' + 0xA;
} else if (L'a' <= *s && *s <= L'f') {
value = *s - L'a' + 0xA;
} else { // Oops, missing digit
wb_wccat(&buf, L'?');
break;
}
s++;
if (L'0' <= *s && *s <= L'9')
value = (value << 4) | (*s - L'0');
else if (L'A' <= *s && *s <= L'F')
value = (value << 4) | (*s - L'A' + 0xA);
else if (L'a' <= *s && *s <= L'f')
value = (value << 4) | (*s - L'a' + 0xA);
else // Okay, no second digit
goto only_one_xdigit;
s++;
only_one_xdigit:
wb_wccat(&buf, (wchar_t) value);
break;
default:;
int count = 0;
value = 0;
while (count < 3 && L'0' <= *s && *s <= L'7')
value = (value << 3) | (*s - L'0'), s++, count++;
if (count > 0 && (value & ~0xFF) == 0)
wb_wccat(&buf, (wchar_t) value);
else
wb_wccat(&buf, L'?');
break;
}
break;
default:
wb_wccat(&buf, *s);
s++;
break;
}
}
}

/* Backslashes characters in `s' that are contained in `t'.
* Returns a newly-malloced wide string.
* `t' may be NULL, in which case all the characters are backslashed. */
Expand Down
28 changes: 23 additions & 5 deletions parser.c
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ static void next_token(parsestate_T *ps)
__attribute__((nonnull));
static wordunit_T *parse_word(parsestate_T *ps, bool testfunc(wchar_t c))
__attribute__((nonnull,malloc,warn_unused_result));
static void skip_to_next_single_quote(parsestate_T *ps)
static void skip_to_next_single_quote(parsestate_T *ps, bool allowescape)
__attribute__((nonnull));
static wordunit_T *parse_special_word_unit(parsestate_T *ps, bool indq)
__attribute__((nonnull,malloc,warn_unused_result));
Expand Down Expand Up @@ -1211,11 +1211,18 @@ wordunit_T *parse_word(parsestate_T *ps, bool testfunc(wchar_t c))
continue;
}
assert(ps->src.contents[ps->index] == L'$');
if (!indq && ps->src.contents[ps->index + 1] == L'\'') {
ps->index += 2;
skip_to_next_single_quote(ps, true);
if (ps->src.contents[ps->index] == L'\'')
ps->index++;
continue;
}
break;
case L'\'':
if (!indq) {
ps->index++;
skip_to_next_single_quote(ps);
skip_to_next_single_quote(ps, false);
if (ps->src.contents[ps->index] == L'\'')
ps->index++;
continue;
Expand All @@ -1241,26 +1248,37 @@ wordunit_T *parse_word(parsestate_T *ps, bool testfunc(wchar_t c))
/* Skips to the next single quote.
* If the current position is already at a single quote, the position is not
* moved.
* It is an error if there is no single quote before the end of file. */
void skip_to_next_single_quote(parsestate_T *ps)
* It is an error if there is no single quote before the end of file.
* If `allowescape' is true, backslash escapes are considered: quotes preceded
* by a backslash are treated literally. */
void skip_to_next_single_quote(parsestate_T *ps, bool allowescape)
{
bool escape = false;
for (;;) {
bool nextescape = false;
switch (ps->src.contents[ps->index]) {
case L'\'':
return;
if (escape)
break;
else
return;
case L'\0':
if (read_more_input(ps) != INPUT_OK) {
serror(ps, Ngt("the single quotation is not closed"));
return;
}
continue;
case L'\\':
nextescape = !escape && allowescape;
break;
case L'\n':
ps->info->lineno++;
break;
default:
break;
}
ps->index++;
escape = nextescape;
}
}

Expand Down
6 changes: 0 additions & 6 deletions tests/param-y.tst
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,6 @@ __IN__
[$][$][$]
__OUT__

test_oE "\$'"
bracket $'x'
__IN__
[$x]
__OUT__

test_oE '$"'
bracket $"x"
__IN__
Expand Down
13 changes: 13 additions & 0 deletions tests/quote-p.tst
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,19 @@ b][a
b]
__OUT__

test_oE 'dollar-single-quotes'
bracket $'' $'a' $'a
b' -$'\"\'\'"\\\a\b\e\f\n\r\t\v\x20\1000'- =$'\x9'$'\11\7'=
bracket $'\cA\ca\c^\c\\\c?' $'\\
'
__IN__
[][a][a
b][-"''"\
@0-][= =]
[][\
]
__OUT__

test_oE 'double quotes'
bracket "abc" "'a'"
bracket "a
Expand Down
Loading

0 comments on commit 9b2d911

Please sign in to comment.