Conform non-suffixed integer literals (#5717)

fairywreath · csyonghe · web-flow · commit 98cab6fa86c9 · 2024-12-03T13:40:09.000-08:00
* Make non-suffixed integer literal type resolution conform to C

* Update integer literal tests

* Clean up integer literal implementation a bit

* Update docs on integer literals

* Clean up docs update

* Clean up docs update

* Add comment on INT64_MIN edge case

* Fixed failing test, fixed formatting and cleaned up code

---------

Co-authored-by: Yong He &lt;yonghe@outlook.com&gt;
diff --git a/docs/64bit-type-support.md b/docs/64bit-type-support.md
@@ -5,7 +5,8 @@ Slang 64-bit Type Support
 
 * Not all targets support 64 bit types, or all 64 bit types 
   * 64 bit integers generally require later APIs/shader models
-* When specifying 64 bit literals *always* use the type suffixes (ie `L`, `ULL`, `LL`) 
+* When specifying 64 bit floating-point literals *always* use the type suffixes (ie `L`) 
+* An integer literal will be interpreted as 64 bits if it cannot fit in a 32 bit value.
 * GPU target/s generally do not support all double intrinsics 
   * Typically missing are trascendentals (sin, cos etc), logarithm and exponential functions
   * CUDA is the exception supporting nearly all double intrinsics
@@ -28,7 +29,7 @@ This also applies to vector and matrix versions of these types.
 
 Unfortunately if a specific target supports the type or the typical HLSL intrinsic functions (such as sin/cos/max/min etc) depends very much on the target.
 
-Special attention has to be made with respect to literal 64 bit types. By default float and integer literals if they do not have an explicit suffix are assumed to be 32 bit. There is a variety of reasons for this design choice - the main one being around by default behavior of getting good performance. The suffixes required for 64 bit types are as follows
+Special attention has to be made with respect to literal 64 bit types. By default float literals if they do not have an explicit suffix are assumed to be 32 bit. There is a variety of reasons for this design choice - the main one being around by default behavior of getting good performance. The suffixes required for 64 bit types are as follows
 
 ```
 // double - 'l' or 'L'
@@ -40,27 +41,47 @@ double b = 1.34e-200;
 // int64_t - 'll' or 'LL' (or combination of upper/lower)
 
 int64_t c = -5436365345345234ll;
-// WRONG!: This is the same as d = int64_t(int32_t(-5436365345345234)) which means d ! = -5436365345345234LL. 
-// Will produce a warning.
-int64_t d = -5436365345345234;      
 
 int64_t e = ~0LL;       // Same as 0xffffffffffffffff
-// Does produce the same result as 'e' because equivalent int64_t(~int32_t(0))
-int64_t f = ~0;         
 
 // uint64_t - 'ull' or 'ULL' (or combination of upper/lower)
 
 uint64_t g = 0x8000000000000000ull; 
-// WRONG!: This is the same as h = uint64_t(uint32_t(0x8000000000000000)) which means h = 0
-// Will produce a warning.
-uint64_t h = 0x8000000000000000u;   
 
 uint64_t i = ~0ull;       // Same as 0xffffffffffffffff
 uint64_t j = ~0;          // Equivalent to 'i' because uint64_t(int64_t(~int32_t(0)));
 ```
 
 These issues are discussed more on issue [#1185](https://github.com/shader-slang/slang/issues/1185)
 
+The type of a decimal non-suffixed integer literal is the first integer type from the list [`int`, `int64_t`] 
+which can represent the specified literal value. If the value cannot fit, the literal is  represented as an `uint64_t` 
+and a warning is given.
+The type of a hexadecimal non-suffixed integer literal  is the first type from the list [`int`, `uint`, `int64_t`, `uint64_t`] 
+that can represent the specified literal value. A non-suffixed integer literal will be 64 bit if it cannot fit in 32 bits.
+```
+// Same as int64_t a = int(1), the value can fit into a 32 bit integer.
+int64_t a = 1;
+
+// Same as int64_t b = int64_t(2147483648), the value cannot fit into a 32 bit integer.
+int64_t b = 2147483648;
+
+// Same as int64_t c = uint64_t(18446744073709551615), the value is larger than the maximum value of a signed 64 bit
+// integer, and is interpreted as an unsigned 64 bit integer. Warning is given.
+uint64_t c = 18446744073709551615;
+
+// Same as uint64_t = int(0x7FFFFFFF), the value can fit into a 32 bit integer.
+uint64_t d = 0x7FFFFFFF;
+
+// Same as uint64_t = int64_t(0x7FFFFFFFFFFFFFFF), the value cannot fit into an unsigned 32 bit integer but
+// can fit into a signed 64 bit integer.
+uint64_t e = 0x7FFFFFFFFFFFFFFF;
+
+// Same as uint64_t = uint64_t(0xFFFFFFFFFFFFFFFF), the value cannot fit into a signed 64 bit integer, and
+// is interpreted as an unsigned 64 bit integer.
+uint64_t f = 0xFFFFFFFFFFFFFFFF;
+```
+
 Double support
 ==============
 
diff --git a/docs/user-guide/02-conventional-features.md b/docs/user-guide/02-conventional-features.md
@@ -39,8 +39,11 @@ The following integer types are provided:
 
 All targets support the 32-bit `int` and `uint` types, but support for the other types depends on the capabilities of each target platform.
 
-Integer literals can be both decimal and hexadecimal, and default to the `int` type.
-A literal can be explicitly made unsigned with a `u` suffix.
+Integer literals can be both decimal and hexadecimal. An integer literal can be explicitly made unsigned 
+with a `u` suffix, and explicitly made 64-bit with the `ll` suffix. The type of a decimal non-suffixed integer literal is the first integer type from
+the list [`int`, `int64_t`] which can represent the specified literal value. If the value cannot fit, the literal is represented as 
+an `uint64_t` and a warning is given. The type of hexadecimal non-suffixed integer literal is the first type from the list 
+[`int`, `uint`, `int64_t`, `uint64_t`] that can represent the specified literal value. For more information on 64 bit integer literals see the documentation on [64 bit type support](../64bit-type-support.md).
 
 The following floating-point type are provided:
 
diff --git a/source/compiler-core/slang-lexer.cpp b/source/compiler-core/slang-lexer.cpp
@@ -673,7 +673,10 @@ static int _readOptionalBase(char const** ioCursor)
 }
 
 
-IntegerLiteralValue getIntegerLiteralValue(Token const& token, UnownedStringSlice* outSuffix)
+IntegerLiteralValue getIntegerLiteralValue(
+    Token const& token,
+    UnownedStringSlice* outSuffix,
+    bool* outIsDecimalBase)
 {
     IntegerLiteralValue value = 0;
 
@@ -698,6 +701,11 @@ IntegerLiteralValue getIntegerLiteralValue(Token const& token, UnownedStringSlic
         *outSuffix = UnownedStringSlice(cursor, end);
     }
 
+    if (outIsDecimalBase)
+    {
+        *outIsDecimalBase = (base == 10);
+    }
+
     return value;
 }
 
diff --git a/source/compiler-core/slang-lexer.h b/source/compiler-core/slang-lexer.h
@@ -172,7 +172,10 @@ String getFileNameTokenValue(Token const& token);
 typedef int64_t IntegerLiteralValue;
 typedef double FloatingPointLiteralValue;
 
-IntegerLiteralValue getIntegerLiteralValue(Token const& token, UnownedStringSlice* outSuffix = 0);
+IntegerLiteralValue getIntegerLiteralValue(
+    Token const& token,
+    UnownedStringSlice* outSuffix = 0,
+    bool* outIsDecimalBase = 0);
 FloatingPointLiteralValue getFloatingPointLiteralValue(
     Token const& token,
     UnownedStringSlice* outSuffix = 0);
diff --git a/source/slang/slang-diagnostic-defs.h b/source/slang/slang-diagnostic-defs.h
@@ -1574,6 +1574,12 @@ DIAGNOSTIC(
     Error,
     invalidFloatingPointLiteralSuffix,
     "invalid suffix '$0' on floating-point literal")
+DIAGNOSTIC(
+    39999,
+    Warning,
+    integerLiteralTooLarge,
+    "integer literal is too large to be represented in a signed integer type, interpreting as "
+    "unsigned")
 
 DIAGNOSTIC(
     39999,
diff --git a/source/slang/slang-parser.cpp b/source/slang/slang-parser.cpp
@@ -3136,8 +3136,7 @@ static Modifier* ParseSemantic(Parser* parser)
         BitFieldModifier* bitWidthMod = parser->astBuilder->create<BitFieldModifier>();
         parser->FillPosition(bitWidthMod);
         const auto token = parser->tokenReader.advanceToken();
-        UnownedStringSlice suffix;
-        bitWidthMod->width = getIntegerLiteralValue(token, &suffix);
+        bitWidthMod->width = getIntegerLiteralValue(token);
         return bitWidthMod;
     }
     else if (parser->LookAheadToken(TokenType::CompletionRequest))
@@ -6638,6 +6637,64 @@ static IntegerLiteralValue _fixIntegerLiteral(
     return value;
 }
 
+static BaseType _determineNonSuffixedIntegerLiteralType(
+    IntegerLiteralValue value,
+    bool isDecimalBase,
+    Token* token,
+    DiagnosticSink* sink)
+{
+    const uint64_t rawValue = (uint64_t)value;
+
+    /// Non-suffixed integer literal types
+    ///
+    /// The type is the first from the following list in which the value can fit:
+    /// - For decimal bases:
+    ///     - `int`
+    ///     - `int64_t`
+    /// - For non-decimal bases:
+    ///     - `int`
+    ///     - `uint`
+    ///     - `int64_t`
+    ///     - `uint64_t`
+    ///
+    /// The lexer scans the negative(-) part of literal separately, and the value part here
+    /// is always positive hence it is sufficient to only compare with the maximum limits.
+    BaseType baseType;
+    if (rawValue <= INT32_MAX)
+    {
+        baseType = BaseType::Int;
+    }
+    else if ((rawValue <= UINT32_MAX) && !isDecimalBase)
+    {
+        baseType = BaseType::UInt;
+    }
+    else if (rawValue <= INT64_MAX)
+    {
+        baseType = BaseType::Int64;
+    }
+    else
+    {
+        baseType = BaseType::UInt64;
+
+        if (isDecimalBase)
+        {
+            // There is an edge case here where 9223372036854775808 or INT64_MAX + 1
+            // brings us here, but the complete literal is -9223372036854775808 or INT64_MIN and is
+            // valid. Unfortunately because the lexer handles the negative(-) part of the literal
+            // separately it is impossible to know whether the literal has a negative sign or not.
+            // We emit the warning and initially process it as a uint64 anyways, and the negative
+            // sign will be properly parsed and the value will still be properly stored as a
+            // negative INT64_MIN.
+
+            // Decimal integer is too large to be represented as signed.
+            // Output warning that it is represented as unsigned instead.
+            sink->diagnose(*token, Diagnostics::integerLiteralTooLarge);
+        }
+    }
+
+    return baseType;
+}
+
 static bool _isCast(Parser* parser, Expr* expr)
 {
     if (as<PointerTypeExpr>(expr))
@@ -6925,20 +6982,18 @@ static Expr* parseAtomicExpr(Parser* parser)
             constExpr->token = token;
 
             UnownedStringSlice suffix;
-            IntegerLiteralValue value = getIntegerLiteralValue(token, &suffix);
+            bool isDecimalBase;
+            IntegerLiteralValue value = getIntegerLiteralValue(token, &suffix, &isDecimalBase);
 
             // Look at any suffix on the value
             char const* suffixCursor = suffix.begin();
             const char* const suffixEnd = suffix.end();
+            const bool suffixExists = (suffixCursor != suffixEnd);
 
-            // If no suffix is defined go with the default
-            BaseType suffixBaseType = BaseType::Int;
-
-            if (suffixCursor < suffixEnd)
+            // Mark as void, taken as an error
+            BaseType suffixBaseType = BaseType::Void;
+            if (suffixExists)
             {
-                // Mark as void, taken as an error
-                suffixBaseType = BaseType::Void;
-
                 int lCount = 0;
                 int uCount = 0;
                 int zCount = 0;
@@ -7008,6 +7063,14 @@ static Expr* parseAtomicExpr(Parser* parser)
                     suffixBaseType = BaseType::Int;
                 }
             }
+            else
+            {
+                suffixBaseType = _determineNonSuffixedIntegerLiteralType(
+                    value,
+                    isDecimalBase,
+                    &token,
+                    parser->sink);
+            }
 
             value = _fixIntegerLiteral(suffixBaseType, value, &token, parser->sink);
 
diff --git a/tests/diagnostics/int-literal.slang b/tests/diagnostics/int-literal.slang
@@ -2,8 +2,8 @@
 
 int doSomething(int a)
 {
-    // Warning can't fit
-    int c0 = 0x800000000;
+    // No warning, literal will be interpreted as 64 bit.
+    uint64_t c0 = 0x800000000;
     
     // No warning as top bits are just ignored
     int c1 = -1ll;
@@ -13,19 +13,30 @@ int doSomething(int a)
     // Should sign extend 
     int c3 = 0x80000000;
     
-    // Should give a warning (ideally including the preceeding -)
-    // Currently we don't have the -, because the lexer lexes - independently
-    int c4 = -0xfffffffff;
+    // No warning, hex literal will be interpreted as an unsigned 64 integer then signed with negative operator.
+    int64_t c4 = -0xfffffffff;
     
-    // 
-    a += c0 + c1 + c2;
+    a += (int)c0 + c1 + c2;
     
     int64_t b = 0;
 
     // Ok
     b += 0x800000000ll;
     
     uint64_t c5 = -2ull;
+
+    // Warning, integer literal is too large for signed 64 bit, must be interpreted as unsigned.
+    uint64_t d0 = 18446744073709551615;
+
+    // Warning, integer literal is too small for signed 64 bit, must be interpreted as unsigned.
+    uint64_t d1 = -9223372036854775809;
+
+    // This is INT64_MIN and valid negative signed integer, but warning will be emitted as negative(-) is scanned
+    // separately in the lexer, and the positive literal portion will emit a warning.
+    // The final value will still be correctly set as INT64_MIN.
+    //
+    // To not have this warning the lexer must scan the negative operator and number together.
+    uint64_t d2 = -9223372036854775808;
     
     return a + int(b);
 }
diff --git a/tests/diagnostics/int-literal.slang.expected b/tests/diagnostics/int-literal.slang.expected
@@ -1,11 +1,14 @@
 result code = 0
 standard error = {
-tests/diagnostics/int-literal.slang(6): warning 39999: integer literal '0x800000000' too large for type 'int' truncated to '0'
-    int c0 = 0x800000000;
-             ^~~~~~~~~~~
-tests/diagnostics/int-literal.slang(18): warning 39999: integer literal '0xfffffffff' too large for type 'int' truncated to '-1'
-    int c4 = -0xfffffffff;
-              ^~~~~~~~~~~
+tests/diagnostics/int-literal.slang(29): warning 39999: integer literal is too large to be represented in a signed integer type, interpreting as unsigned
+    uint64_t d0 = 18446744073709551615;
+                  ^~~~~~~~~~~~~~~~~~~~
+tests/diagnostics/int-literal.slang(32): warning 39999: integer literal is too large to be represented in a signed integer type, interpreting as unsigned
+    uint64_t d1 = -9223372036854775809;
+                   ^~~~~~~~~~~~~~~~~~~
+tests/diagnostics/int-literal.slang(39): warning 39999: integer literal is too large to be represented in a signed integer type, interpreting as unsigned
+    uint64_t d2 = -9223372036854775808;
+                   ^~~~~~~~~~~~~~~~~~~
 }
 standard output = {
 }
diff --git a/tests/hlsl-intrinsic/literal-int64.slang b/tests/hlsl-intrinsic/literal-int64.slang
diff --git a/tests/hlsl-intrinsic/literal-int64.slang.expected.txt b/tests/hlsl-intrinsic/literal-int64.slang.expected.txt
diff --git a/tests/slang-extension/atomic-int64-byte-address-buffer.slang b/tests/slang-extension/atomic-int64-byte-address-buffer.slang

Original file line number	Diff line number	Diff line change
`@@ -673,7 +673,10 @@ static int _readOptionalBase(char const** ioCursor)`
`673`	`673`	`}`
`674`	`674`
`675`	`675`
`676`		`-IntegerLiteralValue getIntegerLiteralValue(Token const& token, UnownedStringSlice* outSuffix)`
	`676`	`+IntegerLiteralValue getIntegerLiteralValue(`
	`677`	`+ Token const& token,`
	`678`	`+ UnownedStringSlice* outSuffix,`
	`679`	`+ bool* outIsDecimalBase)`
`677`	`680`	`{`
`678`	`681`	`IntegerLiteralValue value = 0;`
`679`	`682`
`@@ -698,6 +701,11 @@ IntegerLiteralValue getIntegerLiteralValue(Token const& token, UnownedStringSlic`
`698`	`701`	`*outSuffix = UnownedStringSlice(cursor, end);`
`699`	`702`	`}`
`700`	`703`
	`704`	`+ if (outIsDecimalBase)`
	`705`	`+ {`
	`706`	`+ *outIsDecimalBase = (base == 10);`
	`707`	`+ }`
	`708`	`+`
`701`	`709`	`return value;`
`702`	`710`	`}`
`703`	`711`