expipiplus1
diff --git a/‎docs/64bit-type-support.md
+3-3 b/‎docs/64bit-type-support.md
+3-3
diff --git a/‎docs/README.md
+1-1 b/‎docs/README.md
+1-1
diff --git a/‎docs/cpu-target.md
+4-4 b/‎docs/cpu-target.md
+4-4
diff --git a/‎docs/cuda-target.md
+5-5 b/‎docs/cuda-target.md
+5-5
diff --git a/‎docs/design/autodiff.md
+3-3 b/‎docs/design/autodiff.md
+3-3
diff --git a/‎docs/design/autodiff/basics.md
+3-3 b/‎docs/design/autodiff/basics.md
+3-3
diff --git a/‎docs/design/autodiff/decorators.md
+2-2 b/‎docs/design/autodiff/decorators.md
+2-2
@@ -7,7 +7,7 @@ Slang 64-bit Type Support
   * 64 bit integers generally require later APIs/shader models
 * When specifying 64 bit literals *always* use the type suffixes (ie `L`, `ULL`, `LL`) 
 * GPU target/s generally do not support all double intrinsics 
-  * Typically missing are trascendentals (sin, cos etc), logarithm and exponental functions
+  * Typically missing are trascendentals (sin, cos etc), logarithm and exponential functions
   * CUDA is the exception supporting nearly all double intrinsics
 * D3D 
   * D3D targets *appear* to support double intrinsics (like sin, cos, log etc), but behind the scenes they are actually being converted to float
@@ -26,7 +26,7 @@ The Slang language supports 64 bit built in types. Such as
 
 This also applies to vector and matrix versions of these types. 
 
-Unfortunately if a specific target supports the type or the typical HLSL instrinsic functions (such as sin/cos/max/min etc) depends very much on the target. 
+Unfortunately if a specific target supports the type or the typical HLSL intrinsic functions (such as sin/cos/max/min etc) depends very much on the target.
 
 Special attention has to be made with respect to literal 64 bit types. By default float and integer literals if they do not have an explicit suffix are assumed to be 32 bit. There is a variety of reasons for this design choice - the main one being around by default behavior of getting good performance. The suffixes required for 64 bit types are as follows
 
@@ -107,7 +107,7 @@ On dxc the following intrinsics are available with double::
 
 These are tested in the test `tests/hlsl-intrinsic/scalar-double-d3d-intrinsic.slang`.
 
-There is no suport for transcendentals (`sin`, `cos` etc) or `log`/`exp`. More surprising is that`sqrt`, `rsqrt`, `frac`, `ceil`, `floor`, `trunc`, `step`, `lerp`, `smoothstep` are also not supported.
+There is no support for transcendentals (`sin`, `cos` etc) or `log`/`exp`. More surprising is that `sqrt`, `rsqrt`, `frac`, `ceil`, `floor`, `trunc`, `step`, `lerp`, `smoothstep` are also not supported.
 
 uint64_t and int64_t Support
 ============================
 
@@ -19,7 +19,7 @@ The [target compatibility guide](target-compatibility.md) gives an overview of f
 
 The [CPU target guide](cpu-target.md) gives information on compiling Slang or C++ source into shared libraries/executables or functions that can be directly executed. It also covers how to generate C++ code from Slang source.  
 
-The [CUDA target guide](cuda-target.md) provides information on compiling Slang/HLSL or CUDA source. Slang can compile to equivalent CUDA source, as well as to PTX via the nvrtc CUDA complier.
+The [CUDA target guide](cuda-target.md) provides information on compiling Slang/HLSL or CUDA source. Slang can compile to equivalent CUDA source, as well as to PTX via the nvrtc CUDA compiler.
 
 Contributors
 ------------
 
@@ -293,7 +293,7 @@ The global can now be set from host code via
     }
 ```
 
-In terms of reflection `__global` variables are not visibile. 
+In terms of reflection `__global` variables are not visible.
 
 ## NativeString
 
@@ -309,7 +309,7 @@ TODO(JS): What happens with String with shader compile style on CPU? Shouldn't i
 
 It is currently not possible to step into LLVM-JIT code when using [slang-llvm](#slang-llvm). Fortunately it is possible to step into code compiled via a [regular C/C++ compiler](#regular-cpp). 
 
-Below is a code snippet showing how to swich to a [regular C/C++ compiler](#regular-cpp) at runtime. 
+Below is a code snippet showing how to switch to a [regular C/C++ compiler](#regular-cpp) at runtime.
 
 ```C++
     SlangPassThrough findRegularCppCompiler(slang::IGlobalSession* slangSession)
@@ -401,7 +401,7 @@ struct ComputeVaryingInput
 
 `ComputeVaryingInput` allows specifying a range of groupIDs to execute - all the ids in a grid from startGroup to endGroup, but not including the endGroupIDs. Most compute APIs allow specifying an x,y,z extent on 'dispatch'. This would be equivalent as having startGroupID = { 0, 0, 0} and endGroupID = { x, y, z }. The exported function allows setting a range of groupIDs such that client code could dispatch different parts of the work to different cores. This group range mechanism was chosen as the 'default' mechanism as it is most likely to achieve the best performance.
 
-There are two other functions that consist of the entry point name postfixed with `_Thread` and `_Group`. For the entry point 'computeMain' these functions would be accessable from the shared library interface as `computeMain_Group` and `computeMain_Thread`. `_Group` has the same signature as the listed for computeMain, but it doesn't execute a range, only the single group specified by startGroupID (endGroupID is ignored). That is all of the threads within the group (as specified by `[numthreads]`) will be executed in a single call. 
+There are two other functions that consist of the entry point name postfixed with `_Thread` and `_Group`. For the entry point 'computeMain' these functions would be accessible from the shared library interface as `computeMain_Group` and `computeMain_Thread`. `_Group` has the same signature as the listed for computeMain, but it doesn't execute a range, only the single group specified by startGroupID (endGroupID is ignored). That is all of the threads within the group (as specified by `[numthreads]`) will be executed in a single call.
 
 It may be desirable to have even finer control of how execution takes place down to the level of individual 'thread's and this can be achieved with the `_Thread` style. The signature looks as follows
 
@@ -566,7 +566,7 @@ It may be useful to be able to include `slang-cpp-types.h` in C++ code to access
 
 Would wrap all the Slang prelude types in the namespace `CPPPrelude`, such that say a `StructuredBuffer<int32_t>` could be specified in C++ source code as `CPPPrelude::StructuredBuffer<int32_t>`.
 
-The code that sets up the prelude for the test infrastucture and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
+The code that sets up the prelude for the test infrastructure and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
 
 The *default* prelude is set to the contents of the files for C++ held in the prelude directory and is held within the Slang shared library. It is therefore typically not necessary to distribute Slang with prelude files.
 
 
@@ -30,7 +30,7 @@ The following are a work in progress or not implemented but are planned to be so
 
 For producing PTX binaries Slang uses [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html). NVRTC dll/shared library has to be available to Slang (for example in the appropriate PATH for example) for it to be able to produce PTX.
 
-The NVRTC compiler can be accessed directly via the pass through mechanism and is identifed by the enum value `SLANG_PASS_THROUGH_NVRTC`.
+The NVRTC compiler can be accessed directly via the pass through mechanism and is identified by the enum value `SLANG_PASS_THROUGH_NVRTC`.
 
 Much like other targets that use downstream compilers Slang can be used to compile CUDA source directly to PTX via the pass through mechansism. The Slang command line options will broadly be mapped down to the appropriate options for the NVRTC compilation. In the API the `SlangCompileTarget` for CUDA is `SLANG_CUDA_SOURCE` and for PTX is `SLANG_PTX`. These can also be specified on the Slang command line as `-target cuda` and `-target ptx`.
 
@@ -126,11 +126,11 @@ The UniformState and UniformEntryPointParams struct typically vary by shader. Un
 
 Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling.
 
-If code relys on this behavior it will be necessary to bind multiple CtexObjects with different sampler settings, accessing the same texture data.
+If code relies on this behavior it will be necessary to bind multiple CtexObjects with different sampler settings, accessing the same texture data.
 
 Slang has some preliminary support for TextureSampler type - a combined Texture and SamplerState. To write Slang code that can target CUDA and other platforms using this mechanism will expose the semantics appropriately within the source.
 
-Load is only supported for Texture1D, and the mip map selection argument is ignored. This is because there is tex1Dfetch and no higher dimensional equivalents. CUDA also only allows such access if the backing array is linear memory - meaning the bound texture cannot have mip maps - thus making the mip map parameter superflous anyway. RWTexture does allow Load on other texture types.
+Load is only supported for Texture1D, and the mip map selection argument is ignored. This is because there is tex1Dfetch and no higher dimensional equivalents. CUDA also only allows such access if the backing array is linear memory - meaning the bound texture cannot have mip maps - thus making the mip map parameter superfluous anyway. RWTexture does allow Load on other texture types.
 
 ## RWTexture
 
@@ -239,7 +239,7 @@ That for pass-through usage, prelude is not pre-pended, preludes are for code ge
 void setDownstreamCompilerPrelude(SlangPassThrough passThrough, const char* preludeText);
 ```
 
-The code that sets up the prelude for the test infrastucture and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
+The code that sets up the prelude for the test infrastructure and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
 
 Half Support
 ============
@@ -292,7 +292,7 @@ Will require 3 times as many steps as the earlier scalar example just using a si
 
 ## WaveGetLaneIndex
 
-'WaveGetLaneIndex' defaults to `(threadIdx.x & SLANG_CUDA_WARP_MASK)`. Depending on how the kernel is launched this could be incorrect. There other ways to get lane index, for example using inline assembly. This mechanism though is apparently slower than the simple method used here. There is support for using the asm mechnism in the CUDA prelude using the `SLANG_USE_ASM_LANE_ID` preprocessor define to enable the feature.
+'WaveGetLaneIndex' defaults to `(threadIdx.x & SLANG_CUDA_WARP_MASK)`. Depending on how the kernel is launched this could be incorrect. There are other ways to get lane index, for example using inline assembly. This mechanism though is apparently slower than the simple method used here. There is support for using the asm mechanism in the CUDA prelude using the `SLANG_USE_ASM_LANE_ID` preprocessor define to enable the feature.
 
 There is potential to calculate the lane id using the [numthreads] markup in Slang/HLSL, but that also requires some assumptions of how that maps to a lane index.
 
 
@@ -201,7 +201,7 @@ DP<float> f_SSA_Proped(DP<float> dpa, DP<float> dpb)
     }
 
     // Note here that we have to 'store' all the intermediaries 
-    // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundementally
+    // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally
     // the tradeoff between fwd_mode and rev_mode
 
     if (_b1)
@@ -288,7 +288,7 @@ void f_SSA_Rev(inout DP<float> dpa, inout DP<float> dpb, float dout)
     }
 
     // Note here that we have to 'store' all the intermediaries 
-    // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundementally
+    // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally
     // the tradeoff between fwd_mode and rev_mode
 
     if (_b1)
@@ -330,4 +330,4 @@ void f_SSA_Rev(inout DP<float> dpa, inout DP<float> dpb, float dout)
     }
 }
 
-```
+```
@@ -4,7 +4,7 @@ This documentation is intended for Slang contributors and is written from a comp
 
 ## What is Automatic Differentiation?
 
-Before diving into the design of the automatic differentiation (for brevity, we will call it 'auto-diff') passes, it is important to understand the end goal of what auto-diff tries to acheive. 
+Before diving into the design of the automatic differentiation (for brevity, we will call it 'auto-diff') passes, it is important to understand the end goal of what auto-diff tries to achieve.
 
 The over-arching goal of Slang's auto-diff is to enable the user to compute derivatives of a given shader program or function's output w.r.t its input parameters. This critical compiler feature enables users to quickly use their shaders with gradient-based parameter optimization algorithms, which forms the backbone of modern machine learning systems. It enables users to train and deploy graphics systems that contain ML primitives (like multi-layer perceptron's or MLPs) or use their shader programs as differentiable primitives within larger ML pipelines.
 
@@ -60,7 +60,7 @@ DifferentialPair<float> fwd_f(DifferentialPair<float> dpx)
 }
 ```
 
-Note that `(2 * x)` is the multiplier corresponding to $Df(x)$. We refer to $x$ and $f(x)$ as "*primal*" values and the pertubations $dx$ and $Df(x)\cdot dx$ as "*differential*" values. The reason for this separation is that the "*differential*" output values are always linear w.r.t their "*differential*" inputs.
+Note that `(2 * x)` is the multiplier corresponding to $Df(x)$. We refer to $x$ and $f(x)$ as "*primal*" values and the perturbations $dx$ and $Df(x)\cdot dx$ as "*differential*" values. The reason for this separation is that the "*differential*" output values are always linear w.r.t their "*differential*" inputs.
 
 As the name implies, `DifferentialPair<T>` is a special pair type used by Slang to hold values and their corresponding differentials.
 
@@ -256,7 +256,7 @@ void rev_f(inout DifferentialPair<float> dpx, inout DifferentialPair<float> dpy,
 
 Note that `rev_f` accepts derivatives w.r.t the output value as the input, and returns derivatives w.r.t inputs as its output (through `inout` parameters). `rev_f` still needs the primal values `x` and `y` to compute the derivatives, so those are still passed in as an input through the primal part of the differential pair. 
 
-Also note that the reverse-mode derivative function does not have to compute the primal result value (its return is void). The reason for this is a matter of convenience: reverse-mode derivatives are often invoked after all the primal fuctions, and there is typically no need for these values. We go into more detail on this topic in the checkpointing chapter.
+Also note that the reverse-mode derivative function does not have to compute the primal result value (its return is void). The reason for this is a matter of convenience: reverse-mode derivatives are often invoked after all the primal functions, and there is typically no need for these values. We go into more detail on this topic in the checkpointing chapter.
 
 The reverse mode function can be used to compute both `dOutput/dx` and `dOutput/dy` with a single invocation (unlike the forward-mode case where we had to invoke `fwd_f` once for each input)
 
 
@@ -45,7 +45,7 @@ interface IFoo_after_checking_and_lowering
 ### `[TreatAsDifferentiable]`
 In large codebases where some interfaces may have several possible implementations, it may not be reasonable to have to mark all possible implementations with `[Differentiable]`, especially if certain implementations use hacks or workarounds that need additional consideration before they can be marked `[Differentiable]`
 
-In such cases, we provide the `[TreatAsDifferentiable]` decoration (AST node: `TreatAsDifferentiableAttribute`, IR: `OpTreatAsDifferentiableDecoration`), which instructs the auto-diff passes to construct an 'empty' function that returns a 0 (or 0-equivalent) for the derivative values. This allows the signature of a `[TreatAsDifferentiable]` function to match a `[Differentiable]` requirment without actually having to produce a derivative.
+In such cases, we provide the `[TreatAsDifferentiable]` decoration (AST node: `TreatAsDifferentiableAttribute`, IR: `OpTreatAsDifferentiableDecoration`), which instructs the auto-diff passes to construct an 'empty' function that returns a 0 (or 0-equivalent) for the derivative values. This allows the signature of a `[TreatAsDifferentiable]` function to match a `[Differentiable]` requirement without actually having to produce a derivative.
 
 ## Custom derivative decorators
 In many cases, it is desirable to manually specify the derivative code for a method rather than let the auto-diff pass synthesize it from the method body. This is usually desirable if:
@@ -68,7 +68,7 @@ In some cases, we face the opposite problem that inspired custom derivatives. Th
 This frequently occurs with hardware intrinsic operations that are lowered into special op-codes that map to hardware units, such as texture sampling & interpolation operations. 
 However, these operations do have reference 'software' implementations which can be used to produce the derivative.
 
-To allow user code to use the fast hardward intrinsics for the primal pass, but use synthesized derivatives for the derivative pass, we provide decorators `[PrimalSubstitute(ref-fn)]` and `[PrimalSubstituteOf(orig-fn)]` (AST Node: `PrimalSubstituteAttribute`/`PrimalSubstituteOfAttribute`, IR: `OpPrimalSubstituteDecoration`), that can be used to provide a reference implementation for the auto-diff pass.
+To allow user code to use the fast hardware intrinsics for the primal pass, but use synthesized derivatives for the derivative pass, we provide decorators `[PrimalSubstitute(ref-fn)]` and `[PrimalSubstituteOf(orig-fn)]` (AST Node: `PrimalSubstituteAttribute`/`PrimalSubstituteOfAttribute`, IR: `OpPrimalSubstituteDecoration`), that can be used to provide a reference implementation for the auto-diff pass.
 
 Example:
 ```C
Original file line number	Diff line number	Diff line change
`@@ -201,7 +201,7 @@ DP<float> f_SSA_Proped(DP<float> dpa, DP<float> dpb)`
`201`	`201`	`}`
`202`	`202`
`203`	`203`	`// Note here that we have to 'store' all the intermediaries`
`204`		`- // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundementally`
	`204`	`+ // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally`
`205`	`205`	`// the tradeoff between fwd_mode and rev_mode`
`206`	`206`
`207`	`207`	`if (_b1)`
`@@ -288,7 +288,7 @@ void f_SSA_Rev(inout DP<float> dpa, inout DP<float> dpb, float dout)`
`288`	`288`	`}`
`289`	`289`
`290`	`290`	`// Note here that we have to 'store' all the intermediaries`
`291`		`- // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundementally`
	`291`	`+ // _t1, _t2, _q4, _t3, _q5, _t3_d, _t4 and _q1. This is fundamentally`
`292`	`292`	`// the tradeoff between fwd_mode and rev_mode`
`293`	`293`
`294`	`294`	`if (_b1)`
`@@ -330,4 +330,4 @@ void f_SSA_Rev(inout DP<float> dpa, inout DP<float> dpb, float dout)`
`330`	`330`	`}`
`331`	`331`	`}`
`332`	`332`
`333`		-```
	`333`	+```