Skip to content

Support numerical evaluation of expectations #19

@thomasahle

Description

@thomasahle

Motivation

Currently, the library can handle symbolic expressions for Tensors and can represent expectations (via the Expectation class). However, there's no fully unified method to numerically evaluate those expectations by sampling the distribution of a variable (e.g. a Gaussian) while keeping everything else the same. For many practical applications, it would be helpful to do Monte Carlo approximation of (\mathbb{E}[f(X)]) by adding a "samples" (or "batch") dimension to (X), then averaging over that dimension.

The Problem

When we inject an extra "samples" dimension into the numeric tensor for (X), it no longer strictly matches the symbolic shape ((i, j, ...)). For instance:

  • Symbolically, X might be declared as Variable("X", i, j).
  • Numerically, we produce a tensor shaped ((\text{samples}, i, j)) for sampling.

Because much of the code assumes that the numeric tensor has exactly the same named edges as the symbolic expression, we get:

  • KeyErrors if the code attempts old_to_new[e] for e == "samples".
  • align_to errors if "samples" is not part of the list of symbolic edges.
  • Dimension mismatch checks in _inner_evaluate or rename(...) blocks that fail when extra dimensions exist.

Proposed Directions

  1. Partial Fix: Ellipses and Skipping Unknown Dims
    One approach is to:

    • Use .align_to(..., *self.edges) so that leftover dims like "samples" are tolerated.
    • Skip unrecognized dims in rename checks, e.g. if a dim is not in v.orig, we ignore it.
      This “patch” approach does let code run without error, but the symbolic shapes are still (i, j) while numeric shapes are (samples, i, j). The library is unaware of the extra dimension except for scattered “just skip it” logic.
  2. Symbolic Substitution / Broadcasting
    A more complete fix would be:

    • When we decide to do Monte Carlo, replace X with a new Variable("X_samples", "samples", i, j) throughout the expression.
    • That way, the shape is truly (samples, i, j) in the symbolic expression itself.
    • Everything else in the library remains consistent: no rename or align issues, because the expression literally expects a "samples" edge now.
    • Downside: This can be complex. We’d have to modify every sub-expression referencing X, handle potential edge collisions, etc.
  3. Dedicated “Batch” or “Sample” Concept
    A more architectural approach might define a “batch dimension” system at the library’s top level. Each Tensor can have zero or more batch dims plus its symbolic dims. Then the code knows about that distinction and handles it systematically.

Next Steps

  • Decide whether a short-term patch (skipping unrecognized dims, using ellipses) is enough, or if a more thorough symbolic approach is warranted.
  • Possibly adopt the substitution idea so we truly unify the shape (samples, i, j) both numerically and symbolically.

Comments, suggestions, and PRs are welcome to refine how we handle numeric evaluation of expectations with extra sample dimensions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions