Skip to content

Shakeri-Lab/loop_modulus

Repository files navigation

Network Loop Modulus Calculation for Graph Analysis

This repository contains a Python implementation for calculating the Loop Modulus (specifically for p=2) of a network. The modulus quantifies the "richness" of cycles in a graph, providing a measure related to connectivity, robustness, and flow properties. The extremal density rho* derived from the modulus calculation can be used as edge weights to enhance downstream tasks like community detection.

This implementation utilizes an iterative algorithm based on convex optimization (Quadratic Programming - QP) and incorporates several performance enhancements:

  1. Refined Preprocessing: Identifies an initial set of potentially important cycles (currently focused on triangles) using heuristics based on edge centrality (degree product) and a greedy strategy balancing edge coverage and low pairwise overlap to "warm-start" the optimization.
  2. Efficient QP Solver: Uses the OSQP solver, known for its speed on large sparse problems relevant to this iterative constraint addition process.
  3. Warm Starting: Leverages OSQP's warm start capability (solver.warm_start()) to significantly speed up QP solves in subsequent iterations by initializing with the previous solution.
  4. Temporary Graph Pruning (Heuristic): Optionally prunes the graph temporarily for cycle searching, focusing the search near recently added constraints to potentially accelerate the bottleneck constraint-finding step (use_pruning_heuristic=True).
  5. Batch Constraint Addition: Adds multiple (cycles_to_add_per_iter) violating constraints per iteration to reduce the total number of outer loops and QP solves compared to adding only one.

Background: Loop Modulus (p=2)

The 2-Modulus of the family of all simple loops L in a graph G=(V, E) is defined via the optimization problem:

Primal Problem:

minimize   sum_{e in E} rho(e)^2
subject to sum_{e in gamma} rho(e) >= 1   for all simple loops gamma in L
           rho(e) >= 0                   for all edges e in E

where rho: E -> R+ is the density function on edges. The initial edge weights sigma(e) are assumed to be 1 in this implementation, so the energy minimized is sum(rho(e)^2). The vector rho represents the variable we solve for.

The optimal rho* represents the "importance" or expected usage of each edge within the context of the loop family, constrained by the loop length condition. The final modulus value Mod2(L) = sum_{e in E} rho*(e)^2 measures the overall richness of the loop structure satisfying the constraints.

Dual Problem & Probabilistic Interpretation: The dual problem involves finding Lagrange multipliers lambda(gamma) for each loop constraint. For p=2, this can be interpreted probabilistically: Mod2(L) is related to minimizing the expected overlap between two randomly chosen loops (E[C(gamma, gamma')]), where the probability distribution mu over loops is derived from the optimal Lagrange multipliers (lambda = nu * mu, nu is a scalar). Edges with high rho* are those with high expected usage (rho*(e) = Mod2(L) * E_mu[N(gamma, e)]) under this optimal, minimum-overlap distribution.

Algorithm Overview

The core algorithm follows the iterative approach described in the source papers (related to Algorithm 1):

  1. Initialization:

    • (Preprocessing) Generate candidate cycles (currently triangles via get_triangles). Score them based on edge centrality (degree product). Select an initial set (initial_cycles_list) using a greedy heuristic (calculate_loop_modulus_rho_preprocessed internal logic) that balances maximizing edge coverage and minimizing pairwise overlap among selected cycles, up to a target size (initial_set_target_size).
    • Build the initial constraint matrix N (sparse format) where rows correspond to the selected initial cycles (N[i, j] = 1 if edge j is in cycle i).
    • Solve the initial QP using OSQP (solver.setup and solver.solve) to get the starting density rho. Store the solution (x, y) for warm starting.
  2. Iteration: Repeat until convergence (max_iterations reached or no violated constraints found):

    • Determine Search Graph: If the pruning heuristic (use_pruning_heuristic) is enabled and the reset interval (pruning_reset_interval) hasn't been reached, use a pruned subgraph (current_graph_view) focused around nodes near the last added cycle(s); otherwise, use the full graph.
    • Find Violated Constraints: Using the current rho as edge weights (rho_map), search for simple cycles gamma where the p-length (sum_{e in gamma} rho(e)) is less than 1 - tolerance.
      • This implementation uses get_top_k_shortest_p_length_cycles which internally performs an APSP (All-Pairs Shortest Path) calculation on the current search graph weighted by rho. It finds up to cycles_to_add_per_iter unique cycles satisfying the violation condition.
    • Check Convergence: If no violated cycles are found, the algorithm has converged.
    • Select & Add Constraints: Add the found violated cycles to the master list (added_cycles). Update the list of constraint vectors (N_constraints_dynamic). Check for duplicates; stop if a duplicate is added.
    • Solve QP: Re-setup the OSQP solver with the updated constraint matrix (A, bounds l, u). Call solver.warm_start() with the stored primal (x) and potentially dual (y) solution from the previous iteration. Call solver.solve() to get the updated rho. Store the new solution for the next warm start. Update pruning state if applicable.
  3. Output: Return the final rho* map (final_rho_star_map), the list of all cycles added as constraints (added_cycles), the final modulus value (final_modulus), the total number of QP solves (qp_solves_count), and total runtime (total_time).

Implementation Details

  • Core Function: calculate_loop_modulus_rho_preprocessed(...) (Note: name includes "preprocessed" reflecting its standard operation now).
  • Dependencies: networkx, numpy, osqp, scipy (for sparse matrices). matplotlib is needed for plotting examples.
  • Cycle Finding:
    • Preprocessing uses get_triangles.
    • Iterative step uses get_top_k_shortest_p_length_cycles (APSP-based).
  • QP Solver: osqp via the osqp Python package. Warm start is used via solver.warm_start().
  • Pruning Heuristic: Optional feature controlled by use_pruning_heuristic and related parameters. Uses BFS to find nodes near recently added cycles.

Usage

import networkx as nx
# Assuming other necessary imports and helper functions (like cycle finders) are defined

# --- Create or load your graph ---
# G = nx.karate_club_graph()
# G = nx.read_graphml("my_graph.graphml")
# Example: Load Cholera data (requires geopandas, libpysal - see test script)
# from load_cholera import load_cholera_graph # Hypothetical helper
# G, positions = load_cholera_graph()

# --- Set Parameters ---
TOLERANCE = 1e-4
MAX_ITER_AFTER_PREP = 2000 # Max iterations *after* initial solve
INITIAL_SET_SIZE = 200 # Target #cycles from preprocessing (adjust based on graph size)
OVERLAP_ALPHA = 0.1    # Penalty for overlap in preprocessing heuristic
OSQP_ABS_TOL = 1e-5
OSQP_REL_TOL = 1e-5
OSQP_MAX_ITER = 15000  # Increase OSQP internal iterations if needed
CYCLES_PER_ITER = 5    # Add top 5 new violated cycles each outer iteration
USE_PRUNING = True     # Enable the temporary graph pruning heuristic
PRUNING_INTERVAL = 10  # Reset pruned graph every 10 iterations
PRUNING_DIST = 3       # Prune nodes > 3 hops away
PRINT_VERBOSE = False  # Reduce console output during iterations

# --- Run the Calculation ---
final_rho_star_map, edge_list, added_cycles, final_modulus, qp_solves_count, total_time = \
    calculate_loop_modulus_rho_preprocessed(
        graph=G,
        tolerance=TOLERANCE,
        max_iterations=MAX_ITER_AFTER_PREP,
        initial_set_target_size=INITIAL_SET_SIZE,
        overlap_penalty_alpha=OVERLAP_ALPHA,
        osqp_eps_abs=OSQP_ABS_TOL,
        osqp_eps_rel=OSQP_REL_TOL,
        osqp_max_iter=OSQP_MAX_ITER,
        cycles_to_add_per_iter=CYCLES_PER_ITER,
        use_pruning_heuristic=USE_PRUNING,
        pruning_reset_interval=PRUNING_INTERVAL,
        pruning_distance_threshold=PRUNING_DIST,
        # pruning_max_sources=20, # Can add this back if needed
        verbose_iterations=PRINT_VERBOSE
    )

# --- Analyze Results ---
print(f"\n=== Final Results ===")
print(f"QP Solves: {qp_solves_count}")
print(f"Total Time (s): {total_time:.2f}")
print(f"Final Modulus: {final_modulus:.6f}")
print(f"Total Constraints Added: {len(added_cycles)}")

# --- Optional: Create weighted graph ---
# G_weighted = nx.Graph()
# for edge, rho_val in final_rho_star_map.items():
#     if rho_val > 1e-6: G_weighted.add_edge(edge[0], edge[1], weight=rho_val)

# --- Optional: Plotting ---
# Needs positions dictionary and list of initial cycles from preprocessing (if desired)
# initial_cycles_used = added_cycles[:len(initial_cycles_list)] # Get initial cycles back
# plot_initial_cycles(G, positions, initial_cycles_used, ...)
# plot_final_rho(G, positions, final_rho_star_map, edge_list, ...)

Example Results (Cholera Dataset)

Running this improved algorithm (OSQP + Refined Preprocessing + Warm Start) on the Cholera dataset graph (~324 nodes, ~941 edges) yielded significant performance gains:

Method QP Solves Total Time (s) Final Modulus Total Constraints Notes
This Impl. (OSQP+Prep) 67 ~12.08 ~100.72 556 Latest optimized implementation
Prev. (CVXOPT+BasicPrep) ~308 ~475 ~97.6 ~518 From previous logs
Original (CVXOPT+BFS) ~475 ~700 ~99.1 ~475 From previous logs

Note: Exact times, modulus values, and QP solves may vary based on hardware, specific parameters (like initial_set_target_size, overlap_penalty_alpha), random elements in selection, and OSQP's internal convergence.

The results show a dramatic reduction in total runtime compared to previous versions, primarily due to the speed of OSQP and effective warm starting. The preprocessing helps establish a good starting point, reducing the number of subsequent iterations needed compared to starting "cold".

Cholera Dataset Visualization

Original Delaunay Graph

Delaunay Graph of Cholera Cases

Loop Modulus Results

Loop Modulus Results on Cholera Dataset

The visualization shows the initial cycle set used for preprocessing (left) and the final rho* weights calculated (right). Edges with higher rho* values (brighter color) represent more important connections in the graph with respect to cycle structure.

Running the Test Case

To run the cholera dataset test case:

# Install required dependencies
pip install -r requirements.txt
# Install geographic dependencies if needed
pip install libpysal geopandas contextily

# Run the test
python test_cholera.py

The test script will:

  1. Load the cholera dataset and create a Delaunay graph
  2. Plot the original graph
  3. Run the loop modulus calculation
  4. Visualize the initial cycles and final rho* weights
  5. Report performance metrics

Expected output:

=== Running Modulus Calculation with Preprocessing and heuristics===
--- Preprocessing: Finding initial cycles (Triangles Only) ---
Found 633 triangles.
Scored 633 valid.
Selecting initial set...
Target size: 235
Preprocessing finished (11.92s). Using 235 cycles.

--- Solving Initial QP with OSQP (235) ---
Initial QP (0.01s). Mod=78.333215

Starting Main Loop. Tolerance=0.0001, Max Iter=1882, Parallel Seeds=4, Cycles/Iter=5

Iter 1 (0.15s): Found 133 candidates (4 seeds). Adding 5 new.
  Best l_rho added: 0.333333
Solving OSQP (240)
QP solved (0.01s). Mod=79.121125

...

Finished Modulus Calculation main loop.
--- Final State Summary ---
 Converged. Last check found no new violated cycles or length >= tolerance.
 Final Modulus approx: 100.720820
 Last QP solve time: 0.01s
Constraints added (after initial): 321, Total constraints: 556
Final Modulus approx Mod2(L'): 100.720820

Potential Further Improvements

While the current implementation is significantly faster, further enhancements are possible:

  1. Active-Set QP Solver with Constraint Dropping:

    • Concept: Implement or use a QP solver library (e.g., qpOASES, Gurobi, MOSEK) that employs true active-set methods which can drop inactive constraints.
    • Benefit: Keeps the QP problem size smaller in later iterations, potentially leading to faster solves, especially if the final optimal active set is much smaller than the total number of constraints encountered.
  2. Adaptive Constraint Addition / Finding:

    • Concept: Adjust cycles_to_add_per_iter dynamically based on convergence progress. Explore faster, heuristic methods for get_top_k_shortest_p_length_cycles (e.g., sampling sources for Dijkstra) instead of full APSP, possibly accepting slightly more outer iterations for much faster constraint finding.
    • Benefit: Optimizes the trade-off between the cost of finding constraints and the number of QP solves needed.
  3. Refined Pruning Heuristic:

    • Concept: Improve the logic for when and how to prune the graph for cycle searching. Could pruning be based on rho values (pruning low-rho edges/regions) instead of just graph distance? Tune parameters like pruning_reset_interval more carefully.
    • Benefit: Make the cycle search even faster without significantly impacting the quality of the found violated constraints.
  4. C/C++ Implementation:

    • Concept: Porting the QP interface and potentially cycle searching to C/C++ for maximum speed.
    • Benefit: Raw performance gain, especially for very large graphs. High development effort.

Conclusion

This implementation provides an efficient method for calculating the loop modulus and associated edge weights (rho*). The combination of targeted preprocessing (using triangles and a coverage/overlap heuristic), an efficient QP solver (OSQP) with warm starting, batch constraint addition, and optional graph pruning offers substantial performance improvements over simpler approaches, making the analysis feasible for moderately sized graphs. Further speedups are possible by exploring true active-set QP methods and more adaptive algorithm strategies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published