This repository contains a Python implementation for calculating the Loop Modulus (specifically for p=2) of a network. The modulus quantifies the "richness" of cycles in a graph, providing a measure related to connectivity, robustness, and flow properties. The extremal density rho*
derived from the modulus calculation can be used as edge weights to enhance downstream tasks like community detection.
This implementation utilizes an iterative algorithm based on convex optimization (Quadratic Programming - QP) and incorporates several performance enhancements:
- Refined Preprocessing: Identifies an initial set of potentially important cycles (currently focused on triangles) using heuristics based on edge centrality (degree product) and a greedy strategy balancing edge coverage and low pairwise overlap to "warm-start" the optimization.
- Efficient QP Solver: Uses the OSQP solver, known for its speed on large sparse problems relevant to this iterative constraint addition process.
- Warm Starting: Leverages OSQP's warm start capability (
solver.warm_start()
) to significantly speed up QP solves in subsequent iterations by initializing with the previous solution. - Temporary Graph Pruning (Heuristic): Optionally prunes the graph temporarily for cycle searching, focusing the search near recently added constraints to potentially accelerate the bottleneck constraint-finding step (
use_pruning_heuristic=True
). - Batch Constraint Addition: Adds multiple (
cycles_to_add_per_iter
) violating constraints per iteration to reduce the total number of outer loops and QP solves compared to adding only one.
The 2-Modulus of the family of all simple loops L
in a graph G=(V, E)
is defined via the optimization problem:
Primal Problem:
minimize sum_{e in E} rho(e)^2
subject to sum_{e in gamma} rho(e) >= 1 for all simple loops gamma in L
rho(e) >= 0 for all edges e in E
where rho: E -> R+
is the density function on edges. The initial edge weights sigma(e)
are assumed to be 1 in this implementation, so the energy minimized is sum(rho(e)^2)
. The vector rho
represents the variable we solve for.
The optimal rho*
represents the "importance" or expected usage of each edge within the context of the loop family, constrained by the loop length condition. The final modulus value Mod2(L) = sum_{e in E} rho*(e)^2
measures the overall richness of the loop structure satisfying the constraints.
Dual Problem & Probabilistic Interpretation:
The dual problem involves finding Lagrange multipliers lambda(gamma)
for each loop constraint. For p=2, this can be interpreted probabilistically: Mod2(L)
is related to minimizing the expected overlap between two randomly chosen loops (E[C(gamma, gamma')]
), where the probability distribution mu
over loops is derived from the optimal Lagrange multipliers (lambda = nu * mu
, nu
is a scalar). Edges with high rho*
are those with high expected usage (rho*(e) = Mod2(L) * E_mu[N(gamma, e)]
) under this optimal, minimum-overlap distribution.
The core algorithm follows the iterative approach described in the source papers (related to Algorithm 1):
-
Initialization:
- (Preprocessing) Generate candidate cycles (currently triangles via
get_triangles
). Score them based on edge centrality (degree product). Select an initial set (initial_cycles_list
) using a greedy heuristic (calculate_loop_modulus_rho_preprocessed
internal logic) that balances maximizing edge coverage and minimizing pairwise overlap among selected cycles, up to a target size (initial_set_target_size
). - Build the initial constraint matrix
N
(sparse format) where rows correspond to the selected initial cycles (N[i, j] = 1
if edgej
is in cyclei
). - Solve the initial QP using OSQP (
solver.setup
andsolver.solve
) to get the starting densityrho
. Store the solution (x
,y
) for warm starting.
- (Preprocessing) Generate candidate cycles (currently triangles via
-
Iteration: Repeat until convergence (
max_iterations
reached or no violated constraints found):- Determine Search Graph: If the pruning heuristic (
use_pruning_heuristic
) is enabled and the reset interval (pruning_reset_interval
) hasn't been reached, use a pruned subgraph (current_graph_view
) focused around nodes near the last added cycle(s); otherwise, use the full graph. - Find Violated Constraints: Using the current
rho
as edge weights (rho_map
), search for simple cyclesgamma
where the p-length (sum_{e in gamma} rho(e)
) is less than1 - tolerance
.- This implementation uses
get_top_k_shortest_p_length_cycles
which internally performs an APSP (All-Pairs Shortest Path) calculation on the current search graph weighted byrho
. It finds up tocycles_to_add_per_iter
unique cycles satisfying the violation condition.
- This implementation uses
- Check Convergence: If no violated cycles are found, the algorithm has converged.
- Select & Add Constraints: Add the found violated cycles to the master list (
added_cycles
). Update the list of constraint vectors (N_constraints_dynamic
). Check for duplicates; stop if a duplicate is added. - Solve QP: Re-
setup
the OSQP solver with the updated constraint matrix (A
, boundsl
,u
). Callsolver.warm_start()
with the stored primal (x
) and potentially dual (y
) solution from the previous iteration. Callsolver.solve()
to get the updatedrho
. Store the new solution for the next warm start. Update pruning state if applicable.
- Determine Search Graph: If the pruning heuristic (
-
Output: Return the final
rho*
map (final_rho_star_map
), the list of all cycles added as constraints (added_cycles
), the final modulus value (final_modulus
), the total number of QP solves (qp_solves_count
), and total runtime (total_time
).
- Core Function:
calculate_loop_modulus_rho_preprocessed(...)
(Note: name includes "preprocessed" reflecting its standard operation now). - Dependencies:
networkx
,numpy
,osqp
,scipy
(for sparse matrices).matplotlib
is needed for plotting examples. - Cycle Finding:
- Preprocessing uses
get_triangles
. - Iterative step uses
get_top_k_shortest_p_length_cycles
(APSP-based).
- Preprocessing uses
- QP Solver:
osqp
via theosqp
Python package. Warm start is used viasolver.warm_start()
. - Pruning Heuristic: Optional feature controlled by
use_pruning_heuristic
and related parameters. Uses BFS to find nodes near recently added cycles.
import networkx as nx
# Assuming other necessary imports and helper functions (like cycle finders) are defined
# --- Create or load your graph ---
# G = nx.karate_club_graph()
# G = nx.read_graphml("my_graph.graphml")
# Example: Load Cholera data (requires geopandas, libpysal - see test script)
# from load_cholera import load_cholera_graph # Hypothetical helper
# G, positions = load_cholera_graph()
# --- Set Parameters ---
TOLERANCE = 1e-4
MAX_ITER_AFTER_PREP = 2000 # Max iterations *after* initial solve
INITIAL_SET_SIZE = 200 # Target #cycles from preprocessing (adjust based on graph size)
OVERLAP_ALPHA = 0.1 # Penalty for overlap in preprocessing heuristic
OSQP_ABS_TOL = 1e-5
OSQP_REL_TOL = 1e-5
OSQP_MAX_ITER = 15000 # Increase OSQP internal iterations if needed
CYCLES_PER_ITER = 5 # Add top 5 new violated cycles each outer iteration
USE_PRUNING = True # Enable the temporary graph pruning heuristic
PRUNING_INTERVAL = 10 # Reset pruned graph every 10 iterations
PRUNING_DIST = 3 # Prune nodes > 3 hops away
PRINT_VERBOSE = False # Reduce console output during iterations
# --- Run the Calculation ---
final_rho_star_map, edge_list, added_cycles, final_modulus, qp_solves_count, total_time = \
calculate_loop_modulus_rho_preprocessed(
graph=G,
tolerance=TOLERANCE,
max_iterations=MAX_ITER_AFTER_PREP,
initial_set_target_size=INITIAL_SET_SIZE,
overlap_penalty_alpha=OVERLAP_ALPHA,
osqp_eps_abs=OSQP_ABS_TOL,
osqp_eps_rel=OSQP_REL_TOL,
osqp_max_iter=OSQP_MAX_ITER,
cycles_to_add_per_iter=CYCLES_PER_ITER,
use_pruning_heuristic=USE_PRUNING,
pruning_reset_interval=PRUNING_INTERVAL,
pruning_distance_threshold=PRUNING_DIST,
# pruning_max_sources=20, # Can add this back if needed
verbose_iterations=PRINT_VERBOSE
)
# --- Analyze Results ---
print(f"\n=== Final Results ===")
print(f"QP Solves: {qp_solves_count}")
print(f"Total Time (s): {total_time:.2f}")
print(f"Final Modulus: {final_modulus:.6f}")
print(f"Total Constraints Added: {len(added_cycles)}")
# --- Optional: Create weighted graph ---
# G_weighted = nx.Graph()
# for edge, rho_val in final_rho_star_map.items():
# if rho_val > 1e-6: G_weighted.add_edge(edge[0], edge[1], weight=rho_val)
# --- Optional: Plotting ---
# Needs positions dictionary and list of initial cycles from preprocessing (if desired)
# initial_cycles_used = added_cycles[:len(initial_cycles_list)] # Get initial cycles back
# plot_initial_cycles(G, positions, initial_cycles_used, ...)
# plot_final_rho(G, positions, final_rho_star_map, edge_list, ...)
Running this improved algorithm (OSQP + Refined Preprocessing + Warm Start) on the Cholera dataset graph (~324 nodes, ~941 edges) yielded significant performance gains:
Method | QP Solves | Total Time (s) | Final Modulus | Total Constraints | Notes |
---|---|---|---|---|---|
This Impl. (OSQP+Prep) | 67 | ~12.08 | ~100.72 | 556 | Latest optimized implementation |
Prev. (CVXOPT+BasicPrep) | ~308 | ~475 | ~97.6 | ~518 | From previous logs |
Original (CVXOPT+BFS) | ~475 | ~700 | ~99.1 | ~475 | From previous logs |
Note: Exact times, modulus values, and QP solves may vary based on hardware, specific parameters (like initial_set_target_size
, overlap_penalty_alpha
), random elements in selection, and OSQP's internal convergence.
The results show a dramatic reduction in total runtime compared to previous versions, primarily due to the speed of OSQP and effective warm starting. The preprocessing helps establish a good starting point, reducing the number of subsequent iterations needed compared to starting "cold".
The visualization shows the initial cycle set used for preprocessing (left) and the final rho* weights calculated (right). Edges with higher rho* values (brighter color) represent more important connections in the graph with respect to cycle structure.
To run the cholera dataset test case:
# Install required dependencies
pip install -r requirements.txt
# Install geographic dependencies if needed
pip install libpysal geopandas contextily
# Run the test
python test_cholera.py
The test script will:
- Load the cholera dataset and create a Delaunay graph
- Plot the original graph
- Run the loop modulus calculation
- Visualize the initial cycles and final rho* weights
- Report performance metrics
Expected output:
=== Running Modulus Calculation with Preprocessing and heuristics===
--- Preprocessing: Finding initial cycles (Triangles Only) ---
Found 633 triangles.
Scored 633 valid.
Selecting initial set...
Target size: 235
Preprocessing finished (11.92s). Using 235 cycles.
--- Solving Initial QP with OSQP (235) ---
Initial QP (0.01s). Mod=78.333215
Starting Main Loop. Tolerance=0.0001, Max Iter=1882, Parallel Seeds=4, Cycles/Iter=5
Iter 1 (0.15s): Found 133 candidates (4 seeds). Adding 5 new.
Best l_rho added: 0.333333
Solving OSQP (240)
QP solved (0.01s). Mod=79.121125
...
Finished Modulus Calculation main loop.
--- Final State Summary ---
Converged. Last check found no new violated cycles or length >= tolerance.
Final Modulus approx: 100.720820
Last QP solve time: 0.01s
Constraints added (after initial): 321, Total constraints: 556
Final Modulus approx Mod2(L'): 100.720820
While the current implementation is significantly faster, further enhancements are possible:
-
Active-Set QP Solver with Constraint Dropping:
- Concept: Implement or use a QP solver library (e.g., qpOASES, Gurobi, MOSEK) that employs true active-set methods which can drop inactive constraints.
- Benefit: Keeps the QP problem size smaller in later iterations, potentially leading to faster solves, especially if the final optimal active set is much smaller than the total number of constraints encountered.
-
Adaptive Constraint Addition / Finding:
- Concept: Adjust
cycles_to_add_per_iter
dynamically based on convergence progress. Explore faster, heuristic methods forget_top_k_shortest_p_length_cycles
(e.g., sampling sources for Dijkstra) instead of full APSP, possibly accepting slightly more outer iterations for much faster constraint finding. - Benefit: Optimizes the trade-off between the cost of finding constraints and the number of QP solves needed.
- Concept: Adjust
-
Refined Pruning Heuristic:
- Concept: Improve the logic for when and how to prune the graph for cycle searching. Could pruning be based on
rho
values (pruning low-rho
edges/regions) instead of just graph distance? Tune parameters likepruning_reset_interval
more carefully. - Benefit: Make the cycle search even faster without significantly impacting the quality of the found violated constraints.
- Concept: Improve the logic for when and how to prune the graph for cycle searching. Could pruning be based on
-
C/C++ Implementation:
- Concept: Porting the QP interface and potentially cycle searching to C/C++ for maximum speed.
- Benefit: Raw performance gain, especially for very large graphs. High development effort.
This implementation provides an efficient method for calculating the loop modulus and associated edge weights (rho*
). The combination of targeted preprocessing (using triangles and a coverage/overlap heuristic), an efficient QP solver (OSQP) with warm starting, batch constraint addition, and optional graph pruning offers substantial performance improvements over simpler approaches, making the analysis feasible for moderately sized graphs. Further speedups are possible by exploring true active-set QP methods and more adaptive algorithm strategies.