Slow performance instantiating large problems #749

MBradbury · 2024-05-03T09:36:07Z

Details for the issue

What did you do?

Creating large problems in PULP is slow, there are a two issues:

Creating expressions via pulp.lpSum is slower than directly creating them via pulp.LpAffineExpression
Making constraints requires allocating and filling an additional 2 copies of an LpAffineExpression

import pulp, numpy as np, itertools

# Some reward matrix
reward = np.full((300, 200, 50), 10)

prob = pulp.LpProblem("Problem", pulp.constants.LpMaximize)
dv = pulp.LpVariable.dicts("dv",
       itertools.product(range(300), range(200), range(50)),
       cat=pulp.constants.LpBinary)

Slow objective function:

prob += pulp.lpSum([
            dv[a, b, c] * reward[a, b, c]
            for a in range(300)
            for b in range(200)
            for c in range(50)
        ])

Faster objective function:

prob += pulp.LpAffineExpression(
            (dv[a, b, c], reward[a, b, c])
            for a in range(300)
            for b in range(200)
            for c in range(50)
        )

Slow constraint:

for c in range(50):
  prob += pulp.lpSum([
              dv[a, b, c]
              for a in range(300)
              for b in range(200)
          ]) == 1

Faster constraint:

for c in range(50):
  prob += pulp.LpAffineExpression(
              (dv[a, b, c], 1)
              for a in range(300)
              for b in range(200)
          ) == 1

Making constraints is slow as self - other creates the first LpAffineExpression copy at https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L980, https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L983 or https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L986. A second copy is then made when LpConstraint's constructor is called (https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L1012).

It seems as if one of these copies can be avoided if the rhs is set when other is int/float. So in LpAffineExpression.__eq__ this would be:

def __eq__(self, other):
        if isinstance(other, (float, int)):
            return LpConstraint(self, const.LpConstraintEQ, rhs=other)
        else:
            return LpConstraint(self - other, const.LpConstraintEQ)

Avoiding the second copy of LpAffineExpression would require that LpConstraint does not inherit from LpAffineExpression, which is a more significant change.

Some performance numbers using the attached script:

Slow Objective: 16.692 seconds
Slow Constraint: 22.000 seconds
Fast Objective: 2.945 seconds
Fast Constraint: 20.278 seconds
Faster Constraint: 15.407 seconds

Happy to send a PR with the change to use rhs if this looks correct.
Unsure about the right way forward with lpSum, as LpAffineExpression's are constructed when doing the internal multiplication, ideally something more lightweight (e.g., tuple) would be used as the result of the multiplication.

Useful extra information

The info below often helps, please fill it out if you're able to. :)

What operating system are you using?

Windows: Microsoft Windows [Version 10.0.19045.4291]

I'm using python version: Python 3.10.12

I installed PuLP via:

pypi (python -m pip install pulp)

Did you also

Tried out the latest github version: https://github.com/coin-or/pulp
Searched for an existing similar issue: https://github.com/coin-or/pulp/issues?utf8=%E2%9C%93&q=is%3Aissue%20

The text was updated successfully, but these errors were encountered:

pchtsp · 2024-05-03T17:49:09Z

Thanks for the issue. Indeed some improvements can be done in efficiency here. If you have a proposal, feel free to do a PR and I'll check it. The comments make sense I can't check if the proposal is correct. We can let the tests check that. F.

…

On Fri, May 3, 2024, 11:36 Matthew Bradbury ***@***.***> wrote: Details for the issue What did you do? Creating large problems in PULP is slow, there are a two issues: 1. Creating expressions via pulp.lpSum is slower than directly creating them via pulp.LpAffineExpression 2. Making constraints requires allocating and filling an additional 2 copies of an LpAffineExpression import pulp, numpy as np, itertools # Some reward matrixreward = np.full((300, 200, 50), 10) prob = pulp.LpProblem("Problem", pulp.constants.LpMaximize)dv = pulp.LpVariable.dicts("dv", itertools.product(range(300), range(200), range(50)), cat=pulp.constants.LpBinary) Slow objective function: prob += pulp.lpSum([ dv[a, b, c] * reward[a, b, c] for a in range(300) for b in range(200) for c in range(50) ]) Faster objective function: prob += pulp.LpAffineExpression( (dv[a, b, c], reward[a, b, c]) for a in range(300) for b in range(200) for c in range(50) ) Slow constraint: for c in range(50): prob += pulp.lpSum([ dv[a, b, c] for a in range(300) for b in range(200) ]) == 1 Faster constraint: for c in range(50): prob += pulp.LpAffineExpression( (dv[a, b, c], 1) for a in range(300) for b in range(200) ) == 1 Making constraints is slow as self - other creates the first LpAffineExpression copy at https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L980, https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L983 or https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L986. A second copy is then made when LpConstraint's constructor is called ( https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L1012). It seems as if one of these copies can be avoided if the rhs is set when other is int/float. So in LpAffineExpression.__eq__ this would be: def __eq__(self, other): if isinstance(other, (float, int)): return LpConstraint(self, const.LpConstraintEQ, rhs=other) else: return LpConstraint(self - other, const.LpConstraintEQ) Avoiding the second copy of LpAffineExpression would require that LpConstraint does not inherit from LpAffineExpression, which is a more significant change. Some performance numbers using the attached <https://github.com/coin-or/pulp/files/15198221/test.txt> script: Slow Objective: 16.692 seconds Slow Constraint: 22.000 seconds Fast Objective: 2.945 seconds Fast Constraint: 20.278 seconds Faster Constraint: 15.407 seconds - Happy to send a PR with the change to use rhs if this looks correct. - Unsure about the right way forward with lpSum, as LpAffineExpression's are constructed when doing the internal multiplication, ideally something more lightweight (e.g., tuple) would be used as the result of the multiplication. Useful extra information The info below often helps, please fill it out if you're able to. :) What operating system are you using? - Windows: Microsoft Windows [Version 10.0.19045.4291] I'm using python version: Python 3.10.12 I installed PuLP via: - pypi (python -m pip install pulp) Did you also - Tried out the latest github version: https://github.com/coin-or/pulp - Searched for an existing similar issue: https://github.com/coin-or/pulp/issues?utf8=%E2%9C%93&q=is%3Aissue%20 — Reply to this email directly, view it on GitHub <#749>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJUZ42WYOW3JNOEGOKVY23ZANLB5AVCNFSM6AAAAABHFGOANWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TOMRZGYYDMNQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

MBradbury · 2024-05-03T20:18:45Z

Some initial work here: https://github.com/MBradbury/pulp/tree/perf

My test file now shows these results:

Slow Objective: 16.397 seconds
Slow Constraint: 7.267 seconds
Fast Objective: 2.675 seconds
Fast Constraint: 6.413 seconds

Faster Constraint is no longer present as the change has been incorporated.

No change in performance of building objectives, improvement for constraints. There are still some performance wins when using tuples over lpSum, but constraints are faster now as they avoid copying LpAffineExpression content and just use the object that was provided to them.

The tests I am able to run are fine:

Ran 1327 tests in 40.831s

OK (skipped=1264)

I am running a workflow with ~700k jobs and, at each given time, there are around 230k jobs ready to be run. The initial building of the DAG is quite slow (~2h, but I'll leave that for another PR 😄), but the main issue is that the scheduler takes a lot of time deciding the next jobs to be submitted. In my case, all jobs are quite fast and similar in terms of resources, so the cluster is idle most of the time. The greedy scheduler is considerably faster, but still too slow. The ILP should switch to the greedy after 10s, but it sometimes ignores the timeout (coin-or/Cbc#487) and it has been reported being quite slow instantiating large problems (coin-or/pulp#749). In my case, the ILP runs for 60s (the pulp file is 100Mb) before switching to greedy. Apart from that, and specially on slow file systems, the scheduler can still be quite slow checking all temp and input files. Here, I propose sampling ready jobs, so that only a subset of jobs (instead of all ready jobs) are evaluated by the scheduler. In my tests, this greatly reduces the scheduler time: | | ILP | greedy | |---|---|---| | Native |15 - 20 mins |30s - 1 min | | Sampling 1000 jobs | | 1 - 2 s | ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new argument `--scheduler-subsample` to optimize job scheduling by limiting the number of jobs considered for execution. - Added a method for inferring resource requirements, enhancing user experience with better error handling. - Updated settings to include a new attribute for job subsampling, improving scheduling flexibility. - **Bug Fixes** - Improved error handling and logging for resource evaluation and parsing, providing clearer guidance for users. - Enhanced job selection process with a subsampling mechanism to optimize scheduling efficiency. - **Refactor** - Enhanced structure and organization of job scheduling logic for better integration with existing functionality.  --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

MBradbury mentioned this issue May 4, 2024

Improve performance for large models #750

Open

fgvieira mentioned this issue Oct 4, 2024

feat: subsample jobs to speed-up scheduler snakemake/snakemake#3112

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance instantiating large problems #749

Slow performance instantiating large problems #749

MBradbury commented May 3, 2024

pchtsp commented May 3, 2024 via email

MBradbury commented May 3, 2024

Slow performance instantiating large problems #749

Slow performance instantiating large problems #749

Comments

MBradbury commented May 3, 2024

Details for the issue

What did you do?

Useful extra information

What operating system are you using?

I'm using python version: Python 3.10.12

I installed PuLP via:

Did you also

pchtsp commented May 3, 2024 via email

MBradbury commented May 3, 2024