Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Aggregation of CPU, maybe even Utilization in general. #44

Open
stiesssh opened this issue Jul 3, 2024 · 0 comments
Open

Comments

@stiesssh
Copy link
Contributor

stiesssh commented Jul 3, 2024

Current Behaviour

  • Probe: for a given point in time, the CPU - utilization probe emit values $1$ or $0$, if jobs processed or waiting, or not.
  • Aggregation: Our aggregation aggregates all values while disregarding the duration. Which is perfectly fine with operation response time and other probes, whose value apply for a single point in time only. However, for the utilization, we must also consider the duration of each value, which we currently do not. In fact, we do consider the duration, that's what the timefactor is for, however it is still broken, because we do not (and at this point cannot) differentiate which resource emmitted a value, when aggregating, as a result a resource utilization like on the left would result in a very high average utilization, and a resource utilization like on the right would result in a very low utilization, whereas it should actually be both around 50%.

Image

Extereme cases as in the examples are unlikely, but i had infact a simulation run, where i had two replicase of a resource and for some seconds only one of the received requests while the other was idle.
And with the currect EMA calculation, the idle time was disregarded, because the $1$ reported by the other resource "ended" the idle time of the first resource.

Desired Behaviour

Correct Aggregation results for all metrics.

Possible Solutions

Hacky fix (i tried it, it works)

see branches fix-Utilisation (still local) in SPD-Interpreter and Monitoring repositories.

  • Idea: Probe already considers the duration when emitting utilisation values $\rightarrow$ emitted values for utilisation are now "utilisation during two successive event". Aggregator must be adapted accordingly.
  • Problems:
    • This works with average, and probably also with sum but will probably break on others.
    • The emitted measurement values are not consistent with Palladios usual understanding of Utilization.

Better Solution

  • Enforce Aggregation for all Utilization Monitors (as SimuLizar does)
    • $\rightarrow$ requires check for defined Monitors
  • Aggregate across multiple CPUs by aggregating the values that are already aggregated-per-resource
    • $\rightarrow$ requires filter chain to listen to MeasurementUpdated instead of MeasurementMade
    • $\rightarrow$ first make sure that aggregation-per-resource is correct (should be though, because it relies on aggregator classes from org.palladiosimulator.monitorrepository.statisticalcharacterization)
  • Problems:
    • the windows of the aggregation-per-CPU must have the same point in time for start and end, or else it get's weird.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant