You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have records of high memory footprint (30 GB) on cloudbeat in cases where the GCP account under scan has a vast number of assets.
Currently, GCP (and Azure) fetchers work by reading all pages of paginated responses, in memory, and then parsing and sending the assets to the channel for further processing.
This could lead to the memory footprint being scaled linearly with the amount of scanned GCP assets. In cases that the account/project contains a vast amount of resources, the RAM footprint could peak accordingly.
A pprof run shows a big memory footprint also by the elastic event encoder, but it has not been verified whether it scales linearly with the amount of resources or it's capped.
Preconditions
GCP project/account with big number of resources.
Additional context
We should refactor the GCP fetchers to stream each page rather than wait to collect all pages when calling the gcp SDK to fetch the resources.
We should check if there is any implication / any change in the rest steps of the mechanism to support that stream-based approach (e.g. does the rest of the mechanism expect all the resources grouped by type or it can support multiple batches per type).
This can be done step by step (fetcher by fetcher) and test accordingly.
To test the improvement we need a GCP account with a lot of resources or to mock the GCP part.
Note: The same pattern has been followed in Azure pagination. We can investigate Azure on a later step based on findings on GCP refactor
The text was updated successfully, but these errors were encountered:
Describe the bug
We have records of high memory footprint (30 GB) on cloudbeat in cases where the GCP account under scan has a vast number of assets.
Currently, GCP (and Azure) fetchers work by reading all pages of paginated responses, in memory, and then parsing and sending the assets to the channel for further processing.
This could lead to the memory footprint being scaled linearly with the amount of scanned GCP assets. In cases that the account/project contains a vast amount of resources, the RAM footprint could peak accordingly.
A pprof run shows a big memory footprint also by the elastic event encoder, but it has not been verified whether it scales linearly with the amount of resources or it's capped.
Preconditions
GCP project/account with big number of resources.
Additional context
We should refactor the GCP fetchers to stream each page rather than wait to collect all pages when calling the gcp SDK to fetch the resources.
We should check if there is any implication / any change in the rest steps of the mechanism to support that stream-based approach (e.g. does the rest of the mechanism expect all the resources grouped by type or it can support multiple batches per type).
This can be done step by step (fetcher by fetcher) and test accordingly.
To test the improvement we need a GCP account with a lot of resources or to mock the GCP part.
Note: The same pattern has been followed in Azure pagination. We can investigate Azure on a later step based on findings on GCP refactor
The text was updated successfully, but these errors were encountered: