Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: Fix to optimize the time taken while batching huge configs #17672

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

raja-rajasekar
Copy link
Contributor

Issue: When the incoming config has say 30K entries of a prefix-lists, current implementation is to schedule the configs to be batched and only after batching the entire config, the processing of the configs take place. As part of batching this config, we perform string concatenation to save all the configs in the buffer which over time results in taking longer time.

Ex: Imagine each line of config is 50 chars. With a delimiter of ‘- ‘ we end up adding 52 chars to buffer for each command i.e. 52*30000 = 156K of chars. Strlcat is an expensive operation and every time we strlcat, we have to traverse at end of string to append new char.
Because of this, we end up adding extra 6-8 secs for accepting the config.

Fix: The idea here is to bring back something similar to the backoff count implemented as part of 20e9a40 (lib: introduce configuration back-off timer for YANG-modeled commands).

Essentially we keep a cap of 5000 per batch. So once 5000k config commands are batched, we process them, clear the buffer, set the count to 0 and then continue processing the rest of the config.

option1 file has 30K entries of prefix-list
Without Fix:
root@mlx-3700-20:mgmt:/var/log/raja/frr# time sudo vtysh -f option1 ..............
Waiting for children to finish applying config...
[25191|staticd] done
[25189|watchfrr] done
[25178|ospfd] done
[25190|pbrd] done
[25181|bgpd] done
[25175|zebra] done

real 0m20.123s
user 0m9.384s
sys 0m2.403s

With Fix:
root@mlx-3700-20:mgmt:/var/log/raja/frr# time sudo vtysh -f option1 ..............
Waiting for children to finish applying config...
[19887|staticd] done
[19885|watchfrr] done
[19886|pbrd] done
[19874|ospfd] done
[19877|bgpd] done
[19871|zebra] done

real 0m12.168s
user 0m7.511s
sys 0m1.981s

Issue: 3589101

Ticket# 3589101

Issue: When the incoming config has say 30K entries of a prefix-lists,
current implementation is to schedule the configs to be batched and
only after batching the entire config, the processing of the configs
take place. As part of batching this config, we perform string
concatenation to save all the configs in the buffer which over time
results in taking longer time.

Ex: Imagine each line of config is 50 chars. With a delimiter of ‘- ‘ we end
up adding 52 chars to buffer for each command i.e. 52*30000 = 156K of chars.
Strlcat is an expensive operation and every time we strlcat, we have to
traverse at end of string to append new char.
Because of this, we end up adding extra 6-8 secs for accepting the config.

Fix: The idea here is to bring back something similar to the backoff
count implemented as part of 20e9a40 (lib: introduce configuration
back-off timer for YANG-modeled commands).

Essentially we keep a cap of 5000 per batch. So once 5000k config
commands are batched, we process them, clear the buffer, set the count
to 0 and then continue processing the rest of the config.

option1 file has 30K entries of prefix-list
Without Fix:
root@mlx-3700-20:mgmt:/var/log/raja/frr# time sudo vtysh -f option1
<SNIP>..............
Waiting for children to finish applying config...
[25191|staticd] done
[25189|watchfrr] done
[25178|ospfd] done
[25190|pbrd] done
[25181|bgpd] done
[25175|zebra] done

real    0m20.123s
user    0m9.384s
sys     0m2.403s

With Fix:
root@mlx-3700-20:mgmt:/var/log/raja/frr# time sudo vtysh -f option1
<SNIP>..............
Waiting for children to finish applying config...
[19887|staticd] done
[19885|watchfrr] done
[19886|pbrd] done
[19874|ospfd] done
[19877|bgpd] done
[19871|zebra] done

real	0m12.168s
user	0m7.511s
sys	0m1.981s

Issue: 3589101

Ticket# 3589101

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
@frrbot frrbot bot added the libfrr label Dec 18, 2024
@Jafaral
Copy link
Member

Jafaral commented Dec 20, 2024

@raja-rajasekar hav you done any benchmarking with different buffer sizes to see if a bigger or a smaller buffer would perform even better than 5000?

@raja-rajasekar
Copy link
Contributor Author

@raja-rajasekar hav you done any benchmarking with different buffer sizes to see if a bigger or a smaller buffer would perform even better than 5000?

i recall trying out from 2k to 10K (intervals of 1K each)... between 4-6K things were much similar.. few ms here and there.. 5K seems to be the best

@Jafaral Jafaral merged commit c5123ff into FRRouting:master Dec 20, 2024
14 checks passed
@raja-rajasekar raja-rajasekar deleted the rajasekarr/batch_huge_cfg branch December 21, 2024 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants