Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling -- option to foreach paralell? #67

Open
tobiasboone opened this issue May 28, 2024 · 5 comments
Open

Scaling -- option to foreach paralell? #67

tobiasboone opened this issue May 28, 2024 · 5 comments

Comments

@tobiasboone
Copy link

I work for an organization with a rather large tenant; 1.2 million users, tens of thousands of groups. The export times out. Wondering what recommendations you may have to thread this out so that it can finish. My goal is to get the files exported to blob and then ingest it into Splunk so we can report on the tenant at scale across all users/groups and have it self version.

@SamErde
Copy link
Contributor

SamErde commented May 31, 2024

What part of the export times out? It's not surprising with that many users, and you probably are dealing with a massive number of groups as well.

My initial thoughts, while certainly not definitive or authoritative, are:

  • Have you tried running an export with everything except users and groups? You may need to break these down even more if any contain references to large numbers of users and groups. Each of these supported type exports could potentially be run as their own individual exporter job vs pulling all of them at once. (Config, AccessReviews, ConditionalAccess, Applications, ServicePrincipals, B2C, B2B, PIM, PIMAzure, PIMAAD, AppProxy, Organization, Domains, EntitlementManagement, Policies, AdministrativeUnits, SKUs, Identity, Roles, Governance)
  • There are some examples of PowerShell scripts/modules that are built to work around throttling and timeouts for long-running scripts against large numbers of cloud-based objects. Concepts from these could potentially be worked into (or wrapped around) Entra Exporter if someone has the time to work on that. (eg: https://github.com/Canthv0/RobustCloudCommand)
  • Consider your ultimate use of the tool. User and group assignments can certainly be considered part of configuration in many ways, but I would also encourage you to not rely on this as a true backup/recovery option--especially for the identities themselves.

@tobiasboone
Copy link
Author

We are actually exporting each thing independently to attain a little better performance:

$EntraOptions=@("Config","AccessReviews","ConditionalAccess","Applications","ServicePrincipals","B2C","B2B","PIM","PIMAzure","PIMAAD","AppProxy","Organization","Domains","EntitlementManagement","Policies","AdministrativeUnits","SKUs","Identity","Roles","Governance","Devices")

Unused Options = All, Users Groups

$EntraOptions | foreach-object -Parallel {
#Export-Entra -path $exportpath -TYpe $_
} -ThrottleLimit 15
#######################

The issue is absolutely with users and their groups. It just times out after an hour; often never exporting any users at all. I am able to use the splunk connector for azure to get a copy of the base users on a 24 hour window, but that doesn't grab group memberships at the same time.

TY for the reference to Robust Cloud Command. This may be a way around this with something more custom.

FWIW, a filter on the user export that would allow a user export to occur based on upn's starting with a* b* c* would be excellent in this tool. That would allow us to run the user export in 26 parallel task streams one for each begining letter of the alphabet.... :)

@SamErde
Copy link
Contributor

SamErde commented Jun 28, 2024

Running in parallel may be hurting your throttling limits. Have a look at these two documents:

Try running Export-Entra with Verbose or Debug to see what happens when the operation fails. That might give you a more specific clue about what problem to solve.

@merill
Copy link
Contributor

merill commented Jul 3, 2024

Do you get an out of memory issue or some other exception?

@mbsnl
Copy link

mbsnl commented Dec 13, 2024

We are actually exporting each thing independently to attain a little better performance:

$EntraOptions=@("Config","AccessReviews","ConditionalAccess","Applications","ServicePrincipals","B2C","B2B","PIM","PIMAzure","PIMAAD","AppProxy","Organization","Domains","EntitlementManagement","Policies","AdministrativeUnits","SKUs","Identity","Roles","Governance","Devices")

I wonder if "Config" has a lot of overlap. Maybe it would even be exactly the same when you call all the other seperate "Type-parameter Options". So more like this:

$EntraOptions=(Get-Command Export-Entra | Select-Object -Expand Parameters)['Type'].Attributes.ValidValues|Where-Object {$_ -notin ('All','Config')}

This would reduce the amount of API calls.
Also note that "RoleManagement" has some overlap with "DirectoryRoles" "ExchangeRoles".
I did not look into other "Type-parameter Options", neither did I look into what the difference is.
I think, this Modules needs improvement where we have real seperate "Type-parameter Options" and a new parameter when there is added functionality (like if "Config" would export something different).
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants