Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-enable otel subcommand on Windows #6068

Merged
merged 7 commits into from
Nov 27, 2024

Conversation

leehinman
Copy link
Contributor

@leehinman leehinman commented Nov 18, 2024

What does this PR do?

  1. Re-enables otel subcommand on Windows
  2. Moves launching go routine that processes Windows Service events earlier in boot process
  3. Adds Windows machines to otel integration tests

Why is it important?

Previously when Windows was added, elastic-agent did not respond to the Windows Service manager quickly enough that it was starting and was deemed "unresponsive". Moving the go routine that responds to the Windows Service Manager earlier in the boot process should make this less likely. Given go's design of DLL loading and init code we can't eliminate this completely.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None.

How to test this PR locally

mage integration:auth
mage integration:test

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented Nov 18, 2024

This pull request does not have a backport label. Could you fix it @leehinman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Nov 18, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Nov 18, 2024
@leehinman leehinman added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 18, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@leehinman leehinman force-pushed the 4976_windows_startup branch from 5b13070 to e6091b9 Compare November 19, 2024 14:42
@leehinman leehinman requested a review from a team as a code owner November 19, 2024 14:42
Copy link
Contributor

mergify bot commented Nov 20, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 4976_windows_startup upstream/4976_windows_startup
git merge upstream/main
git push upstream 4976_windows_startup

@leehinman leehinman force-pushed the 4976_windows_startup branch 2 times, most recently from a922b9f to 42ed9c8 Compare November 22, 2024 14:54
@leehinman leehinman force-pushed the 4976_windows_startup branch from 42ed9c8 to d006214 Compare November 25, 2024 17:44
Copy link
Member

@mauri870 mauri870 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codewise looks good, tests are passing so I assume the changes did fix the issue. I guess worst case scenario is that this starts flakying on CI.

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice optimisation, thanks for that @leehinman

@@ -154,6 +142,18 @@ func run(override cfgOverrider, testingMode bool, fleetInitTimeout time.Duration
defer cancel()
go service.ProcessWindowsControlEvents(stopBeat)

if err := handleUpgrade(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@leehinman leehinman merged commit 8e83ce0 into elastic:main Nov 27, 2024
14 checks passed
@leehinman leehinman deleted the 4976_windows_startup branch November 27, 2024 15:27
mergify bot pushed a commit that referenced this pull request Nov 27, 2024
* move processing windows events earlier in the boot process

* add Windows to otel integration tests

(cherry picked from commit 8e83ce0)
pierrehilbert added a commit that referenced this pull request Dec 2, 2024
* move processing windows events earlier in the boot process

* add Windows to otel integration tests

(cherry picked from commit 8e83ce0)

Co-authored-by: Lee E Hinman <57081003+leehinman@users.noreply.github.com>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
@jlind23 jlind23 added the backport-8.17 Automated backport with mergify label Jan 29, 2025
@jlind23
Copy link
Contributor

jlind23 commented Jan 29, 2025

@leehinman do you recall a reason why it was not backported to 8.17?

mergify bot pushed a commit that referenced this pull request Jan 29, 2025
* move processing windows events earlier in the boot process

* add Windows to otel integration tests

(cherry picked from commit 8e83ce0)
@mlunadia
Copy link

@jlind23 can we get a code snippet for the Windows onboarding flow?

@michalpristas
Copy link
Contributor

michalpristas commented Feb 14, 2025

this works on powershell
i kept it readable. you can concise later merging multiple commands into a single line

# Download and extract
$distroPath = "elastic-distro-8.17.2-windows-x86_64";$zipFile = "$distroPath.zip"


# Disable progress bar, it slows down the download
$ProgressPreference = 'SilentlyContinue';Invoke-WebRequest -Uri "https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.17.2-windows-x86_64.zip" -OutFile $zipFile;
New-Item -ItemType Directory -Force -Path $distroPath | Out-Null
Expand-Archive -Path $zipFile -DestinationPath $distroPath
Move-Item -Path "$distroPath\elastic-agent-8.17.2-windows-x86_64\*" -Destination $distroPath
Remove-Item -Path "$distroPath\elastic-agent-8.17.2-windows-x86_64" -Recurse
Remove-Item -Path $zipFile
Set-Location $distroPath

# Configure otel
Remove-Item -Path .\otel.yml -ErrorAction SilentlyContinue
Copy-Item .\otel_samples\platformlogs_hostmetrics.yml .\otel.yml
New-Item -ItemType Directory -Force -Path .\data\otelcol | Out-Null

# Replace environment variables in otel.yml
$content = Get-Content .\otel.yml
$content = $content -replace '\${env:STORAGE_DIR}', "$PWD\data\otelcol"
$content = $content -replace '\${env:ELASTIC_ENDPOINT}', "https://sample.eu-west-1.aws.qa.cld.elstc.co:443"
$content = $content -replace '\${env:ELASTIC_API_KEY}', "sampleApiKey=="
$content | Set-Content .\otel.yml

notes:

  • uri is hardcoded, will need to be replaced
  • comparing to linux we don't support arm or 32bit so defaulting to amd64 without an option to have this dynamic
  • otel_resources are not packed in any of the zip files because we don't have any for windows. this needs to be fixed and published in future releases or published on github somewhere and fetched from there

cc @mlunadia

@mlunadia
Copy link

mlunadia commented Feb 14, 2025

Amazing! @gbamparop here's the script for running EDOT on windows hosts so we can update the hosts OTel flow.

@michalpristas do we have one for Elastic Agent, so we can also add it as an option?

@michalpristas
Copy link
Contributor

what do you have in mind for agent?

@mlunadia
Copy link

mlunadia commented Feb 14, 2025

This is the current flow for Agent, is it just a matter of adding the icon or do we need a windows specific auto-detect script? @flash1293 maybe you can help?

image image

@flash1293
Copy link

flash1293 commented Feb 14, 2025

@mlunadia The auto-detect script is doing a bunch of stuff that relies on bash, so it's not a trivial change, that's why we didn't add it yet. Maybe it's possible to run it via the WSL, but I don't know enough about Windows to give a good opinion here.

https://github.com/elastic/kibana/blob/ef96cd5d0b9cba3f03a00ff8b6e1e93840119570/x-pack/solutions/observability/plugins/observability_onboarding/public/assets/auto_detect.sh#L3

I'm not sure whether it's worth translating it into powershell or something like this, this is a product question. We could think about a simplified flow based on powershell for windows, but the big question is whether we want it or not.

cc @gbamparop @akhileshpok how can we make this decision?

@flash1293
Copy link

Adding @michalpristas s script to the otel flow shouldn't be a problem

@michalpristas
Copy link
Contributor

i was looking into autodetect and porting this to powershell is not that trivial. i think it should be possible but it means it is a normal small task for .net/powershell experienced person that needs to be planned. not something i can generate in one morning

@akhileshpok
Copy link

@mlunadia - We can and should enhance the OTel Host onboarding workflow in order to incorporate Windows support. In terms of the the Elastic Agent based onboarding workflow for Hosts, we will need functionality similar to our current auto-detect script for Linux before we can consider merging it.

@gbamparop
Copy link

Created https://github.com/elastic/observability-dev/issues/4317 so we can continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.17 Automated backport with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
10 participants