-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arroyo Deployment Issue in Kubernetes #728
Comments
Based on the error logs and similar issues, this appears to be a problem with the worker pod recovery process in a distributed Kubernetes environment. Let me help you resolve this. The key issue here is that when the worker pod is trying to restart, it's failing to establish proper communication with the controller and restore from checkpoints. This is often related to storage configuration in distributed environments. Here's what we need to do:
controller:
env:
- name: CHECKPOINT_STORAGE_BACKEND
value: "s3"
- name: AWS_ACCESS_KEY_ID
value: "your-access-key"
- name: AWS_SECRET_ACCESS_KEY
value: "your-secret-key"
- name: AWS_REGION
value: "your-region"
- name: CHECKPOINT_S3_BUCKET
value: "your-bucket-name"
controller:
env:
- name: K8S_WORKER_SERVICE_ACCOUNT_NAME
value: "arroyo-worker"
Important considerations:
References:
Let me know if you need help with any of these steps or encounter any issues during the implementation. A quick note about the risks:
|
I deployed arroyo in Kubernetes using helm. After creating a pipeline, I manually deleted the worker-pod, but the worker-pod did not restart successfully.
arroyo version: 0.11.3
kubernets version: 1.23.4
The arroyo-controller logs reported the following error:
The text was updated successfully, but these errors were encountered: