Skip to content

Commit 0836e39

Browse files
authored
Merge branch 'main' into RHOAIENG-16512
2 parents 11349c1 + b612ce3 commit 0836e39

File tree

6 files changed

+28
-25
lines changed

6 files changed

+28
-25
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,7 @@ test-component: envtest ginkgo ## Run component tests.
393393

394394
.PHONY: test-e2e
395395
test-e2e: manifests fmt vet ## Run e2e tests.
396-
go test -timeout 30m -v ./test/e2e
396+
CODEFLARE_TEST_OUTPUT_DIR=/tmp/ CLUSTER_HOSTNAME=kind CODEFLARE_TEST_TIMEOUT_MEDIUM=5m CODEFLARE_TEST_TIMEOUT_LONG=40m go test -v -skip "^Test.*Gpu$$" ./test/e2e -timeout=60m
397397

398398
.PHONY: kind-e2e
399399
kind-e2e: ## Set up e2e KinD cluster

README.md

+19-16
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# codeflare-operator
22

3-
Operator for installation and lifecycle management of CodeFlare distributed workload stack.
3+
The CodeFlare-Operator has embedded two controllers, a [RayCluster controller](https://github.com/project-codeflare/codeflare-operator/blob/main/pkg/controllers/raycluster_controller.go) which creates resources including secrets, ingress, routes, service, serviceaccounts, clusterrolebinding resources; all needed for the RayClusters created to work as expected.
4+
5+
There's an [AppWrapper Controller](https://github.com/project-codeflare/appwrapper/blob/main/internal/controller/appwrapper/appwrapper_controller.go), which is a flexible and workload-agnostic mechanism to enable Kueue to manage a group of Kubernetes resources as a single logical unit and to provide an additional level of automatic fault detection and recovery.
6+
7+
For each controller, there are webhooks in place that can be found [here](https://github.com/project-codeflare/codeflare-operator/tree/main/pkg/controllers).
48

59
<!-- Don't delete these comments, they are used to generate Compatibility Matrix table for release automation -->
610
<!-- Compatibility Matrix start -->
@@ -24,6 +28,7 @@ Requirements:
2428
# brew install gnu-sed
2529
make install -e SED=/usr/local/bin/gsed
2630
```
31+
- Kind - Kind is used in the kind-e2e command in the Makefile. Follow these instructions for the kind setup <a href="https://kind.sigs.k8s.io/docs/user/quick-start/" target="_blank">here</a>
2732

2833
### Testing
2934

@@ -34,11 +39,9 @@ The e2e tests can be executed locally by running the following commands:
3439
```bash
3540
# Create a KinD cluster
3641
make kind-e2e
37-
# Install the CRDs
38-
make install
3942
```
4043

41-
[!NOTE]
44+
> [!NOTE]
4245
Some e2e tests cover the access to services via Ingresses, as end-users would do, which requires access to the Ingress controller load balancer by its IP.
4346
For it to work on macOS, this requires installing [docker-mac-net-connect](https://github.com/chipmk/docker-mac-net-connect).
4447

@@ -47,16 +50,16 @@ The e2e tests can be executed locally by running the following commands:
4750
```bash
4851
make setup-e2e
4952
```
50-
51-
[!NOTE]
53+
54+
> [!NOTE]
5255
Kueue will only activate its Ray integration if KubeRay is installed before Kueue (as done by this make target).
5356

54-
[!NOTE]
57+
> [!NOTE]
5558
In OpenShift the KubeRay operator pod gets random user assigned. This user is then used to run Ray cluster.
5659
However the random user assigned by OpenShift doesn't have rights to store dataset downloaded as part of test execution, causing tests to fail.
5760
To prevent this failure on OpenShift user should enforce user 1000 for KubeRay and Ray cluster by creating this SCC in KubeRay operator namespace (replace the namespace placeholder):
5861
59-
```yaml
62+
```yaml
6063
kind: SecurityContextConstraints
6164
apiVersion: security.openshift.io/v1
6265
metadata:
@@ -68,21 +71,21 @@ The e2e tests can be executed locally by running the following commands:
6871
uid: 1000
6972
users:
7073
- 'system:serviceaccount:$(namespace):kuberay-operator'
71-
```
72-
73-
3. Start the operator locally:
74+
```
7475
76+
3. In the /etc/hosts file add the following lines:
7577
```bash
76-
NAMESPACE=default make run
78+
127.0.0.1 ray-dashboard-raycluster-test-ns-1.kind
79+
127.0.0.1 ray-dashboard-raycluster-test-ns-2.kind
7780
```
7881
79-
Alternatively, You can run the operator from your IDE / debugger.
80-
81-
4. In a separate terminal, set your output directory for test files, and run the e2e suite:
82+
4. Build, push and deploy the codeflare-operator image:
8283
```bash
83-
export CODEFLARE_TEST_OUTPUT_DIR=<your_output_directory>
84+
make image-push IMG=<full-registry>:<tag>
85+
make deploy -e IMG=<full-registry>:<tag> -e ENV="e2e"
8486
```
8587
88+
5. To run the tests run the command
8689
```bash
8790
make test-e2e
8891
```

go.mod

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ require (
1111
github.com/openshift/api v0.0.0-20240904015708-69df64132c91
1212
github.com/openshift/client-go v0.0.0-20240904130219-3795e907a202
1313
github.com/project-codeflare/appwrapper v1.0.4
14-
github.com/project-codeflare/codeflare-common v0.0.0-20250128135036-f501cd31fe8b
14+
github.com/project-codeflare/codeflare-common v0.0.0-20250306164418-eb812487be82
1515
github.com/ray-project/kuberay/ray-operator v1.2.2
1616
go.uber.org/zap v1.27.0
1717
golang.org/x/exp v0.0.0-20240719175910-8a7402abbf56

go.sum

+2-2
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,8 @@ github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRI
225225
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
226226
github.com/project-codeflare/appwrapper v1.0.4 h1:364zQLX0tsi4LvBBYNKZL7PPbNWPbVU7vK6+/kVV/FQ=
227227
github.com/project-codeflare/appwrapper v1.0.4/go.mod h1:A1b6bMFNMX5Btv3ckgeuAHVVZzp1G30pSBe6BE/xJWE=
228-
github.com/project-codeflare/codeflare-common v0.0.0-20250128135036-f501cd31fe8b h1:MOmv/aLx/kcHd7PBErx8XNSTW180s8Slf/uVM0uV4rw=
229-
github.com/project-codeflare/codeflare-common v0.0.0-20250128135036-f501cd31fe8b/go.mod h1:DPSv5khRiRDFUD43SF8da+MrVQTWmxNhuKJmwSLOyO0=
228+
github.com/project-codeflare/codeflare-common v0.0.0-20250306164418-eb812487be82 h1:cL1K2+r1lJVwBkhXiVFr2A9DphnylJmilYDIqg/W62M=
229+
github.com/project-codeflare/codeflare-common v0.0.0-20250306164418-eb812487be82/go.mod h1:DPSv5khRiRDFUD43SF8da+MrVQTWmxNhuKJmwSLOyO0=
230230
github.com/prometheus/client_golang v1.20.5 h1:cxppBPuYhUnsO6yo/aoRol4L7q7UFfdm+bR9r+8l63Y=
231231
github.com/prometheus/client_golang v1.20.5/go.mod h1:PIEt8X02hGcP8JWbeHyeZ53Y/jReSnHgO035n//V5WE=
232232
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=

test/e2e/kind.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ kind: Cluster
2323
apiVersion: kind.x-k8s.io/v1alpha4
2424
nodes:
2525
- role: control-plane
26-
image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1
26+
image: kindest/node:v1.30.10@sha256:4de75d0e82481ea846c0ed1de86328d821c1e6a6a91ac37bf804e5313670e507
2727
extraPortMappings:
2828
- containerPort: 80
2929
hostPort: 80

test/e2e/mnist_rayjob_raycluster_test.go

+4-4
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ func TestMnistRayJobRayClusterGpu(t *testing.T) {
5050
func runMnistRayJobRayCluster(t *testing.T, accelerator string, numberOfGpus int) {
5151
test := With(t)
5252

53-
// Create a namespace
54-
namespace := test.NewTestNamespace()
53+
// Create a static namespace to ensure a consistent Ray Dashboard hostname entry in /etc/hosts before executing the test.
54+
namespace := test.NewTestNamespace(WithNamespaceName("test-ns-1"))
5555

5656
// Create Kueue resources
5757
resourceFlavor := CreateKueueResourceFlavor(test, v1beta1.ResourceFlavorSpec{})
@@ -121,8 +121,8 @@ func TestMnistRayJobRayClusterAppWrapperGpu(t *testing.T) {
121121
func runMnistRayJobRayClusterAppWrapper(t *testing.T, accelerator string, numberOfGpus int) {
122122
test := With(t)
123123

124-
// Create a namespace
125-
namespace := test.NewTestNamespace()
124+
// Create a static namespace to ensure a consistent Ray Dashboard hostname entry in /etc/hosts before executing the test.
125+
namespace := test.NewTestNamespace(WithNamespaceName("test-ns-2"))
126126

127127
// Create Kueue resources
128128
resourceFlavor := CreateKueueResourceFlavor(test, v1beta1.ResourceFlavorSpec{})

0 commit comments

Comments
 (0)