Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: new Github CI test labels and markers #771

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

dorotat-nv
Copy link
Collaborator

@dorotat-nv dorotat-nv commented Mar 20, 2025

Description

Introducing new PR labels and pytest markers for more efficient compartmentalization of tests in BioNeMo FW

# Test pipeline labels
- name: ciflow:L1
  description: Run slow single GPU integration tests marked by `@pytest.mark.L1`
  color: FBCA04  # Yellow/Gold

- name: ciflow:L2
  description: Run multi-GPU and long-running integration tests `@pytest.mark.L2`
  color: 6F42C1  # Purple

- name: ciflow:docs
  description: Run documentation and tutorial tests
  color: 1D76DB  # Blue

- name: ciflow:all
  description: Run all tests (L0, L1, L2, and docs)
  color: B60205  # Red

- name: ciflow:skip
  description: Skip all CI tests for this PR
  color: 0E8A16  # Green

- name: ciflow:skip-subpackage
  description: Skip sub-package testing and publishing for this PR
  color: D93F0B  # Orange

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Copy link

copy-pr-bot bot commented Mar 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dorotat-nv dorotat-nv changed the title PoC: wew Github CI test labels and markers PoC: new Github CI test labels and markers Mar 20, 2025
dorotat-nv and others added 9 commits March 20, 2025 15:46
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
- Update evo2 readme to add Quickstart section (previously unfinished),
linking to notebooks.
- Move `assets/` out of `examples/` to `bionemo-evo2` root.
- Update finetuning notebook image link to use new `assets/` dir.

---------

Signed-off-by: Jared Wilber <jwilber@nvidia.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
### Description
This PR:
1. Aligns location of tensorboard logs for
* ESM2 from `result_dir/experiment_name/lightining_logs/dev` to
`result_dir/experiment_name/dev/` and unit test it in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_runs`
* Evo2 from `experiment_name/dummy/lightining_logs/` to
`result_dir/experiment_name/dev/` and unit test it in
`sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_evo2_runs`
3. Adds tflops callback to ESM2 training and unit tests it in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_runs`
4. Adds resume training unit tests for
* Evo2 in
`sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_evo2_stop_at_max_steps_and_continue`
(the tests fails on L40 in the internal CI, skipped test and added
issue: #769 )
* ESM2 in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_stop_at_num_steps_and_continue`
(the trainings does not resume, added xfail with issue
#757)

Fixed

![image](https://github.com/user-attachments/assets/3107b2a2-d361-41a8-9635-763198aef0a7)

![image](https://github.com/user-attachments/assets/dce921a6-1e13-4810-a735-d35a65f0978d)

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (non-breaking change which adds functionality)
- [x]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
python sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py --mock-data --model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* --no-activation-checkpointing --add-bias-output   --warmup-steps 1 --seq-length 128 --hidden-dropout 0.1 --attention-dropout 0.1 --create-tensorboard-logger --wandb-project fix-tensorboard-logs  --val-check-interval 50 --max-steps 100

python sub-packages/bionemo-esm2/src/bionemo/esm2/scripts/train_esm2.py --train-cluster-path=/data/train_clusters.parquet --train-database-path=/data/train.db --valid-cluster-path=/data/valid_clusters.parquet --valid-database-path=/data/validation.db --micro-batch-size=16 --num-nodes=1 --num-gpus=1 --val-check-interval=50 --limit-val-batches=1  --min-seq-length=1024 --max-seq-length=1024 --num-layers=33 --hidden-size=1280 --num-attention-heads=20 --ffn-hidden-size=5120 --create-tensorboard-logger  --wandb-project=fix-tensorboard-logs --val-check-interval 50 --create-tflops-callback  --num-steps=100 --resume-if-exists
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] I have added/updated tests as needed
 - [x] All existing tests pass successfully

Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
@dorotat-nv dorotat-nv force-pushed the dorotat/add-ciflow-scopes branch from f3515f6 to 58376dd Compare March 20, 2025 14:46
@dorotat-nv dorotat-nv requested a review from farhadrgh as a code owner March 20, 2025 14:46
@dorotat-nv
Copy link
Collaborator Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants