Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added fastadatamodule #790

Open
wants to merge 21 commits into
base: jstjohn/evo2-mamba
Choose a base branch
from

Conversation

Geraldene
Copy link

Description

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

jomitchellnv and others added 20 commits March 13, 2025 19:34
)

### Description
Removes all traced of BIONEMO_HOME ENVVAR and the setup_bionemo_home
script from our repository.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [X]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
Now that our notebooks are merged in `main`, I've recreated the brev.dev
launchables to build from these notebooks, and update the badges
accordingly.

Signed-off-by: Jared Wilber <jwilber@nvidia.com>
### Type of changes
- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

---------

Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: Jared Wilber <jwilber@nvidia.com>
Co-authored-by: Jared Wilber <jwilber@nvidia.com>
…es are not passed to the next job. (NVIDIA#753)

### Description
<!-- Provide a detailed description of the changes in this PR -->

- Collects the outputs from both the PR and dispatch workflows, and
passes the relevant one to the job output.
- GOATed pro-tip about GitHub Actions YAML config substitutions and
passing outputs with `"` (e.g. raw string JSON arrays) between steps:
https://github.com/orgs/community/discussions/32012
- Remove `ref` input, since it's a default input for
`workflow_dispatch`.
- Shorten workflow dispatch input descriptions, which are the names of
the inputs in the GHA UI.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### Usage
<!--- How does a user interact with the changed code -->

- GitHub Actions > BioNeMo Sub-Package CI > New Workflow

### Testing

- Fixed workflow dispatch:
https://github.com/NVIDIA/bionemo-framework/actions/runs/13845308857
which publishes https://pypi.org/project/bionemo-core/ `2.4.0`.

---------

Signed-off-by: Cory Ye <cye@nvidia.com>
Cleaned up tutorial to get lower memory requirement for the continuous
interpolant notebook

Signed-off-by: Danny <dreidenbach@nvidia.com>
A number of cleanups and changes needed to get unit-tests running
efficiently on nv-gha-runners L4 nodes.

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
### Description
Marking as xfail unit tests
:sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_single_gpu[7b_arc_longcontext]
due to its issues on certain GPUs

Issue as a follow-up:
NVIDIA#731

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
### Description
THe commit from merge of
NVIDIA#742 introduced a change
that broke docker building in CI. Hot fixing it

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
### Description
Changing version of the framework from 2.4 to 2.5

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
### Description
Updating max number of steps to 6900 for the script to execute within
max time limit - 4h

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
Bumps the ruff version we use to format files. Ruff now will check and
format jupyter notebooks, which means we need a couple small edits.

The ruff vscode extension has deprecrated the old ruff-lsp, so the old
version is giving me a warning every time I open the container.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Jared Wilber <jwilber@nvidia.com>
Co-authored-by: Jared Wilber <jwilber@nvidia.com>
### Description
Marking as skip failing unit test for Evo2

```
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_evo2_stops FAILED [ 79%]
.
.
.
11:44:55  /usr/local/lib/python3.12/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:385: in _save_topk_checkpoint
11:44:55      self._save_monitor_checkpoint(trainer, monitor_candidates)
11:44:55  /usr/local/lib/python3.12/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:705: in _save_monitor_checkpoint
11:44:55      self._update_best_and_save(current, trainer, monitor_candidates)
11:44:55  /usr/local/lib/python3.12/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:757: in _update_best_and_save
11:44:55      self._save_checkpoint(trainer, filepath)
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/pytorch/callbacks/model_checkpoint.py:628: in _save_checkpoint
11:44:55      TrainerContext.from_trainer(trainer).io_dump(ckpt_to_dir(filepath) / "context", yaml_attrs=["model"])
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/io/mixin.py:249: in io_dump
11:44:55      json = serialization.dump_json(io)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:826: in dump_json
11:44:55      return json.dumps(Serialization(value, pyref_policy).result, indent=indent)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:527: in __init__
11:44:55      _ROOT_KEY: self._serialize(self._root, (), all_paths=((),)),
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/io/fdl_torch.py:134: in _modified_serialize
11:44:55      return self._original_serialize(to_config(value), current_path, all_paths)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:662: in _serialize
11:44:55      serialized_value = self._serialize(
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/io/fdl_torch.py:134: in _modified_serialize
11:44:55      return self._original_serialize(to_config(value), current_path, all_paths)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:662: in _serialize
11:44:55      serialized_value = self._serialize(
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/io/fdl_torch.py:134: in _modified_serialize
11:44:55      return self._original_serialize(to_config(value), current_path, all_paths)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:662: in _serialize
11:44:55      serialized_value = self._serialize(
11:44:55  /usr/local/lib/python3.12/dist-packages/nemo/lightning/io/fdl_torch.py:134: in _modified_serialize
11:44:55      return self._original_serialize(to_config(value), current_path, all_paths)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:640: in _serialize
11:44:55      output = self._pyref(value, current_path)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:579: in _pyref
11:44:55      if value is not import_symbol(self._pyref_policy, module_name, symbol):
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/experimental/serialization.py:293: in import_symbol
11:44:55      with reraised_exception.try_with_lazy_message(make_message):
11:44:55  /usr/lib/python3.12/contextlib.py:158: in __exit__
11:44:55      self.gen.throw(value)
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/reraised_exception.py:82: in try_with_lazy_message
11:44:55      raise decorate_exception(exc, message) from None
11:44:55  /usr/local/lib/python3.12/dist-packages/fiddle/_src/reraised_exception.py:74: in try_with_lazy_message
11:44:55      yield
11:44:55  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
11:44:55  
11:44:55  policy = <fiddle._src.experimental.serialization.DefaultPyrefPolicy object at 0x7f45b1797920>
11:44:55  module = 'megatron.core.utils', symbol = 'init_method_normal.<locals>.init_'
11:44:55  
11:44:55      def import_symbol(policy: PyrefPolicy, module: str, symbol: str):
11:44:55        """Returns the value obtained from importing `symbol` from `module`.
11:44:55      
11:44:55        Args:
11:44:55          policy: The `PyrefPolicy` governing which symbols can be imported.
11:44:55          module: The module to import `symbol` from.
11:44:55          symbol: The symbol to import from `module`.
11:44:55      
11:44:55        Raises:
11:44:55          ModuleNotFoundError: If `module` can't be found.
11:44:55          AttributeError: If `symbol` can't be accessed on `module`.
11:44:55          PyrefPolicyError: If importing `symbol` from `module` is disallowed by
11:44:55            this `PyrefPolicy`.
11:44:55        """
11:44:55        value = PyrefPolicyError.PRE_IMPORT
11:44:55        if policy.allows_import(module, symbol):
11:44:55          module = special_overrides.maybe_get_module_override_for_migrated_serialization_symbol(
11:44:55              module, symbol
11:44:55          )
11:44:55          make_message = functools.partial(_fiddle_pyref_context, module, symbol)
11:44:55          with reraised_exception.try_with_lazy_message(make_message):
11:44:55            value = importlib.import_module(module)
11:44:55            for attr_name in symbol.split('.'):
11:44:55  >           value = getattr(value, attr_name)
11:44:55  E           AttributeError: 'function' object has no attribute '<locals>'
11:44:55  E           Fiddle context: Error occurred while importing pyref to 'init_method_normal.<locals>.init_' from 'megatron.core.utils'.
```

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
### Description
Adding release notes for v2.5 release:
https://github.com/NVIDIA/bionemo-framework/releases/tag/v2.5

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
Switches our runner backend to use nv-gha-runners rather than the
self-hosted azure runner.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
- Update evo2 readme to add Quickstart section (previously unfinished),
linking to notebooks.
- Move `assets/` out of `examples/` to `bionemo-evo2` root.
- Update finetuning notebook image link to use new `assets/` dir.

---------

Signed-off-by: Jared Wilber <jwilber@nvidia.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
)

### Description
This PR:
1. Aligns location of tensorboard logs for 
* ESM2 from `result_dir/experiment_name/lightining_logs/dev` to
`result_dir/experiment_name/dev/` and unit test it in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_runs`
* Evo2 from `experiment_name/dummy/lightining_logs/` to
`result_dir/experiment_name/dev/` and unit test it in
`sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_evo2_runs`
3. Adds tflops callback to ESM2 training and unit tests it in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_runs`
4. Adds resume training unit tests for 
* Evo2 in
`sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py::test_train_evo2_stop_at_max_steps_and_continue`
(the tests fails on L40 in the internal CI, skipped test and added
issue: NVIDIA#769 )
* ESM2 in
`sub-packages/bionemo-esm2/tests/bionemo/esm2/scripts/test_train_esm2.py::test_main_stop_at_num_steps_and_continue`
(the trainings does not resume, added xfail with issue
NVIDIA#757)


Fixed


![image](https://github.com/user-attachments/assets/3107b2a2-d361-41a8-9635-763198aef0a7)


![image](https://github.com/user-attachments/assets/dce921a6-1e13-4810-a735-d35a65f0978d)


### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (non-breaking change which adds functionality)
- [x]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
python sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py --mock-data --model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* --no-activation-checkpointing --add-bias-output   --warmup-steps 1 --seq-length 128 --hidden-dropout 0.1 --attention-dropout 0.1 --create-tensorboard-logger --wandb-project fix-tensorboard-logs  --val-check-interval 50 --max-steps 100

python sub-packages/bionemo-esm2/src/bionemo/esm2/scripts/train_esm2.py --train-cluster-path=/data/train_clusters.parquet --train-database-path=/data/train.db --valid-cluster-path=/data/valid_clusters.parquet --valid-database-path=/data/validation.db --micro-batch-size=16 --num-nodes=1 --num-gpus=1 --val-check-interval=50 --limit-val-batches=1  --min-seq-length=1024 --max-seq-length=1024 --num-layers=33 --hidden-size=1280 --num-attention-heads=20 --ffn-hidden-size=5120 --create-tensorboard-logger  --wandb-project=fix-tensorboard-logs --val-check-interval 50 --create-tflops-callback  --num-steps=100 --resume-if-exists
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] I have added/updated tests as needed
 - [x] All existing tests pass successfully
Docker builds were running out of memory, so we're switching to the 8-CPU workers. There's a `linux-amd64-cpu16` type if we need it as well.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link

copy-pr-bot bot commented Mar 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Geraldene_Munsamy <geraldenemunsamy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants