Skip to content

Commit

Permalink
edit Container ch & misc
Browse files Browse the repository at this point in the history
  • Loading branch information
gspetro-NOAA committed Oct 24, 2023
1 parent f13402e commit 0fa8e01
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 38 deletions.
2 changes: 1 addition & 1 deletion docs/UsersGuide/source/BackgroundInfo/Introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ A list of available component documentation is shown in :numref:`Table %s <list_

.. _list_of_documentation:

.. list-table:: Centralized List of Documentation
.. list-table:: Centralized List of Documentation
:widths: 20 50
:header-rows: 1

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,13 @@ Install Singularity/Apptainer

.. note::

As of November 2021, the Linux-supported version of Singularity has been `renamed <https://apptainer.org/news/community-announcement-20211130/>`__ to *Apptainer*. Apptainer has maintained compatibility with Singularity, so ``singularity`` commands should work with either Singularity or Apptainer (see compatibility details `here <https://apptainer.org/docs/user/1.2/introduction.html>`__.)
As of November 2021, the Linux-supported version of Singularity has been `renamed <https://apptainer.org/news/community-announcement-20211130/>`__ to *Apptainer*. Apptainer has maintained compatibility with Singularity, so ``singularity`` commands should work with either Singularity or Apptainer (see compatibility details `here <https://apptainer.org/docs/user/1.2/singularity_compatibility.html>`__.)

To build and run the SRW App using a Singularity/Apptainer container, first install the software according to the `Apptainer Installation Guide <https://apptainer.org/docs/admin/1.2/installation.html>`__. This will include the installation of all dependencies.

.. warning::
Docker containers can only be run with root privileges, and users cannot have root privileges on :term:`HPCs <HPC>`. Therefore, it is not possible to build the SRW App, which uses the spack-stack, inside a Docker container on an HPC system. However, a Singularity/Apptainer image may be built directly from a Docker image for use on the system.

.. COMMENT: Update reference to HPC-Stack --> spack-stack?
Working in the Cloud or on HPC Systems
-----------------------------------------

Expand All @@ -49,7 +47,7 @@ For users working on systems with limited disk space in their ``/home`` director
where ``/absolute/path/to/writable/directory/`` refers to a writable directory (usually a project or user directory within ``/lustre``, ``/work``, ``/scratch``, or ``/glade`` on NOAA Level 1 systems). If the ``cache`` and ``tmp`` directories do not exist already, they must be created with a ``mkdir`` command.

On NOAA Cloud systems, the ``sudo su`` command may also be required:
On NOAA Cloud systems, the ``sudo su`` command may also be required. For example:

.. code-block::
Expand All @@ -76,24 +74,27 @@ Level 1 Systems

On most Level 1 systems, a container named ``ubuntu20.04-intel-ue-1.4.1-srw-dev.img`` has already been built at the following locations:

.. table:: Locations of pre-built containers

+--------------+--------------------------------------------------------+
| Machine | File location |
+==============+========================================================+
| Cheyenne | /glade/scratch/epicufsrt/containers |
+--------------+--------------------------------------------------------+
| Gaea | /lustre/f2/dev/role.epic/containers |
+--------------+--------------------------------------------------------+
| Hera | /scratch1/NCEPDEV/nems/role.epic/containers |
+--------------+--------------------------------------------------------+
| Jet | /mnt/lfs4/HFIP/hfv3gfs/role.epic/containers |
+--------------+--------------------------------------------------------+
| NOAA Cloud | /contrib/EPIC/containers |
+--------------+--------------------------------------------------------+
| Orion | /work/noaa/epic-ps/role-epic-ps/containers |
+--------------+--------------------------------------------------------+

.. list-table:: Locations of pre-built containers
:widths: 20 50
:header-rows: 1

* - Machine
- File Location
* - Cheyenne/Derecho
- /glade/scratch/epicufsrt/containers
* - Gaea
- /lustre/f2/dev/role.epic/containers
* - Hera
- /scratch1/NCEPDEV/nems/role.epic/containers
* - Jet
- /mnt/lfs4/HFIP/hfv3gfs/role.epic/containers
* - NOAA Cloud
- /contrib/EPIC/containers
* - Orion/Hercules
- /work/noaa/epic-ps/role-epic-ps/containers

.. COMMENT: Confirm container location on Orion/Hercules! Also check for Cheyenne/Derecho
.. note::
* On Gaea, Singularity/Apptainer is only available on the C5 partition, and therefore container use is only supported on Gaea C5.
* The NOAA Cloud containers are accessible only to those with EPIC resources.
Expand Down Expand Up @@ -130,13 +131,6 @@ On non-Level 1 systems, users should build the container in a writable sandbox:
Some users may prefer to issue the command without the ``sudo`` prefix. Whether ``sudo`` is required is system-dependent.

.. note::
Users can choose to build a release version of the container (SRW App |release|) using a similar command:

.. code-block:: console
sudo singularity build --sandbox ubuntu20.04-intel-srwapp docker://noaaepic/ubuntu20.04-intel-srwapp:release-public-v2.1.0
For easier reference, users can set an environment variable to point to the container:

.. code-block:: console
Expand Down Expand Up @@ -236,7 +230,7 @@ Generate the Forecast Experiment
To generate the forecast experiment, users must:

#. :ref:`Activate the workflow <SetUpPythonEnvC>`
#. :ref:`Set experiment parameters <SetUpConfigFileC>`
#. :ref:`Set experiment parameters to configure the workflow <SetUpConfigFileC>`
#. :ref:`Run a script to generate the experiment workflow <GenerateWorkflowC>`

The first two steps depend on the platform being used and are described here for Level 1 platforms. Users will need to adjust the instructions to match their machine configuration if their local machine is a Level 2-4 platform.
Expand Down Expand Up @@ -290,7 +284,7 @@ where:

* ``-c`` indicates the compiler on the user's local machine (e.g., ``intel/2022.1.2``)
* ``-m`` indicates the :term:`MPI` on the user's local machine (e.g., ``impi/2022.1.2``)
* ``<platform>`` refers to the local machine (e.g., ``hera``, ``jet``, ``noaacloud``, ``mac``). See ``MACHINE`` in :numref:`Section %s <user>` for a full list of options.
* ``<platform>`` refers to the local machine (e.g., ``hera``, ``jet``, ``noaacloud``, ``macos``, ``linux``). See ``MACHINE`` in :numref:`Section %s <user>` for a full list of options.
* ``-i`` indicates the container image that was built in :numref:`Step %s <BuildC>` (``ubuntu20.04-intel-srwapp`` or ``ubuntu20.04-intel-ue-1.4.1-srw-dev.img`` by default).

For example, on Hera, the command would be:
Expand Down Expand Up @@ -341,7 +335,7 @@ From here, users can follow the steps below to configure the out-of-the-box SRW
USE_USER_STAGED_EXTRN_FILES: true
EXTRN_MDL_SOURCE_BASEDIR_ICS: /scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/v2p2/input_model_data/FV3GFS/grib2/${yyyymmddhh}
On other systems, users will need to change the path for ``EXTRN_MDL_SOURCE_BASEDIR_ICS`` and ``EXTRN_MDL_FILES_LBCS`` (below) to reflect the location of the system's data. The location of the machine's global data can be viewed :ref:`here <Data>` for Level 1 systems. Alternatively, the user can add the path to their local data if they downloaded it as described in :numref:`Section %s <InitialConditions>`.
On other systems, users will need to change the path for ``EXTRN_MDL_SOURCE_BASEDIR_ICS`` and ``EXTRN_MDL_SOURCE_BASEDIR_LBCS`` (below) to reflect the location of the system's data. The location of the machine's global data can be viewed :ref:`here <Data>` for Level 1 systems. Alternatively, the user can add the path to their local data if they downloaded it as described in :numref:`Section %s <InitialConditions>`.

#. Edit the ``task_get_extrn_lbcs:`` section of the ``config.yaml`` to include the correct data paths to the lateral boundary conditions files. For example, on Hera, add:

Expand Down Expand Up @@ -413,12 +407,9 @@ If a task goes DEAD, it will be necessary to restart it according to the instruc
crontab -e
*/3 * * * * cd /path/to/expt_dirs/test_community && ./launch_FV3LAM_wflow.sh called_from_cron="TRUE"
where:

* ``/path/to`` is replaced by the actual path to the user's experiment directory, and
* ``esc`` and ``enter`` refer to the escape and enter **keys** (not a typed command).
where ``/path/to`` is replaced by the actual path to the user's experiment directory.

New Experiment
===============

To run a new experiment in the container at a later time, users will need to rerun the commands in :numref:`Section %s <SetUpPythonEnvC>` to reactivate the workflow. Then, users can configure a new experiment by updating the environment variables in ``config.yaml`` to reflect the desired experiment configuration. Basic instructions appear in :numref:`Section %s <SetUpConfigFileC>` above, and detailed instructions can be viewed in :numref:`Section %s <UserSpecificConfig>`. After adjusting the configuration file, regenerate the experiment by running ``./generate_FV3LAM_wflow.py``.
To run a new experiment in the container at a later time, users will need to rerun the commands in :numref:`Section %s <SetUpPythonEnvC>` to reactivate the workflow. Then, users can configure a new experiment by updating the variables in ``config.yaml`` to reflect the desired experiment configuration. Basic instructions appear in :numref:`Section %s <SetUpConfigFileC>` above, and detailed instructions can be viewed in :numref:`Section %s <UserSpecificConfig>`. After adjusting the configuration file, regenerate the experiment by running ``./generate_FV3LAM_wflow.py``.
8 changes: 8 additions & 0 deletions docs/UsersGuide/source/BuildingRunningTesting/RunSRW.rst
Original file line number Diff line number Diff line change
Expand Up @@ -947,6 +947,14 @@ To check the experiment progress:
cd $EXPTDIR
rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
Users can track the experiment's progress by reissuing the ``rocotostat`` command above every so often until the experiment runs to completion. The following message usually means that the experiment is still getting set up:

.. code-block:: console
08/04/23 17:34:32 UTC :: FV3LAM_wflow.xml :: ERROR: Can not open FV3LAM_wflow.db read-only because it does not exist
After a few (3-5) minutes, ``rocotostat`` should show a status-monitoring table.

After finishing the experiment, open the crontab using ``crontab -e`` and delete the crontab entry.

.. _Success:
Expand Down

0 comments on commit 0fa8e01

Please sign in to comment.