Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with "datarmor" cluster #65

Open
gmaze opened this issue Oct 8, 2024 · 1 comment
Open

Error with "datarmor" cluster #65

gmaze opened this issue Oct 8, 2024 · 1 comment

Comments

@gmaze
Copy link

gmaze commented Oct 8, 2024

Hi,
I came across the error below when trying to create a "datarmor" cluster on datarmor.

I'm running the 2023.3.2.dev20+g9a8772f version under python 3.10.14

cluster = dask_hpcconfig.cluster("datarmor")
/home1/datahome/gmaze/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:248: FutureWarning: extra has been renamed to worker_extra_args. You are still using it (even if only set to []; please also check config files). If you did not set worker_extra_args yet, extra will be respected for now, but it will be removed in a future release. If you already set worker_extra_args, extra is ignored and you can remove it.
  warnings.warn(warn, FutureWarning)
/home1/datahome/gmaze/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:266: FutureWarning: job_extra has been renamed to job_extra_directives. You are still using it (even if only set to []; please also check config files). If you did not set job_extra_directives yet, job_extra will be respected for now, but it will be removed in a future release. If you already set job_extra_directives, job_extra is ignored and you can remove it.
  warnings.warn(warn, FutureWarning)
/home1/datahome/gmaze/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:285: FutureWarning: env_extra has been renamed to job_script_prologue. You are still using it (even if only set to []; please also check config files). If you did not set job_script_prologue yet, env_extra will be respected for now, but it will be removed in a future release. If you already set job_script_prologue, env_extra is ignored and you can remove it.
  warnings.warn(warn, FutureWarning)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 cluster = dask_hpcconfig.cluster("datarmor")

File ~/git/github/umr-lops/dask-hpcconfig/dask_hpcconfig/clusters.py:68, in cluster(name, asynchronous, loop, **overrides)
     63     raise ValueError(
     64         f"cluster: malformed cluster definition of {name}: needs at least the 'cluster' key"
     65     )
     67 # instantiate cluster class
---> 68 cluster = new_cluster(name, cluster_config, asynchronous=asynchronous)
     70 # feed every other setting to `dask.config.merge` before passing it to `dask.config.set` (because
     71 # that replaces any top-level attributes)
     72 merged = dask.config.merge(
     73     dask.config.config, {k: v for k, v in definition.items() if k != "cluster"}
     74 )

File ~/git/github/umr-lops/dask-hpcconfig/dask_hpcconfig/clusters.py:14, in new_cluster(name, config, asynchronous, loop)
     11     raise ValueError(f"cluster: configuration of {name} does not have a 'type' key")
     13 type_ = _cluster_type(type_name)
---> 14 cluster = type_(
     15     asynchronous=asynchronous,
     16     loop=loop,
     17     **{k.replace("-", "_"): v for k, v in config.items() if k != "type"},
     18 )
     20 return cluster

File ~/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:661, in JobQueueCluster.__init__(self, n_workers, job_cls, loop, security, shared_temp_directory, silence_logs, name, asynchronous, dashboard_address, host, scheduler_options, scheduler_cls, interface, protocol, config_name, **job_kwargs)
    656 if "processes" in self._job_kwargs and self._job_kwargs["processes"] > 1:
    657     worker["group"] = [
    658         "-" + str(i) for i in range(self._job_kwargs["processes"])
    659     ]
--> 661 self._dummy_job  # trigger property to ensure that the job is valid
    663 super().__init__(
    664     scheduler=scheduler,
    665     worker=worker,
   (...)
    670     name=name,
    671 )
    673 if n_workers:

File ~/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:690, in JobQueueCluster._dummy_job(self)
    688     address = "tcp://<insert-scheduler-address-here>:8786"
    689 try:
--> 690     return self.job_cls(
    691         address or "tcp://<insert-scheduler-address-here>:8786",
    692         # The 'name' parameter is replaced inside Job class by the
    693         # actual Dask worker name. Using 'dummy-name here' to make it
    694         # more clear that cluster.job_script() is similar to but not
    695         # exactly the same script as the script submitted for each Dask
    696         # worker
    697         name="dummy-name",
    698         **self._job_kwargs
    699     )
    700 except TypeError as exc:
    701     # Very likely this error happened in the self.job_cls constructor
    702     # because an unexpected parameter was used in the JobQueueCluster
    703     # constructor. The next few lines builds a more user-friendly error message.
    704     match = re.search("(unexpected keyword argument.+)", str(exc))

File ~/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/pbs.py:55, in PBSJob.__init__(self, scheduler, name, queue, project, account, resource_spec, walltime, config_name, **base_class_kwargs)
     43 def __init__(
     44     self,
     45     scheduler=None,
   (...)
     53     **base_class_kwargs
     54 ):
---> 55     super().__init__(
     56         scheduler=scheduler, name=name, config_name=config_name, **base_class_kwargs
     57     )
     59     if queue is None:
     60         queue = dask.config.get("jobqueue.%s.queue" % self.config_name)

File ~/conda-env/argopy-tests/lib/python3.10/site-packages/dask_jobqueue/core.py:320, in Job.__init__(self, scheduler, name, cores, memory, processes, nanny, protocol, security, interface, death_timeout, local_directory, extra, worker_command, worker_extra_args, job_extra, job_extra_directives, env_extra, job_script_prologue, header_skip, job_directives_skip, log_directory, shebang, python, job_name, config_name)
    317 self.job_header = None
    319 if interface:
--> 320     worker_extra_args = worker_extra_args + ["--interface", interface]
    321 if protocol:
    322     worker_extra_args = worker_extra_args + ["--protocol", protocol]

TypeError: can only concatenate str (not "list") to str

I'm using a conda env defined by:

SYSTEM

commit: None
python: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.14.3
libnetcdf: 4.9.2

INSTALLED VERSIONS: CORE

aiohttp : 3.10.5
argopy : 0.1.17
decorator : 5.1.1
erddapy : 2.2.3
fsspec : 2024.6.1
netCDF4 : 1.7.1
packaging : 24.1
requests : 2.32.3
scipy : 1.14.1
toolz : 0.12.1
xarray : 2024.2.0

INSTALLED VERSIONS: EXT.UTIL

boto3 : 1.35.22
gsw : 3.6.19
s3fs : 2024.6.1
tqdm : 4.66.5
zarr : 2.18.3

INSTALLED VERSIONS: EXT.PERF

dask : 2024.9.0
distributed : 2024.9.0
h5netcdf : 1.3.0
pyarrow : 17.0.0

INSTALLED VERSIONS: EXT.PLOT

IPython : 8.27.0
cartopy : 0.23.0
ipykernel : 6.29.5
ipywidgets : 8.1.5
matplotlib : 3.9.2
pyproj : 3.6.1
seaborn : 0.13.2

INSTALLED VERSIONS: DEV

aiofiles : 24.1.0
black : 24.8.0
bottleneck : 1.4.0
cfgrib : 0.9.14.1
cftime : 1.6.4
codespell : 2.3.0
flake8 : 7.1.1
numpy : 1.26.4
pandas : 2.2.2
pip : 24.2
pytest : 8.3.3
pytest_cov : 5.0.0
pytest_env : 1.1.5
pytest_localftpserver: -
setuptools : 74.1.2
sphinx : -

INSTALLED VERSIONS: PIP

pytest-reportlog: 0.4.0

@keewis
Copy link
Collaborator

keewis commented Oct 29, 2024

apologies for the late reply, @gmaze

I don't know where exactly the error is coming from, but I'd suspect that this is because of a change to dask-jobqueue. If that's correct, try pinning to a earlier version (not sure which you have, that doesn't seem to be included in the environment info you posted).

Otherwise I can try reproducing, but I'm not sure when I will get to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants