tfep.utils.cli.launcher.SRunLauncher

class tfep.utils.cli.launcher.SRunLauncher(n_tasks: int | list[int] | None = None, multiprog: bool = False, multiprog_config_file_path: str = 'srun-job.conf', **kwargs)[source]

Bases: Launcher

Launch a command through SLURM’s srun.

The launcher simply prepends "srun" to each given command, setting the specified number of nodes, tasks per node, and cpus per task. Except for multiprog and multiprog_config_file_path, any parameter can be passed as a list of values, one for each command.

The launcher also supports running multiple commands in parallel using the --multi-prog feature. The launcher assigns contiguous task ranks to each command.

The class also has a GLOBAL_SRUN_OPTIONS attribute holding a dictionary where options for srun that are shared across all executions of srun can be specified.

Parameters:
  • n_tasks (int or List[int], optional) – The number of tasks to pass to srun. When multiprog is True, this must be given as a list with length equal to the number of commands.

  • multiprog (bool, optional) – If True multiple commands are run in parallel using the --multi-prog argument. In this case, srun is invoked only once, and thus all parameters (n_nodes, n_tasks_per_node, etc.) cannot be list except for n_tasks.

  • multiprog_config_file_path (str, optional) – The file path (relative to the working directory) where the multiprog configuration file is created.

  • time (str or List[str], optional) – The maximum time before the job step is terminated as a string in the same format used by SLURM (e.g., '1-00:06:00').

  • n_nodes (int or List[int], optional) – The number of nodes to pass to srun.

  • n_tasks_per_node (int or List[int], optional) – The number of tasks per node to pass to srun. Note that n_tasks takes precedence over this.

  • n_cpus_per_task (int or List[int], optional) – The number of cpus per task to pass to srun.

  • relative_node_idx (int or List[int], optional) – Run a job step relative the relative_node_idx-th node (starting from node 0) of the current allocation.

  • cpu_bind (str or List[str], optional) – How to bind tasks to CPU (e.g., 'threads'). Corresponds to the srun --cpu-bind option.

  • distribution (str or List[str], optional) – Specify how to distribute tasks among cores (e.g., 'block:block:fcyclic'). Corresponds to the srun --distribution option.

See also

Launcher

Standard launcher class.

Examples

If the number of nodes/tasks/cpus are given as an integer, all srun parallel executions will have the same number of nodes/tasks/cpus.

>>> launcher = SRunLauncher(n_nodes=2, n_tasks_per_node=4, n_cpus_per_task=4)

Multiple commands can be run in parallel either by calling srun twice by calling it once with the --multi-prog argument, which is design to support multiple-program multiple-data (MPMD) MPI programs. In the first case, it is possible to specify the configuration for each srun.

For example, this modifies the launcher to run two commands in parallel using the same number of cpus per task but different number of nodes and tasks per node.

>>> launcher.n_nodes = [1, 4]
>>> launcher.n_tasks_per_node = [8, 4]

Instead, when --multi-prog is used, srun is invoked only once. Thus no option can be a list, except for n_tasks, which must be provided as a list and is used to determine the task ranks assigned to each program.

The following example configures the launcher to run three programs on 4 nodes, and 7 tasks. It assigns 3 tasks to the second process and 2 tasks to the others.

>>> launcher = SRunLauncher(n_nodes=4, n_tasks=[2, 3, 2], multiprog=True)
__init__(n_tasks: int | list[int] | None = None, multiprog: bool = False, multiprog_config_file_path: str = 'srun-job.conf', **kwargs)[source]

Methods

__init__([n_tasks, multiprog, ...])

run(*commands, **kwargs)

Run one or more commands with srun.

Attributes

GLOBAL_SRUN_OPTIONS

n_tasks

The number of tasks to pass to srun for each command.

multiprog

Whether the --multi-prog feature should be used to run multiple commands.

multiprog_config_file_path

The file path (relative to the working directory) where the multiprog configuration file is created.

srun_kwargs

Other keword arguments for SRunTool.

multiprog

Whether the --multi-prog feature should be used to run multiple commands.

multiprog_config_file_path

The file path (relative to the working directory) where the multiprog configuration file is created.

n_tasks

The number of tasks to pass to srun for each command.

run(*commands, **kwargs)[source]

Run one or more commands with srun.

The method accepts all keyword arguments supported by tfep.utils.cli.Launcher.run().

Parameters:
  • *commands – One or more commands to execute, either in the same list format used by subprocess.Popen or as a CLITool.

  • **kwargs – Other keyword arguments to pass to Launcher.run.

Returns:

result – The object encapsulating the results of the project. If multiple processes are run in parallel, this is a list of results, one for each process. Note that when running with multiprog only a single result is returned.

Return type:

subprocess.CompletedProcess or List[subprocess.CompletedProcess]

See also

tfep.utils.cli.Launcher.run

The parent class method.

srun_kwargs

Other keword arguments for SRunTool.