Cluster usage

  • Cluster usage

  • process

8-Job management

sinfo

The idle state of each partition node can be queried through sinfo; it displays the idle state of all partition nodes in the cluster, idel means idle, mix means that part of the core of the node can be used, and alloc means occupied; the queue status will be constantly adjusted, and the specific update information can be followed Computing Center website: http://hpc.cibr.ac.cn

Common parameters of sinfo

-a, --all # show all partitions ( (including hidden and those inaccessible)
-d, --dead #View unresponsive nodes in the cluster
-l, --long #long output -- show more information
-n, --nodes=NODES # Display information about the specified node, separated by commas if multiple nodes are specified
-o, --format=format #Output in the specified format
-p, --partition=PARTITION #Display the information of the specified partition, if multiple partitions are specified, separate them with commas;
Help options:
--help # Display the help information of the sinfo command;

job/squeue

View the queuing of submitted jobs;

job #View the job information submitted by yourself
squeue #View job information submitted by all users

By default, the output contents of j ob and squeue are as follows: job number, partition, job name, user, job status, running time, number of nodes, number of CPUs requested, number of memory requested , and running nodes

JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_M NODELIST

By default, the output of squeue is as follows, namely job number, partition, job name, user, job status, running time, number of nodes, running node

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

Common parameters of squeue

--help # Display the help information of the squeue command;
-A < account_list > # Display the jobs of all users under the specified account, separated by commas if there are multiple accounts;
- i # Refresh the output job information every corresponding number of seconds
-j < job_id_list > #Display the job information of the specified job number, if there are multiple job numbers, separate them with commas;
-n < name_list > #Display job information on the specified node, separated by commas if multiple nodes are specified;
-t < state_list > #Display the job information of the specified state, if multiple states are specified, separate them with commas;
-u < user_list > #Display the job information of the specified user, if there are multiple users, separate them with commas;
-w < hostlist > #Display jobs running on the specified node, separated by commas if there are multiple nodes;
-l, --long # output long report

sacct and scontrol show job/node

Display job /node information through sacct and scontrol show job / node ;

Use sacct to query information about jobs that have ended, as follows:

sacct -j 899775

Output job information in a specified format:

sacct --format= jobid,user ,alloccpu,allocgres,state%15,exit -S 2022-08-01
Note: Detailed parameters can be viewed through sacct –help

jobid resource of the running job through scontrol show job :

show node via scontrol View the application resources of the occupied node :

scancel

Cancel submitted jobs in the queue:

scancel jobid

scancel common parameters

--help # Display the help information of scancel command;
-n < job_name > # Cancel the job of the specified job name;
-p < partition_name > # Cancel the job of the specified partition;
-t < job_state_name > # Cancel the job of the specified state, "PENDING", "RUNNING" or "SUSPENDED";
-u < user_name > # Cancel the job under the specified user;

© 2023 by Personal Life Coach. Proudly created with Wix.com  ICP备案号:京ICP备18029179号