Cluster usage

  • Cluster usage

  • process

Task submission rules

The research group server is merged into the cluster resource pool:

Process: Determining the needs--the laboratory needs to purchase a model similar to the existing equipment in the computing center (or contact the computing center to confirm) cpu, memory and gpu should be the same as the existing equipment--the purchase of equipment needs to be equipped with an IB card and a separate IB line--Installation rules: The local system of the device needs to install the centos7.6 system--the partition and related configuration need to be the same as the existing equipment--join the computing center network image--merge into the cluster pool (in order to realize that the research group has Priority and income when converted into the subject group machine, we will set the added device into a separate queue --- add slurm queue and parameter configuration, allow all users in the institute to access, the research group itself has priority, start related services --- add Finish

Advantages: The hosting equipment is merged into the public resource pool, and no hosting fees are charged. The research group has priority over its own equipment (no time limit), and other users of the center use the research group's equipment to convert the research group's machine time income. At the same time, the center's shared computing resource pool is expanded.

Task submission rules:

public resource pool

1. The default qos (high) of the cluster restricts a single user to submit (720 cores, 3600G memory) which is equivalent to 20 nodes in the q_cn queue, (excluding the qos of the research group that joined the public resource pool)
1.1 If a single user submits a task and uses resources that do not reach the limit, without queuing (the default limit policy does not change)
1.2 The resources used by a single user to raise a task reach the default resource limit, and the queuing status (the script detects once every minute, whether there are free resources in the queue and the queuing status of other users)
1.2.1 There are free resources and no other users queue up, and there are free resources and other users queue up (Release resource restrictions for queued users to use (the task is run according to the order of user submission and resource satisfaction conditions)) (the script is detected once every minute, no When free resources, restore the default resource limit)
1.2.2 There are other users queuing when there are no free resources (the default policy has not changed)
1.3 The default time for submitting a task is 2 days (48 hours). If it takes longer, you need to contact the administrator of the computing center to extend it. The default single extension is 2 days (48 hours) (no extension will be given if there is a queue and no resources) )
1.4 There are 3 hours before the task expires and there will be an email reminder when it expires (in order to avoid sending batches of emails, within 2.5 hours, there will also be a reminder once for batch tasks due) (sending emails is provided by the application account and application form Email), when the user receives the reminder, he needs to log in to the cluster to check which tasks are due, and then contact the administrator to extend the time, explaining (the email reminder is only used as an auxiliary reminder, and the user must know it well after submitting the task. Approximate expiration time, to avoid delays or failures in receiving mailboxes caused by other reasons, causing unnecessary trouble)

research group

The research group that joins the cluster resource pool will have an additional qos priority such as: high_l/high_c (the naming rule is the default qos high_ (abbreviation of the last name of the research team teacher)), the user queries the qos command he owns: sacctmgr show assoc, the research group The queue priority is higher than the default priority (queued tasks need to be less than or equal to other tasks to take effect under the same circumstances). The user submits to other (non-self device queue) has the default priority. Remarks: To submit a task, you need to specify the qos -q parameter of the research group separately, otherwise it will be the default qos. (The research group qos does not limit the use of a single user by default, if necessary, you can contact the administrator of the computing center to set it)

Remark:

If the cluster resources are relatively tight, you can divide the resources you submit the task application into the smallest unit, so that you can allocate computing resources faster and perform calculations.
Run the restat command to view the resources used by each node in the cluster. Only when the requested resources are within the range of remaining resources can resources be directly allocated to reduce waiting.
CfgTRES=cpu=36, mem=191891M, billing=36 AllocTRES=cpu=16, mem=125G
CfgTRES=cpu=36 The maximum number of cores of the node (mem=191891M is the maximum memory of the system, but to exclude the memory overhead of the system itself, the maximum available memory can refer to the column of hardware configuration - single node core number: https://hpc.cibr. ac.cn/index.php?m=&c=Hpc&a=hpcInfo&pid=148) AllocTRES is the applied resource

The above resource limitation strategy is to better enable each user to use the computing resources of the cluster and reduce queuing. You are welcome to give us timely feedback on the problems and suggestions found during use, and we will make adjustments and optimizations at any time.

© 2023 by Personal Life Coach. Proudly created with Wix.com  ICP备案号:京ICP备18029179号