简介及安装
BioNote 2021-10-15
SGE
Computing Cluster
# 参考
http://www.softpanorama.org/HPC/Grid_engine/Queues/queue_states.shtml http://www.chenlianfu.com/?p=2441
# 基本概念
集群中的主机分2种:控制节点(mater)和计算节点(slave)。其中控制节点只在一台机器上部署,该控制节点也同时作为计算节点;其它主机全部是计算节点。
计算资源是由host的slots构成。可以选取集群中部分的hosts,定义为host用户组。
队列则表示集群中计算资源的容器。例如,名称叫all.q的队列对应着集群中全部的计算资源。
若不想让某些用户使用集群全部的计算资源,则定义一个新的队列名,且该队列仅能使用集群部分的计算资源。
使用SGE集群进行计算的时候,为了进行并行化计算,需要设置并行化参数。
# 安装
# 准备工作
- 修改集群端口为不常用端口
sed -i 's#6444/tcp#27100/tcp#g' /etc/services
sed -i 's#6444/udp#27100/udp#g' /etc/services
sed -i 's#6445/tcp#27101/tcp#g' /etc/services
sed -i 's#6445/udp#27101/udp#g' /etc/services
1
2
3
4
2
3
4
- 在/etc/hosts文件中配置节点对应的解析
cat /etc/hosts
172.168.1.6 test1
172.168.1.7 test2
172.168.1.8 test3
1
2
3
4
2
3
4
- 配置SGE根目录
SGE所在目录需存入环境变量中
export SGE_ROOT=/ssd/01.software/02.sge/ge2011
1
# 管理节点配置
登陆管理节点,进入$SGE_ROOT目录下执行./install_qmaster,然后按照提示执行即可
# 脚本安装执行节点
# 前置条件
- hosts配置无误
- 配置免密(或者采用NIS服务)
# 编辑配置文件
将其中的三个 HOST_LIST 都改为要添加的节点,如有多个节点采用空格分隔。配置文件详见后文。
# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
ADMIN_HOST_LIST="test1"
# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
SUBMIT_HOST_LIST="test1"
# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
EXEC_HOST_LIST="test1"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 执行安装操作
./inst_sge -x -auto /ssd/01.software/02.sge/ge2011/inst.conf
1
# 配置节点信息
主要修改complex_values
qconf -me test1
1
# 卸载执行节点
./inst_sge -ux -host test1
1
# 管理命令
qconf -ae hostname
添加执行主机
qconf -de hostname
删除执行主机
qconf -sel
显示执行主机列表
qconf -ah hostname
添加管理主机
qconf -dh hostname
删除管理主机
qconf -sh
显示管理主机列表
qconf -as hostname
添加提交主机
qconf -ds hostname
删除提交主机
qconf -ss
显示提交主机列表
qconf -ahgrp groupname
添加主机用户组
qconf -mhgrp groupname
修改主机用户组
qconf -shgrp groupname
显示主机用户组成员
qconf -shgrpl
显示主机用户组列表
qconf -aq queuename
添加集群队列
qconf -dq queuename
删除集群队列
qconf -mq queuename
修改集群队列配置
qconf -sq queuename
显示集群队列配置
qconf -sql
显示集群队列列表
qconf -ap PE_name
添加并行化环境
qconf -mp PE_name
修改并行化环境
qconf -dp PE_name
删除并行化环境
qconf -sp PE_name
显示并行化环境
qconf -spl
显示并行化环境名称列表
qstat -f
显示执行主机状态
qstat -u user
查看用户的作业
qhost
显示执行主机资源信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# 注意
需配置正确的资源池才可正常进行使用
命令如下
qconf -mc
1
# 配置文件
# Oct 11, 2019: Modify from util/install_modules/inst_template.conf
#-------------------------------------------------
# Huawei SGE configuration file
#-------------------------------------------------
# Use always fully qualified pathnames, please
# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
SGE_ROOT="/ssd/01.software/02.sge/ge2011"
# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_QMASTER_PORT="27100"
# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
SGE_EXECD_PORT="27101"
# SGE_ENABLE_SMF
# if set to false SMF will not control SGE services
SGE_ENABLE_SMF="false"
# SGE_ENABLE_ST
# if set to false Sun Service Tags will not be used
SGE_ENABLE_ST="true"
# SGE_CLUSTER_NAME
# Name of this cluster (used by SMF as an service instance name)
SGE_CLUSTER_NAME="Please enter cluster name"
# SGE_JMX_PORT is used by qmasters JMX MBean server
# mandatory if install_qmaster -jmx -auto <cfgfile>
# range: 1024-65500
SGE_JMX_PORT="Please enter port"
# SGE_JMX_SSL is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true, the mbean server connection uses
# SSL authentication
SGE_JMX_SSL="false"
# SGE_JMX_SSL_CLIENT is used by qmasters JMX MBean server
# if SGE_JMX_SSL_CLIENT=true, the mbean server connection uses
# SSL authentication of the client in addition
SGE_JMX_SSL_CLIENT="false"
# SGE_JMX_SSL_KEYSTORE is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true the server keystore found here is used
# e.g. /var/sgeCA/port<sge_qmaster_port>/<sge_cell>/private/keystore
SGE_JMX_SSL_KEYSTORE="Please enter absolute path of server keystore file"
# SGE_JMX_SSL_KEYSTORE_PW is used by qmasters JMX MBean server
# password for the SGE_JMX_SSL_KEYSTORE file
SGE_JMX_SSL_KEYSTORE_PW="Please enter the server keystore password"
# SGE_JVM_LIB_PATH is used by qmasters jvm thread
# path to libjvm.so
# if value is missing or set to "none" JMX thread will not be installed
# when the value is empty or path does not exit on the system, Grid Engine
# will try to find a correct value, if it cannot do so, value is set to
# "jvmlib_missing" and JMX thread will be configured but will fail to start
SGE_JVM_LIB_PATH="Please enter absolute path of libjvm.so"
# SGE_ADDITIONAL_JVM_ARGS is used by qmasters jvm thread
# jvm specific arguments as -verbose:jni etc.
# optional, can be empty
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"
# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"
# ADMIN_USER, if you want to use a different admin user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir will be used as admin user
ADMIN_USER=""
# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
QMASTER_SPOOL_DIR="Please, enter spooldir"
# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to use
# berkeley db spooling. Only cluster configuration and jobs will
# be spooled in the database. The execution daemon still needs a spool
# directory
#(mandatory for qmaster installation)
EXECD_SPOOL_DIR="Please, enter spooldir"
# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
GID_RANGE="Please, enter GID range"
# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"
# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must contain the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"
# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NFS mount, please
DB_SPOOLING_DIR="spooldb"
# This parameter set the number of parallel installation processes.
# The prevent a system overload, or exeeding the number of open file
# descriptors the user can limit the number of parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"
# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
ADMIN_HOST_LIST="test1"
# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
SUBMIT_HOST_LIST="test1"
# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
EXEC_HOST_LIST="test1"
# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
EXECD_SPOOL_DIR_LOCAL=""
# If true, the domainnames will be ignored, during the hostname resolving
# if false, the fully qualified domain name will be used for name resolving
HOSTNAME_RESOLVING="true"
# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="ssh"
# This remote copy command is used for csp installation.
# The script needs the remote copy command for distributing
# the csp certificates. Using ssl the command scp has to be entered,
# using the not so secure rsh the command rcp has to be entered.
# Both need a passwordless ssh/rsh connection to the hosts, which
# should be connected to. (mandatory for csp installation mode)
COPY_COMMAND="scp"
# Enter your default domain, if you are using /etc/hosts or NIS configuration
DEFAULT_DOMAIN="none"
# If a job stops, fails, finish, you can send a mail to this adress
ADMIN_MAIL="none"
# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
ADD_TO_RC="false"
#If this is "true" the file permissions of executables will be set to 755
#and of ordenary file to 644.
SET_FILE_PERMS="true"
# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be rescheduled
RESCHEDULE_JOBS="wait"
# Enter a one of the three distributed scheduler tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"
# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
SHADOW_HOST=""
# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
EXEC_HOST_LIST_RM=""
# This option is used for startup script removing.
# If true, all rc startup scripts will be removed during
# automatic deinstallation. If false, the scripts won't
# be touched.
# (mandatory for unistallation of execution/qmaster hosts)
REMOVE_RC="true"
# This is a Windows specific part of the auto isntallation template
# If you going to install windows executions hosts, you have to enable the
# windows support. To do this, please set the WINDOWS_SUPPORT variable
# to "true". ("false" is disabled)
# (mandatory for qmaster installation, by default WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"
# Enabling the WINDOWS_SUPPORT, recommends the following parameter.
# The WIN_ADMIN_NAME will be added to the list of SGE managers.
# Without adding the WIN_ADMIN_NAME the execution host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator" which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_ADMIN_NAME="Administrator"
# This parameter is used to switch between local ADMINUSER and Windows
# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS variable to true, the
# Adminuser will be a Windows Domain User. It is recommended that
# a Windows Domain Server is configured and the Windows Domain User is
# created. Setting this variable to false, the local Adminuser will be
# used as ADMINUSER. The install script tries to create this user account
# but we recommend, because it will be saver, to create this user,
# before running the installation.
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_DOMAIN_ACCESS="false"
# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each installtion, if true.
# In case of false, the certs will be created, if not existing.
# Existing certs won't be overwritten. (mandatory for csp install)
CSP_RECREATE="true"
# The created certs won't be copied, if this option is set to false
# If true, the script tries to copy the generated certs. This
# requires passwordless ssh/rsh access for user root to the
# execution hosts
CSP_COPY_CERTS="false"
# csp information, your country code (only 2 characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"
# your state (mandatory for csp install)
CSP_STATE="Germany"
# your location, eg. the building (mandatory for csp install)
CSP_LOCATION="Building"
# your arganisation (mandatory for csp install)
CSP_ORGA="Organisation"
# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"
# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269