如何更改 condor 中的设置,以便用户不必 chmod 他们的实验脚本?



当前秃鹰集群的用户必须做:

chmod a+x /home/user/automl-meta-learning/results_plots/main.py

能够使用condor_submit运行他们的脚本。如何使秃鹰设置使用户不再需要这样做?


这里是一个用户示例提交脚本:

####################
#
# Experiments script
# Simple HTCondor submit description file
#
# chmod a+x test_condor.py
# chmod a+x experiments_meta_model_optimization.py
# chmod a+x meta_learning_experiments_submission.py
# chmod a+x download_miniImagenet.py
# chmod a+x ~/meta-learning-lstm-pytorch/main.py
# chmod a+x /home/user/automl-meta-learning/automl-proj/meta_learning/datasets/rand_fc_nn_vec_mu_ls_gen.py
# chmod a+x /home/user/automl-meta-learning/automl-proj/experiments/meta_learning/supervised_experiments_submission.py
# chmod a+x /home/user/automl-meta-learning/results_plots/main.py
# condor_submit -i
# condor_submit job.sub
#
####################
# Executable = /home/user/automl-meta-learning/automl-proj/experiments/meta_learning/supervised_experiments_submission.py
Executable = /home/user/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
# Executable = /home/user/meta-learning-lstm-pytorch/main.py
# Executable = /home/user/automl-meta-learning/automl-proj/meta_learning/datasets/rand_fc_nn_vec_mu_ls_gen.py
## Output Files
Log          = experiment_output_job.$(CLUSTER).log.out
Output       = experiment_output_job.$(CLUSTER).out.out
Error        = experiment_output_job.$(CLUSTER).err.out
# Use this to make sure 1 gpu is available. The key words are case insensitive.
REquest_gpus = 1
requirements = (CUDADeviceName != "Tesla K40m")
# requirements = (CUDADeviceName == "Quadro RTX 6000")
# requirements = ((CUDADeviceName = "Tesla K40m")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.gpus >= Requestgpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
# requirements = (CUDADeviceName == "Tesla K40m")
# requirements = (CUDADeviceName == "GeForce GTX TITAN X")
# Note: to use multiple CPUs instead of the default (one CPU), use request_cpus as well
Request_cpus = 4
# Request_cpus = 16
# E-mail option
Notify_user = me@gmail.com
Notification = always
Environment = MY_CONDOR_JOB_ID= $(CLUSTER)
# "Queue" means add the setup until this line to the queue (needs to be at the end of script).
Queue

如果要直接执行.py文件,则需要设置执行位,这是Linux/Unix的习惯用法,与HTCondor没有什么关系。也就是说,在命令行上,如果您想运行

$ ./foo.py

foo.py需要设置可执行位。如果想解决这个问题,可以将foo.py作为参数传递给python,然后运行

$ python foo.py

则不需要设置可执行位。为了在HTCondor中模拟这一点,你可以将/usr/bin/python或/usr/bin/python3设置为可执行文件,并将foo.py设置为参数,例如

executable = /usr/bin/python
arguments = foo.py

这些都假设您有一个共享的文件系统。如果您使用HTCondor的文件传输将数据发送到工作节点,则需要多几行。

相关内容

最新更新