r-在使用slurm提交我的作业时添加Rscript后，平均加载量提高到1000+

我使用slum提交作业，一开始，一切都很好。在添加Rscript进行简单过滤后，系统平均负载突然上升到1000+，这是非常不正常的。我一直在谷歌上搜索，但没有发现任何东西。我的代码显示如下：

#!/bin/bash
#SBATCH --job-name=gtool
#SBATCH --partition=Compute
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -a 1-22
for file in output/impute2/data_chr"${SLURM_ARRAY_TASK_ID}".*impute2
do
echo "$file" start!
# file prefix
foo=$(echo "$file" | awk -F "/" '{print $NF}' | awk -F . '{print $1"."$2}')
# use R for subset ID
Rscript src/detect.impute.snp.r "$file"
# gtool subset
gtool -S 
--g "$file" 
--s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample 
--og output/impute2_subset/"$foo".gen 
--inclusion output/impute2_subset/"$foo".SNPID.txt
# gtool GEN to PED 
gtool -G 
--g output/impute2_subset/"$foo".gen 
--s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample 
--ped output/impute2_subset_2_PLINK/"$foo".impute2.ped 
--map output/impute2_subset_2_PLINK/"$foo".impute2.map 
--chr "${SLURM_ARRAY_TASK_ID}" 
--snp
echo "$file" fin!
done

Rscipt:

options(tidyverse.quiet = TRUE)
options(readr.show_col_types = FALSE) 
library("tidyverse")
args <- commandArgs(T)
fn <- args[1]
d <- read_delim(fn,
col_names = F,
delim = " ",
col_select = c(2, 4, 5))
fn.out <- str_sub(last(str_split(fn,"/")[[1]]), 1, -9)
d %>% mutate(len1 = nchar(X4),
len2 = nchar(X5)) %>%
arrange(desc(X4), desc(X5)) %>% 
filter(len1==1, len2 == 1) %>%
select(X2) %>%
write_tsv(file = str_c("output/impute2_subset/", fn.out,".SNPID.txt"),
col_names = F)

scontrol还显示我的工作只使用一个CPU:

JobId=4873 ArrayJobId=4872 ArrayTaskId=1 JobName=gtool
......
NodeList=localhost
BatchHost=localhost
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
......

R和gtool使用的是单螺纹，没有提供螺纹参数，--ntasks也设置为1，孔在哪里？

R和/或gtools使用的一些库，如MKL、BLIS或OpenBLAS，可能会按系统配置为使用节点的所有核心，而不会检测到Slurm只分配了一个CPU。你可以尝试添加

export OMP_NUM_THREADS=1
export BLIS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

在for循环之前的提交脚本中。。

相关内容

最新更新

热门标签：