我使用slum提交作业,一开始,一切都很好。在添加Rscript进行简单过滤后,系统平均负载突然上升到1000+,这是非常不正常的。我一直在谷歌上搜索,但没有发现任何东西。我的代码显示如下:
#!/bin/bash
#SBATCH --job-name=gtool
#SBATCH --partition=Compute
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -a 1-22
for file in output/impute2/data_chr"${SLURM_ARRAY_TASK_ID}".*impute2
do
echo "$file" start!
# file prefix
foo=$(echo "$file" | awk -F "/" '{print $NF}' | awk -F . '{print $1"."$2}')
# use R for subset ID
Rscript src/detect.impute.snp.r "$file"
# gtool subset
gtool -S
--g "$file"
--s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample
--og output/impute2_subset/"$foo".gen
--inclusion output/impute2_subset/"$foo".SNPID.txt
# gtool GEN to PED
gtool -G
--g output/impute2_subset/"$foo".gen
--s output/pre_phasing/chr"${SLURM_ARRAY_TASK_ID}".sample
--ped output/impute2_subset_2_PLINK/"$foo".impute2.ped
--map output/impute2_subset_2_PLINK/"$foo".impute2.map
--chr "${SLURM_ARRAY_TASK_ID}"
--snp
echo "$file" fin!
done
Rscipt:
options(tidyverse.quiet = TRUE)
options(readr.show_col_types = FALSE)
library("tidyverse")
args <- commandArgs(T)
fn <- args[1]
d <- read_delim(fn,
col_names = F,
delim = " ",
col_select = c(2, 4, 5))
fn.out <- str_sub(last(str_split(fn,"/")[[1]]), 1, -9)
d %>% mutate(len1 = nchar(X4),
len2 = nchar(X5)) %>%
arrange(desc(X4), desc(X5)) %>%
filter(len1==1, len2 == 1) %>%
select(X2) %>%
write_tsv(file = str_c("output/impute2_subset/", fn.out,".SNPID.txt"),
col_names = F)
scontrol还显示我的工作只使用一个CPU:
JobId=4873 ArrayJobId=4872 ArrayTaskId=1 JobName=gtool
......
NodeList=localhost
BatchHost=localhost
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
......
R和gtool使用的是单螺纹,没有提供螺纹参数,--ntasks
也设置为1,孔在哪里?
R
和/或gtools
使用的一些库,如MKL
、BLIS
或OpenBLAS
,可能会按系统配置为使用节点的所有核心,而不会检测到Slurm只分配了一个CPU。你可以尝试添加
export OMP_NUM_THREADS=1
export BLIS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
在for
循环之前的提交脚本中。。