我已经在我的蛇形文件中写了这个规则
rule extractfeat:
input:
'/path/to/file/{genome}.gbk'
output:
'{genome}_{locus_tag}_{gene}_{substrate}.fasta'
shell:
'''
extractfeat {input} {output} -value {wildcards.genome}_{wildcards.locus_tag} -type CDS -describe product,locus_tag
'''
我想要得到所有的输出文件(这里的输出文件是指file=row),它们被写在一个单独的文件中,看起来像这样:
genome locus_tag gene substrate
PalbDSM11370 02121 susC pululan
PalbDSM11370 02122 susD pululan
PalbDSM11370 01210 susC arabinan
PalbDSM11370 01209 susD arabinan
PalbDSM11370 02015 susC bglukan
PalbDSM11370 02016 susD bglukan
PpalDSM17968 00934 susC pululan
PpalDSM17968 00933 susD pululan
PpalDSM17968 02229 susC arabinan
PpalDSM17968 02228 susD arabinan
PpalDSM17968 01622 susC bglukan
PpalDSM17968 01623 susD bglukan
PREVCOP 05864 susC pululan
PREVCOP 05865 susD pululan
PREVCOP 05852 susC arabinan
PREVCOP 05851 susD arabinan
PREVCOP 05099 susC bglukan
PREVCOP 05098 susD bglukan
PREVCOP 03646 susC ksiloglukan
PREVCOP 03645 susD ksiloglukan
Psp.AGR2160 00839 susC ksiloglukan
感谢您可以将其添加为另一个规则,其输入依赖于生成规则的所有输出:
rule tabulate:
input: <ALL THE FASTA FILES>
output: 'table.txt'
run:
wcs = glob_wildcards('{genome}_{locus_tag}_{gene}_{substrate}.fasta')
with open(output[0], 'w') as outfile:
outfile.write('genometlocus_tagtgenetsubstraten') # header
for row in zip(*wcs): # order will match order in wildcard string
outfile.write('t'.join(row) + 'n')
或者如果你已经有了这些通配符的列表,你可以直接写,而不是使用glob_wildcards。