这是我的输入文件格式:
@SRR2056440.1 1 length=100
TGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCA
+SRR2056440.1 1 length=100
BCBFFFEFHHHHHJJJJJJIJJJJJJJJIJHHIJJIIJJJJJIJJIJJJJJJJJFHIJJJHHHHHHFDDDBDDD>>ACDEDDDDDDDDDDDDDDDDDEDD
@SRR2056440.2 2 length=100
CTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCCA
+SRR2056440.2 2 length=100
CCCFFFFFHHHHHJJJJJJJJJJJIJIJIGJGGIGGJIJJEHFEDDDDDDDDDDABDDDDDDDDDDDDDDADDDDDDDDDDDCDDDDDDBBDDCDDBDD@
@SRR2056440.3 3 length=100
TCTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCC
+SRR2056440.3 3 length=100
CCCFFFFFHGHHHJJJJJIJJJJJJIJJIJJJIJJIIIGIJ<CDBCDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDDDCDCBDD
这是我要执行的命令:
cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'
和命令的输出:
100,0 0,0
我想使用子进程在 python 脚本中执行该命令。我已经做了几次尝试,但我无法弄清楚,这是我的最后一次尝试:
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
cmd = ['cat', 'input.fq', '|', 'awk', awk_comm]
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out1, err = p2.communicate()
编辑:
我在输出中看不到任何错误。它被卡住了,永远运行。
以下内容对我有用。
>>> awk_comm = r"""cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
>>> p2 = subprocess.Popen(awk_comm, stdout=subprocess.PIPE,shell=True)
>>> res = p2.communicate()
>>> res
('100.0t0.0n', None)
这里没有必要shell=True
。只需设置您的subprocess.Popen
对象即可执行您原本使用 shell 的所有操作:
# the original awk code, with whitespace added for readability
awk_command = r"""
NR%4==2 {
sum+=length($0);
nr++;
sumsq+=length($0)*length($0)
}
END {
printf "%.1ft%.1fn", sum/nr, sqrt(sumsq/nr-(sum/nr)**2)
}
"""
p2 = subprocess.Popen(
['awk', awk_command],
stdin=open('input.fq', 'r'), # pass a file handle to input.fq directly on awk's stdin
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out1, err = p2.communicate()
默认情况下,Python 不使用 shell 来运行命令...但是管道是由壳体评估的!!您需要通过shell=True
:
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
您可以使用命令模块来实现此目的:
import commands
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
p1 = commands.getoutput('cat input.fq | awk ' + awk_comm)
print p1
希望这有帮助