子进程打开文件和管道 awk 命令



这是我的输入文件格式:

@SRR2056440.1 1 length=100
TGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCA
+SRR2056440.1 1 length=100
BCBFFFEFHHHHHJJJJJJIJJJJJJJJIJHHIJJIIJJJJJIJJIJJJJJJJJFHIJJJHHHHHHFDDDBDDD>>ACDEDDDDDDDDDDDDDDDDDEDD
@SRR2056440.2 2 length=100
CTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCCA
+SRR2056440.2 2 length=100
CCCFFFFFHHHHHJJJJJJJJJJJIJIJIGJGGIGGJIJJEHFEDDDDDDDDDDABDDDDDDDDDDDDDDADDDDDDDDDDDCDDDDDDBBDDCDDBDD@
@SRR2056440.3 3 length=100
TCTGCCGCCACCGCAGCAGCCACAGGCAGAGGAGGACGAGGACGACTGGGAATCGTAGGGGGCTCCATGACACCTTCCCCCCCAGACCCAGACTTGGGCC
+SRR2056440.3 3 length=100
CCCFFFFFHGHHHJJJJJIJJJJJJIJJIJJJIJJIIIGIJ<CDBCDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCDDDDDDDDDDDDDDDDDDCDCBDD

这是我要执行的命令:

cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'

和命令的输出:

100,0 0,0

我想使用子进程在 python 脚本中执行该命令。我已经做了几次尝试,但我无法弄清楚,这是我的最后一次尝试:

awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
cmd = ['cat', 'input.fq', '|', 'awk', awk_comm]
p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out1, err = p2.communicate()

编辑:

我在输出中看不到任何错误。它被卡住了,永远运行。

以下内容对我有用。

>>> awk_comm = r"""cat input.fq | awk 'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
>>> p2 = subprocess.Popen(awk_comm, stdout=subprocess.PIPE,shell=True)
>>> res = p2.communicate()
>>> res
('100.0t0.0n', None)

这里没有必要shell=True。只需设置您的subprocess.Popen对象即可执行您原本使用 shell 的所有操作:

# the original awk code, with whitespace added for readability
awk_command = r"""
NR%4==2 {
sum+=length($0);
nr++;
sumsq+=length($0)*length($0)
}
END {
printf "%.1ft%.1fn", sum/nr, sqrt(sumsq/nr-(sum/nr)**2)
}
"""
p2 = subprocess.Popen(
['awk', awk_command],
stdin=open('input.fq', 'r'),  # pass a file handle to input.fq directly on awk's stdin
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out1, err = p2.communicate()

默认情况下,Python 不使用 shell 来运行命令...但是管道是由壳体评估的!!您需要通过shell=True

p2 = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)

您可以使用命令模块来实现此目的:

import commands
awk_comm = r"""'NR%4==2{sum+=length($0);nr++;sumsq+=length($0)*length($0)}END{printf"%.1ft%.1fn",sum/nr,sqrt(sumsq/nr-(sum/nr)**2)}'"""
p1 = commands.getoutput('cat input.fq | awk ' + awk_comm)
print p1

希望这有帮助

最新更新