Python:使用value from file a在另一个文件中搜索行

新手问题

我有两个文件文件A:有物品清单的文件(苹果、梨、橙子)文件B:包含世界上所有水果的文件(1,000,000行)

在unix中，我会从文件B中grep apple并返回所有结果

在unix中，我会1. 从文件b>> fruitfound.txt中找到苹果2. 从文件b>> fruitfound.txt中删除梨3.从文件b中删除橙子>> fruitfound.txt

我想要一个python脚本，它使用文件a中的值并搜索文件b，然后写出输出。注意:文件B会有青苹果，红苹果，黄苹果，我想把这三个结果都写到fruitfound.txt

最贴心的问候

Kornity

grep -f $patterns $filename就是这样做的。无需使用python脚本

要查找Python中包含任何给定关键字的行，可以使用正则表达式:

import re
from itertools import ifilter
def fgrep(words, lines):
    # note: allow a partial match e.g., 'b c' matches 'ab cd'
    return ifilter(re.compile("|".join(map(re.escape, words))).search, lines)

将其转换为命令行脚本:

import sys
def main():
    with open(sys.argv[1]) as kwfile: # read keywords from given file
        # one keyword per line
        keywords = [line.strip() for line in kwfile if line.strip()]
    if not keywords:
       sys.exit("no keywords are given")
    if len(sys.argv) > 2: # read lines to match from given file
        with open(sys.argv[2]) as file:
            sys.stdout.writelines(fgrep(keywords, file))
    else: # read lines from stdin
        sys.stdout.writelines(fgrep(keywords, sys.stdin))
main()

的例子:

$ python fgrep.py a b > fruitfound.txt

有更有效的算法，例如Ago-Corasick算法，但在我的机器上，它只需要不到一秒钟的时间来过滤数百万行，它可能已经足够好了(grep要快几倍)。令人惊讶的是，基于Ago-Corasick算法的acora对于我尝试过的数据来说速度较慢。

相关内容

最新更新

热门标签：