Python:我需要打印只返回唯一的句子



我正在尝试获取一个包含 10 个句子(所有单词(的 txt 文件,并将其作为命令行参数传递给 python 脚本。我想打印包含dic中列出的单词的句子。下面的脚本查找匹配的句子,但它打印句子的次数与找到匹配单词的次数一样多。

有没有另一种方法可以用来做到这一点?另外,我不希望输出用一行分隔 ((

import sys
dic=["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"]
f=open(sys.argv[1])
for i in range(0,10):
line=f.readline()    
words=line.split()
if len(words) > 3:
    for j in words:
        if j in dic:
            print(line)

输出:

eighty two is what i am bidding on the brent
eighty two is what i am bidding on the brent
eighty two is what i am bidding on the brent
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
no nothing is going on double
i am bidding on the option for eighty five
i am bidding on the option for eighty five
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
blah blah blah blah close

所需输出:

eighty two is what i am bidding on the brent
call on sixty five to sixty seventy
no nothing is going on double
i am bidding on the option for eighty five
recross sell seller selling sept
blah blah blah blah close
  1. 抑制输出中的重复行

    print(line) 语句后添加一个break ,以便中断字典单词上的for循环

  2. 取消换行符

    额外的换行符是由 f.readline() 引起的,因为它将包含返回字符串末尾的n。您可以使用 line.strip() 删除它,但最好改用 for line in f 语法。

这是代码:

for line in f:    
    words=line.split()
    if len(words) > 3:
        for j in words:
            if j in dic:
                print(line)
                break

我建议为单词字典创建一个set,并创建一个包含文件每一行单词的第二个set。然后,您可以使用&来比较集合以获得它们的交集,或者两者共有的单词。这比循环浏览列表以查找类似单词更有效。

import sys
dic=set(["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"])
filename = sys.argv[1]
with open(filename) as f:
    for line in f:
        s = set(line.split())
        if s & dic:
            print(line.strip())

最新更新