在Python中不输出相同的单词两次



我对Python相当陌生,我有以下代码,用于导入csv文件,处理该文件,并将文件中的每个单词打印到新的csv文件中的自己的行中。例如:

csv文件:

The dog is black and has a black collar

输出CSV文件:

The
dog
is
black
and
has
a
black
collar

但是,如果同一个单词在同一行中,我希望输出不要打印相同的单词两次。例如:

期望输出的CSV文件:

The
dog
is
black
and
has
a
collar

注意到单词"black"没有被打印两次吗?这就是我想要的。如果有人能帮我,那就太好了。就像我说的,我对Python还是个新手,我正在摸索。提前感谢!

for row in file1:
    row = row.strip()
    row = row.lower()
    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")
    for word in row.split():
        writer.writerow([word])

如果您不需要按顺序打印,那么您可以尝试set()

>>> s = 'The dog is black and has a black collar'
>>> s.split()
['The', 'dog', 'is', 'black', 'and', 'has', 'a', 'black', 'collar']
>>> set(s.split())
{'is', 'has', 'black', 'and', 'dog', 'collar', 'a', 'The'}

尝试累积您已经在set中看到的单词,然后只输出不在集合中的单词:

# before you process the file
seen_words = set()
# ... later, in the loop...
for word in row.split():
  if word not in seen_words:
    writer.writerow([word])
    seen_words.add(word)

我实际上最终解决了我自己的问题!谢谢你的建议。我是这样做的:

for row in file1:
    row = row.strip()
    row = row.lower()
    for stopword in internal_stop_words:
        if stopword in row:
            row = row.replace(stopword," ")
    mylist = row.split()
    newlist = []
    for word in mylist:
        if not word in newlist:
            newlist.append(word)
            writer.writerow([word])

相关内容

最新更新