我对Python相当陌生,我有以下代码,用于导入csv文件,处理该文件,并将文件中的每个单词打印到新的csv文件中的自己的行中。例如:
csv文件:The dog is black and has a black collar
输出CSV文件:
The
dog
is
black
and
has
a
black
collar
但是,如果同一个单词在同一行中,我希望输出不要打印相同的单词两次。例如:
期望输出的CSV文件:
The
dog
is
black
and
has
a
collar
注意到单词"black"没有被打印两次吗?这就是我想要的。如果有人能帮我,那就太好了。就像我说的,我对Python还是个新手,我正在摸索。提前感谢!
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
for word in row.split():
writer.writerow([word])
如果您不需要按顺序打印,那么您可以尝试set()
>>> s = 'The dog is black and has a black collar'
>>> s.split()
['The', 'dog', 'is', 'black', 'and', 'has', 'a', 'black', 'collar']
>>> set(s.split())
{'is', 'has', 'black', 'and', 'dog', 'collar', 'a', 'The'}
尝试累积您已经在set
中看到的单词,然后只输出不在集合中的单词:
# before you process the file
seen_words = set()
# ... later, in the loop...
for word in row.split():
if word not in seen_words:
writer.writerow([word])
seen_words.add(word)
我实际上最终解决了我自己的问题!谢谢你的建议。我是这样做的:
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
mylist = row.split()
newlist = []
for word in mylist:
if not word in newlist:
newlist.append(word)
writer.writerow([word])